A Cloud Browser is a browser running and executing on a server. This document describes the concepts and architecture for the Cloud Browser using use cases and requirements.

This is an Interest Group Note from the Cloud Browser Task Force. The Cloud Browser Task Force is part of the Media and Entertainment Interest Group. It has no official standing of any kind and does not represent the support or consensus of any standard organisations or contributor.


The web becomes feature-rich. Media capturing, web workers, and many other features ensure an engaging experience. Increasing features impacts hardware resources usage. Not all features work on a resource-constrained device. An application will use features depending on the device capabilities. Managing all those device differences is complex. This is especially true for devices used in the media and entertainment industry. Devices in this industry have a longevity and are resource-constrained. A typical example is a set-top-box which has a lifespan of 8 years. Furthermore, updating those devices are discouraged because of the risk a device will fail. Consequently, it is limited in web features. Compared with a desktop computer or mobile phone the web experience is very different.

Content providers would like to provide the same experience to all devices. There are numerous reasons such as: they would not have to discriminate their uses based on the device; branding and communications are only needed for one experience; implementation and design are easier without having to deal with a variety of experiences. However, providing the same experience to a variety of devices is difficult.

The industry is shifting to another approach. Leading content providers will not use a general purpose web browser anymore; instead, they will use a constrained, customised, web browser. Some will constrain to a subset of the web specifications. Some will constrain to their own proprietary language. Others will constrain to a (native) javascript framework. Their approach assures (near) the same experience on each device.

Several web browser on a single device is not desirable. The device will only be able to handle a few. Each web browser will add burden to the hardware resources. This limits the set of possible applications on a device. Furthermore, the end user will lose the coherent experience from a general purpose web browser. Moving to different web browsers will slightly change the experience. It may use other page transitions or use other key shortcuts. Likewise, each web browser needs to be configured independently. For example enabling an accessibility feature needs to be done per web browser. Also, web developers will find it harder to develop for multiple browsers with their own constrains such-as javascript only. The authoring complexity will be moved from device diversity to language diversity.

Cloud Browser addresses these issues by putting the browser into a powerful server, or cloud. The execution is shifted to the cloud, where the user interface is rendered and send to the Client Device. The main purpose of the client device is presenting the user interface to the end user. This solution makes it possible to provide a uniform experience for a large range of devices. Furthermore, it reduces the need for resource intensive processing on the client device and helps to deploy new browser technologies faster.

Providing the web technology to a non-internet infrastructure is difficult. The Cloud Browser is agnostic to infrastructures. The Cloud Browser is terminated in the cloud and is able to use any infrastructure to the client. A typical example is a broadcast infrastructure where the client is not able to connect to a web server.

A great number of users experiencing the Cloud Browser Technology today. The technology is based on the foundation of the World Wide Web. However, there are differences with the World Wide Web traditional (client-server) model. This causes incompatibilities. The Cloud Browser Task Force will explorer this—as a joined effort between a variety of stakeholders. The task force provides requirements, use cases and a formalised architecture of a Cloud Browser Technology. This group note is the result and will be used for further standardisation on the Cloud Browser Technology.

Cloud Browser compared to a local browser

Most people are familiar with local browsers. A content server serves resources such as HTML, Javascript, and media. The local browser requests the resources and makes the User Interface perceivable for the end-user. The local browser operates in the operating system. Starting and stopping the browser is the responsibility of the operating system. In addition, it provides hardware functionality such as keyboard inputs. A Cloud Browser differ in various aspects:

Cloud Browser compared with a split browser

A split browser, or transformation proxy, transform the content to the most appropriate format for the client device. The Cloud Browser is comparable but different.

Cloud Browser compared with a thin/fat paradigm

In a thin/fat paradigm, all processing is done on the server and the user interface is provided to a thin- or zero-client. Cloud Browser is very comparable but has three key differences:


Cloud Browser Client
Program which communicates with the orchestration.
Client Device
Actual hardware which is running the cloud browser client.
Cloud Browser
Web browser terminated in the Orchestration.
Server which abstracts functionality and manages sessions for the cloud browser.
Media Server
A generalised term for an infrastructure which provide various media streams such as VoD, linear broadcast and even a local pvr.
Graphics Library
A library responsible for rendering graphics to the end-user.
Out-of-band media
The Media server deliver its media to the Cloud Browser Client directly—without going through the Orchestration.
Web Application
A software program running in a web browser.
Cloud Browser Server
The host for the Orchestration is called the Cloud Browser Server
Transformation Proxy
Transformation proxies alter requests sent by web browser to servers and responses returned by servers so that the appearance, structure or control flow of Web applications are modified.
User Interface
The User Interface represents the part where the end-user could interact. Typically this is the Graphical user interface on a Client Device display—but could have any other form as-well.


The following section provides the Cloud Browser Architecture. The goal is a consistent architecture between Cloud Browser vendors.

Client Device

A Client Device is an apparatus which presenting—or delegating—the User Interface. To be operable a connected to a network is required. A Client Device can be A Set-Top-Box. A Client Device can be a mobile phone. A Client Device can even be a smart watch. Provided, a Cloud Browser Client is available.

Cloud Browser Client

In a Cloud Browser solution, the execution is done in the cloud; the result is sent to the Cloud Browser Client. This Cloud Browser Client is only responsible for showing the user interface and providing essential information such as keystrokes. The latter is depending on the infrastructure.

The Cloud Browser Client does not have any context. It connects to the Orchestration and shows the user interface. This will make sure that all the logic is in the cloud. This avoids otherwise required updates on the Client Device. A way to visualise the Cloud Browser Client is a remote display which sends essential information to the Cloud Browser Orchestration.

The transport and format between the Cloud Browser Client and Orchestration is agnostic— vendor specific.

Cloud Browser

The Cloud Browser denoted in the architecture is a web browser instance, terminated in the orchestration. It acts as a local web browser. Consequently, re-authoring for a Cloud Browser Solution is not necessary for existing applications.


The Cloud Browser exists in an orchestration. It is responsible for session management and abstraction. Multiple Cloud Browsers Clients could connect to the orchestration. The orchestration makes sure a client connects to the right Cloud Browser instance. Furthermore, it abstracts differences with a local browser. For example, a Client Device may have constraints on codec capabilities. Then the orchestration could either, provide this to a W3C standardized API, or virtually enhance this constraint by transcoding to a supported codec. Either way, the Cloud Browser is unaware of the decisions. This typical example ensures that the Client Device could run any Web Application, regardless of the codec it mandates. Details on abstraction are out of scope for this document and up to the implementor.

The orchestration receives triggers from the Cloud Browser Client such as keystrokes originated from the remote control. Triggers are not restricted to input. It could be any information from the Client Device. The Orchestration interprets the triggers and may decide to delegate it to the Cloud Browser. Depending on the context a red button could mean a regular key press in the Cloud Browser or open a new Web Application.

How the User Interface is sent to the Cloud Browser Client is decided by the Orchestration. There are two main approaches, a Single Stream and a Double Stream approach. The stream format is agnostic to the architecture because this encourages innovation.

Single Stream

In the Single Stream approach, the orchestration provides both, the user interface and the media streams.

Architecture single stream

This approach executes the Web Application in the Cloud Browser and delivers the user interface to the Cloud Browser Client. A typical use case: the user starts the Client Device. The Cloud Browser Client triggers a signal. The signal is received and resolved in the orchestration. It delegates further processing to the Cloud Browser. The Cloud Browser request, download, and parse the resources. These resources include HTML, CSS, and JavaScript. JavaScript is processed and executed. User Interface painting commands are sent to the Graphic Library.

In case the Web Application requests a media stream from a Media Server. The Graphic Library will combine the media stream together with the User Interface. The Orchestration sent a single media stream to the Cloud Browser Client; encoded by a Client Device supported codec. The Cloud Browser Client decode and present the stream to the Client Device display.

Double Stream

With a Double Stream approach, the Cloud Browser renders the user interface only, while the media is delivered from another source. The User Interface and media streams are delivered separately to the Client Device, which then has to combine both of these streams. The Cloud Browser Client present them to the end-user in a unified form. This approach leverages the video delivery infrastructures. A typical example of a delivery infrastructure is broadcast networks. Here the media is sent to all receivers at ones. Rerouting the traffic is not desired.

Architecture Double Stream

This approach executes the Web Application and delivers the User Interface to the Cloud Browser Client as the single stream approach. A media stream is delivered Out Of Band from another source. Blending is done on the Cloud Browser Client. The user interface is usually provided as a sequence of images which a Client Device could process within its own Graphic Library.

Use Cases

Cloud Browser Use Cases are split into two categories. Communication and execution.

Communication uses cases

Enabling a Cloud Browser solution will need communication between a Cloud Browser Client and the Orchestration. Outcomes from these use cases will lead to new standards.

Cloud Browser Client establishes a control channel with the Cloud Browser

The control channel is used to communicate the state of the Cloud Browser Client to the Cloud Browser, receive requests from the Cloud Browser.

  1. Cloud Browser Client connects to the Orchestration.
  2. Cloud Browser Client authenticates itself to the Cloud Orchestration.
  3. If the authentication is successful, the Orchestration connects the Cloud Browser with the client through the control channel.
  • The control channel must be reliable; so, that no state change or instruction gets lost.
  • Both the Cloud Browser and the Cloud Browser Client communicate through this channel.
  • Cloud environment needs a way to authenticate the client.

Cloud Browser sends request through control channel to the Cloud Browser Client

The application running within the Cloud Browser issues an API call, this API call depends upon the partial or whole execution in the Cloud Browser Client.

  1. Application issues API call.
  2. Cloud Browser begins executing the API call.
  3. If a part or the whole call depends on the client side execution, then send a request to the device client.
  4. Client processes the request and sends the result back to the Cloud Browser (if necessary).
  5. Cloud Browser goes on with the execution of the API call.

If the API call is synchronous then the request has to be synchronous too.

List of possible requests:

  • Set the volume.
  • Load an URL for playback (for Double Stream Approach with a local player).
  • Play, pause, stop, seek current playing media.
  • Control client local tuner.

  • There must be a control channel.
  • All communication must be validated, to make sure that no malicious data has been inserted.

Cloud Browser Client creates a session with the Orchestration

For each session there will be one User Interface and one or more control channels. Also the Orchistration can store resources associated to the session.

  1. The Cloud Browser Client connects to the Orchistration
  2. The Cloud Browser Client authenticates
  3. After a successful authentication the Orchistration responds with a session id

A possible implementations would be a kind of RPC. A possible representation for session ids would be UUID (http://www.ietf.org/rfc/rfc4122.txt). It is recommended to use a secure communication protocol (HTTPS, TLS).


The session id must be unique.

Cloud Browser Client closes the session

The Cloud Browser Client wants to close the session.

  1. Cloud browser client requests the orchistration to close the session with the specified session id

There must be a way to close the session. It may be required to close the session from the orchestration.

Cloud Browser Client starts streaming the User Interface

The Cloud Browser Client begins the streaming of the User Interface associated with a session id.

  1. Cloud Browser client sent a red key
  2. Orchistration uses this as trigger to start an Web Application
  3. Web application is executed in the Cloud Browser
  4. Orchistration sent user interface to the Cloud Browser Client

The User Interface stream must be as low latency as possible.

Execution Use Cases

Execution Use Cases describe the challenges that the Cloud Browser encounter. Some use cases will work on a local browser but are problematic on a Cloud Browser. These use cases will be used to extend or modify existing standards.

User interface is downscaled on the client device.

As a Web Application, I would like to know how the User Interface is perceived by the end-user.

  1. The cloud browser renders the User Interface on 1920x1080 resolution.
  2. The client is only capable of displaying 1280x720.
  3. Cloud Browser downscale the User Interface to 1280x720.

Typical example: the user interface is specifically authored for a higher resolution. The screen properties provide the same resolution but the scale is changed.


A workable solution is to look at the devicePixelRatio on the window object (window.devicePixelRatio). However, this is not how the property is used today.


There should be means to see how the end user perceives the User Interface.

Application needs to know when the User Interface is visible

As an application, I would like to know when the User Interface is perceived by the end-user

  1. The cloud browser renders the User Interface.
  2. DOM loaded event is fired.
  3. The client receives the User Interface and renders it to the end user.

Since the UI is terminated in the cloud, it is hard to tell when the user actually sees it. On a local browser this instantaneously after a load event. However, on a cloud solution, the UI needs to be sent to the Client Device. Depending on the network infrastructure (broadband, managed IP) it could take up a significant amount of time. Therefore, you need to use the rendered event which reflects the time the user perceives the UI.


There should be means to see when the end user perceives the User Interface. A solution could be to have an additional event to notify when the User Interface is rendered on the screen

Application needs to virtual media capabilities

As a Web Application, I would like to know which media capabilities are virtual and which are native

  1. The cloud browser request media capabilities.
  2. Cloud Browser provides both virtual as-well as native capabilities.
  3. As a content provider, you would like to serve the best possible content the users could perceive on their device.

The canPlayType property may be used to detect the supported types. However, on a cloud browser solution, you would also like to know which types are played natively on the client device and which are transcoded. Apart from additional processing cost on the server, it would make more sense to play a media asset natively and leverage the device capabilities when possible.


There should be a way to identify virtual and native media capabilities. A solution could be to look at the the "virtually" type with the canPlayType. Here is an example how it will look:

                  var support = videoElement.canPlayType('type=\'video/mp4; codecs="avc1.42E01E, mp4a.40.2"\); 
                  if (support == "probably") console.log("natively supported") 
                  else if (support == "virtually") console.log("supported by transcoding")

Application needs to know how to identify

As a Web Application, I would like to know how to interpreted identification

  1. The Cloud Browser request geolocation.
  2. Cloud Browser provides location of the orchestration.

Sometimes it could be confusing what part is identified. For example, the geolocation API could be used to identify from which location the user is interacting. Based on this information different content could be served. However, it should be clear that the locations are based on the user location and not the cloud browser. This will also provide incorrect inside into analytic services. Usually, the services are using the IP address as the reference which may not work with a Cloud Browser.


There should be a convention how to interpreted identification APIs. Where possible a Cloud Browser solution could provide an x -forwarded-for HTTP header. The client type information could be obtained by the user agent string together with the Orchestration information.

Application needs to have additional error handling

As a Web Application, I would like to know how to provide Cloud Browser specific errors

  1. The client generates a decode error
  2. Cloud Browser provides error to the application

In a Cloud Browser, you may need additional error reporting. For example when something goes wrong between the Cloud Browser and Client Device you need to have feedback what went wrong.


There needs to be a way to provide Cloud Browser specific errors. A proper error message may be reported in the global window object using the error event.


Special thanks to Oliver Friedrich, Nilo Mitra, Kaz Ashimura and Steve Morris for there contributions to this document.