A Cloud Browser is a browser running and executing on a server. This document describes the concepts and architecture for the Cloud Browser using use cases and requirements.
This is an Interest Group Note from the Cloud Browser Task Force. The Cloud Browser Task Force is part of the Media and Entertainment Interest Group. It has no official standing of any kind and does not represent the support or consensus of any standard organisations or contributor.
The web becomes feature-rich. Media capturing, web workers, and many other features ensure an engaging experience. Increasing features impacts hardware resources usage. Not all features work on a resource-constrained device. An application will use features depending on the device capabilities. Managing all those device differences is complex. This is especially true for devices used in the media and entertainment industry. Devices in this industry have a longevity and are resource-constrained. A typical example is a set-top-box which has a lifespan of 8 years. Furthermore, updating those devices are discouraged because of the risk a device will fail. Consequently, it is limited in web features. Compared with a desktop computer or mobile phone the web experience is very different.
Content providers would like to provide the same experience to all devices. There are numerous reasons such as: they would not have to discriminate their uses based on the device; branding and communications are only needed for one experience; implementation and design are easier without having to deal with a variety of experiences. However, providing the same experience to a variety of devices is difficult.
The industry is shifting to another approach. Leading content providers will not use a general purpose web browser anymore; instead, they will use a constrained, customised, web browser. Some will constrain to a subset of the web specifications. Some will constrain to their own proprietary language. Others will constrain to a (native) javascript framework. Their approach assures (near) the same experience on each device.
Several web browser on a single device is not desirable. The device will only be able to handle a few. Each web browser will add burden to the hardware resources. This limits the set of possible applications on a device. Furthermore, the end user will lose the coherent experience from a general purpose web browser. Moving to different web browsers will slightly change the experience. It may use other page transitions or use other key shortcuts. Likewise, each web browser needs to be configured independently. For example enabling an accessibility feature needs to be done per web browser. Also, web developers will find it harder to develop for multiple browsers with their own constrains such-as javascript only. The authoring complexity will be moved from device diversity to language diversity.
Cloud Browser addresses these issues by putting the browser into a powerful server, or cloud. The execution is shifted to the cloud, where the user interface is rendered and send to the Client Device. The main purpose of the client device is presenting the user interface to the end user. This solution makes it possible to provide a uniform experience for a large range of devices. Furthermore, it reduces the need for resource intensive processing on the client device and helps to deploy new browser technologies faster.
Providing the web technology to a non-internet infrastructure is difficult. The Cloud Browser is agnostic to infrastructures. The Cloud Browser is terminated in the cloud and is able to use any infrastructure to the client. A typical example is a broadcast infrastructure where the client is not able to connect to a web server.
A great number of users experiencing the Cloud Browser Technology today. The technology is based on the foundation of the World Wide Web. However, there are differences with the World Wide Web traditional (client-server) model. This causes incompatibilities. The Cloud Browser Task Force will explorer this—as a joined effort between a variety of stakeholders. The task force provides requirements, use cases and a formalised architecture of a Cloud Browser Technology. This group note is the result and will be used for further standardisation on the Cloud Browser Technology.
Most people are familiar with local browsers. A content server serves resources such as HTML, Javascript, and media. The local browser requests the resources and makes the User Interface perceivable for the end-user. The local browser operates in the operating system. Starting and stopping the browser is the responsibility of the operating system. In addition, it provides hardware functionality such as keyboard inputs. A Cloud Browser differ in various aspects:
A split browser, or transformation proxy, transform the content to the most appropriate format for the client device. The Cloud Browser is comparable but different.
In a thin/fat paradigm, all processing is done on the server and the user interface is provided to a thin- or zero-client. Cloud Browser is very comparable but has three key differences:
The following section provides the Cloud Browser Architecture. The goal is a consistent architecture between Cloud Browser vendors.
A Client Device is an apparatus which presenting—or delegating—the User Interface. To be operable a connected to a network is required. A Client Device can be A Set-Top-Box. A Client Device can be a mobile phone. A Client Device can even be a smart watch. Provided, a Cloud Browser Client is available.
In a Cloud Browser solution, the execution is done in the cloud; the result is sent to the Cloud Browser Client. This Cloud Browser Client is only responsible for showing the user interface and providing essential information such as keystrokes. The latter is depending on the infrastructure.
The Cloud Browser Client does not have any context. It connects to the Orchestration and shows the user interface. This will make sure that all the logic is in the cloud. This avoids otherwise required updates on the Client Device. A way to visualise the Cloud Browser Client is a remote display which sends essential information to the Cloud Browser Orchestration.
The transport and format between the Cloud Browser Client and Orchestration is agnostic— vendor specific.
The Cloud Browser denoted in the architecture is a web browser instance, terminated in the orchestration. It acts as a local web browser. Consequently, re-authoring for a Cloud Browser Solution is not necessary for existing applications.
The Cloud Browser exists in an orchestration. It is responsible for session management and abstraction. Multiple Cloud Browsers Clients could connect to the orchestration. The orchestration makes sure a client connects to the right Cloud Browser instance. Furthermore, it abstracts differences with a local browser. For example, a Client Device may have constraints on codec capabilities. Then the orchestration could either, provide this to a W3C standardized API, or virtually enhance this constraint by transcoding to a supported codec. Either way, the Cloud Browser is unaware of the decisions. This typical example ensures that the Client Device could run any Web Application, regardless of the codec it mandates. Details on abstraction are out of scope for this document and up to the implementor.
The orchestration receives triggers from the Cloud Browser Client such as keystrokes originated from the remote control. Triggers are not restricted to input. It could be any information from the Client Device. The Orchestration interprets the triggers and may decide to delegate it to the Cloud Browser. Depending on the context a red button could mean a regular key press in the Cloud Browser or open a new Web Application.
How the User Interface is sent to the Cloud Browser Client is decided by the Orchestration. There are two main approaches, a Single Stream and a Double Stream approach. The stream format is agnostic to the architecture because this encourages innovation.
In the Single Stream approach, the orchestration provides both, the user interface and the media streams.
This approach executes the Web Application in the Cloud Browser and delivers the user interface to the Cloud Browser Client. A typical use case: the user starts the Client Device. The Cloud Browser Client triggers a signal. The signal is received and resolved in the orchestration. It delegates further processing to the Cloud Browser. The Cloud Browser request, download, and parse the resources. These resources include HTML, CSS, and JavaScript. JavaScript is processed and executed. User Interface painting commands are sent to the Graphic Library.
In case the Web Application requests a media stream from a Media Server. The Graphic Library will combine the media stream together with the User Interface. The Orchestration sent a single media stream to the Cloud Browser Client; encoded by a Client Device supported codec. The Cloud Browser Client decode and present the stream to the Client Device display.
With a Double Stream approach, the Cloud Browser renders the user interface only, while the media is delivered from another source. The User Interface and media streams are delivered separately to the Client Device, which then has to combine both of these streams. The Cloud Browser Client present them to the end-user in a unified form. This approach leverages the video delivery infrastructures. A typical example of a delivery infrastructure is broadcast networks. Here the media is sent to all receivers at ones. Rerouting the traffic is not desired.
This approach executes the Web Application and delivers the User Interface to the Cloud Browser Client as the single stream approach. A media stream is delivered Out Of Band from another source. Blending is done on the Cloud Browser Client. The user interface is usually provided as a sequence of images which a Client Device could process within its own Graphic Library.
Cloud Browser Use Cases are split into two categories. Communication and execution.
Enabling a Cloud Browser solution will need communication between a Cloud Browser Client and the Orchestration. Outcomes from these use cases will lead to new standards.
The control channel is used to communicate the state of the Cloud Browser Client to the Cloud Browser, receive requests from the Cloud Browser.
The application running within the Cloud Browser issues an API call, this API call depends upon the partial or whole execution in the Cloud Browser Client.
If the API call is synchronous then the request has to be synchronous too.
List of possible requests:
For each session there will be one User Interface and one or more control channels. Also the Orchistration can store resources associated to the session.
A possible implementations would be a kind of RPC. A possible representation for session ids would be UUID (http://www.ietf.org/rfc/rfc4122.txt). It is recommended to use a secure communication protocol (HTTPS, TLS).
The session id must be unique.
The Cloud Browser Client wants to close the session.
There must be a way to close the session. It may be required to close the session from the orchestration.
The Cloud Browser Client begins the streaming of the User Interface associated with a session id.
The User Interface stream must be as low latency as possible.
Execution Use Cases describe the challenges that the Cloud Browser encounter. Some use cases will work on a local browser but are problematic on a Cloud Browser. These use cases will be used to extend or modify existing standards.
As a Web Application, I would like to know how the User Interface is perceived by the end-user.
Typical example: the user interface is specifically authored for a higher resolution. The screen properties provide the same resolution but the scale is changed.
A workable solution is to look at the devicePixelRatio on the window object (window.devicePixelRatio). However, this is not how the property is used today.
There should be means to see how the end user perceives the User Interface.
As an application, I would like to know when the User Interface is perceived by the end-user
Since the UI is terminated in the cloud, it is hard to tell when the user actually sees it. On a local browser this instantaneously after a load event. However, on a cloud solution, the UI needs to be sent to the Client Device. Depending on the network infrastructure (broadband, managed IP) it could take up a significant amount of time. Therefore, you need to use the rendered event which reflects the time the user perceives the UI.
There should be means to see when the end user perceives the User Interface. A solution could be to have an additional event to notify when the User Interface is rendered on the screen
As a Web Application, I would like to know which media capabilities are virtual and which are native
The canPlayType property may be used to detect the supported types. However, on a cloud browser solution, you would also like to know which types are played natively on the client device and which are transcoded. Apart from additional processing cost on the server, it would make more sense to play a media asset natively and leverage the device capabilities when possible.
There should be a way to identify virtual and native media capabilities. A solution could be to look at the the "virtually" type with the canPlayType. Here is an example how it will look:
var support = videoElement.canPlayType('type=\'video/mp4; codecs="avc1.42E01E, mp4a.40.2"\);
if (support == "probably") console.log("natively supported")
else if (support == "virtually") console.log("supported by transcoding")
As a Web Application, I would like to know how to interpreted identification
Sometimes it could be confusing what part is identified. For example, the geolocation API could be used to identify from which location the user is interacting. Based on this information different content could be served. However, it should be clear that the locations are based on the user location and not the cloud browser. This will also provide incorrect inside into analytic services. Usually, the services are using the IP address as the reference which may not work with a Cloud Browser.
There should be a convention how to interpreted identification APIs. Where possible a Cloud Browser solution could provide an x -forwarded-for HTTP header. The client type information could be obtained by the user agent string together with the Orchestration information.
As a Web Application, I would like to know how to provide Cloud Browser specific errors
In a Cloud Browser, you may need additional error reporting. For example when something goes wrong between the Cloud Browser and Client Device you need to have feedback what went wrong.
There needs to be a way to provide Cloud Browser specific errors. A proper error message may be reported in the global window object using the error event.
Special thanks to Oliver Friedrich, Nilo Mitra, Kaz Ashimura and Steve Morris for there contributions to this document.