WebSocket
Overview
A WebSocket is a full-duplex and bidirectional protocol used in the client-server communication channels. A WebSocket address starts with wss:// or ws://. It is a stateful protocol, implying that the connection between the server and the client is kept alive until either client or server terminates it. After closing the connection by either the server or the client, the connection terminates from both ends. As of today, WebSockets have become a ubiquitous and important tool in Internet architecture.
What is a WebSocket?
Formally stated, WebSockets are bidirectional duplex protocols that are used chiefly in the client-server communication architecture. Since a WebSocket is bidirectional, communication occurs to and fro between client-server. A WebSocket connection exists so long as either of the participant entities (client and server) lay it off.
Once either the client or the server terminates the WebSocket connection, the other will not be able to communicate because the connection will automatically break at its end. WebSockets require help from HTTP (Hyper Text Transfer Protocol) to initialize a connection.
How Does a WebSocket Work?
WebSocket protocol was developed due to the confinements of the HTTP protocol. WebSocket enables a user to create real-time applications without having to use long polling. In HTTP, a client first makes a call demanding a resource. Then the server returns the demanded data as a response to the received call from the client. This operational mechanism of HTTP is purely unidirectional. To bypass this limitation, users had to use long polling. But long polling consumes loads of server resources. In long polling, the server elects to hold the connection of a client open for as long as possible, sending a response only if the timeout threshold has been reached or data becomes available, whichever is earlier.
This is where the WebSocket enters the picture. It's a bidirectional full-duplex communication protocol that allows a server to transmit message-based running data while relying on TCP (Transmission Control Protocol). WebSocket does, however, rely on HTTP strings for the communication transport method but retains a TCP connection once it receives the HTTP response, which enables transmitting messages between the client and the server.
It is to be noted that although HTTP as well as WebSocket both rely on TCP, they differ in many aspects, such as design. Its design enables a WebSocket to operate over HTTPS/HTTP ports 80 and 443 while also providing support for HTTPS/HTTP intermediaries and proxies. This is what makes WebSocket compatible with HTTP. To attain this compatibility, the WebSocket handshake alters the HTTP/HTTPS protocol to its WebSocket protocol by using the HTTPS/HTTP Upgrade header.
Testing a WebSocket
Black-Box Testing
-
Determine whether or not the application uses WebSocket:
- Inspect the frontend source code for the wss:// or ws:// URI scheme.
- Google Chrome’s Developer Tools can be used for viewing the network WebSocket communication.
- You may use ZAP’s WebSocket tab.
-
Now try to connect to the remote WebSocket server by using a WebSocket client (examples mentioned in the tools section below). If the connection is opened successfully, then we infer that the server may not be checking the WebSocket handshake origin header.
-
Integrity and Confidentiality:
- Verify that the WebSocket connection uses SSL (Secure Socket Layer) for transporting sensitive information.
- Check the Implementation of the SSL for security-related issues such as RC4, a valid certificate, CRIME, BEAST, etc.
-
Authentication:
WebSocket doesn't deal with authentication; hence normal black-box authentication tests must be executed.
-
Authorization:
WebSocket does not deal with authorization; thus normal black-box authorization tests must be executed.
-
Input Sanitization:
For fuzz and replay WebSocket requests and responses, use ZAP’s WebSocket tab.
Gray-Box Testing
Gray-box testing does not differ much from black-box testing. The pen-tester has partial knowledge of the application in grey-box testing. A relevant difference is that one may have API documentation for the application being tested which includes the expected WebSocket requests and responses, unlike in black-box testing.
Tools
-
A simple-to-use integrated, free, and open-source penetration testing tool for discovering vulnerabilities in web applications.
-
A WebSocket client is used for interacting with a WebSocket server.
-
Google Chrome Simple WebSocket Client
Build custom WebSocket requests and handle responses for directly testing WebSocket services.
WebSocket Protocols and Their Components
RFC 6455, the WebSocket wire protocol consists of two high-level components:
Handshake
This is the usual and typical HTTP handshake that is required during the beginning for negotiating the parameters of the connection.
Binary Message and a Framing Mechanism
Required for enabling message-based low overhead delivery of both binaries as well as text data. WebSocket applications communicate via a message-oriented API. The sender provides an arbitrary binary or UTF-8 payload. The receiver gets a notification of the delivery of the payload when the whole of the message is available. To achieve this, WebSocket utilizes a custom binary framing format, as shown in the diagram ahead. It segments every application message into a number of frames, transmits them to the destination, reassembles them, and lastly passes a notification to the receiver after the entire message has been received.
- Frame
A frame is the smallest unit of communication. All the frame contains each a frame header, of varying length, and a payload that may carry a partial or entire application message.
- Message
A sequence of frames that corresponds to a complete logical application message.
The underlying implementation of the client and server framing code determines whether or not to divide an application message into several frames. Thus, the application stays unaware of how the framing is performed, or the individual WebSocket frames. The representation of a WebSocket frame is as follows:
-
Beginning from the left-hand side, the first bit (FIN) of each frame implies is the frame is a message or the final fragment of a message. A message might possibly consist of just one frame.
-
The 4-bit opcode implies the type of transferred frame.
-
The mask bit stores whether or not is the payload masked, which is done when the message is to be sent from the client to the server alone.
-
The length of the payload is stored in a field of varying length as follows:
-
If 0–125, then that is the payload length.
-
If 126, then the 2 bytes that follow, represent a 16-bit unsigned integer which is the length of the frame.
-
If 127, then the 8 bytes that follow represent a 64-bit unsigned integer which is the length of the frame.
-
-
The masking key is a 32-bit field that stores the value used for masking the payload.
-
The payload consists of the application data. If the client and server had negotiated upon an extension when the connection was opened, then the payload also includes custom extension data.
Each WebSocket frame that the server sends incurs a framing overhead of 2–10 bytes. The client is supposed to send a masking key. This adds an additional 4 bytes to the overhead.
Uses of WebSocket
WebSockets effectively facilitate real-time operations. They work by initializing full-duplex, continuous communication between a server and a client. This decreases unnecessary network traffic since data can travel immediately in both directions via a single open connection. This provides real-time capability and speed on the internet. Websockets also allow a server to track clients and push data to them as required, which was impossible using HTTP alone.
WebSocket enables the streaming of binary data and text strings through messages, which include a frame, payload, and data. A very minute amount of non-payload data is sent across the existing network connection this way and thus helps reduce overhead and latency, especially in comparison with HTTP streaming and request models.
Following are some practical use cases of WebSocket:
- Online multiplayer gaming.
- live updates of events such as sports, weather, etc.
- Real-time data visualization.
- Live online multimedia chat.
- Geo-location applications that deal with the live data flow.
- Audio / Video chat with WebRTC.
- Real-time multi-party collaboration applications such as Zoom, Slack, online whiteboard, etc.
- Real-time feeds and notifications.
- WebSocket is compatible with HTML5 and all major platforms such as Linux, macOS, Windows, Android, iOS, web, etc.
How to Create a WebSocket Connection?
So as to establish a WebSocket connection, we need to create a new WebSocket using the special protocol wss in the URL, as follows:
Once the WebSocket has been created, we can listen to it for events. There are a total of 4 events, as ahead:
- open
Connection established.
- message
Data received.
- error
WebSocket error.
- close
Connection closed.
The following code snippet is used for sending a message:
Ahead is a complete code example:
For the instance above, running it outputs “Hello from the server, Wick”, then hangs for 5 seconds, and then finally terminates the connection. The following sequence of events will be observed: open → message → close.
How to Scale WebSocket?
For a WebSocket connection to work, and for each user to be able to see each other, they all should be connected to the same WebSocket server. Thus the number of active users is directly related to the hardware capacity of the server. Modern JavaScript runtime environments, such as Node, Deno, etc. are quite good at concurrency handling, but once a certain threshold of user count is reached, to keep all the users in sync, vertical scaling (adding more resources such as memory, CPU cores, storage, etc to your server on an as required basis) of the server hardware is needed.
But once the technical limitations of hardware capability are hit, horizontal scaling (adding extra machines or nodes to your infrastructure to keep up with new requirements) is required. The major issue is that connections to the WebSocket server must be persistent. Even once the server nodes are scaled both horizontally and vertically, a method of sharing data between the nodes needs to be provided. To ensure that every node has the same view of the state, any state must be stored out-of-process.
Apart from sharing state using extra technology, broadcasting to all subscribed clients becomes troublesome as any given WebSocket server node is aware of only the clients connected to it. One method of getting around the server capability challenge is to shift the traffic to the cloud. Organizations create, extend, and deliver real-time capabilities using Serverless WebSockets. Businesses will have an elastic WebSocket capacity without the need to scale or manage the WebSocket servers since they can scale capacity on an as-needed basis.
Parameter | HTTP | WebSocket |
---|---|---|
Duplex | Half | Full |
Message pattern | Request-Response | Bi-directional |
Service Push | Not natively supported. Client polling or streaming download required | Core feature |
Overhead | Moderate overhead per request/connection | Moderate overhead to establish and maintain the connection, then minimal overhead per message |
Edge Caching | Intermediary | Core feature |
Supported clients | Brad support | Modern languages and clients |
Conclusion
-
A WebSocket is a full-duplex and bidirectional protocol used in the client-server communication channels.
-
A WebSocket connection exists so long as either of the participant entities (client and server) lay it off. Once either the client or the server terminates the WebSocket connection, the other will not be able to communicate because the connection will automatically break at its end.
-
WebSocket protocol was developed due to the confinements of the HTTP protocol. WebSocket enables a user to create real-time applications without having to use long polling.
-
Its design enables a WebSocket to operate over HTTPS/HTTP ports 80 and 443 while also providing support for HTTPS/HTTP intermediaries and proxies.
-
RFC 6455, the WebSocket wire protocol consists of two high-level components: Handshake, Binary message, and a framing mechanism.