Using Server Sent Events to Simplify Real-time Streaming at Scale

We walk through how we implemented an SSE server that's scalable and load-balanced to simplify and improve a real-time data visualization application.

Overview

The article discusses the implementation of Server Sent Events (SSE) to enhance real-time data streaming for Shopify's BFCM Live Map. It highlights the advantages of SSE over traditional polling and WebSocket methods, detailing the architecture and performance improvements achieved through this transition.

What You'll Learn

1

How to implement Server Sent Events in a Golang application

2

Why Server Sent Events are preferable for unidirectional data streaming

3

How to ensure your SSE server can handle high load during peak traffic

Prerequisites & Requirements

  • Understanding of real-time data streaming concepts
  • Familiarity with Golang and Kafka

Key Questions Answered

What are the benefits of using Server Sent Events over WebSocket?
Server Sent Events (SSE) provide a secure, uni-directional push from the server to the client, eliminating the need for polling. This allows for immediate data delivery as it becomes available, improving data latency significantly compared to WebSocket, which is bidirectional and may introduce unnecessary complexity for applications that only require server-to-client communication.
How did Shopify improve data latency for the BFCM Live Map?
Shopify improved data latency by implementing Server Sent Events, which allowed data to be pushed to clients immediately as it became available. This replaced the previous polling method, which had a minimum delay of 10 seconds, resulting in data being visualized on the Live Map UI within 21 seconds of its creation time.
What architecture changes were made to support SSE?
The architecture was simplified by replacing the previous complex system with a Flink-based data pipeline that directly pushes data to an SSE server. This change allowed for better scalability and reduced bottlenecks, enabling the system to handle millions of concurrent connections during peak traffic.
What challenges did the SSE server face under load?
The SSE server needed to handle a high volume of concurrent connections, especially during peak BFCM traffic. To address this, Shopify built a horizontally scalable server architecture behind NGINX load balancers, allowing them to dynamically adjust the number of server pods based on traffic demands.

Key Statistics & Figures

Data latency improvement
Data was delivered to clients within milliseconds
This was a significant improvement from the previous system, which had a minimum polling delay of 10 seconds.
Data processed during BFCM 2021
323 billion rows of data
This highlights the scale at which the Shopify platform operates during peak sales events.
Visualization time on Live Map UI
21 seconds
This is the time taken for data to be visualized on the BFCM Live Map UI after its creation.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement Server Sent Events for applications that require real-time data updates without the overhead of bidirectional communication.
This is particularly useful for data visualization applications where the server needs to push updates to clients, reducing latency and simplifying the architecture.
2
Conduct load testing to determine the maximum number of concurrent connections your SSE server can handle.
Simulating high traffic scenarios will help identify potential bottlenecks and inform scaling strategies, ensuring reliability during peak usage times.
3
Utilize a familiar HTTP protocol for SSE connections to simplify implementation and reduce the learning curve for developers.
By leveraging existing knowledge of HTTP, teams can more easily adopt SSE without needing to learn a new protocol, facilitating faster development cycles.

Common Pitfalls

1
Failing to properly simulate high traffic conditions can lead to underestimating the load requirements of the SSE server.
Without accurate load testing, you may not identify bottlenecks until they occur in production, which can lead to service outages during peak times.

Related Concepts

Real-time Data Streaming
Server Sent Events
Websocket
Data Visualization Techniques