Sidekick’s Improved Streaming Experience

In this post, learn how Shopify's Sidekick solves markdown rendering jank and response delay in LLM chatbots with buffering parser and async content resolution.

Ates Goral
7 min readbeginner
--
View Original

Overview

The article discusses improvements made to Sidekick's streaming experience for Large Language Model (LLM) chatbots, focusing on resolving issues like Markdown rendering jank and response delays. It introduces a buffering Markdown parser and an event emitter to enhance user experience by allowing immediate response streaming while handling asynchronous content resolution.

What You'll Learn

1

How to implement a buffering Markdown parser for LLM responses

2

Why multiplexing streams improves user experience in chatbots

3

How to manage asynchronous content resolution in LLM applications

Prerequisites & Requirements

  • Understanding of Markdown syntax and LLM functionality
  • Familiarity with Node.js and JavaScript

Key Questions Answered

What are the main user experience disruptions in LLM chatbots?
The main disruptions are Markdown rendering jank, where syntax fragments appear as raw text until fully formed, and response delays caused by multiple LLM roundtrips for external data. These issues lead to a frustrating user experience, often leaving users waiting without clear feedback.
How does the buffering Markdown parser work?
The buffering Markdown parser collects characters that may form Markdown elements and flushes the buffer when it encounters unexpected characters or when a complete Markdown element is formed. This allows for smoother rendering of Markdown during streaming, reducing visual disruptions.
What is the benefit of multiplexing asynchronously resolved content?
Multiplexing allows the initial LLM response to be streamed to the user while placeholders for additional content are filled in asynchronously. This approach enhances the perceived speed of the interaction and provides users with immediate feedback, improving overall experience.
How are special card links used in the LLM responses?
Special card links use the 'card:' protocol in their URLs, allowing the LLM to indicate where additional content will be resolved asynchronously. This integration helps streamline the response rendering process by combining Markdown parsing with content resolution.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implement a buffering Markdown parser to enhance user experience in chatbots.
By buffering Markdown syntax during streaming, you can avoid rendering jank and provide users with a smoother visual experience. This is particularly important in applications where user engagement is critical.
2
Utilize multiplexing to improve response times in LLM applications.
By allowing the initial response to be sent while additional data is resolved, you can keep users engaged and reduce the frustration associated with waiting for information.
3
Incorporate asynchronous content resolution to handle complex user queries.
This approach allows for more dynamic interactions, enabling chatbots to respond to multiple user intents simultaneously without compromising the speed of the conversation.

Common Pitfalls

1
Failing to properly handle Markdown ambiguity can lead to rendering issues.
This happens when the parser does not correctly identify the end of Markdown elements, resulting in jank. To avoid this, ensure that your buffering logic is robust and can handle various Markdown scenarios.
2
Neglecting the importance of user feedback during asynchronous operations.
Users may feel frustrated if they do not receive timely updates. Implementing a system that provides immediate responses while additional content is being resolved can significantly enhance user satisfaction.

Related Concepts

Markdown Parsing
Asynchronous Programming
Large Language Models
User Experience Design