We’re All Just Looking for Connection

We’ve been working to bring components of Quip’s technology into Slack with the canvas feature, while also maintaining the stand-alone Quip product. Quip’s backend, which powers both Quip and canvas, is written in Python. This is the story of a tricky bug we encountered last July and the lessons we learned along the way about…

Brett Wines
9 min readadvanced
--
View Original

Overview

The article discusses a challenging bug encountered while integrating Quip's technology into Slack, focusing on TCP state management and EOFError issues. It details the investigation process, resolutions implemented, and the impact on the migration to asyncio in Python.

What You'll Learn

1

How to troubleshoot EOFError in SQL queries

2

Why understanding TCP connection states is crucial for database interactions

3

How to implement effective connection state management in Python using asyncio

Prerequisites & Requirements

  • Understanding of TCP/IP and database connection management
  • Familiarity with Python and asyncio

Key Questions Answered

What caused the EOFError during SQL queries in Slack's integration with Quip?
The EOFError was caused by database proxies closing connections unexpectedly after 24 hours, which was not recognized by the application, leading to failed SQL query responses. This was compounded by incorrect connection state checks in the code.
How did the team resolve the issues related to TCP connection states?
The team identified that the proxy was closing connections and implemented fixes to ensure proper connection state management, which significantly reduced EOFError occurrences during SQL queries. They also improved the handling of client-initiated connections.
What lessons were learned regarding connection state management in Python?
The team learned that relying solely on the StreamWriter's closing state was insufficient, as it did not account for the reader's EOF state. They discovered multiple bugs related to connection management that needed addressing to ensure reliability.

Key Statistics & Figures

Reduction in EOFError occurrences
Near-total reduction
This was achieved after deploying fixes related to connection state management.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend
Python
Used for the backend of Quip and Slack integration, particularly in handling database connections.
Backend
Asyncio
Standard Python IO library used for managing asynchronous connections.

Key Actionable Insights

1
Implement robust connection state checks in your database interactions to avoid unexpected EOFErrors.
This is crucial for maintaining application stability, especially in high-load environments where connection management is critical.
2
Regularly review and test your connection handling logic to identify potential bugs early.
Proactive testing can help catch issues before they impact users, especially when integrating new technologies or features.
3
Utilize metrics to monitor connection states and errors in real-time.
This allows for quicker identification of issues and can inform necessary adjustments in your connection management strategy.

Common Pitfalls

1
Assuming that connection state checks are sufficient without considering the reader's EOF state.
This can lead to unhandled exceptions and application instability, especially in environments with frequent connection closures.

Related Concepts

TCP/IP Connection Management
Asynchronous Programming In Python
Database Error Handling