Open-sourcing KingPin, building blocks for scaling Pinterest

Pinterest Engineering
5 min readintermediate
--
View Original

Overview

The article discusses the open-sourcing of KingPin, a toolset developed by Pinterest to enhance scalability and reliability in their infrastructure. It highlights the key components, use cases, and architecture of KingPin, emphasizing its role in configuration management and service discovery.

What You'll Learn

1

How to implement a local daemon to address ZooKeeper's single point of failure

2

Why using a Python Thrift client wrapper enhances functionality in high-request environments

3

When to utilize KingPin for configuration management in Python-oriented stacks on AWS

4

How to manage configurations in real-time using KingPin's MetaConfig Manager

Prerequisites & Requirements

  • Understanding of ZooKeeper and its role in distributed systems
  • Familiarity with Python and AWS services(optional)

Key Questions Answered

What are the main components of KingPin and their functions?
KingPin consists of several components including Kazoo Utils for RPC framework enhancements, Thrift Utils for a greenlet-safe Python Thrift client, and Config Utils for configuration management using S3 and ZooKeeper. Each component addresses specific challenges in service discovery and configuration management.
How does KingPin improve service discovery within Pinterest?
KingPin facilitates service discovery by allowing service clients to register their endpoints with ZooKeeper. This enables dynamic updates to the serverset, which is crucial for maintaining connections as server nodes join or leave the network.
When should you consider using KingPin for your projects?
You should consider using KingPin if your stack is Python-oriented and hosted on AWS, if you need to enhance the robustness of your ZooKeeper cluster, or if you require a configuration system that supports complex data structures.
What is the role of the ZK Update Monitor in KingPin?
The ZK Update Monitor is a local daemon that syncs subscribed configurations and serversets from ZooKeeper and S3 to local disk, ensuring fault tolerance and real-time updates for configuration management.

Key Statistics & Figures

Hosts supported by local daemon
20K
The local daemon delivers configuration data in less than 10 seconds across these hosts.
Requests per second via Python Thrift client
hundreds of thousands
This demonstrates the high throughput capabilities of the Thrift client wrapper implemented in KingPin.
Configurations managed
over 400
These configurations are updated and consumed through KingPin's configuration management framework.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Programming Language
Python
Used as the primary development language for building KingPin and its components.
Service
Zookeeper
Used for managing configuration data and service discovery within KingPin.
Storage
S3
Used for storing configuration data as the ground truth in KingPin.
Protocol
Thrift
Used for communication between services, enhanced by KingPin's Thrift Utils.

Key Actionable Insights

1
Implement a local daemon with KingPin to mitigate ZooKeeper's single point of failure, enhancing your system's resilience.
This is particularly important in large-scale environments where configuration data needs to be delivered quickly and reliably across numerous hosts.
2
Utilize the Thrift Utils component of KingPin to improve the performance of your Python applications by managing connection pools and retries effectively.
This can significantly enhance the responsiveness and reliability of your service interactions, especially under high load.
3
Leverage KingPin's configuration management framework to handle complex data structures like JSON, which can simplify your application's configuration needs.
This is beneficial when building systems that require dynamic configuration updates without downtime.

Common Pitfalls

1
Neglecting to implement a local daemon can lead to reliance on ZooKeeper's availability, creating a single point of failure.
This can result in significant downtime or configuration delivery delays, especially in large-scale deployments.
2
Failing to manage connection pools effectively in high-load scenarios can lead to performance bottlenecks.
Without proper connection management, applications may experience increased latency and reduced throughput.

Related Concepts

Distributed Systems
Configuration Management
Service Discovery
Scalability In Cloud Environments