How Women Lead Data Engineering at Slack

The Data Engineering team is responsible for Slack’s data lake, analytics dashboards, and other data services. The team’s mission is to empower users to leverage data to make decisions quickly, accurately, and easily. Slack’s data lake grew in size from sub-petabyte to over 100 petabytes in recent years and it now spans millions of tables.…

Slack Engineering
11 min readintermediate
--
View Original

Overview

The article explores the significant contributions of women in the Data Engineering team at Slack, highlighting their roles in managing complex data systems and fostering a diverse work culture. It features personal stories from female engineers who share their experiences and the technologies they utilize to drive innovation in data management.

What You'll Learn

1

How to leverage Apache Airflow for data workflow management

2

Why diverse teams enhance problem-solving in data engineering

3

How to migrate from a virtual machine setup to a cloud-native Kubernetes infrastructure

Prerequisites & Requirements

  • Understanding of data engineering concepts and tools
  • Experience with cloud-native technologies(optional)

Key Questions Answered

What role do women play in data engineering at Slack?
Women in data engineering at Slack are pivotal in managing complex data systems and driving innovation. Their diverse perspectives contribute to creative problem-solving, enabling the team to navigate intricate challenges effectively. The article highlights their leadership roles and the technologies they utilize.
How has Slack's data lake evolved over the years?
Slack's data lake has expanded from sub-petabyte to over 100 petabytes, now encompassing millions of tables. This growth reflects the increasing complexity of data management and the need for a diverse engineering team to support the ecosystem.
What technologies are used by Slack's Data Engineering team?
The Data Engineering team at Slack utilizes various technologies including Apache Airflow for workflow management, Apache Pinot for data querying, and Kubernetes for cloud-native infrastructure. These tools help maintain high performance and reliability in data operations.
What challenges does the Data Engineering team face?
The team faces challenges such as migrating from a virtual machine setup to a cloud-native Kubernetes infrastructure, which involves customizing solutions to meet performance requirements while managing costs and maintenance overhead.

Key Statistics & Figures

Growth of Slack's data lake
over 100 petabytes
This growth highlights the increasing complexity of data management at Slack.
Query success rate SLA
99.95%
This performance metric reflects the reliability of the data systems managed by the engineering team.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Emphasize the importance of diversity in engineering teams to enhance problem-solving capabilities.
Diverse teams bring varied perspectives that can lead to innovative solutions, especially in complex fields like data engineering. This approach can improve agility and insight in tackling challenges.
2
Utilize Apache Airflow to streamline data workflows and improve efficiency.
Airflow allows for better management of data pipelines, ensuring timely data processing and accuracy, which is crucial for decision-making in organizations.
3
Consider cloud-native solutions for infrastructure to reduce costs and maintenance overhead.
Migrating to cloud-native platforms can optimize resource usage and enhance scalability, which is essential for growing data needs.

Common Pitfalls

1
Underestimating the complexity of migrating to cloud-native infrastructure.
Many teams may overlook the necessary customizations and adjustments required for a successful migration, leading to increased costs and operational challenges.

Related Concepts

Data Engineering Best Practices
Cloud-native Infrastructure
Diversity In Tech Teams
Data Workflow Management