How Women Lead Data Engineering at Slack

Slack Engineering

The Data Engineering team is responsible for Slack’s data lake, analytics dashboards, and other data services. The team’s mission is to empower users to leverage data to make decisions quickly, accurately, and easily. Slack’s data lake grew in size from sub-petabyte to over 100 petabytes in recent years and it now spans millions of tables.…

Slack

•

Slack Engineering

•11 min read•intermediate•

--

•View Original

AnsibleApacheAWSAWS EC2ChefHelmJenkinsKubernetesPythonReact

Overview

The article explores the significant contributions of women in the Data Engineering team at Slack, highlighting their roles in managing complex data systems and fostering a diverse work culture. It features personal stories from female engineers who share their experiences and the technologies they utilize to drive innovation in data management.

What You'll Learn

1

How to leverage Apache Airflow for data workflow management

2

Why diverse teams enhance problem-solving in data engineering

3

How to migrate from a virtual machine setup to a cloud-native Kubernetes infrastructure

Prerequisites & Requirements

Understanding of data engineering concepts and tools
Experience with cloud-native technologies(optional)

Key Questions Answered

What role do women play in data engineering at Slack?

Women in data engineering at Slack are pivotal in managing complex data systems and driving innovation. Their diverse perspectives contribute to creative problem-solving, enabling the team to navigate intricate challenges effectively. The article highlights their leadership roles and the technologies they utilize.

How has Slack's data lake evolved over the years?

Slack's data lake has expanded from sub-petabyte to over 100 petabytes, now encompassing millions of tables. This growth reflects the increasing complexity of data management and the need for a diverse engineering team to support the ecosystem.

What technologies are used by Slack's Data Engineering team?

The Data Engineering team at Slack utilizes various technologies including Apache Airflow for workflow management, Apache Pinot for data querying, and Kubernetes for cloud-native infrastructure. These tools help maintain high performance and reliability in data operations.

What challenges does the Data Engineering team face?

The team faces challenges such as migrating from a virtual machine setup to a cloud-native Kubernetes infrastructure, which involves customizing solutions to meet performance requirements while managing costs and maintenance overhead.

Key Statistics & Figures

Growth of Slack's data lake

over 100 petabytes

This growth highlights the increasing complexity of data management at Slack.

Query success rate SLA

99.95%

This performance metric reflects the reliability of the data systems managed by the engineering team.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Workflow Management

Apache Airflow

Used for managing internal implementations of data workflows.

Data Querying

Apache Pinot

Utilized for maintaining a system that supports sub-second query latency.

Infrastructure

Kubernetes

Adopted for transitioning to a cloud-native setup to improve efficiency.

Key Actionable Insights

1
Emphasize the importance of diversity in engineering teams to enhance problem-solving capabilities.
Diverse teams bring varied perspectives that can lead to innovative solutions, especially in complex fields like data engineering. This approach can improve agility and insight in tackling challenges.

2
Utilize Apache Airflow to streamline data workflows and improve efficiency.
Airflow allows for better management of data pipelines, ensuring timely data processing and accuracy, which is crucial for decision-making in organizations.

3
Consider cloud-native solutions for infrastructure to reduce costs and maintenance overhead.
Migrating to cloud-native platforms can optimize resource usage and enhance scalability, which is essential for growing data needs.

Common Pitfalls

1

Underestimating the complexity of migrating to cloud-native infrastructure.

Many teams may overlook the necessary customizations and adjustments required for a successful migration, leading to increased costs and operational challenges.

Related Concepts

Data Engineering Best Practices

Cloud-native Infrastructure

Diversity In Tech Teams

Data Workflow Management

Apache Airflow is a tool for describing, executing, and monitoring workflows. At Slack, we use Airflow to orchestrate and manage our data warehouse workflows, which includes product and business metrics and also is used for different engineering use-cases (e.g. search and offline indexing). For two years we’ve been running Airflow 1.8, and it was time for…

AWSMySQLAWS S3

11 min read

Has Summary

--

Slack

Advanced

BuildRock: A Build Platform at Slack

Our build platform is an essential piece of delivering code to production efficiently and safely at Slack. Over time it has undergone a lot of changes, and in 2021 the Build team started looking at the long-term vision. Some questions the Build team wanted to answer were: When should we invest in modernizing our build…

AWSDockerKubernetes

13 min read

Has Summary

--

These articles from Pinterest and other leading engineering teams share similar topics with "How Women Lead Data Engineering at Slack". Explore more engineering insights on React, AWS, MySQL.