How Slack Uses AWS
48 engineering articles about AWS from Slack's engineering team
Other Slack Technologies
Other Companies Using AWS
Articles
Filter:
This article details Slack's approach to making Chef infrastructure deployments safer by splitting a single production Chef environment into six bucketed environments (prod-1 through prod-6) mapped...
Archie Gunasekara
16 min read
Includes Code
Has Summary
--
Slack's Deploy Safety Program, launched in mid-2023, achieved a 90% reduction in customer impact hours by January 2025 through automated detection, remediation, and cultural changes across all depl...
Sam Bailey
12 min read
Has Summary
--
The article discusses how Slack's DevXP team optimized their end-to-end (E2E) testing pipeline, significantly reducing build times and eliminating unnecessary frontend builds.
The article discusses the development of Slack's enterprise search functionality, emphasizing its security and privacy features.
Ian Hoffman
7 min read
Has Summary
--
The article discusses Slack's audit logs and the detection of anomalous activity within its platform.
The article discusses how Slack is utilizing AI-powered tools to enhance developer productivity and streamline processes.
Anirudh Janga
10 min read
Has Summary
--
The article discusses a challenging bug encountered while integrating Quip's technology into Slack, focusing on TCP state management and EOFError issues.
The article discusses the evolution of Slack's Chef infrastructure, focusing on enhancing safety and scalability through a transition from a single Chef stack to a sharded infrastructure.
Archie Gunasekara
16 min read
Includes Code
Has Summary
--
The article discusses the re-architecture of Slack's backend to create the Unified Grid, aimed at improving user experience for large customers by providing a single view of data across multiple wo...
This article details Slack's migration from AWS EMR 5 with Spark 2 to EMR 6 with Spark 3, highlighting the challenges faced and the performance improvements achieved.
The article discusses proactive measures taken by Slack to enhance user security against password breaches and cookie hijacking.
The article discusses advanced rollout techniques for stateful applications in Kubernetes, focusing on the development of the Bedrock Rollout Operator at Slack.
The article explores the significant contributions of women in the Data Engineering team at Slack, highlighting their roles in managing complex data systems and fostering a diverse work culture.
The article discusses the development of Slack AI with a focus on ensuring security and privacy for customer data.
Kelly Moran
9 min read
Has Summary
--
The article discusses the complexities and challenges of automating deployments at Slack, particularly in a monolithic service environment.
Sean McIlroy
16 min read
Includes Code
Has Summary
--
The article discusses Slack's migration from AWS Instance Metadata Service version 1 (IMDSv1) to version 2 (IMDSv2), emphasizing the security enhancements and challenges faced during the transition.
Archie Gunasekara
13 min read
Includes Code
Has Summary
--
The article discusses enhancements made to the Workflow Builder in Slack, focusing on the implementation of custom animations to improve user experience.
The article 'Traffic 101: Packets Mostly Flow' provides an in-depth look at how Slack processes billions of network requests daily through its edge network and AWS infrastructure.
The article discusses the challenges and experiences encountered while building GovSlack, a version of Slack designed for government agencies, utilizing AWS GovCloud infrastructure.
Archie Gunasekara
12 min read
Includes Code
Has Summary
--
The article discusses how Slack utilizes Terraform for managing its infrastructure across multiple cloud providers, including AWS, DigitalOcean, NS1, and GCP.
The article discusses the Mobile Developer Experience Team at Slack, focusing on how they enhance developer productivity and satisfaction through targeted improvements in the mobile development pro...
The article provides an insightful glimpse into the daily routine of Georgi Knox, a Senior Cloud Engineer at Slack Australia.
Georgi Knox
10 min read
Has Summary
--
The article discusses BuildRock, Slack's new build platform designed to enhance the efficiency and safety of code deployment.
Joel Bartlett
13 min read
Has Summary
--
This article discusses how Slack implemented orchestration-level circuit breakers to enhance developer productivity and prevent cascading failures in their CI/CD processes.
Frank Chen
19 min read
Includes Code
Has Summary
--
The article discusses AutoTransform, an open-source framework developed by Slack to automate the maintenance, modification, and upgrading of large codebases.
The article discusses the transition to remote development environments at Slack, highlighting the challenges faced with local setups and the benefits of using AWS EC2 instances for development.
The article discusses the implementation of background effects, specifically background blur and background image replacement, for Slack Clips, utilizing web technologies like WebGL and WebAssembly...
Albert Xing
8 min read
Has Summary
--
The article discusses the implementation of continuous load testing at Slack using a tool called Koi Pond.
Shreya Ramesh
16 min read
Has Summary
--
The article discusses how Slack's Mobile Developer Experience Team tackled the challenge of flaky tests in their CI/CD pipeline by implementing an automated detection and suppression system.
This article discusses how Slack built and operationalized self-driving Kafka clusters using open source components over four years.
Suman Karumuri
14 min read
Has Summary
--
The article discusses the challenges and outcomes of Slack's attempt to implement DNSSEC, a security extension for the Domain Name System.
Rafael Elvira
19 min read
Includes Code
Has Summary
--
This article provides a retrospective on the evolution of cloud networks at Slack, focusing on the lessons learned and improvements made since the implementation of a new network architecture calle...
The article discusses how Slack achieved a significant reduction in infrastructure spending through improved observability and changes in their Continuous Integration (CI) infrastructure.
The article discusses how Slack automates the building and maintenance of data lineage to manage increasing data complexity.
This article discusses Slack's migration of millions of concurrent WebSocket connections from HAProxy to Envoy Proxy.
This article details the outage experienced by Slack on January 4th, 2021, highlighting the causes, the incident response, and the lessons learned.
The article highlights the contributions and experiences of women in the security team at Slack, showcasing their diverse backgrounds and the company's commitment to inclusion and diversity.
Suzanna Khatchatrian
12 min read
Has Summary
--
The article discusses the development of Email Bridge, a feature that allows Slack users to interact with invited users who have not yet activated their accounts, facilitating communication and onb...
The article discusses Slack's evolution of cloud networking, detailing the redesign of their AWS infrastructure through a project named Whitecastle.
The article delves into the intricacies of web forms, particularly those used by Slack for lead generation.
Frances Coronel
16 min read
Includes Code
Has Summary
--
The article provides an in-depth analysis of a significant outage experienced by Slack on May 12, 2020, detailing the technical issues that led to the incident.
The article provides an in-depth look at a typical day for a Frontend Foundations Engineer at Slack, detailing daily routines, tasks, and the engineering challenges faced, particularly focusing on ...
This article details Slack's experience upgrading Apache Airflow from version 1. 8 to 1.
Ashwin Shankar
11 min read
Has Summary
--
The article discusses Slack's approach to Chaos Engineering through a process called Disasterpiece Theater, which aims to enhance the reliability of their systems by intentionally causing failures ...
The article discusses how Slack hires a red team for security assessments and provides guidance for organizations looking to implement similar practices.
John Sonnenschein
11 min read
Has Summary
--
This article provides an in-depth look at Slack Enterprise Key Management (EKM), detailing how the Slack engineering team designed a solution to enhance data security for customers.
The article discusses the challenges and solutions involved in scaling Slack's job queue system, which processes billions of tasks efficiently using Kafka and Redis.
The article discusses Flannel, an application-level edge cache developed by Slack to enhance scalability and performance for large teams.
You've reached the end! All 48 articles loaded.