How Two Interns Are Helping Secure Millions of Lines of Code

Slack

At Slack, proactively securing our systems is a top priority. One way we achieve this is by automating the detection of security issues with static code analysis, which are tools that inspect programs without executing them. They’re often used with security-based rules to automate identification of vulnerabilities and insecure programming practices, which frees up more…

Slack

•9 min read•advanced•

--

•View Original

ChefOCamlPHPPythonTypeScript

Overview

The article discusses how two interns at Slack, Nicholas Lin and David Frankel, contributed to enhancing the security of millions of lines of code written in Hack by developing a static analysis tool using Semgrep. This initiative addresses the lack of existing static analysis tools for Hack, ensuring that security vulnerabilities are identified and mitigated effectively.

What You'll Learn

1

How to extend an existing static analysis tool to support a new programming language

2

Why static code analysis is critical for security in software development

3

How to create a custom parser for a programming language using Tree-sitter

4

When to apply Semgrep rules for vulnerability detection in code

Prerequisites & Requirements

Understanding of static code analysis concepts
Familiarity with Semgrep and Tree-sitter(optional)
Experience with Hack programming language(optional)

Key Questions Answered

How did interns at Slack contribute to securing millions of lines of code?

Nicholas Lin and David Frankel developed a static analysis tool for Hack by extending Semgrep, which is used to scan for vulnerabilities in multiple programming languages. They created a grammar for Hack, achieving a parse rate of over 99.999% on more than 5 million lines of code, significantly enhancing security measures at Slack.

What challenges exist in static code analysis for Hack?

The main challenge is the absence of existing static analysis tools for Hack, as it is a unique language derived from PHP. This necessitated the development of a custom grammar and parser to enable effective vulnerability detection using Semgrep.

What is the significance of using Tree-sitter in this project?

Tree-sitter is used to generate a parser from the grammar rules of Hack, allowing the conversion of source code into a concrete syntax tree (CST). This CST is crucial for Semgrep to understand Hack on a semantic level, enabling effective vulnerability detection.

How does Semgrep enhance security in Slack's codebase?

Semgrep applies custom rules to the abstract syntax tree (AST) derived from Hack code to identify security vulnerabilities. This process automates the detection of issues, ensuring that new code adheres to security standards before deployment.

Key Statistics & Figures

Lines of code secured

5 million

This represents the total amount of Hack code at Slack that is now subject to automated security analysis.

Parse rate achieved

99.999%

This indicates the effectiveness of the custom grammar developed for Hack, significantly reducing the number of unparsable lines.

Reduction in unparsable lines

from over 120,000 to 15

This dramatic decrease demonstrates the success of the grammar development efforts.

Technologies & Tools

Static Analysis Tool

Semgrep

Used to scan code for vulnerabilities across multiple programming languages, including Hack.

Parser Generator

Tree-sitter

Utilized to create a parser for Hack, converting source code into a concrete syntax tree for analysis.

Key Actionable Insights

1
Integrate static code analysis into your CI/CD pipeline to automate vulnerability detection.
This ensures that security checks are performed on every code change, significantly reducing the risk of vulnerabilities being introduced into production.

2
Develop a custom grammar for your programming language if existing tools do not support it.
This allows you to leverage existing static analysis frameworks like Semgrep, enhancing your security posture without the need for building a tool from scratch.

3
Regularly update and maintain your static analysis tools and rules.
As programming languages evolve, keeping your analysis tools current ensures that they remain effective in identifying new vulnerabilities.

Common Pitfalls

1

Neglecting to integrate security checks into the development workflow can lead to vulnerabilities being deployed.

Without automated checks, security issues may go unnoticed until they cause significant problems in production, making proactive measures essential.

2

Assuming existing tools will support all programming languages can result in gaps in security.

It's crucial to assess whether your tools can handle the specific languages your team uses, and if not, be prepared to develop custom solutions.

Related Concepts

Static Code Analysis

Vulnerability Detection

Programming Language Grammars

CI/CD Integration

It’s a scene familiar to many tech companies: summer rolls around, and the office is filled with interns who bring fresh ideas and energy to the workplace. In their first few days, they’ll typically attend some training sessions. Then, once they get settled in, they work on projects where they can contribute meaningful work in…

TypeScriptPHPChef

9 min read

Has Summary

--

Slack

Intermediate

Moving Fast and Securing Things

For development teams, process can often be antithetical to speed. Ease of deployment and security tend to have an inverse relationship, with some resentment for the security team occasionally mixed in. You may have seen the following tweet: https://twitter.com/petecheslock/status/595617204273618944?lang=en We believe things don’t have to be like that. In this post, we will discuss how…

TypeScriptPHPChef

13 min read

Has Summary

--

Slack

Intermediate

Hacklang at Slack: A Better PHP

Slack launched in 2014 with a PHP 5 backend. Along with several other companies, we switched to HHVM in 2016 because it ran our PHP code faster. We stayed with HHVM because it offers an entirely new language: Hack (searchable as Hacklang). Hack makes our developers faster by improving productivity through better tooling. Hack began as a superset of PHP, retaining its best…

TypeScriptJavaScriptJava

10 min read

Includes Code

Has Summary

--

These articles from Slack and other leading engineering teams share similar topics with "How Two Interns Are Helping Secure Millions of Lines of Code". Explore more engineering insights on TypeScript, PHP, JavaScript.