How Two Interns Are Helping Secure Millions of Lines of Code

At Slack, proactively securing our systems is a top priority. One way we achieve this is by automating the detection of security issues with static code analysis, which are tools that inspect programs without executing them. They’re often used with security-based rules to automate identification of vulnerabilities and insecure programming practices, which frees up more…

9 min readadvanced
--
View Original

Overview

The article discusses how two interns at Slack, Nicholas Lin and David Frankel, contributed to enhancing the security of millions of lines of code written in Hack by developing a static analysis tool using Semgrep. This initiative addresses the lack of existing static analysis tools for Hack, ensuring that security vulnerabilities are identified and mitigated effectively.

What You'll Learn

1

How to extend an existing static analysis tool to support a new programming language

2

Why static code analysis is critical for security in software development

3

How to create a custom parser for a programming language using Tree-sitter

4

When to apply Semgrep rules for vulnerability detection in code

Prerequisites & Requirements

  • Understanding of static code analysis concepts
  • Familiarity with Semgrep and Tree-sitter(optional)
  • Experience with Hack programming language(optional)

Key Questions Answered

How did interns at Slack contribute to securing millions of lines of code?
Nicholas Lin and David Frankel developed a static analysis tool for Hack by extending Semgrep, which is used to scan for vulnerabilities in multiple programming languages. They created a grammar for Hack, achieving a parse rate of over 99.999% on more than 5 million lines of code, significantly enhancing security measures at Slack.
What challenges exist in static code analysis for Hack?
The main challenge is the absence of existing static analysis tools for Hack, as it is a unique language derived from PHP. This necessitated the development of a custom grammar and parser to enable effective vulnerability detection using Semgrep.
What is the significance of using Tree-sitter in this project?
Tree-sitter is used to generate a parser from the grammar rules of Hack, allowing the conversion of source code into a concrete syntax tree (CST). This CST is crucial for Semgrep to understand Hack on a semantic level, enabling effective vulnerability detection.
How does Semgrep enhance security in Slack's codebase?
Semgrep applies custom rules to the abstract syntax tree (AST) derived from Hack code to identify security vulnerabilities. This process automates the detection of issues, ensuring that new code adheres to security standards before deployment.

Key Statistics & Figures

Lines of code secured
5 million
This represents the total amount of Hack code at Slack that is now subject to automated security analysis.
Parse rate achieved
99.999%
This indicates the effectiveness of the custom grammar developed for Hack, significantly reducing the number of unparsable lines.
Reduction in unparsable lines
from over 120,000 to 15
This dramatic decrease demonstrates the success of the grammar development efforts.

Technologies & Tools

Static Analysis Tool
Semgrep
Used to scan code for vulnerabilities across multiple programming languages, including Hack.
Parser Generator
Tree-sitter
Utilized to create a parser for Hack, converting source code into a concrete syntax tree for analysis.

Key Actionable Insights

1
Integrate static code analysis into your CI/CD pipeline to automate vulnerability detection.
This ensures that security checks are performed on every code change, significantly reducing the risk of vulnerabilities being introduced into production.
2
Develop a custom grammar for your programming language if existing tools do not support it.
This allows you to leverage existing static analysis frameworks like Semgrep, enhancing your security posture without the need for building a tool from scratch.
3
Regularly update and maintain your static analysis tools and rules.
As programming languages evolve, keeping your analysis tools current ensures that they remain effective in identifying new vulnerabilities.

Common Pitfalls

1
Neglecting to integrate security checks into the development workflow can lead to vulnerabilities being deployed.
Without automated checks, security issues may go unnoticed until they cause significant problems in production, making proactive measures essential.
2
Assuming existing tools will support all programming languages can result in gaps in security.
It's crucial to assess whether your tools can handle the specific languages your team uses, and if not, be prepared to develop custom solutions.

Related Concepts

Static Code Analysis
Vulnerability Detection
Programming Language Grammars
CI/CD Integration