Increasing velocity by modernizing a Python codebase

How we modernized a machine learning codebase to increase velocity, quality, and confidence

Peyton McCullough
15 min readintermediate
--
View Original

Overview

The article discusses how Ramp modernized its Python codebase to increase development velocity by adopting industry best practices in dependency management, formatting, linting, testing, and type checking. It highlights the challenges faced with a monorepo of machine learning workflows and the solutions implemented to improve code quality and developer confidence.

What You'll Learn

1

How to manage dependencies effectively using Poetry

2

Why enforcing consistent code formatting improves collaboration

3

How to implement automated testing with pytest

4

How to use mypy for type checking in Python

Prerequisites & Requirements

  • Basic understanding of Python programming and dependency management
  • Familiarity with Poetry, pytest, and mypy(optional)

Key Questions Answered

What challenges did Ramp face with its Python codebase?
Ramp faced issues with a monorepo that became unwieldy as the codebase and number of contributors grew. This led to difficulties in managing dependencies, increased risk in making changes, and ultimately slowed development velocity.
How did Ramp improve dependency management?
Ramp adopted Poetry for dependency management, which replaced the traditional requirements.txt file with a pyproject.toml file. This change allowed for better handling of direct and transitive dependencies, ensuring consistency across environments and reducing debugging efforts.
What tools did Ramp use for code formatting and linting?
Ramp adopted Ruff for both formatting and linting, which helped enforce consistent code style and catch small errors. This tool was integrated into the CI process to ensure that all code submitted via pull requests adhered to formatting standards.
Why is automated testing important for machine learning workflows?
Automated testing is crucial for machine learning workflows as it allows developers to catch corner cases and bad assumptions in their code. This testing provides confidence when making changes and ensures that critical functions, like data chunking, work correctly across various scenarios.
How does type checking with mypy enhance code quality?
Using mypy for type checking helps catch subtle bugs related to incorrect type annotations. This practice allows developers to make sweeping changes confidently, as type checking can identify potential issues across many parts of the codebase before runtime.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Dependency Management
Poetry
Used for managing project dependencies and ensuring consistency across environments.
Linting And Formatting
Ruff
Adopted for code formatting and linting to enforce consistent coding standards.
Testing
Pytest
Utilized for writing and running automated tests to validate code functionality.
Type Checking
Mypy
Employed for static type checking to catch type-related bugs in the codebase.

Key Actionable Insights

1
Implementing Poetry for dependency management can significantly streamline the process of handling complex dependencies in Python projects.
By using Poetry, developers can avoid the pitfalls of transitive dependencies and ensure that their environments are consistent, which is especially beneficial in collaborative settings.
2
Adopting an auto-formatter like Ruff can eliminate formatting debates among team members, allowing them to focus on more critical aspects of development.
This approach not only saves time but also improves code readability and maintainability, fostering a more collaborative environment.
3
Regularly integrating automated tests into your development workflow can catch errors early and improve overall code quality.
This practice is particularly important in machine learning projects where assumptions about data can lead to significant downstream issues if not validated.
4
Using type checking with mypy can help enforce better coding practices and reduce runtime errors in Python applications.
By gradually introducing type annotations and using mypy to enforce them, teams can enhance code clarity and prevent bugs related to type mismatches.

Common Pitfalls

1
Failing to manage transitive dependencies can lead to inconsistencies between development environments.
This often occurs when using traditional dependency management methods like requirements.txt, which do not account for the full tree of dependencies, leading to different versions being installed on different machines.
2
Neglecting to enforce code formatting can result in lengthy discussions during code reviews.
Without an auto-formatter, developers may spend excessive time debating formatting styles, which detracts from productive discussions about code functionality.

Related Concepts

Dependency Management
Code Quality Assurance
Automated Testing Strategies
Static Type Checking