How Ramp Accelerated Machine Learning Development to Simplify Finance

A walkthrough of why and how Ramp uses Metaflow for machine learning engineering

Peyton McCullough
11 min readadvanced
--
View Original

Overview

The article discusses how Ramp utilized Metaflow to enhance their machine learning development process, thereby simplifying financial operations. It highlights the challenges faced with traditional ML pipelines and how Metaflow improved model deployment speed and developer experience.

What You'll Learn

1

How to shorten the feedback loop for machine learning models using Metaflow

2

Why using AWS-managed services can simplify machine learning infrastructure

3

How to define and run ML pipelines locally and in the cloud with Metaflow

4

When to choose Metaflow over other orchestration tools like Airflow for ML tasks

Prerequisites & Requirements

  • Basic understanding of machine learning concepts and workflows
  • Familiarity with AWS services and infrastructure management(optional)

Key Questions Answered

How did Ramp improve their machine learning model deployment speed?
Ramp improved their model deployment speed by adopting Metaflow, which allowed them to ship eight additional models in just ten months compared to the previous lengthy process. This shift significantly reduced the time from prototype to production, enhancing overall productivity.
What challenges did Ramp face with their initial machine learning setup?
Ramp faced several challenges including long job run times, lack of local pipeline execution, poor logging, and excessive platform involvement, which hindered the developer experience and slowed down model iteration and deployment.
Why did Ramp choose Metaflow over other ML platforms?
Ramp chose Metaflow for its simplicity and velocity, allowing data scientists to define ML pipelines purely in Python and run them both locally and in the cloud without extensive setup. This streamlined their workflow and improved developer autonomy.
What is the role of AWS Batch in Ramp's machine learning setup?
AWS Batch is used by Ramp to schedule and manage jobs on AWS-managed ECS clusters, providing job queue functionality that allows for efficient resource management and scaling. This integration supports their machine learning workflows effectively.

Key Statistics & Figures

Number of models shipped after adopting Metaflow
8
Ramp was able to ship eight additional models in just ten months after adopting Metaflow.
Time taken to build the initial riskiness model
months
The initial riskiness model took months to build due to various challenges in the previous setup.
Total Flow runs at Ramp
6000
Currently, Ramp has more than 6000 Flow runs, which include both scheduled and ad hoc tasks.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

ML Framework
Metaflow
Used for training and managing machine learning models.
Cloud Service
AWS Batch
Used to schedule and manage jobs on AWS-managed ECS clusters.
Infrastructure As Code
Terraform
Used to define and manage Metaflow's infrastructure.
Workflow Orchestrator
Airflow
Used for scheduling and triggering Metaflow flows.

Key Actionable Insights

1
Implementing Metaflow can drastically reduce the time taken to deploy machine learning models.
By adopting Metaflow, Ramp was able to ship eight models in ten months, showcasing how the right tools can enhance productivity and streamline workflows.
2
Utilizing AWS-managed services can simplify infrastructure management for machine learning projects.
Ramp's decision to leverage AWS services allowed them to focus more on model development rather than infrastructure complexities, which is crucial for teams looking to scale their ML efforts.
3
Defining ML pipelines in Python with Metaflow enhances flexibility and ease of use.
This approach allows data scientists to quickly iterate on models and run them locally or in the cloud, which is essential for rapid experimentation and deployment.

Common Pitfalls

1
Relying solely on a single orchestration tool like Airflow for machine learning tasks can lead to inefficiencies.
Airflow is designed for orchestrating compute workloads rather than processing them, which can create bottlenecks in ML workflows. It's important to choose tools that are specifically tailored for ML tasks to avoid such issues.

Related Concepts

Machine Learning Development
AWS Services For ML
ML Pipeline Management
Infrastructure As Code