Omphalos, Uber’s Parallel and Language-Extensible Time Series Backtesting Tool

Roy Yang

Uber

•

Roy Yang

•11 min read•intermediate•

--

•View Original

scikit-learnTensorFlow

Overview

Omphalos is Uber's innovative time series backtesting tool designed to enhance forecasting accuracy and model comparison across various programming languages. The article discusses its design, implementation, and the methodologies employed to improve Uber's forecasting capabilities.

What You'll Learn

1

How to implement a sliding window backtesting procedure for time series forecasting

2

Why language-extensibility is crucial for model evaluation in diverse programming environments

3

How to leverage Omphalos for efficient model comparison across different languages

Prerequisites & Requirements

Understanding of time series forecasting concepts
Familiarity with programming in Go, R, and Python(optional)

Key Questions Answered

What are the two forms of backtesting used in Omphalos?

Omphalos employs two forms of backtesting: sliding window and expanding window. The sliding window method balances model accuracy and training time, while the expanding window is suited for time series with limited historical data. Each method has specific hyperparameters that dictate how training and forecasting are conducted.

How does Omphalos facilitate model comparison across programming languages?

Omphalos is designed as a language-extensible framework that allows for the comparison of model performance measures across different programming languages. This ensures that as long as the same backtesting configuration is used, models can be evaluated consistently regardless of the language they are implemented in.

What improvements does Omphalos bring to Uber's forecasting process?

Omphalos enhances Uber's forecasting process by enabling fast, flexible, and accurate comparisons of forecasting models. It streamlines model development, allowing data scientists to quickly identify the best-performing models and integrate them into their workflows, significantly improving the customer experience.

Key Statistics & Figures

Time difference in forecasting execution

150 hours

Omphalos reduced the time taken to run forecasts from 155 hours to just a quarter of a day by utilizing concurrent processing.

Average forecast time per time series

0.5 seconds

Using the Go implementation of Auto-Forecaster, the average time to generate forecasts was significantly reduced.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Backend

Go

Used to build the Omphalos framework for its robustness and scalability.

Data Analysis

R

Utilized for traditional statistical algorithms in time series forecasting.

Data Analysis

Python

Used for machine learning applications in time series forecasting.

Key Actionable Insights

1
Utilize the sliding window backtesting method to improve model accuracy when dealing with high-frequency time series data.
This method allows for a more reliable assessment of model performance by preserving the chronological order of data, which is crucial for accurate forecasting.

2
Incorporate multiple programming languages into your forecasting models to leverage the strengths of each language.
By using Omphalos, data scientists can compare models developed in R, Python, and Go, ensuring that the best algorithms are chosen based on performance metrics.

3
Implement the Auto-Forecaster API to streamline the forecasting process across various use cases.
This tool allows for quick and accurate predictions with minimal input, making it easier for data scientists to handle a wide range of forecasting scenarios.

Common Pitfalls

1

Failing to preserve the chronological order of data during backtesting can lead to inaccurate model performance assessments.

This mistake often occurs when using arbitrary splits for training and validation sets, which can skew results, especially in rapidly changing environments like Uber's marketplace.

Related Concepts

Time Series Forecasting Methodologies

Backtesting Techniques

Model Evaluation Strategies