Improving Cursor Tab with online RL

Jacob Jackson, Phillip Kravtsov, Shomil Jain

Our new Tab model makes 21% fewer suggestions while having 28% higher accept rate.

Cursor

•

Jacob Jackson, Phillip Kravtsov, Shomil Jain

•6 min read•intermediate•

--

•View Original

CopilotPyTorch

Overview

The article discusses how Cursor enhances its Tab model for predicting developer actions using online reinforcement learning. It highlights the challenges of noisy suggestions and the implementation of policy gradient methods to improve suggestion accuracy, resulting in a more efficient coding experience.

What You'll Learn

1

How to implement online reinforcement learning to improve predictive models

2

Why maintaining a high accept rate for suggestions is crucial in coding environments

3

When to apply policy gradient methods for optimizing machine learning models

Key Questions Answered

How does Cursor improve its Tab model using online reinforcement learning?

Cursor enhances its Tab model by frequently rolling out new models and using user interaction data for training. This approach contrasts with static datasets used by other providers, allowing for real-time improvements based on user feedback.

What is the significance of the accept rate in Cursor's Tab model?

The accept rate indicates how many suggestions users accept versus reject. A low accept rate suggests too many incorrect suggestions, which can disrupt coding flow. Thus, maintaining a high accept rate is essential for user productivity.

What are policy gradient methods and how are they applied in Cursor's Tab model?

Policy gradient methods optimize a model's policy to maximize rewards based on user interactions. In Cursor's Tab model, these methods help the model learn to suggest actions that are more likely to be accepted by users, thus improving overall performance.

What improvements were made in the new Tab model compared to the previous version?

The new Tab model makes 21% fewer suggestions while achieving a 28% higher accept rate. This indicates a more efficient model that better aligns with user needs and preferences.

Key Statistics & Figures

Reduction in suggestions

21%

The new Tab model makes 21% fewer suggestions compared to the previous version.

Increase in accept rate

28%

The new Tab model has a 28% higher accept rate for its suggestions.

Daily requests handled

400 million

The Tab model processes over 400 million requests per day.

Time to roll out a new model

1.5 to 2 hours

It currently takes Cursor 1.5 to 2 hours to roll out a new model and collect user interaction data.

Key Actionable Insights

1
Implement online reinforcement learning to continuously improve predictive models based on real user data.
This approach allows for rapid adaptation to user behavior, leading to more relevant suggestions and improved user satisfaction.

2
Focus on maintaining a high accept rate for suggestions to enhance developer productivity.
By minimizing incorrect suggestions, developers can maintain their coding flow, reducing distractions and improving overall efficiency.

3
Utilize policy gradient methods to optimize machine learning models effectively.
These methods help in learning from user interactions, allowing models to adapt and improve based on real-time feedback.

Common Pitfalls

1

Failing to account for user context when making suggestions can lead to a low accept rate.

If the model suggests actions without sufficient information about the user's intent, it risks providing irrelevant suggestions, which can frustrate users.