Netomi’s lessons for scaling agentic systems into the enterprise

Built with OpenAI GPT‑4.1 and GPT‑5.2, Netomi provides a blueprint for scaling safe, predictable agentic systems across the enterprise.

OpenAI Team
6 min readadvanced
--
View Original

Overview

The article discusses Netomi's approach to scaling agentic systems within enterprises, emphasizing the importance of building for real-world complexity, ensuring low latency through parallelization, and integrating governance into the runtime. It highlights lessons learned from deploying AI solutions for major clients like United Airlines and DraftKings.

What You'll Learn

1

How to design AI systems that handle complex workflows reliably

2

Why low latency is crucial for user trust in AI systems

3

How to integrate governance into AI runtime for safety and compliance

Key Questions Answered

How does Netomi ensure AI agents handle real-world complexities?
Netomi builds its Agentic OS to manage the complexities of real-world workflows by orchestrating multiple systems, ensuring that AI agents can navigate through various data sources and processes without collapsing under variability. This design allows AI to operate reliably in unpredictable environments.
What strategies does Netomi use to meet enterprise latency expectations?
Netomi employs a concurrency framework that allows tasks to be executed in parallel rather than sequentially. This approach leverages the low-latency capabilities of GPT-4.1 to maintain responsiveness even during high-demand scenarios, ensuring user trust and satisfaction.
What governance mechanisms does Netomi implement in its AI systems?
Netomi integrates governance directly into the AI runtime, which includes schema validation, policy enforcement, and PII protection. This ensures that AI systems operate safely and in compliance with regulations, especially in sensitive industries.

Key Statistics & Figures

Response time during peak traffic
sub-three-second responses
Netomi has maintained this performance while handling over 40,000 concurrent customer requests per second.
Intent classification accuracy
98%
This accuracy was achieved even as workflows involved multiple systems and checks.

Technologies & Tools

AI Model
Gpt-4.1
Used for fast, reliable reasoning and tool-calling in real-time workflows.
AI Model
Gpt-5.2
Utilized for deeper, multi-step planning and reasoning.

Key Actionable Insights

1
Design AI systems to manage real-world complexities by integrating multiple data sources and workflows.
This approach is essential for applications in industries like airlines, where requests often span various systems and require real-time decision-making.
2
Implement a concurrency framework to handle tasks in parallel, ensuring low latency during peak usage.
This is particularly important in high-pressure environments, such as customer service during major events, where delays can lead to user abandonment.
3
Embed governance mechanisms within the AI runtime to ensure compliance and safety.
In regulated industries, such as insurance, this is critical to avoid risks associated with incorrect responses and to maintain trust with users.

Common Pitfalls

1
Failing to account for real-world complexities can lead to system failures.
Many AI systems are designed with idealized workflows that do not reflect the messy realities of enterprise operations, resulting in brittleness and collapse under variability.
2
Neglecting latency can erode user trust.
Systems that execute tasks sequentially can hesitate, leading users to abandon them during critical moments, such as high-demand scenarios.