Amdocs Accelerates Generative AI Performance and Lowers Costs with NVIDIA NIM

Liad Levi-Raz

Telecommunications companies (telcos) are leveraging generative AI to increase employee productivity by automating processes, improving customer experiences…

NVIDIA

•

Liad Levi-Raz

•10 min read•intermediate•

--

•View Original

Generative AIGPTGPT-4JSONLangChainXML

Overview

The article discusses how Amdocs is leveraging NVIDIA NIM to enhance generative AI performance while reducing operational costs in telecommunications. It highlights the development of the amAIz platform, which utilizes NVIDIA DGX Cloud and various LLMs to improve customer service efficiency through AI-driven solutions.

What You'll Learn

1

How to deploy LLMs using NVIDIA NIM for enhanced performance

2

Why parameter-efficient fine-tuning methods like LoRA are beneficial for LLMs

3

How to reduce token consumption in AI applications through data reformatting

Prerequisites & Requirements

Understanding of generative AI and LLMs
Familiarity with NVIDIA DGX Cloud and NVIDIA NIM(optional)

Key Questions Answered

How does Amdocs utilize NVIDIA NIM for generative AI?

Amdocs uses NVIDIA NIM to deploy finetuned LLMs, enabling high throughput and low latency for generative AI applications. This deployment is facilitated through self-hosted instances that expose OpenAI-like API endpoints, streamlining AI application development and improving operational efficiency.

What improvements did Amdocs achieve in AI response accuracy?

Amdocs reported accuracy improvements of up to 30% in AI-generated responses after collaborating with NVIDIA. This enhancement is crucial for the adoption of generative AI services in the telecommunications industry, ensuring that customer inquiries are addressed more effectively.

What are the cost reductions achieved by Amdocs using NVIDIA infrastructure?

Amdocs achieved a reduction of tokens consumed by 60% in data preprocessing and 40% in inferencing while maintaining accuracy. This significant cost efficiency allows for lower operational expenses in deploying generative AI solutions.

How did Amdocs improve latency in their AI applications?

By deploying LLMs on NVIDIA NIM, Amdocs reduced query latency by approximately 80%. This improvement ensures that end users receive near real-time responses, enhancing the overall customer experience across various services.

Key Statistics & Figures

Accuracy improvement

up to 30%

This improvement was achieved in AI-generated responses after integrating NVIDIA technologies.

Reduction in tokens consumed for data preprocessing

60%

This reduction was noted in the operational costs associated with deploying generative AI applications.

Reduction in query latency

approximately 80%

This enhancement was made possible through the deployment of LLMs on NVIDIA NIM.

Technologies & Tools

Backend

Nvidia Nim

Used for deploying LLMs and optimizing AI inference.

Backend

Nvidia Dgx Cloud

Provides the infrastructure for training and fine-tuning LLMs.

AI/ML

Openai Gpt-4

Utilized for filtering transcripts and generating question-answer pairs.

AI/ML

Llama2

Used as a baseline model for enhancing customer service chatbots.

AI/ML

Mixtral

Another LLM used in the fine-tuning experiments.

Key Actionable Insights

1
Leverage NVIDIA NIM to deploy LLMs for faster AI inference in your applications.
Using NVIDIA NIM can significantly enhance the performance of AI applications by reducing latency and improving throughput, which is crucial for real-time customer interactions.

2
Implement parameter-efficient fine-tuning techniques like LoRA to optimize your LLMs.
These techniques allow for effective model training with limited data, making it easier to adapt models to specific use cases without extensive computational resources.

3
Reformat your input data to minimize token consumption and improve processing efficiency.
By reducing the complexity of input data formats, you can achieve substantial savings in operational costs while maintaining the quality of AI outputs.

Common Pitfalls

1

Failing to properly format input data can lead to inefficiencies in LLM performance.

Complex data formats can overwhelm the model's context window, resulting in suboptimal responses. Simplifying and reformatting data is essential for maximizing the effectiveness of AI models.

2

Neglecting the importance of fine-tuning can result in poor model accuracy.

Without proper fine-tuning, LLMs may not perform well in specific domains, leading to inaccuracies in generated responses. Employing techniques like LoRA can significantly enhance model performance.

Related Concepts

Generative AI Applications

Large Language Models (llms)

Parameter-efficient Fine-tuning

AI Inference Optimization