Implementing NVIDIA AI Blueprint

Brett Newman

·

October 16, 2025

NVIDIA AI Blueprints are fast and accessible examples of complete workload pipelines. They’re designed to inspire end users to build their own custom pipelines.

The NVIDIA Enterprise RAG Blueprint is an incredibly robust example of such work, and it shows how Retrieval Augmented Generation is a hot area of AI Deployment that allows you to implement effective AI applications without hyperscaler level computing resources.

You can get started with the NVIDIA Enterprise RAG Blueprint through a few paths:

You’ll need to start 3+ containers as you do this:

Vector DB

docker compose -f deploy/compose/vectordb.yaml up -d

Ingestor

docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d

And the RAG containers

docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d

Implementation Choices with the NVIDIA Enterprise RAG Blueprint

The Blueprint is flexible enough to permit customization to your exact requirements. You should have sufficient tooling to do the following (and many more):

1. Change the inference model

One of the key decisions in your RAG workload is determining a balance between response time, computational requirements, and computational cost. Different GPUs behave differently for different workloads. The same goes for which model you deploy.

A larger model is more computationally intensive for these reasons.

You may wish to proactively test various model sizes to determine what will deliver you the right characteristics. To switch to the much larger llama-nemotron-super-49b execute the following command:

export APP_LLM_MODELNAME=’nvidia/llama-3.3-nemotron-super-49b-v1.5′
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d

3 main models are available:

– nvidia/llama-3.3-nemotron-super-49b-v1.5
– nvidia/llama-3.2-nv-embedqa-1b-v2
– nvidia/llama-3.2-nv-rerankqa-1b-v2

2. Don’t just support, transcribe audio

Diverse data types are a fact of AI life. As the goal of the Enterprise RAG Blueprint is to give you the bones of an AI implementation and inspire your own ideas on new ways to use AI, changing data modality is a good hook.

You don’t have to stick to a single data modality if you implement the right NVIDIA tools. Your blueprint is built to assist by using NVIDIA RIVA to enable audio transcriptions. Follow the instructions to start the audio ingestion containers.

Now you can use the ingestion API to transcribe your audio

FILEPATHS = [
‘../data/audio/sample.mp3’,
‘../data/audio/sample.wav’
]

await upload_documents(collection_name=”audio_data”)

3. Experiment with Chunk Size to Learn about Tuning

Data ingestion is just as important as inference response time. Tuning chunk size can help provide a reasonable ingestion time and reduce the resources required.

Workload Customization Ideas

RAG is what we often call a “technology of possibility.” Once you start to see examples of it in action, it will inspire new, loosely related uses in your own domain.

Here’s a few workload customization ideas to get you going.

Research Document Cache. Get fast answers to questions about trends across research outputs (ex: all published papers) at an academic institution. Ask your queries in plain natural language. Or, feed a large corpus of PDF knowledge on a focused topic from across all the known research papers and let it tell you what has the most commonalities or trends between them.
Knowledge Bot for Documentation. Have a large cache of technical documentation? Don’t waste time searching it. Ingest this an and build a chatbot to retrieve the information for you. We’ve heard of this being used in call centers, but it could equally be useful for systems administrators to share cluster information with users, or even systems integrators Microway to rapidly retrieve technical information from documentation.
Graph Summarizer or Intermediate Data Processor. Looking to getter understand what’s going with a particular dataset? Need to interpret a long list of graphs? Of tabular data? Let AI do it for you! Ingest all the source data and start asking questions.

Try on Microway’s Cluster

Looking to try an RAG Pipeline out? Microway provides the Blueprint on our own GPU Test Drive Cluster. Submit a request today and get started quickly.

Implementing NVIDIA AI Blueprint

Implementation Choices with the NVIDIA Enterprise RAG Blueprint

1. Change the inference model

2. Don’t just support, transcribe audio

3. Experiment with Chunk Size to Learn about Tuning

Workload Customization Ideas

Try on Microway’s Cluster

You May Also Like

Microway Achieves DGX SuperPOD Specialization Partner Status with NVIDIA

DGX A100 review: Throughput and Hardware Summary

Deploying GPUs for Classroom and Remote Learning