NVIDIA AI Blueprints are fast and accessible examples of complete workload pipelines. They’re designed to inspire end users to build their own custom pipelines.
The NVIDIA Enterprise RAG Blueprint is an incredibly robust example of such work, and it shows how Retrieval Augmented Generation is a hot area of AI Deployment that allows you to implement effective AI applications without hyperscaler level computing resources.
You can get started with the NVIDIA Enterprise RAG Blueprint through a few paths:
- Deploy it via Docker Using On Premises Models
- Deploy it via Docker with NVIDIA Hosted Models
- Use Microway’s Pre-Readied Environment
You’ll need to start 3+ containers as you do this:
Vector DB
docker compose -f deploy/compose/vectordb.yaml up -d
Ingestor
docker compose -f deploy/compose/docker-compose-ingestor-server.yaml up -d
And the RAG containers
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
Implementation Choices with the NVIDIA Enterprise RAG Blueprint
The Blueprint is flexible enough to permit customization to your exact requirements. You should have sufficient tooling to do the following (and many more):
1. Change the inference model
One of the key decisions in your RAG workload is determining a balance between response time, computational requirements, and computational cost. Different GPUs behave differently for different workloads. The same goes for which model you deploy.
A larger model is more computationally intensive for these reasons.
You may wish to proactively test various model sizes to determine what will deliver you the right characteristics. To switch to the much larger llama-nemotron-super-49b execute the following command:
export APP_LLM_MODELNAME=’nvidia/llama-3.3-nemotron-super-49b-v1.5′
docker compose -f deploy/compose/docker-compose-rag-server.yaml up -d
3 main models are available:
– nvidia/llama-3.3-nemotron-super-49b-v1.5
– nvidia/llama-3.2-nv-embedqa-1b-v2
– nvidia/llama-3.2-nv-rerankqa-1b-v2
2. Don’t just support, transcribe audio
Diverse data types are a fact of AI life. As the goal of the Enterprise RAG Blueprint is to give you the bones of an AI implementation and inspire your own ideas on new ways to use AI, changing data modality is a good hook.
You don’t have to stick to a single data modality if you implement the right NVIDIA tools. Your blueprint is built to assist by using NVIDIA RIVA to enable audio transcriptions. Follow the instructions to start the audio ingestion containers.
Now you can use the ingestion API to transcribe your audio
FILEPATHS = [
‘../data/audio/sample.mp3’,
‘../data/audio/sample.wav’
]
await upload_documents(collection_name=”audio_data”)
3. Experiment with Chunk Size to Learn about Tuning
Data ingestion is just as important as inference response time. Tuning chunk size can help provide a reasonable ingestion time and reduce the resources required.
Workload Customization Ideas
RAG is what we often call a “technology of possibility.” Once you start to see examples of it in action, it will inspire new, loosely related uses in your own domain.
Here’s a few workload customization ideas to get you going.
- Research Document Cache. Get fast answers to questions about trends across research outputs (ex: all published papers) at an academic institution. Ask your queries in plain natural language. Or, feed a large corpus of PDF knowledge on a focused topic from across all the known research papers and let it tell you what has the most commonalities or trends between them.
- Knowledge Bot for Documentation. Have a large cache of technical documentation? Don’t waste time searching it. Ingest this an and build a chatbot to retrieve the information for you. We’ve heard of this being used in call centers, but it could equally be useful for systems administrators to share cluster information with users, or even systems integrators Microway to rapidly retrieve technical information from documentation.
- Graph Summarizer or Intermediate Data Processor. Looking to getter understand what’s going with a particular dataset? Need to interpret a long list of graphs? Of tabular data? Let AI do it for you! Ingest all the source data and start asking questions.
Try on Microway’s Cluster
Looking to try an RAG Pipeline out? Microway provides the Blueprint on our own GPU Test Drive Cluster. Submit a request today and get started quickly.
