July 17, 2024
1 Solar System Way, Planet Earth, USA

NVIDIA AI Workbench powers application development

Editor's Note: This post is part of the AI Series Decodedwhich demystifies AI by making the technology more accessible and showcases new hardware, software, tools and accelerations for NVIDIA RTX PC and workstation users.

The demand for tools to simplify and optimize Generative AI development is skyrocketing. Applications based on Recovery Augmented Generation (RAG) (a technique for improving the accuracy and reliability of generative AI models with data obtained from specific external sources) and custom models allow developers to tune AI models to their specific needs.

While this type of work may have required complex setup in the past, new tools make it easier than ever.

NVIDIA AI Workbench simplifies AI developer workflows by helping users create their own RAG projects, customize models, and more. is part of the RTX AI Toolkit – a set of software development tools and kits to customize, optimize and implement AI capabilities – launched in Computex earlier this month. AI Workbench removes the complexity of technical tasks that can derail experts and stop beginners.

What is NVIDIA AI Workbench?

Available for free, NVIDIA AI Workbench allows users to develop, experiment, test and prototype AI applications on the GPU systems of their choice, from laptops and workstations to data centers and the cloud. It offers a new approach to creating, using and sharing GPU-enabled development environments between people and systems.

A simple facility allows users to launch AI Workbench on a local or remote machine in just a few minutes. Users can then start a new project or replicate one from the examples on GitHub. Everything works through GitHub or GitLab, so users can collaborate and distribute work easily. Learn more about Getting started with AI Workbench.

How AI Workbench helps address AI project challenges

Developing AI workloads can require manual, often complex processes from the beginning.

Configuring GPUs, updating drivers, and managing version incompatibilities can be cumbersome. Replicating projects across different systems may require replicating manual processes over and over again. Inconsistencies when replicating projects, such as issues with data fragmentation and version control, can hinder collaboration. Varied configuration processes, movement of credentials and secrets, and changes to the environment, data, models, and file locations can limit the portability of projects.

AI Workbench makes it easy for data scientists and developers to manage their work and collaborate across heterogeneous platforms. It integrates and automates various aspects of the development process, offering:

  • Ease of setup: AI Workbench streamlines the process of setting up a GPU-accelerated developer environment, even for users with limited technical knowledge.
  • Perfect collaboration: AI Workbench integrates with version control and project management tools like GitHub and GitLab, reducing friction when collaborating.
  • Consistency when scaling from on-premises to the cloud: AI Workbench ensures consistency across multiple environments and supports scaling up or down from local workstations or PCs to data centers or the cloud.

RAG for documents, easier than ever

NVIDIA offers sample development Workbench projects to help users get started with AI Workbench. He Hybrid RAG Workbench Project is an example: run a custom text-based RAG web application with a user's documents on their local workstation, PC, or remote system.

Each Workbench project runs in a “container” – software that includes all the components needed to run the AI ​​application. The hybrid RAG sample combines a Gradio chat interface on the host machine with a containerized RAG server: the server that serves a user's request and routes queries to and from the vector database and the selected site. large language model.

This Workbench project supports a wide variety of LLMs available on NVIDIA's GitHub page. Additionally, the hybrid nature of the project allows users to select where to run inference.

Workbench Projects allows users to version the code and development environment.

Developers can run the embedding model on the host machine and run inference locally on a Hugging Face text generation inference server, on target cloud resources using NVIDIA inference endpoints such as NVIDIA API Catalogor with self-hosting microservices like NVIDIA NIM or third party services.

The RAG Workbench hybrid project also includes:

  • Performance Metrics: Users can evaluate how RAG-based and non-RAG-based user queries perform in each inference mode. Metrics tracked include recovery time, time to first token (TTFT), and token velocity.
  • Recovery transparency: A panel displays the exact fragments of text, retrieved from the most contextually relevant content in the vector database, that are fed into the LLM and improve the relevance of the answer to the user's query.
  • Response customization: Responses can be modified with a variety of parameters, such as maximum tokens to generate, temperature, and frequency penalty.

To get started with this project, simply install AI workbench on a local system. The hybrid RAG Workbench project can be transferred from GitHub to the user's account and duplicated on the local system.

There are more resources available in the AI Decoded User Guide. Additionally, community members provide helpful video tutorials, like the one from Joe Freeman below.

Customize, optimize, implement

Developers often look to customize AI models for specific use cases. Fine tuning, a technique that changes the model by training it with additional data, can be useful for style transfer or changing the behavior of the model. AI Workbench also helps with fine tuning.

He Flame Factory AI Workbench Project enables QLoRa, a fine-tuning method that minimizes memory requirements, for a variety of models, as well as model quantization through a simple graphical user interface. Developers can use public or proprietary data sets to meet the needs of their applications.

Once tuning is complete, the model can be quantized to improve performance and take up less memory and then deployed to native Windows applications for local inference or to NVIDIA NIM for cloud inference. Find a complete tutorial for this project at the NVIDIA RTX AI Toolkit Repository.

Truly Hybrid: Run AI Workloads Anywhere

The Hybrid-RAG Workbench project described above is hybrid in more ways than one. In addition to offering an inference mode option, the project can run locally on NVIDIA RTX workstations and GeForce RTX PCs, or scale out to remote cloud servers and data centers.

The ability to run projects on systems of the user's choosing (without the infrastructure setup overhead) extends to all Workbench projects. Find more examples and instructions for making adjustments and customization in AI Workbench Quick Start Guide.

Generative AI is transforming gaming, video conferencing, and interactive experiences of all kinds. Get the latest and greatest by subscribing to the AI Decoded Newsletter.

    Leave feedback about this

    • Quality
    • Price
    • Service


    Add Field


    Add Field
    Choose Image
    Choose Video