IntermediateFeatured
LlamaIndex Agentic RAG
Connecting data-aware agents to your private knowledge base using advanced indexing.
25 min read
LlamaIndexOpenAI
LlamaIndex Agentic RAG#
Learn how to build intelligent agents that can reason over your private data using LlamaIndex's powerful RAG capabilities.
What is Agentic RAG?#
Agentic RAG combines the retrieval power of RAG with the reasoning capabilities of AI agents. Instead of simple query-response patterns, agents can:
- Plan multi-step retrieval strategies
- Combine information from multiple sources
- Reason about retrieved context
- Take actions based on findings
Installation#
pip install llama-index llama-index-agent-openai
pip install llama-index-vector-stores-chromaBuilding Your Knowledge Base#
Document Loading#
from llama_index.core import SimpleDirectoryReader
# Load documents from a directory
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
filename_as_id=True
).load_data()
print(f"Loaded {len(documents)} documents")Creating the Index#
from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# Configure models
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-4-turbo")
# Create vector index
index = VectorStoreIndex.from_documents(documents)
# Persist for later use
index.storage_context.persist(persist_dir="./storage")Query Engine as a Tool#
Transform your index into an agent tool:
from llama_index.core.tools import QueryEngineTool
# Create query engine
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact"
)
# Wrap as tool
knowledge_tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="knowledge_base",
description="Search the internal knowledge base for information about company policies and procedures."
)Creating the Agent#
Basic Agent Setup#
from llama_index.agent.openai import OpenAIAgent
agent = OpenAIAgent.from_tools(
tools=[knowledge_tool],
verbose=True,
system_prompt="""You are a helpful assistant with access to a company knowledge base.
Always search the knowledge base before answering questions about policies."""
)
# Query the agent
response = agent.chat("What is our vacation policy?")
print(response)Multi-Tool Agent#
Combine multiple data sources:
from llama_index.core.tools import FunctionTool
# Create additional tools
def get_current_date() -> str:
"""Returns the current date."""
from datetime import datetime
return datetime.now().strftime("%Y-%m-%d")
date_tool = FunctionTool.from_defaults(fn=get_current_date)
# Create agent with multiple tools
agent = OpenAIAgent.from_tools(
tools=[knowledge_tool, date_tool, calculator_tool],
verbose=True
)Advanced RAG Patterns#
Sub-Question Query Engine#
Break complex queries into simpler sub-queries:
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.tools import QueryEngineTool
# Create tools for different indices
tools = [
QueryEngineTool.from_defaults(
query_engine=hr_index.as_query_engine(),
name="hr_policies",
description="HR policies and employee handbook"
),
QueryEngineTool.from_defaults(
query_engine=tech_index.as_query_engine(),
name="technical_docs",
description="Technical documentation and APIs"
)
]
# Create sub-question engine
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools
)Router Query Engine#
Automatically route queries to the right index:
from llama_index.core.query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=tools
)Production Considerations#
Vector Store Integration#
Use production-grade vector stores:
import chromadb
from llama_index.vector_stores.chroma import ChromaVectorStore
# Setup Chroma
chroma_client = chromadb.PersistentClient(path="./chroma_db")
collection = chroma_client.get_or_create_collection("knowledge_base")
# Create index with Chroma
vector_store = ChromaVectorStore(chroma_collection=collection)
index = VectorStoreIndex.from_vector_store(vector_store)Caching and Performance#
from llama_index.core import Settings
# Enable caching
Settings.cache_enabled = True
# Use faster embedding model for production
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
embed_batch_size=100
)Best Practices#
- Chunking Strategy: Experiment with chunk sizes (512-1024 tokens)
- Metadata Filtering: Add metadata for better retrieval precision
- Hybrid Search: Combine vector and keyword search
- Evaluation: Use RAGAS or LlamaIndex evaluation modules
- Observability: Integrate with tracing tools like Phoenix