A production-ready Retrieval-Augmented Generation (RAG) system that answers questions based on your document knowledge base. Deployed on AWS with serverless architecture.
Upload documents β Ask questions β Get accurate, cited answers.
curl -X POST https://your-api.execute-api.us-east-1.amazonaws.com/dev/query \
-H "Content-Type: application/json" \
-d '{"question": "What is the return policy?"}'
{
"answer": "Based on the provided context, items may be returned within 30 days of purchase...",
"sources": [{"source": "return_policy.md", "score": 0.84}]
}
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER REQUEST β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β API Gateway β
β (REST API) β
ββββββββββββ¬βββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Lambda (Query) β
β β
β 1. Embed question β
β 2. Search vectors β
β 3. Call Claude β
ββββββββββββ¬βββββββββββ
β
βββββββββββββββββββββββΌββββββββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Bedrock Titan β β OpenSearch β β Claude API β
β (Embeddings) β β Serverless β β (Answers) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOCUMENT INGESTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββ ββββββββββββ βββββββββββββββββββ
β S3 ββββββββΆβ SQS ββββββββΆβ Lambda (Ingest) β
β Bucket βtriggerβ Queue β β β
ββββββββββββ ββββββββββββ β 1. Download doc β
β 2. Chunk text β
Drop files here β 3. Embed chunks β
Auto-indexed! β 4. Store vectorsβ
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β OpenSearch β
β Serverless β
βββββββββββββββββββ
| Component | Service | Purpose |
|---|---|---|
| API | API Gateway | REST endpoint for queries |
| Query Processing | Lambda | Orchestrates RAG pipeline |
| Document Ingestion | Lambda + SQS | Async document processing |
| Vector Store | OpenSearch Serverless | k-NN similarity search |
| Embeddings | Bedrock Titan | 1536-dimension vectors |
| Generation | Claude API | Answer synthesis with citations |
| Storage | S3 | Document uploads |
| IaC | SAM/CloudFormation | Infrastructure as Code |
- Semantic Search: Finds relevant content even without keyword matches
- Source Citations: Every answer references its source documents
- Auto-Ingestion: Drop files in S3 β automatically indexed
- Serverless: Pay only for what you use, scales automatically
- Production-Ready: Error handling, logging, observability
- Python 3.12+
- AWS CLI configured
- SAM CLI installed
- Anthropic API key
# Clone and setup
git clone https://github.com/woodstocksoftware/rag-documentation-assistant.git
cd rag-documentation-assistant
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set your API key
export ANTHROPIC_API_KEY="your-key-here"
# Run locally with Gradio UI
python app.py
# Open http://localhost:7860# Build
sam build --template infrastructure/template.yaml
# Deploy
sam deploy \
--capabilities CAPABILITY_IAM CAPABILITY_NAMED_IAM CAPABILITY_AUTO_EXPAND \
--parameter-overrides AnthropicApiKey=$ANTHROPIC_API_KEY
# Configure S3 notifications (replace with your bucket name)
aws s3api put-bucket-notification-configuration \
--bucket rag-documents-YOUR_ACCOUNT_ID-dev \
--notification-configuration '{
"QueueConfigurations": [{
"QueueArn": "arn:aws:sqs:us-east-1:YOUR_ACCOUNT_ID:rag-document-processing-dev",
"Events": ["s3:ObjectCreated:*"]
}]
}'Upload documents:
aws s3 cp your-document.pdf s3://rag-documents-YOUR_ACCOUNT_ID-dev/Query the API:
curl -X POST https://YOUR_API.execute-api.us-east-1.amazonaws.com/dev/query \
-H "Content-Type: application/json" \
-d '{"question": "Your question here"}'rag-documentation-assistant/
βββ app.py # Gradio UI for local development
βββ infrastructure/
β βββ template.yaml # SAM/CloudFormation template
βββ src/
β βββ ingestion/
β β βββ chunker.py # Document chunking logic
β β βββ embeddings.py # Local embedding model
β β βββ loader.py # Document text extraction
β β βββ pipeline.py # Ingestion orchestration
β βββ query/
β β βββ generator.py # Claude response generation
β β βββ rag.py # RAG pipeline
β βββ shared/
β β βββ vector_store.py # ChromaDB wrapper
β βββ lambda/
β βββ query/
β β βββ handler.py # Query Lambda function
β βββ ingest/
β βββ handler.py # Ingest Lambda function
βββ sample_docs/ # Test documents
βββ requirements.txt
- Chunking: Documents are split into overlapping chunks (~500 tokens)
- Embedding: Each chunk is converted to a 1536-dimension vector
- Indexing: Vectors are stored in OpenSearch with k-NN indexing
- Query: User question is embedded and used for similarity search
- Retrieval: Top-k most similar chunks are retrieved
- Generation: Claude generates an answer using retrieved context
- Citation: Sources are tracked and included in the response
| Service | Monthly Cost (Dev) |
|---|---|
| OpenSearch Serverless | ~$25-30 |
| Lambda | < $1 |
| Bedrock Titan | < $1 |
| Claude API | ~$5-20 (usage dependent) |
| S3, SQS, API Gateway | < $1 |
| Total | ~$35-50/month |
Delete the stack when not in use to minimize costs.
# Delete all AWS resources
sam delete --stack-name rag-documentation-assistant- Runtime: Python 3.12
- LLM: Claude Sonnet (Anthropic)
- Embeddings: Amazon Bedrock Titan (AWS) / sentence-transformers (local)
- Vector DB: OpenSearch Serverless (AWS) / ChromaDB (local)
- Infrastructure: AWS SAM, CloudFormation
- UI: Gradio
MIT
Built by Jim Williams | GitHub
The API requires an API key for all requests. Include it in the x-api-key header:
curl -X POST https://YOUR_API.execute-api.us-east-1.amazonaws.com/dev/query \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{"question": "Your question here"}'| Limit | Value |
|---|---|
| Daily quota | 1,000 requests |
| Rate limit | 5 requests/second |
| Burst limit | 10 requests |
To get an API key, deploy your own instance or contact the maintainer.
View real-time metrics at:
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#dashboards:name=RAG-Documentation-Assistant
| Alarm | Condition | Action |
|---|---|---|
| RAG-Query-Errors | Any Lambda errors | Email alert |
| RAG-Ingest-Errors | Any ingest errors | Email alert |
| RAG-Query-HighLatency | Avg response > 10s | Email alert |
| RAG-API-5xxErrors | Any server errors | Email alert |
| RAG-API-4xxErrors | > 50 client errors/5min | Email alert |
| RAG-SQS-Backlog | > 10 messages stuck | Email alert |
| RAG-Lambda-Throttled | Any throttling | Email alert |
# Create SNS topic
aws sns create-topic --name rag-alerts-dev
# Subscribe your email
aws sns subscribe \
--topic-arn arn:aws:sns:us-east-1:YOUR_ACCOUNT_ID:rag-alerts-dev \
--protocol email \
--notification-endpoint your@email.com
# Create alarms
./scripts/create-alarms.sh
