RAG (Retrieval-Augmented Generation) — Beginner देखि Clear हुने Full Guide
Document chunk कसरी हुन्छ? Query आउँदा search कसरी हुन्छ? — Step-by-Step Explanation
Introduction — RAG किन सिक्ने?
आजकल AI chatbot, document search system, company assistant जस्ता धेरै system हरू RAG (Retrieval-Augmented Generation) प्रयोग गरेर बनाइन्छन्।
यदि तपाईं:
- Data Science सिक्दै हुनुहुन्छ
- AI/LLM engineering सिक्न चाहनुहुन्छ
- Chatbot वा document search system बनाउन चाहनुहुन्छ
भने RAG बुझ्नु अत्यन्त जरुरी skill हो।
यो blog मा हामी zero knowledge भएको beginner ले पनि बुझ्ने गरी step-by-step RAG explain गर्नेछौं।
RAG भनेको के हो? (Simple Definition)
RAG = Retrieval + Generation
यसको अर्थ:
- Retrieval → document बाट relevant information खोज्ने
- Generation → LLM ले answer generate गर्ने
Simple शब्दमा:
RAG भनेको AI लाई answer दिनु अघि document बाट सही information खोजेर त्यसको आधारमा answer generate गर्ने system हो।
Real-Life Example बाट बुझौं
Imagine गर्नुहोस्:
- 1000 PDF files
- Company policy
- Research notes
User सोध्छ:
"Sick leave कति दिन पाइन्छ?"
AI ले guess गरेर answer दिनु भन्दा:
- Sick leave related document खोज्छ
- त्यो part पढ्छ
- त्यसको आधारमा answer दिन्छ
यो process नै RAG हो।
RAG System को Main Parts
RAG system मा सामान्यतया यी 5 component हुन्छन्:
- Documents
- Chunking
- Embeddings
- Vector Database
- LLM
Step 1 — Document Loading (Data तयार गर्ने)
पहिले हामीसँग ठूलो document हुन्छ।
Example:
- Leave policy
- Sick leave
- Work from home
यो document ठूलो paragraph मा हुन्छ।
Problem:
Large document direct search गर्दा slow हुन्छ।
त्यसैले next step:
Chunking
Step 2 — Chunking (सबैभन्दा Important Concept)
Chunking भनेको:
ठूलो document लाई साना-साना pieces (chunks) मा split गर्नु।
Example:
Original document:
- Section 1 → Leave policy
- Section 2 → Sick leave
- Section 3 → Work from home
Chunk गरेपछि:
Chunk 1: Leave Policy Employees can take 20 days annual leave. Chunk 2: Sick Leave Employees can take sick leave when ill. Chunk 3: Work From Home Employees can work from home when approved.
Chunk किन बनाइन्छ?
- Search fast हुन्छ
- Relevant part मात्र find हुन्छ
- Memory efficient हुन्छ
- Accuracy बढ्छ
Chunk Size कति हुन्छ?
Typical rule:
- 300–500 words per chunk
- 50–100 overlap
Example:
Chunk 1 → words 1–500 Chunk 2 → words 450–950 Chunk 3 → words 900–1400
Overlap किन?
Important sentence छुट्न नदिन।
Step 3 — Embedding (Text लाई Number बनाउने)
Computer लाई text meaning बुझाउन:
Text लाई vector (numbers) मा convert गरिन्छ।
Example:
Text: "Sick leave allowed when ill" Vector: [0.78, 0.22, 0.19, 0.55]
यो process लाई भनिन्छ: Embedding
Step 4 — Vector Database (Storage)
अब सबै chunk vector लाई Vector Database मा store गरिन्छ।
Common vector databases:
- FAISS
- Pinecone
- Chroma
- Weaviate
Step 5 — Query आउँदा के हुन्छ?
User सोध्छ:
"Sick leave कति दिन पाइन्छ?"
Query Processing — Step-by-Step
Step 1 — Query लाई Vector बनाउने
"Sick leave कति दिन?" → Query Vector
Step 2 — Vector Search
System compare गर्छ:
Chunk 1 → Leave policy Chunk 2 → Sick leave ← MATCH Chunk 3 → Work from home
Step 3 — Relevant Chunk LLM लाई पठाउने
Question: "Sick leave कति दिन?" Context: "Sick leave allowed for 10 days."
Step 4 — Final Answer Generate
Employees can take 10 days sick leave.
Real-Life Simple Analogy
Imagine: तपाईं library मा हुनुहुन्छ।
User सोध्छ:
"Python book कहाँ छ?"
You don't read whole library.
Instead:
- Index search
- Relevant shelf find
- Book open
- Answer give
Library = Vector DB
Search = Retrieval
Answer = Generation
That is RAG.
Simple RAG Pipeline Summary
- Load document
- Split into chunks
- Convert chunks to vectors
- Store in vector database
- Convert query to vector
- Search similar chunks
- Send chunks to LLM
- Generate final answer
Final Summary
If you remember only this — you understand RAG:
- Document split हुन्छ
- Vector बनाइन्छ
- Query search हुन्छ
- LLM answer generate गर्छ
यही नै Full RAG workflow हो।