RAG

AWS: Retrieval-Augmented Generation (RAG) is the process of optimizing the output of a large language model, so it references an authoritative knowledge base outside of its training data sources before generating a response
OpenAI: Retrieval Augmented Generation (RAG) is a technique that improves a model’s responses by injecting external context into its prompt at runtime.
Google: RAG (Retrieval-Augmented Generation) is an AI framework that combines the strengths of traditional information retrieval systems (such as search and databases) with the capabilities of generative large language models (LLMs).

RAG는 LLM 모델의 답변을 더 좋게 만들기 위해서 외부 데이터 소스에서 유용한 데이터를 조회(Retrieval)하여 LLM 모델에 전달하여 강화시키고(augmented) 더 좋은 답변(Generation)을 제공할 수 있도록 하는 방법으로 볼 수 있다.

1. RAG Architecture

AWS 문서[1]에서 설명하는 RAG 동작방식에 따르면 아래 그림과 같다.

기존에는 사용자가 작성한 쿼리가 그대로 LLM 모델로 전달되었다면, RAG 기반에서는 외부 데이터소스(AWS: Knowledge Bases)에 가서 관련된 문서가 있는지 먼저 조회하여 관련된 정보를 컨텍스트에 담아 LLM 모델에 전달한다. 이로 인해 LLM 모델은 자신이 몰랐던 정보를 컨텍스트로 입력받아 더 좋은 답변을 할 수 있게 되는 것이다.

단계적으로 살펴보면 다음과 같이 데이터를 Vector 화 하는 사전 처리 단계를 통해 Vector DB에 저장한다.

이후 Runtime 시에는 비슷한 Document를 Vector DB에서 찾아 LLM 모델에 Query + Relevant Information 을 전달하여 LLM 활용을 강화하는 것이다.

AWS와 비슷하게 IBM 에서는 RAG Architecture 로 다음과 같은 기본 컨셉을 제공한다[2].

AI Engineer 는 사전에 데이터 처리를 진행해야한다. 비정형 데이터에 대해서 Vector Embedding을 통해 Vector DB에 데이터를 저장하고, 추후 모델에 Query 시에 관련된 정보를 Vector DB에서 조회하여 LLM에 Query + Relevant Information 을 같이 전달하는 것이다.

2. Embedding

RAG 시스템에서 가장 핵심적인 단계는 텍스트 데이터를 Vector(벡터)로 변환하는 과정이다. 이를 Embedding이라고 하며, Embedding Model을 통해 수행된다.

Embedding이란?

Embedding은 텍스트를 숫자 배열(벡터)로 변환하는 과정이다. 예를 들어 “RAG는 LLM을 강화한다”라는 문장은 [0.041, 0.056, -0.018, -0.012, ...]와 같은 수백 개의 숫자로 표현된다. 이렇게 변환된 벡터는 텍스트 간의 의미적 유사도를 수학적으로 계산할 수 있게 해준다.

비슷한 의미를 가진 문장들은 벡터 공간에서 가까운 위치에 배치되고, 의미가 다른 문장들은 멀리 떨어진다. 예를 들어 “강아지가 뛴다”와 “개가 달린다”는 벡터 공간에서 가까운 위치에 있지만, “하늘이 파랗다”는 멀리 떨어져 있다.

Vector 타입

Embedding Model은 주로 두 가지 타입의 벡터를 생성한다:

Floating-point Vector (float32)

각 차원당 32비트 사용
높은 정밀도로 텍스트의 의미를 표현
대부분의 Embedding Model이 기본으로 사용
예시: [0.041, 0.056, -0.018, -0.012, -0.020, ...]

Binary Vector

각 차원당 1비트만 사용 (0 또는 1)
저장 공간이 32배 적게 필요 (32비트 → 1비트)
정밀도는 낮지만 대용량 데이터 처리에 효율적
예시: [1, 1, 0, 0, 0, ...]

대표적인 Embedding Model

Embedding Model은 텍스트의 의미를 얼마나 잘 포착하느냐에 따라 RAG 시스템의 성능이 결정된다. 대표적인 모델로는:

all-MiniLM-L6-v2: 384차원, 빠르고 효율적인 오픈소스 모델 (HuggingFace)
OpenAI text-embedding-3: 높은 정확도의 상용 모델
Cohere Embed: 다국어 지원이 강력한 상용 모델
Amazon Titan Text Embeddings

Vector 화 과정

텍스트 청크 분할: 긴 문서를 적절한 크기(예: 1000자)로 나눔
Embedding Model 적용: 각 청크를 벡터로 변환
Vector DB 저장: 변환된 벡터를 ChromaDB, Pinecone 등의 Vector DB에 저장
유사도 검색: 사용자 쿼리를 벡터화하여 가장 유사한 문서 검색 (Cosine Similarity 등 사용)

이렇게 Vector화된 데이터는 Runtime 시 사용자의 질문과 의미적으로 가장 유사한 문서를 빠르게 찾아내는 데 사용된다.

3. Test

로컬에서 Ollama 모델에 RAG 를 활용하여 테스트 해 볼 수 있도록 한다.

컨셉: 내가 했던 일에 대해 RAG를 통해 강화하고, linked, wanted 커리어 이력서를 통해 LLM 모델이 내가 했던일을 기반으로 나와 같이 대답을 할 수 있도록 한다.
구조:
- 내가 했던 일은 모두 특정 경로(rag_path) 하위에 markdown 언어로 저장되어 있기 때문에 .md 파일만 검색한다.
- 나에 대한 소개는 모두 /me/ 하위에 linkedin.pdf, wanted.pdf, summary.txt 파일로 저장된다.

먼저, 테스트를 하기 위한 환경으로 Vector DB와 텍스트 청크를 Vector화 할 수 있는 Embedding Model 이 필요하다. 이 번 테스트에서는 아래와 같은 환경을 사용한다.

VectorDB: Chroma
Embedding Model: all-MiniLM-L6-v2

from dotenv import load_dotenv
from openai import OpenAI
from pypdf import PdfReader
import gradio as gr
import chromadb
from chromadb.utils import embedding_functions

print("✅ 라이브러리 import 완료")

Ollama 셋팅

# 환경 변수 로드
load_dotenv(override=True)

# OpenAI 클라이언트 (Ollama 사용)
openai = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama"
)
model_name = "gpt-oss:20b-cloud"

print(f"✅ OpenAI 클라이언트 설정 완료 (모델: {model_name})")

Linkedin, Wanted 등에 저장된 PDF 파일을 다운 받아 /me/ 하위 폴더에 저장한 후 아래 코드를 실행한다. summary.txt 는 자신에 대한 간단한 소개서다.

# LinkedIn PDF 읽기
reader = PdfReader("../me/linkedin.pdf")
linkedin = ""
for page in reader.pages:
    text = page.extract_text()
    if text:
        linkedin += text

print(f"✅ LinkedIn 프로필 읽기 완료 ({len(linkedin)} 글자)")

# Wanted PDF 읽기 (있다면)
try:
    reader = PdfReader("../me/wanted.pdf")
    wanted = ""
    for page in reader.pages:
        text = page.extract_text()
        if text:
            wanted += text
    print(f"✅ Wanted 프로필 읽기 완료 ({len(wanted)} 글자)")
except:
    wanted = ""
    print("⚠️  Wanted 프로필 없음 (선택사항)")

# Summary 읽기
with open("../me/summary.txt", "r", encoding="utf-8") as f:
    summary = f.read()

print(f"✅ Summary 읽기 완료 ({len(summary)} 글자)")

System Prompt 설정

name = "bys"
print(f"✅ 이름 설정: {name}")

system_prompt = f"""You are acting as {name}. You are answering questions on {name}'s website, \
particularly questions related to {name}'s career, background, skills and experience. \
Your responsibility is to represent {name} for interactions on the website as faithfully as possible. \
You are given a summary of {name}'s background and LinkedIn profile which you can use to answer questions. \
Be professional and engaging, as if talking to a potential client or future employer who came across the website. \
If you don't know the answer, say so."""

system_prompt += f"\n\n## Summary:\n{summary}\n\n"
system_prompt += f"## LinkedIn Profile:\n{linkedin}\n\n"
if wanted:
    system_prompt += f"## Wanted Profile:\n{wanted}\n\n"
system_prompt += f"With this context, please chat with the user, always staying in character as {name}."

print(f"✅ System Prompt 생성 완료 ({len(system_prompt)} 글자)")

Chroma DB 초기화

# ============================================
# RAG 시스템 초기화
# ============================================

import chromadb
from chromadb.utils import embedding_functions
import os
from tqdm import tqdm

print("🚀 RAG 시스템 초기화 중...")

# ChromaDB 클라이언트 생성 (로컬 파일로 저장)
chroma_client = chromadb.PersistentClient(path="./chroma_db")

# 임베딩 모델 설정 (all-MiniLM-L6-v2 사용)
embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"  # 384차원, 빠르고 효율적
)

print("✅ ChromaDB 클라이언트 생성 완료")
print("✅ 임베딩 모델 로드 완료 (all-MiniLM-L6-v2)")

Indexing 함수 정의

# 마크다운 파일 인덱싱 함수
# ============================================

def index_markdown_files(rag_path, collection_name="work_cases"):
    """
    마크다운 파일을 읽어서 ChromaDB에 벡터로 저장
    
    Args:
        rag_path: 커스텀 파일이 있는 경로
        collection_name: ChromaDB 컬렉션 이름
    
    Returns:
        collection: ChromaDB 컬렉션 객체
    """
    
    print(f"\n📁 경로 스캔: {rag_path}")
    
    # 기존 컬렉션 삭제 (재인덱싱)
    try:
        chroma_client.delete_collection(name=collection_name)
        print(f"⚠️  기존 컬렉션 '{collection_name}' 삭제")
    except:
        pass
    
    # 새 컬렉션 생성
    collection = chroma_client.create_collection(
        name=collection_name,
        embedding_function=embedding_function,
        metadata={"description": "Work cases and project documentation"}
    )
    
    # 모든 마크다운 파일 찾기
    markdown_files = []
    for root, dirs, files in os.walk(rag_path):
        for file in files:
            if file.endswith('.md'):
                file_path = os.path.join(root, file)
                markdown_files.append(file_path)
    
    print(f"📄 발견된 마크다운 파일: {len(markdown_files)}개")
    
    if len(markdown_files) == 0:
        print("⚠️  마크다운 파일이 없습니다!")
        print(f"💡 경로를 확인하세요: {rag_path}")
        return collection
    
    # 각 파일 처리
    documents = []
    metadatas = []
    ids = []
    doc_id = 0
    
    for file_path in tqdm(markdown_files, desc="📝 인덱싱 진행"):
        try:
            # 파일 읽기
            with open(file_path, 'r', encoding='utf-8') as f:
                content = f.read()
            
            # 빈 파일 스킵
            if not content.strip():
                continue
            
            # 긴 파일은 청크로 분할
            chunk_size = 1000  # 1000자씩
            overlap = 200      # 200자 오버랩 (문맥 유지)
            
            for i in range(0, len(content), chunk_size - overlap):
                chunk = content[i:i+chunk_size]
                
                if not chunk.strip():
                    continue
                
                # 문서 추가
                documents.append(chunk)
                metadatas.append({
                    "file_path": file_path,
                    "file_name": os.path.basename(file_path),
                    "chunk_id": i // (chunk_size - overlap),
                    "total_length": len(content)
                })
                ids.append(f"doc_{doc_id}")
                doc_id += 1
                
        except Exception as e:
            print(f"\n⚠️  파일 읽기 실패: {file_path} - {e}")
    
    # ChromaDB에 배치 추가
    if documents:
        print(f"\n🔄 {len(documents)}개 청크를 벡터로 변환 중...")
        
        # 배치 크기로 나눠서 추가 (메모리 효율)
        batch_size = 100
        for i in range(0, len(documents), batch_size):
            batch_docs = documents[i:i+batch_size]
            batch_metas = metadatas[i:i+batch_size]
            batch_ids = ids[i:i+batch_size]
            
            collection.add(
                documents=batch_docs,
                metadatas=batch_metas,
                ids=batch_ids
            )
        
        print(f"✅ 인덱싱 완료!")
        print(f"   📁 파일 수: {len(markdown_files)}")
        print(f"   📦 총 청크: {len(documents)}")
        print(f"   💾 저장 위치: ./chroma_db")
    else:
        print("⚠️  인덱싱할 내용이 없습니다!")
    
    return collection

print("✅ 인덱싱 함수 정의 완료")

Indexing
실제로 path = “/Users/bys/workspace/work/cases”

# ============================================
# Cases 데이터 인덱싱 실행
# ============================================

# 🔥 여기를 실제 경로로 수정하세요!
rag_path = "/Users/bys/workspace/work/cases"

# 경로 존재 확인
if os.path.exists(rag_path):
    print(f"✅ 경로 확인: {rag_path}")
    
    # 인덱싱 실행
    collection = index_markdown_files(rag_path)
    
    # 결과 확인
    count = collection.count()
    print(f"\n📊 최종 결과:")
    print(f"   총 {count}개의 벡터가 저장되었습니다.")
    
else:
    print(f"❌ 경로를 찾을 수 없습니다: {rag_path}")
    print("💡 경로를 확인하고 다시 시도하세요.")

검색함수 정의

def search_relevant_cases(query, collection, top_k=3, min_similarity=0.6):
    """
    질문과 관련된 케이스 검색
    
    Args:
        query: 검색 질문
        collection: ChromaDB 컬렉션
        top_k: 반환할 결과 수
        min_similarity: 최소 유사도 임계값 (기본값: 0.6 = 60%)
    
    Returns:
        relevant_content: 포맷된 검색 결과 텍스트
        search_results: 검색 결과 리스트
    """
    
    if collection is None:
        return "", []
    
    try:
        # 벡터 검색 실행
        results = collection.query(
            query_texts=[query],
            n_results=top_k
        )
        
        # 결과가 없는 경우
        if not results['documents'][0]:
            return "", []
        
        # 결과 포맷팅
        relevant_content = ""
        search_results = []
        
        for i, (doc, metadata, distance) in enumerate(zip(
            results['documents'][0],
            results['metadatas'][0],
            results['distances'][0]
        )):
            # 유사도 점수 계산
            similarity = 1 - distance
            
            # 유사도가 임계값 이상인 경우만 포함
            if similarity >= min_similarity:
                search_results.append({
                    'content': doc,
                    'file_name': metadata['file_name'],
                    'file_path': metadata['file_path'],
                    'chunk_id': metadata['chunk_id'],
                    'similarity': similarity
                })
                
                relevant_content += f"\n\n## 📌 관련 케이스 {len(search_results)} (유사도: {similarity:.2%})\n"
                relevant_content += f"**파일:** {metadata['file_name']}\n"
                relevant_content += f"**청크:** {metadata['chunk_id']}\n\n"
                relevant_content += f"```\n{doc[:500]}{'...' if len(doc) > 500 else ''}\n```\n"
                relevant_content += "-" * 80
        
        return relevant_content, search_results
        
    except Exception as e:
        print(f"⚠️  검색 실패: {e}")
        return "", []

print("✅ 검색 함수 정의 완료 (최소 유사도: 60%)")

Chatbot 함수 정의

def chat(message, history):
    """
    RAG 기능이 통합된 챗봇
    - 질문과 관련된 케이스를 ChromaDB에서 검색 (유사도 60% 이상만 사용)
    - 검색 결과를 System Prompt에 추가
    - AI가 실제 케이스를 참고하여 답변 생성
    """
    
    # 1. 관련 케이스 검색 (유사도 60% 이상)
    print(f"\n🔍 검색 쿼리: '{message}'")
    
    try:
        relevant_cases, search_results = search_relevant_cases(
            message, 
            collection, 
            top_k=3,
            min_similarity=0.5  # 60% 이상만 사용
        )
        
        if search_results:
            print(f"✅ {len(search_results)}개 관련 케이스 발견 (유사도 60% 이상):")
            for i, result in enumerate(search_results):
                print(f"   {i+1}. {result['file_name']} (유사도: {result['similarity']:.2%})")
        else:
            print("⚠️  유사도 60% 이상인 관련 케이스 없음 (일반 지식으로 답변)")
    except Exception as e:
        print(f"⚠️  검색 실패: {e}")
        relevant_cases = ""
        search_results = []
    
    # 2. System Prompt 구성
    if "patent" in message:
        system = system_prompt + "\n\nEverything in your reply needs to be in pig latin - \
              it is mandatory that you respond only and entirely in pig latin"
    else:
        system = system_prompt
    
    # 3. 관련 케이스 추가 (RAG의 핵심!)
    if relevant_cases:
        system += f"\n\n## 🔍 관련 작업 경험 (실제 케이스 from ChromaDB):\n{relevant_cases}\n"
        system += "\n**중요:** 위 케이스들을 참고하여 구체적이고 정확한 답변을 제공하세요. "
        system += "실제 프로젝트 경험을 바탕으로 답변하되, 자연스럽게 대화하세요.\n"
    
    # 4. 메시지 구성
    messages = [
        {"role": "system", "content": system}
    ] + history + [
        {"role": "user", "content": message}
    ]
    
    # 5. AI 응답 생성
    print("🤖 AI 응답 생성 중...")
    response = openai.chat.completions.create(
        model=model_name,
        messages=messages
    )
    reply = response.choices[0].message.content
    
    print("✅ 응답 생성 완료")
    return reply

print("✅ RAG 통합 챗봇 함수 정의 완료")

Chatbot 실행전 최종 점검

print("🔍 RAG 시스템 상태 확인...\n")

# 1. ChromaDB 클라이언트 확인
try:
    print(f"✅ ChromaDB 클라이언트: OK")
except NameError:
    print("❌ ChromaDB 클라이언트가 없습니다!")

# 2. 컬렉션 확인
try:
    if collection:
        count = collection.count()
        print(f"✅ 컬렉션 로드됨: {count}개 벡터")
    else:
        print("⚠️  컬렉션이 None입니다. initialize_rag.ipynb를 먼저 실행하세요.")
except NameError:
    print("❌ 컬렉션이 로드되지 않았습니다!")

# 3. 검색 함수 확인
if collection:
    try:
        test_results = search_relevant_cases("테스트", collection, top_k=1)
        print(f"✅ 검색 함수 작동 확인")
    except Exception as e:
        print(f"❌ 검색 함수 오류: {e}")

print("\n🚀 모든 준비 완료! 챗봇을 실행하세요.")

Gradio 챗봇

print("=" * 80)
print("🤖 RAG 통합 챗봇 시작!")
print("=" * 80)
print("\n💡 팁:")
print("   - 질문하면 자동으로 관련 케이스를 검색합니다")
print("   - 콘솔에서 검색 결과를 확인할 수 있습니다\n")

# Gradio 인터페이스 실행
gr.ChatInterface(
    chat, 
    type="messages",
    title="🤖 RAG 통합 개인 챗봇",
    description=f"{name}의 실제 프로젝트 경험을 바탕으로 답변하는 AI 챗봇",
    examples=[
        "EKS 트러블슈팅 경험이 있나요?",
        "CI/CD 파이프라인을 구축한 경험을 말해주세요",
        "AWS 아키텍처 설계 경험은?",
        "Kubernetes 배포 자동화 경험"
    ]
).launch()

위 코드들을 실행하면 다음과 같이 Gradio 인터페이스를 통해 나의 정보를 가지고 있는 LLM과 대화를 할 수 있다. 정보들이 엄청 정확하다고 할 수는 없지만 기본적인 RAG 아키텍처와 컨셉을 통해 나의 정보에 대해 어느정도 정확히 분석하여 답변을 해주는 것을 알 수 있다.