성능이 좋아졌는지 아닌지 어케 평가할건데!

필요한 패키지 임포트
사용하던 데이터 불러오고 인덱스 생성
retriever QA Chain 생성
언어모델, 체인 종류, 검색기, 출력할 상세 수준 (verbosity) 지정
어떤 데이터 포인트들을 기준으로 어플리케이션 평가하고 싶은지 파악하기
1. 좋은 예시라고 생각하는 데이터 포인트 정하기
데이터 일부를 직접 살펴보고
예시 질문과 그에 대한 정답 만들어 나중에 평가에 재사용할 수 있음


이 방법은 확장성이 좋지 않음
예시를 하나하나 살펴본 다음 무슨 일이 일어나고 있는지 파악하는 데 시간 꽤 걸림
이 과정을 자동화할 수 있을까?
-> 언어 모델 자체를 이용하자!
LangChain에 이를 수행할 수 있는 체인있음
OA 생성 체인 임포트하면
from langchain.evaluation.qa import QAGenerateChain
example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI(model=llm_model))
문서들 불러와 각 문서에서 질문-답변 쌍 생성
이 과정에 언어 모델 사용됨

OpenAI 언어 모델 통과시켜서 체인 생성하면 더 많은 예시들 생성할 수 있음
apply와 parse 메소드 사용 (결과에 출력 parser 적용)
쿼리-답변 쌍 담긴 딕셔너리로 결과 받기
예시는 완성했는데 평가는 어떻게 할까?
예제를 체인에 넣어 실행하고 그 결과 살펴보기

쿼리 전달하고 그에 대한 답변 받지만
체인 내부에서 실제로 무슨 일이 일어나는지 확인 어려움
(어떤 프롬프트가 언어 모델에 입력되고 검색하는 문서는 무엇인지 등등)
최종 답변만 보고 체인 내 오류나 위험 요소 파악하기 어려움
-> LangChain은 이를 도울 수 있는 도구 제공
langchain.debug

langchain.debug = True 로 설정하고 예제 실행
[chain/start] [1:chain:RetrievalQA] Entering Chain run with input:
{
"query": "Do the Cozy Comfort Pullover Set have side pockets?"
}
[chain/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] Entering Chain run with input:
{
"question": "Do the Cozy Comfort Pullover Set have side pockets?",
"context": ": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.<<<<>>>>>: 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \nRelaxed fit top with raglan sleeves and rounded hem. \nPull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg. \nImported.<<<<>>>>>: 632\nname: Cozy Comfort Fleece Pullover\ndescription: The ultimate sweater fleece \u2013 made from superior fabric and offered at an unbeatable price. \n\nSize & Fit\nSlightly Fitted: Softly shapes the body. Falls at hip. \n\nWhy We Love It\nOur customers (and employees) love the rugged construction and heritage-inspired styling of our popular Sweater Fleece Pullover and wear it for absolutely everything. From high-intensity activities to everyday tasks, you'll find yourself reaching for it every time.\n\nFabric & Care\nRugged sweater-knit exterior and soft brushed interior for exceptional warmth and comfort. Made from soft, 100% polyester. Machine wash and dry.\n\nAdditional Features\nFeatures our classic Mount Katahdin logo. Snap placket. Front princess seams create a feminine shape. Kangaroo handwarmer pockets. Cuffs and hem reinforced with jersey binding. Imported.\n\n \u2013 Official Supplier to the U.S. Ski Team\nTHEIR WILL TO WIN, WOVEN RIGHT IN. LEARN MORE<<<<>>>>>: 151\nname: Cozy Quilted Sweatshirt\ndescription: Our sweatshirt is an instant classic with its great quilted texture and versatile weight that easily transitions between seasons. With a traditional fit that is relaxed through the chest, sleeve, and waist, this pullover is lightweight enough to be worn most months of the year. The cotton blend fabric is super soft and comfortable, making it the perfect casual layer. To make dressing easy, this sweatshirt also features a snap placket and a heritage-inspired Mt. Katahdin logo patch. For care, machine wash and dry. Imported."
}
[llm/start] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] Entering LLM run with input:
{
"prompts": [
"System: Use the following pieces of context to answer the users question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\n: 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.<<<<>>>>>: 73\nname: Cozy Cuddles Knit Pullover Set\ndescription: Perfect for lounging, this knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out. \n\nSize & Fit \nPants are Favorite Fit: Sits lower on the waist. \nRelaxed Fit: Our most generous fit sits farthest from the body. \n\nFabric & Care \nIn the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features \nRelaxed fit top with raglan sleeves and rounded hem. \nPull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg. \nImported.<<<<>>>>>: 632\nname: Cozy Comfort Fleece Pullover\ndescription: The ultimate sweater fleece \u2013 made from superior fabric and offered at an unbeatable price. \n\nSize & Fit\nSlightly Fitted: Softly shapes the body. Falls at hip. \n\nWhy We Love It\nOur customers (and employees) love the rugged construction and heritage-inspired styling of our popular Sweater Fleece Pullover and wear it for absolutely everything. From high-intensity activities to everyday tasks, you'll find yourself reaching for it every time.\n\nFabric & Care\nRugged sweater-knit exterior and soft brushed interior for exceptional warmth and comfort. Made from soft, 100% polyester. Machine wash and dry.\n\nAdditional Features\nFeatures our classic Mount Katahdin logo. Snap placket. Front princess seams create a feminine shape. Kangaroo handwarmer pockets. Cuffs and hem reinforced with jersey binding. Imported.\n\n \u2013 Official Supplier to the U.S. Ski Team\nTHEIR WILL TO WIN, WOVEN RIGHT IN. LEARN MORE<<<<>>>>>: 151\nname: Cozy Quilted Sweatshirt\ndescription: Our sweatshirt is an instant classic with its great quilted texture and versatile weight that easily transitions between seasons. With a traditional fit that is relaxed through the chest, sleeve, and waist, this pullover is lightweight enough to be worn most months of the year. The cotton blend fabric is super soft and comfortable, making it the perfect casual layer. To make dressing easy, this sweatshirt also features a snap placket and a heritage-inspired Mt. Katahdin logo patch. For care, machine wash and dry. Imported.\nHuman: Do the Cozy Comfort Pullover Set have side pockets?"
]
}
[llm/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain > 4:llm:ChatOpenAI] [85.273ms] Exiting LLM run with output:
{
"generations": [
[
{
"text": "Yes, the Cozy Comfort Pullover Set does have side pockets.",
"generation_info": null,
"message": {
"content": "Yes, the Cozy Comfort Pullover Set does have side pockets.",
"additional_kwargs": {},
"example": false
}
}
]
],
"llm_output": {
"token_usage": {
"prompt_tokens": 732,
"completion_tokens": 14,
"total_tokens": 746
},
"model_name": "gpt-3.5-turbo"
}
}
[chain/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain > 3:chain:LLMChain] [85.76899999999999ms] Exiting Chain run with output:
{
"text": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}
[chain/end] [1:chain:RetrievalQA > 2:chain:StuffDocumentsChain] [86.165ms] Exiting Chain run with output:
{
"output_text": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}
[chain/end] [1:chain:RetrievalQA] [170.14399999999998ms] Exiting Chain run with output:
{
"result": "Yes, the Cozy Comfort Pullover Set does have side pockets."
}
'Yes, the Cozy Comfort Pullover Set does have side pockets.'
훨씬 더 많은 정보가 출력됨
먼저 retriever QA 체인으로 들어갔다가
Document 체인으로 들어감
stuff 메소드 사용
LLM 체인 들어감 (몇가지 입력값 필요)
context 넣을 때, 지금까지 retriever한 여러 문서들로부터 생성된 context 넣음
질문에 대해 잘못된 답변 출력되는 경우
보통은 언어 모델 자체의 문제가 아님
실제로는 retriever 단계에서 문제 발생
-> 정확히 어떤 질문인지, 어떤 맥락인지 살펴보면 오류 디버깅에 도움됨
언어 모델에 정확히 어떤 게 입력되는지 OpenAI 자체를 살펴볼 수 있음
사용된 프롬프트에 대한 설명도 있음
반환값의 타입에 대한 더 많은 정보도 확인 가능
문자열 뿐만 아니라 더 많은 정보 반환
-> 체인이나 언어 모델 불러올 때 사용하는 토큰 추적에 매우 유용
이전처럼 수동으로 평가하는 방법도 있지만 귀!찮!기! 때문에
언어 모델에게 작업 요청해보자
모든 예제에 대해 예측 결과 만들기

총 7개의 예시 있었으니 체인 루프 7번 반복하며 각각에 대한 예측값 얻음

QA, question answering, eval chain import
언어 모델 활용해 평가 진행하기 위해 언어 모델로 체인 만들고
이 체인에 evaluate 호출
예시와 예측값 전달하고
이에 대한 다양한 평가 결과를 출력으로 받기
Example 0:
Question: Do the Cozy Comfort Pullover Set have side pockets?
Real Answer: Yes
Predicted Answer: Yes, the Cozy Comfort Pullover Set does have side pockets.
Predicted Grade: CORRECT
Example 1:
Question: What collection is the Ultra-Lofty 850 Stretch Down Hooded Jacket from?
Real Answer: The DownTek collection
Predicted Answer: The Ultra-Lofty 850 Stretch Down Hooded Jacket is from the DownTek collection.
Predicted Grade: CORRECT
Example 2:
Question: What is the approximate weight of the Women's Campside Oxfords per pair?
Real Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Answer: The approximate weight of the Women's Campside Oxfords per pair is 1 lb. 1 oz.
Predicted Grade: CORRECT
Example 3:
Question: What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?
Real Answer: The small size has dimensions of 18" x 28" and the medium size has dimensions of 22.5" x 34.5".
Predicted Answer: The dimensions of the small size of the Recycled Waterhog Dog Mat, Chevron Weave are 18" x 28", and the dimensions of the medium size are 22.5" x 34.5".
Predicted Grade: CORRECT
Example 4:
Question: What are some key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece as described in the document?
Real Answer: The key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for secure fit and maximum coverage.
Predicted Answer: Some key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece are:
- Bright colors, ruffles, and exclusive whimsical prints
- Four-way-stretch and chlorine-resistant fabric
- UPF 50+ rated fabric for high sun protection
- Crossover no-slip straps for a secure fit
- Fully lined bottom for maximum coverage
- Machine washable and line dry for best results
Predicted Grade: CORRECT
Example 5:
Question: What is the fabric composition of the Refresh Swimwear V-Neck Tankini Contrasts?
Real Answer: The body of the tankini top is made of 82% recycled nylon and 18% Lycra® spandex, while the lining is made of 90% recycled nylon and 10% Lycra® spandex.
Predicted Answer: The fabric composition of the Refresh Swimwear V-Neck Tankini Contrasts is as follows:
- Body: 82% recycled nylon, 18% Lycra® spandex
- Lining: 90% recycled nylon, 10% Lycra® spandex
Predicted Grade: CORRECT
Example 6:
Question: What technology sets the EcoFlex 3L Storm Pants apart from other waterproof pants?
Real Answer: The EcoFlex 3L Storm Pants feature TEK O2 technology, which offers the most breathability ever tested in waterproof pants.
Predicted Answer: The EcoFlex 3L Storm Pants feature TEK O2 technology, which offers the most breathability tested. This technology allows air to permeate while keeping water out, providing maximum comfort and dryness during various outdoor activities.
Predicted Grade: CORRECT
이 질문도 언어 모델이 생성함
그 질문의 실제 정답 출력
전체 문서를 가지고 있는 언어 모델에 의해 질문이 생성되었기 때문에 실제 정답 만들 수 있음
그 다음 예측한 답변 출력
정답 예측은 언어 모델이 QA 체인 수행하거나
임베딩과 벡터 디비를 사용해 retriever한 결과를
언어 모델에 전달하는 과정 거쳐 정답 예측
그 다음으로 grade 출력
이 또한 언어 모델에 의해 생성
eval 체인에 현재 일어나고 있는 일에 대해 평가하고
정답 여부를 판단해달라고 요청하는 과정 통해 생성됨
각 모든 예시에 대해 출력하면 그에 대한 세부 사항 확인 가능
임의의 문자열들로 답변 구성한다고 했을때
실제로 하나의 답이 존재하는 건 불가능하고
매우 다양한 변형이 있을 것.
서로 같은 의미 가지고 있다면 평가도 비슷하게 나와야 함
이런 부분에서 exact (일치?) 매칭과 달리 언어 모델이 큰 도움 줄 수 있음
현재 가장 흥미롭고 핫한 휴리스틱 중 하나가
언어 모델을 사용하는 것
'LLM' 카테고리의 다른 글
| RAG - Overview of multimodality (0) | 2026.02.13 |
|---|---|
| LangChain - Agents (0) | 2026.01.19 |
| LangChain - Q&A over Documents (0) | 2026.01.19 |
| LangChain - Chains (0) | 2026.01.12 |
| LangChain - Memory (1) | 2026.01.12 |