본문 바로가기

LLM

RAG - Industry Applications

https://learn.deeplearning.ai/courses/building-multimodal-search-and-rag/

 

Building Multimodal Search and RAG - DeepLearning.AI

Build smarter search and RAG applications for multimodal retrieval and generation.

learn.deeplearning.ai

 

실제 산업에 적용될 수 있는 예시들!

인풋이 영수증이나 송장이면 정보들을 json 데이터로 추출

 

회사 투자 덱이면 마크다운 테이블 형식으로 추출

 

logical flow chart의 implement를 text나 python code로 받고싶다면 이것도 가능

 

import warnings
warnings.filterwarnings("ignore")

import os
from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file
GOOGLE_API_KEY=os.getenv("GOOGLE_API_KEY")

import google.generativeai as genai
from google.api_core.client_options import ClientOptions
genai.configure(
    api_key=GOOGLE_API_KEY,
    transport="rest",
    client_options=ClientOptions(
        api_endpoint=os.getenv("GOOGLE_API_BASE"),
    )
)

# Vision Function
import textwrap
import PIL.Image
from IPython.display import Markdown, Image

def to_markdown(text):
    text = text.replace("•", "  *")
    return Markdown(textwrap.indent(text, "> ", predicate=lambda _: True))

def call_LMM(image_path: str, prompt: str, plain_text: bool=False) -> str:
    img = PIL.Image.open(image_path)

    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content([prompt, img], stream=False)
    response.resolve()
    
    if(plain_text):
        return response.text
    else:
        return to_markdown(response.text)

 

Extracting Structured Data from Retreived Images

# Analyzing an invoice
from IPython.display import Image

Image(url="invoice.png")

call_LMM("invoice.png",
    """Identify items on the invoice, Make sure you output 
    JSON with quantity, description, unit price and ammount.""")
    
# Ask something else
call_LMM("invoice.png",
    """How much would four sets pedal arms cost
    and 6 hours of labour?""",
    plain_text=True
)

비전 모델이 사진에서 그 정보를 추출하고 나서 가격 계산해야 하는데,

실제로 페달 암 두 개의 가격을 공제할 수 있었음

시간당 5달러로 계산한 6시간의 인건비도 총 30달러로 정확하게 계산함

 

# Extracting Tables from Images
Image("prosus_table.png")

call_LMM("prosus_table.png", 
    "Print the contents of the image as a markdown table.")

call_LMM("prosus_table.png", 
    """Analyse the contents of the image as a markdown table.
    Which of the business units has the highest revenue growth?""")

 

# Analyzing Flow Charts
Image("swimlane-diagram-01.png")

call_LMM("swimlane-diagram-01.png", 
    """Provide a summarized breakdown of the flow chart in the image
    in a format of a numbered list.""")
    
call_LMM("swimlane-diagram-01.png", 
    """Analyse the flow chart in the image,
    then output Python code
    that implements this logical flow in one function""")

비전 모델의 무작위성 때문에 동일한 함수를 다시 실행하면 실행할 때마다 다른 함수가 나올 수 있음

 

def order_fulfillment(client, online_shop, courier_company):
   # This function takes three objects as input:
   # - client: the client who placed the order
   # - online_shop: the online shop that received the order
   # - courier_company: the courier company that will deliver the order

   # First, the client places an order.
   order = client.place_order()

   # Then, the client makes a payment for the order.
   payment = client.make_payment(order)

   # If the payment is successful, the order is shipped.
   if payment.status == "successful":
       online_shop.ship_order(order)
       courier_company.transport_order(order)
   
   # If the payment is not successful, the order is canceled.
   else:
       online_shop.cancel_order(order)
       client.refund_order(order)

   # Finally, the order is invoiced.
   online_shop.invoice_order(order)

다른 예제로도 여러 시도 해보면 좋을듯

'LLM' 카테고리의 다른 글

RAG - Multimodal Recommender System  (0) 2026.02.19
RAG - Multimodal RAG (MM-RAG)  (0) 2026.02.19
RAG - Large Multimodal models  (0) 2026.02.15
RAG - Multimodal search  (0) 2026.02.15
RAG - Overview of multimodality  (0) 2026.02.13

Tiny Star