Skip to main content
  1. Posts/
  2. Study/
  3. YBIGTA/

YBIGTA NLP

·471 words·3 mins
Jiho Kim
Author
Jiho Kim
๋‹ฌ๋ ค ๋˜ ๋‹ฌ๋ ค

๐Ÿ“ ์ƒ์„ธ ์ •๋ฆฌ
#

Classical NLP
#

  • ML๋กœ ํ…์ŠคํŠธ๋ฅผ ์ดํ•ดํ•˜๋ ค๋Š” ์‹œ๋„
    • ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋Š” ํ…์ŠคํŠธ์˜ ํŒจํ„ด์„ ์ปดํ“จํ„ฐ์—๊ฒŒ ์–ด๋–ป๊ฒŒ ๋จน์ด๊ณ  / ์ฒ˜๋ฆฌ๋ฅผ ํ• ๊ฒƒ์ด๋ƒ?
  • NLP์˜ ์—ญ์‚ฌ
    • ๊ทœ์น™ ๊ธฐ๋ฐ˜ NLP
      • Rule Base: ์‚ฌ์ „์— ๋งŒ๋“ค์–ด๋‘” ๊ทœ์น™์— ๊ธฐ๋ฐ˜ํ•ด ์ฒ˜๋ฆฌํ•˜์ž
      • nltk/wordnet: ์œ ์˜์–ด ์‚ฌ์ „(์‹œ์†Œ๋Ÿฌ์Šค) ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ์ธ์‹
      • ๋น„์‹ธ๊ณ 
      • ์ •์ ์ด๊ณ 
      • ๋ชจ๋“  ์ƒํ™ฉ ํ‘œํ˜„์ด ๋ถ€์กฑํ•˜๋‹ค
    • ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ NLP
      • Corpus(๋ง๋ญ‰์น˜)์—์„œ ํ…์ŠคํŠธ์˜ ๊ทœ์น™์„ ์ฐพ์ž
      • ๋‹จ์–ด์˜ ๋ฒกํ„ฐํ‘œํ˜„
      • ๋ถ„ํฌ ๊ฐ€์„ค
        • ๋‹จ์–ด์˜ ์˜๋ฏธ๋Š” ์ฃผ๋ณ€ ๋‹จ์–ด์— ์˜ํ•ด ํ˜•์„ฑ๋œ๋‹ค
      • Cosine ์œ ์‚ฌ๋„…
      • ํ•˜์ง€๋งŒ ๋‹จ์–ด ๋ฒกํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๊ณ ์ฐจ์›์ด๋‹ค
      • SVD๋กœ ์ฐจ์›์ถ•์†Œ๋ฅผ ํ•˜๊ธฐ์—”, ๊ณ„์‚ฐ๋Ÿ‰์ด ๋„ˆ๋ฌด ๋งŽ๋‹ค
      • ๊ฒฐ๊ตญ ํฐ Corpus ์•ˆ์—์„œ ๋‹ค์–‘ํ•œ ๋‹จ์–ด๋“ค์˜ ์˜๋ฏธ๋ฅผ ๋ฒกํ„ฐํ™”ํ•ด์•ผํ•˜๋Š”๋ฐ
        • Corpus๊ฐ€ ์ปค์ง€๋ฉด ํž˜๋“ค๋‹ค

NN Based NLP
#

  • Word2vec
    • Neural Network๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋‹จ์–ด๋ฅผ ์ž„๋ฒ ๋”ฉํ•˜์ž
    • Cbow
      • Continuous Bag of words
      • ์ฃผ์œ„ ๋‹จ์–ด๋กœ ๊ฐ€์šด๋ฐ ๋‹จ์–ด ์˜ˆ์ธก
      • you say goodbye and I say hello
        • you, goodbye -> say
        • say, and -> goodbye
        • goodbye, I -> and
        • …๋ฅผ ๋งž์ถ”๋„๋ก…
      • ๋‹จ์–ด๋“ค์„ id๋กœ ๋ฐ”๊พผ ํ›„, ์›ํ•ซ๋ฒกํ„ฐ๋ฅผ ์ •๋‹ต์œผ๋กœํ•ด์„œ ์ˆœ์ „ํŒŒ, softmax, ์ถœ๋ ฅ, ์—ญ์ „ํŒŒ..
    • Skipgram
      • ์ค‘๊ฐ„๋‹จ์–ด ํ•œ๊ฐœ๋กœ ์ฃผ๋ณ€ n๊ฐœ๋‹จ์–ด์˜ context ์˜ˆ์ธก
    • ์•„๋ฌดํŠผ ๊ทธ ๊ฒฐ๊ณผ ๊ฐ ๋‹จ์–ด๋ณ„ Embedding์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค (๋ฒกํ„ฐํ™” ๋œ๋‹ค)

Sequential & Contextual NLP
#

  • Language Model
    • Word2vec๊นŒ์ง€๋Š” ์ˆœ์„œ, ์žฅ๊ธฐ ๋ฌธ๋งฅ ๊ณ ๋ ค ์—†์ด ๋‹จ์–ด๋“ค์„ embeddingํ•œ๋‹ค.
    • ๋ฌธ๋งฅ์˜ ์ž์—ฐ์Šค๋Ÿฌ์›€์„ ํ‰๊ฐ€ํ•˜๊ณ  ์ž์—ฐ์Šค๋Ÿฌ์šด ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“ค์ž!
  • RNN
    • one to many: ์ฒซ ๋‹จ์–ด ์ž…๋ ฅ์— ๋Œ€ํ•ด ๋ฌธ์žฅ ์˜ˆ์ธก
    • many to one: ๊ฐ์ • label ์˜ˆ์ธก
    • many to many: ๊ธฐ๊ณ„๋ฒˆ์—ญ
    • ๋“ฑ๋“ฑ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๋ชจ๋ธ์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค!
    • ๋ฌธ์ œ: ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค / ํญ๋ฐœ
      • ์‹œํ€€์Šค๊ฐ€ ๋„ˆ๋ฌด ๊ธธ๋ฉด ๋’ท์ชฝ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•ด์„œ ๋‚˜์˜จ ์˜ค์ฐจ์˜ ์—ญ์ „ํŒŒ๊ฐ€ ์•ž์ชฝ์— ๊ฑฐ์˜ ๋ฐ˜์˜๋˜์ง€ ๋ชปํ•œ๋‹ค.
  • LSTM
    • ์˜›๋‚  Weight๊ฐ€ ์‚ฌ๋ผ์ง€๋Š”๊ฒƒ๊ฐ™์•„์„œ, Gate๋ฅผ ๋‹ฌ์•„์„œ ๊ฐ€์ ธ์˜ค๊ฒŒ ํ•˜๊ฒ ๋‹ค!
  • GRU
    • LSTM์ด ๋„ˆ๋ฌด ๋ณต์žกํ•˜๋‹ˆ๊นŒ ์กฐ๊ธˆ ๊ฐ„์†Œํ™”ํ•˜์ž

Transformer Attention
#

  • Seq2Seq
    • ์…€์ด ์•„๋‹Œ ์•„ํ‚คํ…์ณ
      • ์œ„์˜ ์•„ํ‚คํ…์ณ ๋‘๊ฐœ๋ฅผ ๊ฒฐํ•ฉ (์ธ์ฝ”๋” / ๋””์ฝ”๋”)
    • ๋ชจ๋“  ๋ฌธ์žฅ์„ ๋๊นŒ์ง€ ๋“ค์€ ํ›„ ํ•˜๋‚˜์˜ ์™„์ „ํ•œ ๋ฌธ์žฅ ์ƒ์„ฑ
    • ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ์‹œํ€€์Šค์˜ ๊ธธ์ด๊ฐ€ ๋‹ฌ๋ผ๋„ ๋œ๋‹ค!
      • ๋ฒˆ์—ญ์— ๊ฐ€์žฅ ๋งŽ์ด ์“ด๋‹ค
    • ๋ฌธ์ œ
      • ๋‹จ๊ณ„์  ์—ฐ์‚ฐ์ด ๋„ˆ๋ฌด ๋А๋ฆฌ๋‹ค
      • ๊ธด์‹œํ€€์Šค์—์„œ ์ •๋ณด๊ฐ€ ์ถฉ๋ถ„ํžˆ ์ „๋‹ฌ๋˜์ง€ ์•Š๋Š”๋‹ค
  • Transformer
    • ๊ฐœ์„ ์ 
      • RNN๊ณ„์—ด ์…€ ๋ฐฐ์ œ -> Transformer block ์‚ฌ์šฉ
      • ํฌ์ง€์…”๋„ ์ธ์ฝ”๋”ฉ - ์‹œํ€€์Šค ๋น„์ˆœ์ฐจ์  ์ž…๋ ฅ -> ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ
      • Self attention - ๊ธด ๋ฌธ์žฅ์˜ ์žฅ๊ธฐ๋งฅ๋ฝ
    • Encoder + Decoder ๊ตฌ์กฐ๋Š” ๊ทธ๋Œ€๋กœ ๊ตฌํ˜„ํ•˜์˜€์Œ!
    • Attention
      • seq2seq์—์„œ์˜ Attention
        • ๋ฒˆ์—ญํ•  ๋•Œ ๋‹จ์–ด “์ •๋ณด"์™€ “imformation"์˜ ๊ด€๊ณ„๊ฐ€ ํฌ์ง€ ์•Š์„๊นŒ?
          • ์ด ์ ์ˆ˜๊ฐ€ Attention Score
      • ๋ฌธ๋งฅ ์ƒ ํ•œ ํ† ํฐ๊ณผ ๊ด€๋ จ์ด ๋†’์€ ๋‹ค๋ฅธ ํ† ํฐ์˜ ์ž„๋ฒ ๋”ฉ๊ณผ์˜ ๊ด€๋ จ๋„๋ฅผ ๊ตฌํ•˜๊ฒ ๋‹ค!
      • ๋งฅ๋ฝ๊ณผ ํฌ์ง€์…”๋‹์œผ๋กœ ํ•ด๊ฒฐํ•˜์ž
    • Self Attention
      • ๋‹ค๋ฅธ ํ† ํฐ์—์„œ ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์„ ํƒ์ ์œผ๋กœ ์ง‘๊ณ„ํ•ด ํ˜„์žฌ ํ† ํฐ ํ‘œํ˜„์„ ๊ฐฑ์‹ 
    • Multi Head Attention
      • ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๊ด€๊ณ„๋ฅผ ๋ณ‘๋ ฌ์ ์œผ๋กœ ํฌ์ฐฉํ•ด ์ง‘๊ณ„ ์‹ ํ˜ธ์˜ ๋‹ค์–‘์„ฑ ์ฆ๊ฐ€
    • Transformer ๋ณ€ํ˜•
      • Encoder & Decoder
        • ์ธ์ฝ”๋”๋งŒ ์ž˜๋ผ์“ฐ๋ฉด BERT
        • ๋””์ฝ”๋”๋งŒ ์ž˜๋ผ์“ฐ๋ฉด GPT
        • LLM์€ ๋ณดํ†ต ๋””์ฝ”๋” ๋ฒ ์ด์Šค
      • BERT ๊ณ„์—ด
        • Representation Model
      • GPT ๊ณ„์—ด
        • Generative Pretrained Transformer
  • LLM
    • Pretrain
      • ์ง€์‹ + ๋ฌธ๋ฒ•์ ์œผ๋กœ ๋งž๊ฒŒ ๊ธ€์“ฐ๊ธฐ ์Šต๋“
      • ๋ฐ์ดํ„ฐ์…‹ corpus๊ฐ™์€๊ฒƒ์„
      • ํ† ํฌ๋‚˜์ด์ œ์ด์…˜ ํ•ด์„œ
      • ์ž„๋ฒ ๋”ฉ + ํฌ์ง€์…”๋„ ์ธ์ฝ”๋”ฉ์„ ํ•˜๊ณ 
      • Masked ๋ฉ€ํ‹ฐํ—ค๋“œ ์–ดํ…์…˜์„ ๋จน์—ฌ์„œ
      • Loss๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ์—ญ์ „ํŒŒ๋ฅผ ํ•˜๊ฒ ๋‹ค!
    • Post Train
      • Pretrain ํ›„์—๋Š” ๊ธ€์€ ์ž˜์“ฐ๋Š”๋ฐ ์งˆ๋ฌธ -> ๋Œ€๋‹ต์ด๋‚˜, ํ• ๋ฃจ์‹œ๋„ค์ด์…˜ ๋ฐฉ์ง€๋“ฑ์ด ์•ˆ๋œ๋‹ค.
        • ์ด๊ฑธ ์žก์ž.
      • ๋„๋ฉ”์ธ ํŠนํ™” ์ง€์‹ + ์„ ํ˜ธ ๋ฐ ์ง€์‹์— ๋งž๊ฒŒ ํŠœ๋‹
      • Supervised Fine Tuning
        • ex) instruction Tuning
        • ์ง€์‹œ๋ฅผ ๋ณด๋ฉด ์ด๋Ÿฐ ํ˜•ํƒœ๋กœ ๋‹ตํ•˜๋„๋ก ํ•™์Šต
      • Reinforcement Learning
        • ex) RLHF, GRPO…
        • ์ง€์‹œ๋ฅผ ๋”ฐ๋ฅด๋˜, ์‚ฌ๋žŒ ์ทจํ–์— ๋” ๋งž๊ฒŒ ๋‹ค๋“ฌ๊ธฐ
    • Agent
      • LLM ์–ธ์–ด๋ชจ๋ธ์—๊ฒŒ Tool์„ ๋ถ€์—ฌ

โ”์งˆ๋ฌธ ์‚ฌํ•ญ
#

๐Ÿ”— ์ฐธ๊ณ  ์ž๋ฃŒ
#