'StanfordNLP' 태그의 글 목록

StanfordNLP

[AI] StanfordNLP, Khaiii (Kakao Hangul Analyzer III) 설명 및 예제 2020.12.18

[AI] StanfordNLP, Khaiii (Kakao Hangul Analyzer III) 설명 및 예제

ASHE SUN 2020. 12. 18. 09:52

2020. 12. 18. 09:52

StanfordNLP

StanfordNLP는 Python으로 패키징 된 자연어 처리 라이브러리
Pytorch를 활용하며, LSTM 기반의 sequential tagging 모델을 활용 → lstm 다대다 모델을 활용하면 토큰마다 결과가 나오게 할 수 있는데 거기에 태그(tagging)를 달아준다.
CoNLL 2018 shared task에서 높은 점수를 획득

https://stanfordnlp.github.io/stanza/

https://stanfordnlp.github.io/stanza/performance.html

기본적인 전처리를 다 할 수 있는 기능을 가지고 있음

https://stanfordnlp.github.io/stanza/available_models.html
→ 제공되는 모델들의 리스트를 확인할 수 있음, 성능에 대한 부분도 나와있음

다른 언어를 사용할 때 활용하는 것을 추천 → 성능이 좋음

StanfordNLP 실습 - StanfordNLP.ipynb

https://stanfordnlp.github.io/stanza/#getting-started https://stanfordnlp.github.io/stanza/#getting-started

!pip install stanza

# 학습되있는 모델을 다운 받는 부분
import stanza
stanza.download('en')

# initialization (초기화) 원하는 언어에 대한 것만 넣으면 됨
nlp = stanza.Pipeline('en')

https://stanfordnlp.github.io/stanza/pipeline.html

→ Pipeline을 쓰면 tokenize, mwt 등 기본적인 전처리가 돼서 결과가 나온다.

rersult = nlp('Barack Obama was born in Hawaii')

print(result)

딥러닝을 활용한 모델이다보니 CPU만 활용을 하면 너무 느려서 GPU를 사용할 수 있을 때는 GPU를 사용하게 내부적으로 구현이 돼있음

https://stanfordnlp.github.io/stanza/performance.html

→ 해당 dataset에 대한 성능이 나와있음

→ token, sentence, word, UPOS → 각 정확도들

→ 개체명 인식 같은 경우 원래 dataset에 포함이 안 돼있었음 그래서 별도로 WikiNER에서 dataset을 가지고 본인들이 따로 학습을 해서 같이 공개를 함

Khaiii (Kakao Hangul Analyzer III)

카카오에서 공개한 한국어 형태소 분석기

https://github.com/kakao/khaiii

윈도우에서는 잘 동작을 안 한다.

Khaiii 실습 - khaiii.ipynb

Khaiii는 python을 사용을 할 수 있게 했지만 내부는 C언어로 돌아감 - 속도 때문에
설치 명령어 - 다운로드하는데 시간이 꽤 걸린다.

!git clone https://github.com/kakao/khaiii.git
!pip install cmake
!mkdir build
!cd build && cmake /content/khaiii
!cd /content/build/ && make all
!cd /content/build/ && make resource
!cd /content/build && make install
!cd /content/build && make package_python
!pip install /content/build/package_python

→ 속도 때문에 C언어로 코드를 짜서 복잡함, 대신 속도가 빠름

→ 설치가 오래 걸리지만 서버에서는 한 번만 설치하면 편하게 쓸 수 있음

from khaiii import KhaiiiApi

api = KhaiiiApi()

sentence = "안녕, 세상."
tagged = api.analyze(sentence)

for word in tagged:
  print(word)

→ 형태소 분석 결과 : 어절 단위로 자르고 어절마다 형태소가 tagging이 됐는지 결과가 나옴

Colab에서는 잘 되지만 내 서버에서는 잘 안될 수도 있음 → 그러면 깃허브에 빌드 및 설치 문서 참고

'AI > 자연어처리' 카테고리의 다른 글

[AI] 최신 기술 이해 및 실습 (Transformers, Self Attention, GPT, BERT 등) (0)	2020.12.18
[AI] BPE (Byte Pair Encoding) 설명 및 예제 (0)	2020.12.18
[AI] 어텐션 메커니즘(Attention Mechanism)이란? - 개념 및 예제 (0)	2020.12.17
[AI] Annoy (Approximate Nearest Neighbors Oh Yeah) 설명 및 예제 (0)	2020.12.17
[AI] Embedding + LSTM 분류 예제 (0)	2020.12.16

PREV 이전 1 NEXT 다음

혼코딩

StanfordNLP

[AI] StanfordNLP, Khaiii (Kakao Hangul Analyzer III) 설명 및 예제

StanfordNLP

StanfordNLP 실습 - StanfordNLP.ipynb

Khaiii (Kakao Hangul Analyzer III)

Khaiii 실습 - khaiii.ipynb

'AI > 자연어처리' 카테고리의 다른 글

+ Recent posts

티스토리툴바