Package 'elbird' reference manual

Title:	Blazing Fast Morphological Analyzer Based on Kiwi(Korean Intelligent Word Identifier)
Description:	This is the R wrapper package Kiwi(Korean Intelligent Word Identifier), a blazing fast speed morphological analyzer for Korean. It supports configuration of user dictionary and detection of unregistered nouns based on frequency.
Authors:	Chanyub Park [aut, cre]
Maintainer:	Chanyub Park <[email protected]>
License:	LGPL (>= 3)
Version:	0.2.5
Built:	2025-03-30 07:28:23 UTC
Source:	https://github.com/mrchypark/elbird

Simple version of analyze function.

Description

Simple version of analyze function.

Usage

analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)
analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)

Arguments

`text`	target text.
`top_n`	`integer`: Number of result. Default is 3.
`match_option`	`Match`: use Match. Default is Match$ALL
`stopwords`	stopwords option. Default is TRUE which is to use embaded stopwords dictionany. If FALSE, use not embaded stopwords dictionany. If char: path of dictionary txt file, use file. If `Stopwords` class, use it. If not valid value, work same as FALSE. Check `analyze()` how to use stopwords param.

Examples

## Not run: 
  analyze("Test text.")
  analyze("Please use Korean.", top_n = 1)
  analyze("Test text.", 1, Match$ALL_WITH_NORMALIZING)
  analyze("Test text.", stopwords = FALSE)
  analyze("Test text.", stopwords = TRUE)
  analyze("Test text.", stopwords = "user_dict.txt")
  analyze("Test text.", stopwords = Stopwords$new(TRUE))

## End(Not run)
## Not run: 
  analyze("Test text.")
  analyze("Please use Korean.", top_n = 1)
  analyze("Test text.", 1, Match$ALL_WITH_NORMALIZING)
  analyze("Test text.", stopwords = FALSE)
  analyze("Test text.", stopwords = TRUE)
  analyze("Test text.", stopwords = "user_dict.txt")
  analyze("Test text.", stopwords = Stopwords$new(TRUE))

## End(Not run)

Get kiwi language model file.

Description

Get kiwi language model file.

Usage

get_model(size = "base", path = model_home(), clean = FALSE)
get_model(size = "base", path = model_home(), clean = FALSE)

Arguments

`size`	"small", "base", "large" model. default is "base". Also "all" available.
`path`	path for model files. default is `model_home()`.
`clean`	remove previous model files before get new.

Source

https://github.com/bab2min/Kiwi/releases

Examples

## Not run: 
  get_model("small")

## End(Not run)
## Not run: 
  get_model("small")

## End(Not run)

Kiwi Class

Description

Kiwi class is provide method for korean mophological analyze result.

Methods

Method `print()`

print method for Kiwi objects

Usage

Kiwi$print(x, ...)

Arguments

x: self
...: ignored

Method `new()`

Create a kiwi instance.

Usage

Kiwi$new(
  num_workers = 0,
  model_size = "base",
  integrate_allomorph = TRUE,
  load_default_dict = TRUE
)

Arguments

num_workers: int(optional): use multi-thread core number. default is 0 which means use all core.
model_size: char(optional): kiwi model select. default is "base". "small", "large" is available.
integrate_allomorph: bool(optional): default is TRUE.
load_default_dict: bool(optional): use defualt dictionary. default is TRUE.

Method `add_user_word()`

add user word with pos and score

Usage

Kiwi$add_user_word(word, tag, score, orig_word = "")

Arguments

word: char(required): target word to add.
tag: Tags(required): tag information about word.
score: num(required): score information about word.
orig_word: char(optional): origin word.

Method `add_pre_analyzed_words()`

TODO

Usage

Kiwi$add_pre_analyzed_words(form, analyzed, score)

Arguments

form: char(required): target word to add analyzed result.
analyzed: data.frame(required): analyzed result expected.
score: num(required): score information about pre analyzed result.

Method `add_rules()`

TODO

Usage

Kiwi$add_rules(tag, pattern, replacement, score)

Arguments

tag: Tags(required): target tag to add rules.
pattern: char(required): regular expression.
replacement: char(required): replace text.
score: num(required): score information about rules.

Method `load_user_dictionarys()`

add user dictionary using text file.

Usage

Kiwi$load_user_dictionarys(user_dict_path)

Arguments

user_dict_path: char(required): path of user dictionary file.

Method `extract_words()`

Extract Noun word candidate from texts.

Usage

Kiwi$extract_words(
  input,
  min_cnt,
  max_word_len,
  min_score,
  pos_threshold,
  apply = FALSE
)

Arguments

input: char(required): target text data
min_cnt: int(required): minimum count of word in text.
max_word_len: int(required): max word length.
min_score: num(required): minimum score.
pos_threshold: num(required): pos threashold.
apply: bool(optional): apply extracted word as user word dict.

Method `analyze()`

Analyze text to token and tag results.

Usage

Kiwi$analyze(text, top_n = 3, match_option = Match$ALL, stopwords = FALSE)

Arguments

text: char(required): target text.
top_n: int(optional): number of result. Default is 3.
match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

list of result.

Method `tokenize()`

Analyze text to token and pos result just top 1.

Usage

Kiwi$tokenize(
  text,
  match_option = Match$ALL,
  stopwords = FALSE,
  form = "tibble"
)

Arguments

text: char(required): target text.
match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is FALSE which is use nothing. If TRUE, use embaded stopwords dictionany. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.
form: char(optional): return form. default is "tibble". "list", "tidytext" is available.

Method `split_into_sents()`

Some text may not split sentence by sentence. split_into_sents works split sentences to sentence by sentence.

Usage

Kiwi$split_into_sents(text, match_option = Match$ALL, return_tokens = FALSE)

Arguments

text: char(required): target text.
match_option: match_option Match: use Match. Default is Match$ALL
return_tokens: bool(optional): add tokenized resault.

Method `get_tidytext_func()`

set function to tidytext unnest_tokens.

Usage

Kiwi$get_tidytext_func(match_option = Match$ALL, stopwords = FALSE)

Arguments

match_option: match_option Match: use Match. Default is Match$ALL
stopwords: stopwords option. Default is TRUE which is to use embaded stopwords dictionary. If FALSE, use not embaded stopwords dictionary. If char: path of dictionary txt file, use file. If Stopwords class, use it. If not valid value, work same as FALSE.

Returns

function

Examples

\dontrun{
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")
}

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Kiwi$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## Not run: 
  kw <- Kiwi$new()
  kw$analyze("test")
  kw$tokenize("test")
  
## End(Not run)

## ------------------------------------------------
## Method `Kiwi$get_tidytext_func`
## ------------------------------------------------

## Not run: 
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")

## End(Not run)
## Not run: 
  kw <- Kiwi$new()
  kw$analyze("test")
  kw$tokenize("test")
  
## End(Not run)

## ------------------------------------------------
## Method `Kiwi$get_tidytext_func`
## ------------------------------------------------

## Not run: 
   kw <- Kiwi$new()
   tidytoken <- kw$get_tidytext_func()
   tidytoken("test")

## End(Not run)

Analyze Match Options.

Description

ALL option contains URL, EMAIL, HASHTAG, MENTION.

Usage

Match
Match

Format

An object of class EnumGenerator of length 13.

Examples

## Not run: 
 Match
 Match$ALL

## End(Not run)
## Not run: 
 Match
 Match$ALL

## End(Not run)

Verifies if model files exists.

Description

Verifies if model files exists.

Usage

model_exists(size = "all")
model_exists(size = "all")

Arguments

size

model size. default is "all" which is true that all three models must be present.

Value

logical model files exists or not.

Examples

## Not run: 
  get_model("small")
  model_exists("small")

## End(Not run)
## Not run: 
  get_model("small")
  model_exists("small")

## End(Not run)

A simple exported version of `kiwi_model_path()` Returns the kiwi model path.

Description

TODO explain ELBIRD_MODEL_HOME

Usage

model_home()
model_home()

Value

character: file path

Examples

 model_home()
model_home()

Verifies if models work fine.

Description

Verifies if models work fine.

Usage

model_works(size = "all")
model_works(size = "all")

Arguments

size

model size. default is "all" which is true that all three models must be present.

Value

logical model work or not.

Examples

## Not run: 
  get_model("small")
  model_works("small")

## End(Not run)
## Not run: 
  get_model("small")
  model_works("small")

## End(Not run)

Split Sentences

Description

Some text may not split sentence by sentence. split_into_sents works split sentences to sentence by sentence.

Usage

split_into_sents(text, return_tokens = FALSE)
split_into_sents(text, return_tokens = FALSE)

Arguments

`text`	target text.
`return_tokens`	add tokenized resault.

Examples

## Not run: 
 split_into_sents("text")
 split_into_sents("text", return_tokens = TRUE)

## End(Not run)
## Not run: 
 split_into_sents("text")
 split_into_sents("text", return_tokens = TRUE)

## End(Not run)

Stopwords Class

Description

Stopwords is for filter result.

Methods

Public methods

Stopwords$print()
Stopwords$new()
Stopwords$add()
Stopwords$add_from_dict()
Stopwords$remove()
Stopwords$save_dict()
Stopwords$get()
Stopwords$clone()

Method `print()`

print method for Stopwords objects

Usage

Stopwords$print(x, ...)

Arguments

x: self
...: ignored

Method `new()`

Create a stopwords object for filter stopwords on analyze() and tokenize() results.

Usage

Stopwords$new(use_system_dict = TRUE)

Arguments

use_system_dict: bool(optional): use system stopwords dictionary or not. Defualt is TRUE.

Method `add()`

add stopword one at a time.

Usage

Stopwords$add(form = NA, tag = Tags$nnp)

Arguments

form: char(optional): Form information. Default is NA.
tag: char(optional): Tag information. Default is "NNP". Please check Tags.

Examples

 \dontrun{
  sw <- Stopwords$new()
  sw$add("word", "NNG")
  sw$add("word", Tags$nng)
  }

Method `add_from_dict()`

add stopword from text file. text file need to form "TEXT/TAG". TEXT can remove like "/NNP". TAG required like "FORM/NNP".

Usage

Stopwords$add_from_dict(path, dict_name = "user")

Arguments

path: char(required): dictionary file path.
dict_name: char(optional): default is "user"

Method `remove()`

remove stopword one at a time.

Usage

Stopwords$remove(form = NULL, tag = NULL)

Arguments

form: char(optional): Form information. If form not set, remove tag in input.
tag: char(required): Tag information. Please check Tags.

Method `save_dict()`

save current stopwords list in text file.

Usage

Stopwords$save_dict(path)

Arguments

path: char(required): file path to save stopwords list.

Method `get()`

return tibble of stopwords.

Usage

Stopwords$get()

Returns

a tibble for stopwords options for analyze() / tokenize() function.

Method `clone()`

The objects of this class are cloneable with this method.

Usage

Stopwords$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples

## Not run: 
  Stopwords$new()

## End(Not run)

## ------------------------------------------------
## Method `Stopwords$add`
## ------------------------------------------------

 ## Not run: 
  sw <- Stopwords$new()
  sw$add("word", "NNG")
  sw$add("word", Tags$nng)
  
## End(Not run)
## Not run: 
  Stopwords$new()

## End(Not run)

## ------------------------------------------------
## Method `Stopwords$add`
## ------------------------------------------------

 ## Not run: 
  sw <- Stopwords$new()
  sw$add("word", "NNG")
  sw$add("word", Tags$nng)
  
## End(Not run)

Tag list

Description

Tags contains tag list for elbird.

Usage

Tags
Tags

Format

An object of class EnumGenerator of length 47.

Source

https://github.com/bab2min/Kiwi

Examples

 ## Not run: 
  Tags
  Tags$nnp
 
## End(Not run)
## Not run: 
  Tags
  Tags$nnp
 
## End(Not run)

Simple version of tokenizer function.

Description

Simple version of tokenizer function.

Usage

tokenize(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tbl(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tidytext(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tidy(text, match_option = Match$ALL, stopwords = TRUE)
tokenize(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tbl(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tidytext(text, match_option = Match$ALL, stopwords = TRUE)

tokenize_tidy(text, match_option = Match$ALL, stopwords = TRUE)

Arguments

`text`	target text.
`match_option`	`Match`: use Match. Default is Match$ALL
`stopwords`	stopwords option. Default is TRUE which is to use embaded stopwords dictionany. If FALSE, use not embaded stopwords dictionany. If char: path of dictionary txt file, use file. If `Stopwords` class, use it. If not valid value, work same as FALSE. Check `analyze()` how to use stopwords param.

Value

list type of result.

Examples

## Not run: 
  tokenize("Test text.")
  tokenize("Please use Korean.", Match$ALL_WITH_NORMALIZING)

## End(Not run)
## Not run: 
  tokenize("Test text.")
  tokenize("Please use Korean.", Match$ALL_WITH_NORMALIZING)

## End(Not run)

Package 'elbird'

Help Index

Simple version of analyze function.

Description

Usage

Arguments

Examples

Get kiwi language model file.

Description

Usage

Arguments

Source

Examples

Kiwi Class

Description

Methods

Public methods

Method print()

Usage

Arguments

Method new()

Usage

Arguments

Method add_user_word()

Usage

Arguments

Method add_pre_analyzed_words()

Usage

Arguments

Method add_rules()

Usage

Arguments

Method load_user_dictionarys()

Usage

Arguments

Method extract_words()

Usage

Arguments

Method analyze()

Usage

Arguments

Returns

Method tokenize()

Usage

Arguments

Method split_into_sents()

Usage

Arguments

Method get_tidytext_func()

Usage

Arguments

Returns

Examples

Method clone()

Usage

Arguments

Examples

Analyze Match Options.

Description

Usage

Format

Examples

Verifies if model files exists.

Description

Usage

Arguments

Value

Examples

A simple exported version of kiwi_model_path() Returns the kiwi model path.

Description

Usage

Value

Examples

Verifies if models work fine.

Description

Usage

Arguments

Value

Examples

Split Sentences

Method `print()`

Method `new()`

Method `add_user_word()`

Method `add_pre_analyzed_words()`

Method `add_rules()`

Method `load_user_dictionarys()`

Method `extract_words()`

Method `analyze()`

Method `tokenize()`

Method `split_into_sents()`

Method `get_tidytext_func()`

Method `clone()`

A simple exported version of `kiwi_model_path()` Returns the kiwi model path.

Method `print()`

Method `new()`

Method `add()`

Method `add_from_dict()`

Method `remove()`

Method `save_dict()`

Method `get()`

Method `clone()`