gibasa 1.0.1
- Added wrappers around dictionary compiler of MeCab.
gibasa 0.9.5
- Removed audubon dependency for maintainability.
pack
now preserves doc_id
type when it’s factor.
gibasa 0.9.4
- Updated Makevars for Unix alikes. Users can now use a file specified by the
MECABRC
environment variable or ~/.mecabrc
to set up dictionaries.
gibasa 0.9.3
- Removed unnecessary C++ files.
gibasa 0.9.2
- Prepare for CRAN release.
gibasa 0.8.1
- For performance,
tokenize
now skips resetting the output encodings to UTF-8.
gibasa 0.8.0
- [Breaking Change] Changed numbering style of ‘sentence_id’ when
split
is FALSE
.
- Added
grain_size
argument to tokenize
.
- Added new
bind_lr
function.
gibasa 0.7.4
- Use
RcppParallel::parallelFor
instead of tbb::parallel_for
. There are no user’s visible changes.
gibasa 0.7.1
- Fix documentations. There are no visible changes.
gibasa 0.7.0
tokenize
can now accept a character vector in addition to a data.frame like object.
gbs_tokenize
is now deprecated. Please use the tokenize
function instead.
gibasa 0.6.4
gibasa 0.6.3
- Added the
partial
argument to gbs_tokenize
and tokenize
. This argument controls the partial parsing mode, which forces to extract given chunks of sentences when activated.
gibasa 0.6.2
- More friendly errors are returned when invalid dictionary path was provided.
- Added new
posDebugRcpp
function.
gibasa 0.6.1
- Revert some missing examples.
gibasa 0.6.0
- Functions added in version ‘0.5.1’ was moved to ‘audubon’ package (>= 0.4.0).
gibasa 0.5.1
- Added some new functions.
bind_tf_idf2
can calculate and bind the term frequency, inverse document frequency, and tf-idf of the tidy text dataset.
collapse_tokens
, mute_tokens
, and lexical_density
can be used for handling a tidy text dataset of tokens.
gibasa 0.5.0
- gibasa now includes the MeCab source, so that users do not need to pre-install the MeCab library when building and installing the package (to use
tokenize
, it still requires MeCab and its dictionaries installed and available).
gibasa 0.4.1
tokenize
now preserves the original order of docid_field
.
gibasa 0.4.0
- Added
bind_tf_idf2
function and is_blank
function.
gibasa 0.3.1
gibasa 0.3.0
- Changed build process on Windows.
- Added a vignette.
gibasa 0.2.1
prettify
now can extract columns only specified by col_select
.
gibasa 0.2.0
- Added a
NEWS.md
file to track changes to the package.
tokenize
now takes a data.frame as its first argument, returns a data.frame only. The former function that gets character vector and returns a data.frame or named list was renamed as gbs_tokenize
.