Word Embedding Research Framework for Psychological Science.
An integrated toolbox of word embedding research that provides:
⚠️ All users should update the package to version ≥ 0.3.2. Old versions may run slowly and have some problems.
HanWuShuang (Bruce) Bao 包寒吴霜
Email: baohws@foxmail.com
Homepage: psychbruce.github.io
## Method 1: Install from CRAN
install.packages("PsychWordVec")
## Method 2: Install from GitHub
install.packages("devtools")
::install_github("psychbruce/PsychWordVec", force=TRUE) devtools
Basic class  matrix  data.table 
Row size  vocabulary size  vocabulary size 
Column size  dimension size  2 (variables: word , vec ) 
Advantage  faster (with matrix operation)  easier to inspect and manage 
Function to get  as_embed() 
as_wordvec() 
Function to load  load_embed() 
load_wordvec() 
PsychWordVec
as_embed()
: from wordvec
(data.table) to
embed
(matrix)as_wordvec()
: from embed
(matrix) to
wordvec
(data.table)load_embed()
: load word embeddings data as
embed
(matrix)load_wordvec()
: load word embeddings data as
wordvec
(data.table)data_transform()
: transform plain text word vectors to
wordvec
or embed
subset()
: extract a subset of wordvec
and
embed
normalize()
: normalize all word vectors to the unit
length 1get_wordvec()
: extract word vectorssum_wordvec()
: calculate the sum vector of multiple
wordsplot_wordvec()
: visualize word vectorsplot_wordvec_tSNE()
: 2D or 3D visualization with
tSNEorth_procrustes()
: Orthogonal Procrustes matrix
alignmentcosine_similarity()
: cos_sim()
or
cos_dist()
pair_similarity()
: compute a similarity matrix of word
pairsplot_similarity()
: visualize similarities of word
pairstab_similarity()
: tabulate similarities of word
pairsmost_similar()
: find the TopN most similar wordsplot_network()
: visualize a (partial correlation)
network graph of wordstest_WEAT()
: WEAT and SCWEAT with permutation test of
significancetest_RND()
: RND with permutation test of
significancedict_expand()
: expand a dictionary from the most
similar wordsdict_reliability()
: reliability analysis and PCA of a
dictionarytokenize()
: tokenize raw texttrain_wordvec()
: train static word embeddingstext_init()
: set up a Python environment for PLMtext_model_download()
: download PLMs from HuggingFace to local “.cache”
foldertext_model_remove()
: remove PLMs from local “.cache”
foldertext_to_vec()
: extract contextualized token and text
embeddingstext_unmask()
: fill in the blank mask(s) in a
querySee the documentation (help pages) for their usage and details.