trinker / Lexicon
A data package containing lexicons and dictionaries for text analysis
Stars: ✭ 87
Programming Languages
r
7636 projects
Labels
Projects that are alternatives of or similar to Lexicon
Symbolized
Hash with indifferent access, with keys stored internally as symbols.
Stars: ✭ 58 (-33.33%)
Mutual labels: hash
Python nlp tutorial
This repository provides everything to get started with Python for Text Mining / Natural Language Processing (NLP)
Stars: ✭ 72 (-17.24%)
Mutual labels: text-mining
Beamsplitter
💎 Beamsplitter - A new (possibly universal) hash that passes SMHasher. Built mainly with a random 10x64 S-box. Also in NodeJS
Stars: ✭ 83 (-4.6%)
Mutual labels: hash
Applied Text Mining In Python
Repo for Applied Text Mining in Python (coursera) by University of Michigan
Stars: ✭ 59 (-32.18%)
Mutual labels: text-mining
Bughunt
A weekly challenge where we share some code and you find a bug in it.
Stars: ✭ 68 (-21.84%)
Mutual labels: hash
Meow hash
Official version of the Meow hash, an extremely fast level 3 hash
Stars: ✭ 1,204 (+1283.91%)
Mutual labels: hash
Cryptonight
➿ Pure Go/ASM implementation of CryptoNight hash function with its variants, without any CGO binding.
Stars: ✭ 58 (-33.33%)
Mutual labels: hash
R Text Data
List of textual data sources to be used for text mining in R
Stars: ✭ 85 (-2.3%)
Mutual labels: text-mining
How To Mine Newsfeed Data And Extract Interactive Insights In Python
A practical guide to topic mining and interactive visualizations
Stars: ✭ 61 (-29.89%)
Mutual labels: text-mining
Cape
String encryption for Arduino, limited microcontrollers and other embedded systems.
Stars: ✭ 58 (-33.33%)
Mutual labels: hash
Pluck all
A more efficient way to get data from database. Like #pluck method but return array of hashes instead.
Stars: ✭ 83 (-4.6%)
Mutual labels: hash
Konlpy
Python package for Korean natural language processing.
Stars: ✭ 1,098 (+1162.07%)
Mutual labels: text-mining
Active enumerable
ActiveRecord like query methods for Ruby enumerable collections.
Stars: ✭ 73 (-16.09%)
Mutual labels: hash
lexicon
Table of Contents
Description
lexicon is a collection of lexical hash tables, dictionaries, and word lists. The data prefixes help to categorize the data types:
Prefix | Meaning |
---|---|
key_ |
A data.frame with a lookup and return value |
hash_ |
A keyed data.table hash table |
freq_ |
A data.table of terms with frequencies |
profanity_ |
A profane words vector
|
pos_ |
A part of speech vector
|
pos_df_ |
A part of speech data.frame
|
sw_ |
A stopword vector
|
Data
Data | Description |
---|---|
cliches | Common Cliches |
common_names | First Names (U.S.) |
constraining_loughran_mcdonald | Loughran-McDonald Constraining Words |
emojis_sentiment | Emoji Sentiment Data |
freq_first_names | Frequent U.S. First Names |
freq_last_names | Frequent U.S. Last Names |
function_words | Function Words |
grady_augmented | Augmented List of Grady Ward’s English Words and Mark Kantrowitz’s Names List |
hash_emojis | Emoji Description Lookup Table |
hash_emojis_identifier | Emoji Identifier Lookup Table |
hash_emoticons | Emoticons |
hash_grady_pos | Grady Ward’s Moby Parts of Speech |
hash_internet_slang | List of Internet Slang and Corresponding Meanings |
hash_lemmas | Lemmatization List |
hash_nrc_emotions | NRC Emotion Table |
hash_sentiment_emojis | Emoji Sentiment Polarity Lookup Table |
hash_sentiment_huliu | Hu Liu Polarity Lookup Table |
hash_sentiment_jockers | Jockers Sentiment Polarity Table |
hash_sentiment_jockers_rinker | Combined Jockers & Rinker Polarity Lookup Table |
hash_sentiment_loughran_mcdonald | Loughran-McDonald Polarity Table |
hash_sentiment_nrc | NRC Sentiment Polarity Table |
hash_sentiment_senticnet | Augmented SenticNet Polarity Table |
hash_sentiment_sentiword | Augmented Sentiword Polarity Table |
hash_sentiment_slangsd | SlangSD Sentiment Polarity Table |
hash_sentiment_socal_google | SO-CAL Google Polarity Table |
hash_valence_shifters | Valence Shifters |
key_contractions | Contraction Conversions |
key_corporate_social_responsibility | Nadra Pencle and Irina Malaescu’s Corporate Social Responsibility Dictionary |
key_grade | Grades Data Set |
key_rating | Ratings Data Set |
key_regressive_imagery | Colin Martindale’s English Regressive Imagery Dictionary |
key_sentiment_jockers | Jockers Sentiment Data Set |
modal_loughran_mcdonald | Loughran-McDonald Modal List |
nrc_emotions | NRC Emotions |
pos_action_verb | Action Word List |
pos_df_irregular_nouns | Irregular Nouns Word Dataframe |
pos_df_pronouns | Pronouns |
pos_interjections | Interjections |
pos_preposition | Preposition Words |
profanity_alvarez | Alejandro U. Alvarez’s List of Profane Words |
profanity_arr_bad | Stackoverflow user2592414’s List of Profane Words |
profanity_banned | bannedwordlist.com’s List of Profane Words |
profanity_racist | Titus Wormer’s List of Racist Words |
profanity_zac_anger | Zac Anger’s List of Profane Words |
sw_dolch | Leveled Dolch List of 220 Common Words |
sw_fry_100 | Fry’s 100 Most Commonly Used English Words |
sw_fry_1000 | Fry’s 1000 Most Commonly Used English Words |
sw_fry_200 | Fry’s 200 Most Commonly Used English Words |
sw_fry_25 | Fry’s 25 Most Commonly Used English Words |
sw_jockers | Matthew Jocker’s Expanded Topic Modeling Stopword List |
sw_loughran_mcdonald_long | Loughran-McDonald Long Stopword List |
sw_loughran_mcdonald_short | Loughran-McDonald Short Stopword List |
sw_lucene | Lucene Stopword List |
sw_mallet | MALLET Stopword List |
sw_python | Python Stopword List |
Installation
To download the development version of lexicon:
Download the zip
ball or tar
ball, decompress and
run R CMD INSTALL
on it, or use the pacman package to install the
development version:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/lexicon")
Contact
You are welcome to:
- submit suggestions and bug-reports at: https://github.com/trinker/lexicon/issues
- send a pull request on: https://github.com/trinker/lexicon/
- compose a friendly e-mail to: [email protected]
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].