Spacy💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+48740%)
charformer-pytorchImplementation of the GBST block from the Charformer paper, in Pytorch
Stars: ✭ 74 (+64.44%)
uax29A tokenizer based on Unicode text segmentation (UAX 29), for Go
Stars: ✭ 26 (-42.22%)
youtokentome-rubyHigh performance unsupervised text tokenization for Ruby
Stars: ✭ 17 (-62.22%)
auto-data-tokenizeIdentify and tokenize sensitive data automatically using Cloud DLP and Dataflow
Stars: ✭ 21 (-53.33%)
simplemmaSimple multilingual lemmatizer for Python, especially useful for speed and efficiency
Stars: ✭ 32 (-28.89%)
polycashThe ultimate open source betting protocol. PolyCash is a P2P blockchain platform for wallets, asset issuance, bonds & gaming.
Stars: ✭ 24 (-46.67%)
wink-tokenizerMultilingual tokenizer that automatically tags each token with its type
Stars: ✭ 51 (+13.33%)
spacy-server🦜 Containerized HTTP API for industrial-strength NLP via spaCy and sense2vec
Stars: ✭ 58 (+28.89%)
lingNatural Language Processing Toolkit in Golang
Stars: ✭ 57 (+26.67%)
nlp-cheat-sheet-pythonNLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+53.33%)
FATFactom Asset Tokens - Open tokenization standards on Factom
Stars: ✭ 17 (-62.22%)
xontrib-output-searchGet identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.
Stars: ✭ 26 (-42.22%)
TweebankNLP[LREC 2022] An off-the-shelf pre-trained Tweet NLP Toolkit (NER, tokenization, lemmatization, POS tagging, dependency parsing) + Tweebank-NER dataset
Stars: ✭ 84 (+86.67%)
lunasecLunaSec - Dependency Security Scanner that automatically notifies you about vulnerabilities like Log4Shell or node-ipc in your Pull Requests and Builds. Protect yourself in 30 seconds with the LunaTrace GitHub App: https://github.com/marketplace/lunatrace-by-lunasec/
Stars: ✭ 1,261 (+2702.22%)
limaThe Libre Multilingual Analyzer, a Natural Language Processing (NLP) C++ toolkit.
Stars: ✭ 75 (+66.67%)
Vaaku2VecLanguage Modeling and Text Classification in Malayalam Language using ULMFiT
Stars: ✭ 68 (+51.11%)
BasicArabicOCRA very basic Arabic OCR based on tesseract OCR engine written in Java.
Stars: ✭ 19 (-57.78%)
ATKSpythis repository is a python package that supports SOAP interface to communicate with the Microsoft ATKS
Stars: ✭ 27 (-40%)
alyahmorArabic flexionnal morphology generator
Stars: ✭ 22 (-51.11%)
arabic-stop-wordsLargest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
Stars: ✭ 193 (+328.89%)
ArSarcasmThis repository contains the Arabic sarcasm dataset (ArSarcasm)
Stars: ✭ 18 (-60%)
Arabic-Tashkeela-ModelThis is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on Kaggle
Stars: ✭ 15 (-66.67%)
masaderThe largest public catalogue for Arabic NLP and speech datasets. There are +250 datasets annotated with more than 25 attributes.
Stars: ✭ 66 (+46.67%)
tajmeeatonتجميعة من المشاريع، وخصوصا مفتوحة المصدر، للنهوض باللغة العربية والأمة. 👨💻 👨🔬👨🏫🧕
Stars: ✭ 115 (+155.56%)
farasapyA Python implementation of Farasa toolkit
Stars: ✭ 69 (+53.33%)
comparable-text-minerComparable documents miner: Arabic-English morphological analysis, text processing, n-gram features extraction, POS tagging, dictionary translation, documents alignment, corpus information, text classification, tf-idf computation, text similarity computation, html documents cleaning
Stars: ✭ 31 (-31.11%)
arabic-taggerAQMAR Arabic Tagger: Sequence tagger with cost-augmented structured perceptron training
Stars: ✭ 38 (-15.56%)
ar-embeddingsSentiment Analysis for Arabic Text (tweets, reviews, and standard Arabic) using word2vec
Stars: ✭ 83 (+84.44%)
SumrizedAutomatic Text Summarization (English/Arabic).
Stars: ✭ 37 (-17.78%)
nmathegA simple strategy for training and finetuning NLP models for Arabic. Specify the parameters and just wait for the results. A simple design that makes use of the different tools in our NLP pipeline.
Stars: ✭ 19 (-57.78%)