MsnoiseA Python Package for Monitoring Seismic Velocity Changes using Ambient Seismic Noise | http://www.msnoise.org
Stars: ✭ 94 (+487.5%)
TidytextText mining using tidy tools ✨📄✨
Stars: ✭ 975 (+5993.75%)
leetspeekOpen and collaborative content from leet hackers!
Stars: ✭ 11 (-31.25%)
Dc Hi guides[Data Castle 算法竞赛] 精品旅行服务成单预测 final rank 11
Stars: ✭ 83 (+418.75%)
Tsrepr TSrepr: R package for time series representations
Stars: ✭ 75 (+368.75%)
NlpplnNLP pipeline software using common workflow language
Stars: ✭ 31 (+93.75%)
Bee UniversityProject thu thập điểm chuẩn đại học 2014 - 2018 và phân tích dữ liệu
Stars: ✭ 73 (+356.25%)
scibloxsciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+200%)
FfbeDatamining for FFBE GL
Stars: ✭ 69 (+331.25%)
EvalneSource code for EvalNE, a Python library for evaluating Network Embedding methods.
Stars: ✭ 67 (+318.75%)
imbalanced-ensembleClass-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (+1143.75%)
AutophraseAutoPhrase: Automated Phrase Mining from Massive Text Corpora
Stars: ✭ 835 (+5118.75%)
GendisContains an implementation (sklearn API) of the algorithm proposed in "GENDIS: GEnetic DIscovery of Shapelets" and code to reproduce all experiments.
Stars: ✭ 59 (+268.75%)
TextClassification基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
Stars: ✭ 86 (+437.5%)
Etherscan MlPython Data Science and Machine Learning Library for the Ethereum and ERC-20 Blockchain
Stars: ✭ 55 (+243.75%)
Nlp In PracticeStarter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+4837.5%)
Php MlPHP-ML - Machine Learning library for PHP
Stars: ✭ 7,900 (+49275%)
readabilityFast readability scores for text data
Stars: ✭ 22 (+37.5%)
BigartmFast topic modeling platform
Stars: ✭ 563 (+3418.75%)
Mldmпотоковый курс "Машинное обучение и анализ данных (Machine Learning and Data Mining)" на факультете ВМК МГУ имени М.В. Ломоносова
Stars: ✭ 35 (+118.75%)
Invoice2dataExtract structured data from PDF invoices
Stars: ✭ 943 (+5793.75%)
ClevercsvCleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
Stars: ✭ 887 (+5443.75%)
RecommendationEngineSource code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"
Stars: ✭ 43 (+168.75%)
Data miningThe Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms
Stars: ✭ 10 (-37.5%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (+450%)
Open Semantic SearchOpen Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)
Stars: ✭ 386 (+2312.5%)
dh-coreFunctional data science
Stars: ✭ 123 (+668.75%)
TextheroText preprocessing, representation and visualization from zero to hero.
Stars: ✭ 2,407 (+14943.75%)
odinsonOdinson is a powerful and highly optimized open-source framework for rule-based information extraction. Odinson couples a simple, yet powerful pattern language that can operate over multiple representations of text, with a runtime system that operates in near real time.
Stars: ✭ 59 (+268.75%)
BiolitmapCode for the paper "BIOLITMAP: a web-based geolocated and temporal visualization of the evolution of bioinformatics publications" in Oxford Bioinformatics.
Stars: ✭ 18 (+12.5%)
KaliIntelligenceSuiteKali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.
Stars: ✭ 58 (+262.5%)
DataprooferA proofreader for your data
Stars: ✭ 628 (+3825%)
GraphbrainLanguage, Knowledge, Cognition
Stars: ✭ 294 (+1737.5%)
ElkiELKI Data Mining Toolkit
Stars: ✭ 613 (+3731.25%)
ambari-hdp-dockerDockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (+43.75%)
Interpretable machine learning with pythonExamples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Stars: ✭ 530 (+3212.5%)
Nlp profilerA simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Stars: ✭ 181 (+1031.25%)
deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (+150%)
Multi rakeMultilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Stars: ✭ 162 (+912.5%)
SearchBlue Brain text mining toolbox for semantic search and structured information extraction
Stars: ✭ 26 (+62.5%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+1081.25%)
TokenizersFast, Consistent Tokenization of Natural Language Text
Stars: ✭ 161 (+906.25%)