coolplayflinkFlink: Stateful Computations over Data Streams
Stars: ✭ 14 (-97.51%)
MnemonicApache Mnemonic - A non-volatile hybrid memory storage oriented library
Stars: ✭ 91 (-83.84%)
Ignite Book Code SamplesAll code samples, scripts and more in-depth examples for the book high performance in-memory computing with Apache Ignite. Please use the repository "the-apache-ignite-book" for Ignite version 2.6 or above.
Stars: ✭ 86 (-84.72%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-95.03%)
MlsqlThe Programming Language Designed For Big Data and AI
Stars: ✭ 1,262 (+124.16%)
centurionKotlin Bigdata Toolkit
Stars: ✭ 320 (-43.16%)
enstopEnsemble topic modelling with pLSA
Stars: ✭ 104 (-81.53%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-86.68%)
CdsData syncing in golang for ClickHouse.
Stars: ✭ 501 (-11.01%)
deduceDeduce: de-identification method for Dutch medical text
Stars: ✭ 40 (-92.9%)
Reddit sse streamA Server Side Event stream to deliver Reddit comments and submissions in near real-time to a client.
Stars: ✭ 39 (-93.07%)
TAKGThe official implementation of ACL 2019 paper "Topic-Aware Neural Keyphrase Generation for Social Media Language"
Stars: ✭ 127 (-77.44%)
AutocrawlerGoogle, Naver multiprocess image web crawler (Selenium)
Stars: ✭ 957 (+69.98%)
python-apiA Python client for Infermedica API.
Stars: ✭ 53 (-90.59%)
PantherDetect threats with log data and improve cloud security posture
Stars: ✭ 885 (+57.19%)
LdetoolCode generator for fast log file parsers
Stars: ✭ 273 (-51.51%)
Bigdata Interview🎯 🌟[大数据面试题]分享自己在网络上收集的大数据相关的面试题以及自己的答案总结.目前包含Hadoop/Hive/Spark/Flink/Hbase/Kafka/Zookeeper框架的面试题知识总结
Stars: ✭ 857 (+52.22%)
hldaGibbs sampler for the Hierarchical Latent Dirichlet Allocation topic model
Stars: ✭ 138 (-75.49%)
10 Weeks10-weeks of technology exploration
Stars: ✭ 22 (-96.09%)
sensimSentence Similarity Estimator (SenSim)
Stars: ✭ 15 (-97.34%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+45.12%)
columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (-95.03%)
Coding Now学习记录的一些笔记,以及所看得一些电子书eBooks、视频资源和平常收纳的一些自己认为比较好的博客、网站、工具。涉及大数据几大组件、Python机器学习和数据分析、Linux、操作系统、算法、网络等
Stars: ✭ 750 (+33.21%)
CudfcuDF - GPU DataFrame Library
Stars: ✭ 4,370 (+676.2%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+32.33%)
tomoto-rubyHigh performance topic modeling for Ruby
Stars: ✭ 49 (-91.3%)
blueprints-textJupyter notebooks for our O'Reilly book "Blueprints for Text Analysis Using Python"
Stars: ✭ 103 (-81.71%)
Gwu data miningMaterials for GWU DNSC 6279 and DNSC 6290.
Stars: ✭ 217 (-61.46%)
contextualLSTMContextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning
Stars: ✭ 28 (-95.03%)
QminerAnalytic platform for real-time large-scale streams containing structured and unstructured data.
Stars: ✭ 206 (-63.41%)
malay-datasetText corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (-66.43%)
HdltexHDLTex: Hierarchical Deep Learning for Text Classification
Stars: ✭ 191 (-66.07%)
textdigesterTextDigester: document summarization java library
Stars: ✭ 23 (-95.91%)
TextheroText preprocessing, representation and visualization from zero to hero.
Stars: ✭ 2,407 (+327.53%)
TopicsExplorerExplore your own text collection with a topic model – without prior knowledge.
Stars: ✭ 53 (-90.59%)
Multi rakeMultilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Stars: ✭ 162 (-71.23%)
Bigdataie大数据博客、笔试题、教程、项目、面经的整理
Stars: ✭ 445 (-20.96%)
LazynlpLibrary to scrape and clean web pages to create massive datasets.
Stars: ✭ 1,985 (+252.58%)
api-pythonPython client library to access Data Commons
Stars: ✭ 52 (-90.76%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-93.96%)
ChemdataextractorAutomatically extract chemical information from scientific documents
Stars: ✭ 152 (-73%)
XiocExtract indicators of compromise from text, including "escaped" ones.
Stars: ✭ 148 (-73.71%)
LdaLDA topic modeling for node.js
Stars: ✭ 262 (-53.46%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (-74.78%)
thrones2vecUsing Word2Vec to explore semantic similarities between the entities of "A Song of Ice and Fire" ("Game of Thrones").
Stars: ✭ 27 (-95.2%)
WikipronMassively multilingual pronunciation mining
Stars: ✭ 99 (-82.42%)
Pyseetapython api for SeetaFaceEngine(https://github.com/seetaface/SeetaFaceEngine.git)
Stars: ✭ 93 (-83.48%)
UnROOT.jlNative Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (-90.76%)
NewsapiA python wrapper for News API.
Stars: ✭ 71 (-87.39%)
PubMed-Best-MatchMachine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Stars: ✭ 36 (-93.61%)
Nlp NotebooksA collection of notebooks for Natural Language Processing from NLP Town
Stars: ✭ 513 (-8.88%)
Paper ReadingPaper reading list in natural language processing, including dialogue systems and text generation related topics.
Stars: ✭ 508 (-9.77%)