gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+143.75%)
awesome-toolscurated list of awesome tools and libraries for specific domains
Stars: ✭ 31 (+93.75%)
EasyMinerEasy association rule mining and classification on the web
Stars: ✭ 14 (-12.5%)
terasliceScalable data processing pipelines in JavaScript
Stars: ✭ 48 (+200%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+350%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (+175%)
palladianPalladian is a Java-based toolkit with functionality for text processing, classification, information extraction, and data retrieval from the Web.
Stars: ✭ 32 (+100%)
JavaFrameworkSimple Java Framework,designed for easily develop Spring based java program.Support Bigdata And metadata management.A common elasticsearch comm query tool and so on.
Stars: ✭ 16 (+0%)
PaperWeeklyAI📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.
Stars: ✭ 50 (+212.5%)
estrattoparsing fixed width files content made easy
Stars: ✭ 12 (-25%)
hierarchical-clusteringA Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.
Stars: ✭ 62 (+287.5%)
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+1318.75%)
beanszooDistributed Java micro-services using ZooKeeper
Stars: ✭ 12 (-25%)
orionManagement and automation platform for Stateful Distributed Systems
Stars: ✭ 77 (+381.25%)
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.
Stars: ✭ 36 (+125%)
AnswerableRecommendation system for Stack Overflow unanswered questions
Stars: ✭ 13 (-18.75%)
smart-data-lakeSmart Automation Tool for building modern Data Lakes and Data Pipelines
Stars: ✭ 79 (+393.75%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (+581.25%)
hadoop-ansibleInstall hadoop cluster with ansible
Stars: ✭ 35 (+118.75%)
leetspeekOpen and collaborative content from leet hackers!
Stars: ✭ 11 (-31.25%)
AsclepiusOpen Price Comparison for US Hospitals
Stars: ✭ 20 (+25%)
scibloxsciblox - Easier Data Science and Machine Learning
Stars: ✭ 48 (+200%)
hive to es同步Hive数据仓库数据到Elasticsearch的小工具
Stars: ✭ 21 (+31.25%)
woollyThe Text Mining Elixir
Stars: ✭ 48 (+200%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+143.75%)
incubator-tezMirror of Apache Tez (Incubating)
Stars: ✭ 60 (+275%)
imbalanced-ensembleClass-imbalanced / Long-tailed ensemble learning in Python. Modular, flexible, and extensible. | 模块化、灵活、易扩展的类别不平衡/长尾机器学习库
Stars: ✭ 199 (+1143.75%)
TextClassification基于scikit-learn实现对新浪新闻的文本分类,数据集为100w篇文档,总计10类,测试集与训练集1:1划分。分类算法采用SVM和Bayes,其中Bayes作为baseline。
Stars: ✭ 86 (+437.5%)
sugarcubeMonoidal data processes.
Stars: ✭ 32 (+100%)
readabilityFast readability scores for text data
Stars: ✭ 22 (+37.5%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+237.5%)
Semantic-Busobject flow treatment, data transformation
Stars: ✭ 49 (+206.25%)
RecommendationEngineSource code and dataset for paper "CBMR: An optimized MapReduce for item‐based collaborative filtering recommendation algorithm with empirical analysis"
Stars: ✭ 43 (+168.75%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (+450%)
bagriXML/Document DB on top of distributed cache
Stars: ✭ 40 (+150%)
KaliIntelligenceSuiteKali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.
Stars: ✭ 58 (+262.5%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+668.75%)
scikit-hubnessA Python package for hubness analysis and high-dimensional data mining
Stars: ✭ 41 (+156.25%)
ambari-hdp-dockerDockerfiles and Docker Compose for HDP 2.6 with Blueprints
Stars: ✭ 23 (+43.75%)
koshort(deprecated) 🐱 koshort is a Python package for Korean internet spoken language crawling and processing... or maybe Korean domestic cat.
Stars: ✭ 62 (+287.5%)
sentometricsAn integrated framework in R for textual sentiment time series aggregation and prediction
Stars: ✭ 77 (+381.25%)
text-analysisWeaving analytical stories from text data
Stars: ✭ 12 (-25%)
couchdb-pkgApache CouchDB Packaging support files
Stars: ✭ 24 (+50%)
FIW KRTFamilies In the WIld: A Kinship Recogntion Toolbox.
Stars: ✭ 18 (+12.5%)
phoenixApache Phoenix / Hbase Spring Boot Microservices
Stars: ✭ 23 (+43.75%)