All Projects → chucheng92 → HadoopDedup

chucheng92 / HadoopDedup

Licence: other
🍉基于Hadoop和HBase的大规模海量数据去重

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to HadoopDedup

big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+25.93%)
Mutual labels:  big-data, mapreduce
pyspark-algorithms
PySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+166.67%)
Mutual labels:  big-data, mapreduce
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+40607.41%)
Mutual labels:  big-data, mapreduce
MLBD
Materials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-25.93%)
Mutual labels:  big-data, mapreduce
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+81559.26%)
Mutual labels:  big-data, mapreduce
Big Data Engineering Coursera Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Stars: ✭ 71 (+162.96%)
Mutual labels:  big-data, mapreduce
Asakusafw
Asakusa Framework
Stars: ✭ 114 (+322.22%)
Mutual labels:  big-data, mapreduce
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+44.44%)
Mutual labels:  big-data
big-data-engineering-indonesia
A curated list of big data engineering tools, resources and communities.
Stars: ✭ 26 (-3.7%)
Mutual labels:  big-data
phoenix-queryserver
Apache Phoenix Query Server
Stars: ✭ 33 (+22.22%)
Mutual labels:  big-data
cdp-service
cdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (+11.11%)
Mutual labels:  big-data
TiBigData
TiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+611.11%)
Mutual labels:  cdc
learning-hadoop-and-spark
Companion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+440.74%)
Mutual labels:  mapreduce
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+100%)
Mutual labels:  big-data
pg-logical-replication
PostgreSQL Logical Replication client for node.js
Stars: ✭ 56 (+107.41%)
Mutual labels:  cdc
corpusexplorer2.0
Korpuslinguistik war noch nie so einfach...
Stars: ✭ 16 (-40.74%)
Mutual labels:  big-data
merkle-db
High-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (+62.96%)
Mutual labels:  big-data
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-11.11%)
Mutual labels:  big-data
javaer-mind
Java 程序员进阶学习的思维导图
Stars: ✭ 66 (+144.44%)
Mutual labels:  big-data
lidbox
End-to-end spoken language identification out of the box.
Stars: ✭ 39 (+44.44%)
Mutual labels:  big-data

基于Hadoop和HBase的大规模海量数据去重

目录

data - 数据集

docs - 文档

src - MapReduce

环境

Hadoop版本1.1.2

HBase 0.94.8

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].