Top 164 bigdata open source projects

Exposure
Exposure是一个帮助做曝光统计需求的库,可以很方便的对曝光事件进行埋点,在现有代码上少量侵入即可实现曝光埋点。支持RV的线性布局、网格布局、瀑布流布局、横向滑动RV,ScrollView等各种滚动布局。支持配置item的有效曝光面积。
SparkTwitterAnalysis
An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
cds
Data syncing in golang for ClickHouse.
meetups-archivos
Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
flink-learn
Learning Flink : Flink CEP,Flink Core,Flink SQL
hadoopoffice
HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
coolplayflink
Flink: Stateful Computations over Data Streams
learning-spark
Tidy up Spark and Hadoop tutorials.
bigdata-tech-index
Big Data Technology Index
columnify
Make record oriented data to columnar format.
datacatalog-tag-manager
Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
Notes
This is a learning note | Java基础,JVM,源码,大数据,面经
dockerfiles
Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
StreamBench
Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark
jhdf
A pure Java HDF5 library
amas
Amas is recursive acronym for “Amas, monitor alert system”.
2019 egu workshop jupyter notebooks
Short course on interactive analysis of Big Earth Data with Jupyter Notebooks
lectures-hse-spark
Масштабируемое машинное обучение и анализ больших данных с Apache Spark
bigdata-doc
大数据学习笔记,学习路线,技术案例整理。
TiBigData
TiDB connectors for Flink/Hive/Presto
awesome-coder-resources
编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Clustering4Ever
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
PersonNotes
个人笔记集中营,快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧
bigquery-data-lineage
Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
intersect
一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集
young-examples
java学习和项目中一些典型的应用场景样例代码
twitter-archive-reader
Full featured TypeScript Twitter archive reader and browser
hayabusa
Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data
Spark-MLlib-Tutorial
大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件
workflUX
An open-source, cloud-ready web application for simplified deployment of big data workflows.
121-164 of 164 bigdata projects