All Categories → Data Processing → bigdata

Top 164 bigdata open source projects

Exposure是一个帮助做曝光统计需求的库，可以很方便的对曝光事件进行埋点，在现有代码上少量侵入即可实现曝光埋点。支持RV的线性布局、网格布局、瀑布流布局、横向滑动RV，ScrollView等各种滚动布局。支持配置item的有效曝光面积。

✭ 51

kotlin android recyclerview bigdata exposure

Native Julia I/O package to work with CERN ROOT files

✭ 52

julia TeX analysis bigdata high-energy-physics hep particle-physics hacktoberfest hep-ex cern-root

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

✭ 29

scala shell spark apache-spark sbt twitter-api bigdata twitter-streaming-api sbt-assembly

Data syncing in golang for ClickHouse.

✭ 839

go Vue javascript SCSS Makefile shell clickhouse bigdata kafka-consumer

awesome-bigdata

A curated list of awesome big data frameworks, ressources and other awesomeness.

✭ 11,093

data-science data awesome database data-stream bigdata series-database data-visualization data-warehouse stream-processing data-analytics awesome-list distributed-database visualize-data streaming-data

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

✭ 60

Jupyter Notebook HTML python r data-science machine-learning workshops big-data deep-learning analytics bigdata artificial-intelligence meetups neuronal-network meetups-archivos

Learning Flink : Flink CEP,Flink Core,Flink SQL

✭ 70

java scala FreeMarker shell sql stream bigdata flink

HadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)

✭ 56

java scala shell spark hive hadoop excel bigdata office poi flink hadoop-ecosystem hadoopoffice analyze-office-documents

Flink: Stateful Computations over Data Streams

✭ 14

streaming bigdata realtime flink

Tidy up Spark and Hadoop tutorials.

✭ 28

java shell r data-science spark hadoop bigdata

bigdata-tech-index

Big Data Technology Index

✭ 24

technology bigdata index

Make record oriented data to columnar format.

✭ 28

go Makefile avro bigdata parquet

datacatalog-tag-manager

Python package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format

✭ 17

python Dockerfile bigdata gcp google-cloud csv-import data-governance datacatalog gcp-datacatalog

This is a learning note | Java基础，JVM，源码，大数据，面经

✭ 69

redis tcp jvm bigdata hashmap interviews reentrantlock

gan deeplearning4j

Automatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.

✭ 19

java Jupyter Notebook data-science machine-learning big-data spark apache-spark computer-vision deep-learning bigdata datascience generative-adversarial-network gan machinelearning deeplearning generative-adversarial-networks deeplearning4j

大数据生态圈学习

✭ 18

java scala elasticsearch spark hadoop storm bigdata spark-streaming mapreduce

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

✭ 77

HTML Jupyter Notebook python CSS shell Makefile visualization data-science machine-learning scale bigdata pyspark feature-engineering transformation feature-recommendation

Multi docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )

✭ 29

shell Dockerfile python Makefile Batchfile XSLT javascript dockerfile kafka spark cassandra hive hadoop docker-image bigdata hbase zookeeper mesos hue flink zeppelin drill

Measuring the performance of popular streaming engines with Yahoo's Streaming Benchmark

✭ 52

C++scala CMake shell benchmark streaming performance bigdata

the-apache-ignite-book

All code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above

✭ 65

java streaming memoization sql spark hive hadoop spring-data bigdata hibernate distributed-database ignite nosql-database in-memory-database streaming-data gridgain hibernate-ogm in-memory-computations in-memory-caching

大数据采集,抽取平台

✭ 292

java HTML CSS data collection spark etl scheduler bigdata data-collection datapipeline pipline sparketl datax-web

A pure Java HDF5 library

✭ 83

java python bigdata file-format hdf5

Amas is recursive acronym for “Amas, monitor alert system”.

✭ 77

python nodejs docker alert monitor bigdata opentsdb opentracing

SQL Parsers for BigData, built with antlr4.

✭ 135

javascript ANTLR typescript parser autocomplete sql bigdata syntax-checker spark-sql flink-sql hive-sql hive-impala

163-bigdate-note

bigdata note

✭ 38

java scala shell notes bigdata

GreyCat - Data Analytics, Temporal data, What-if, Live machine learning

✭ 104

java javascript SCSS typescript CSS HTML nodejs machine-learning node time-series machine-learning-algorithms bigdata data-analytics data-analysis graph-database distributed-database machine-learning-library temporal greycat

2019 egu workshop jupyter notebooks

Short course on interactive analysis of Big Earth Data with Jupyter Notebooks

✭ 29

Jupyter Notebook bigdata jupyter-notebook climate-data googleearthengine geospatial-analysis

lectures-hse-spark

Масштабируемое машинное обучение и анализ больших данных с Apache Spark

✭ 20

Jupyter Notebook shell python machine-learning lectures spark bigdata mapreduce

大数据学习笔记，学习路线，技术案例整理。

✭ 37

shell python kafka hive hadoop bigdata hdfs mapreduce flink

TiDB connectors for Flink/Hive/Presto

✭ 192

java ANTLR presto hive bigdata tikv tidb flink cdc

awesome-coder-resources

编程路上加油站！------【持续更新中...欢迎star,欢迎常回来看看......】【内容：编程/学习/阅读资源，开源项目,面试题,网站,书,博客,教程等等】

✭ 54

awesome books big-data frontend backend bigdata resources interview awesome-list progamming

chatnoir-resiliparse

A robust web archive analytics toolkit

✭ 26

cython python c web bigdata extraction warc webarchive htmlparser

Clustering4Ever

C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.

✭ 126

scala big-data ai spark clustering bigdata scalability artificial-intelligence clustering-algorithm clustering-evaluation

个人笔记集中营，快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧

✭ 61

c Roff shell Makefile python Jupyter Notebook C++mysql blog linux docker bigdata os elk cs preson-notes

bigquery-data-lineage

Reference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.

✭ 112

java bigquery bigdata data-catalog dataflow data-management data-governance data-lineage zetasql

一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集

✭ 54

javascript nodejs stream memory bigdata intersect

java学习和项目中一些典型的应用场景样例代码

✭ 21

javascript java CSS spring-boot study example bigdata annotations design-patterns

twitter-archive-reader

Full featured TypeScript Twitter archive reader and browser

✭ 43

typescript twitter big-data tweets bigdata twitter-archives

Hayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data

✭ 43

CSS HTML python TeX javascript Makefile bigdata syslog

Spark-MLlib-Tutorial

大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件

✭ 32

scala machine-learning spark bigdata mllib

🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Examples for gauravbytes.com

✭ 57

java Jupyter Notebook typescript HTML javascript CSS docker elasticsearch spring spring-boot mongodb pandas-dataframe spring-cloud bigdata jupyter-notebook spring-mvc apache-avro elk-stack imdg

An open-source, cloud-ready web application for simplified deployment of big data workflows.

✭ 26

javascript python Common Workflow Language HTML CSS Dockerfile workflow bioinformatics bigdata web-application workflows

bigdatatutorial

bigdatatutorial

✭ 34

shell spark hadoop bigdata postgresql spark-streaming greenplum spark-sql

121-164 of 164 bigdata projects