columnifyMake record oriented data to columnar format.
Stars: ✭ 28 (-91.25%)
Bigdata File ViewerA cross-platform (Windows, MAC, Linux) desktop application to view common bigdata binary format like Parquet, ORC, AVRO, etc. Support local file system, HDFS, AWS S3, Azure Blob Storage ,etc.
Stars: ✭ 86 (-73.12%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-92.5%)
StreamBenchMeasuring the performance of popular streaming engines with Yahoo's Streaming Benchmark
Stars: ✭ 52 (-83.75%)
hadoopofficeHadoopOffice - Analyze Office documents using the Hadoop ecosystem (Spark/Flink/Hive)
Stars: ✭ 56 (-82.5%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-94.06%)
taller SparkRTaller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (-96.25%)
amasAmas is recursive acronym for “Amas, monitor alert system”.
Stars: ✭ 77 (-75.94%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (-81.25%)
parquet-extraA collection of Apache Parquet add-on modules
Stars: ✭ 30 (-90.62%)
IMCtermiteEnables extraction of measurement data from binary files with extension 'raw' used by proprietary software imcFAMOS/imcSTUDIO and facilitates its storage in open source file formats
Stars: ✭ 20 (-93.75%)
flokkrDocumentation placeholder and utilities for all the other containers.
Stars: ✭ 30 (-90.62%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-91.25%)
NotesThis is a learning note | Java基础,JVM,源码,大数据,面经
Stars: ✭ 69 (-78.44%)
UnROOT.jlNative Julia I/O package to work with CERN ROOT files
Stars: ✭ 52 (-83.75%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-75.94%)
v6.dooring.public可视化大屏解决方案, 提供一套可视化编辑引擎, 助力个人或企业轻松定制自己的可视化大屏应用.
Stars: ✭ 323 (+0.94%)
zdh web大数据采集,抽取平台
Stars: ✭ 292 (-8.75%)
Parquet.jlJulia implementation of Parquet columnar file format reader
Stars: ✭ 93 (-70.94%)
dt-sql-parserSQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (-57.81%)
pyorcPython module for Apache ORC file format
Stars: ✭ 58 (-81.87%)
odbc2parquetA command line tool to query an ODBC data source and write the result into a parquet file.
Stars: ✭ 95 (-70.31%)
parquet-usqlA custom extractor designed to read parquet for Azure Data Lake Analytics
Stars: ✭ 13 (-95.94%)
coolplayflinkFlink: Stateful Computations over Data Streams
Stars: ✭ 14 (-95.62%)
SparkApache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (-82.81%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-89.37%)
datacatalog-tag-managerPython package to manage Google Cloud Data Catalog tags, loading metadata from external sources -- currently supports the CSV file format
Stars: ✭ 17 (-94.69%)
ExposureExposure是一个帮助做曝光统计需求的库,可以很方便的对曝光事件进行埋点,在现有代码上少量侵入即可实现曝光埋点。支持RV的线性布局、网格布局、瀑布流布局、横向滑动RV,ScrollView等各种滚动布局。支持配置item的有效曝光面积。
Stars: ✭ 51 (-84.06%)
ETL-Starter-Kit📁 Extract, Transform, Load (ETL) 👷 refers to a process in database usage and especially in data warehousing. This repository contains a starter kit featuring ETL related work.
Stars: ✭ 21 (-93.44%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-90.94%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-90.94%)
meepo异构存储数据迁移
Stars: ✭ 29 (-90.94%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-79.69%)
cdsData syncing in golang for ClickHouse.
Stars: ✭ 839 (+162.19%)
jhdfA pure Java HDF5 library
Stars: ✭ 83 (-74.06%)
bqvThe simplest tool to manage views of BigQuery.
Stars: ✭ 22 (-93.12%)
albisAlbis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-93.75%)
awesome-bigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 11,093 (+3366.56%)
greycatGreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (-67.5%)
waspWASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.
Stars: ✭ 19 (-94.06%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-93.75%)
parquet2Fastest and safest Rust implementation of parquet. `unsafe` free. Integration-tested against pyarrow
Stars: ✭ 157 (-50.94%)
TiBigDataTiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (-40%)
flink-learnLearning Flink : Flink CEP,Flink Core,Flink SQL
Stars: ✭ 70 (-78.12%)
experimentsCode examples for my blog posts
Stars: ✭ 21 (-93.44%)
graphiqueGraphQL service for arrow tables and parquet data sets.
Stars: ✭ 28 (-91.25%)
vulknLove your Data. Love the Environment. Love VULKИ.
Stars: ✭ 43 (-86.56%)