hadoop-etl-udfsThe Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-50%)
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+567.65%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+26.47%)
openPDCOpen Source Phasor Data Concentrator
Stars: ✭ 109 (+220.59%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+85.29%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (+329.41%)
FlameStreamDistributed stream processing model and its implementation
Stars: ✭ 14 (-58.82%)
MetamodelMirror of Apache Metamodel
Stars: ✭ 143 (+320.59%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (+308.82%)
leetspeekOpen and collaborative content from leet hackers!
Stars: ✭ 11 (-67.65%)
darwinAvro Schema Evolution made easy
Stars: ✭ 26 (-23.53%)
memex-gateGeneral Architecture for Text Engineering
Stars: ✭ 47 (+38.24%)
TiBigDataTiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+464.71%)
SparkApache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (+61.76%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+14.71%)
ngmswissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-32.35%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (+276.47%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+238.24%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (+273.53%)
cloud云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (+41.18%)
jhdfA pure Java HDF5 library
Stars: ✭ 83 (+144.12%)
arrow-datafusionApache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+6841.18%)
webhdfsNode.js WebHDFS REST API client
Stars: ✭ 88 (+158.82%)
dpkb大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+261.76%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-50%)
opendcCollaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (+17.65%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+30911.76%)
vulknLove your Data. Love the Environment. Love VULKИ.
Stars: ✭ 43 (+26.47%)
big-data-upfRECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-38.24%)
hadoop-cryptoLibrary for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+11.76%)
albisAlbis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-41.18%)
couchdb-pkgApache CouchDB Packaging support files
Stars: ✭ 24 (-29.41%)
coolplayflinkFlink: Stateful Computations over Data Streams
Stars: ✭ 14 (-58.82%)
datasqueezeHadoop utility to compact small files
Stars: ✭ 18 (-47.06%)
Graph samplingGraph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (+191.18%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+191.18%)
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+350%)
cdp-servicecdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-11.76%)
Quantitative-Big-Imaging-2018(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (+47.06%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+70.59%)
room-renting用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-52.94%)
subsemblesubsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (+17.65%)
sgdAn R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (+61.76%)
TonYTonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+1920.59%)