All Projects → big_data → Similar Projects or Alternatives

795 Open source projects that are alternatives of or similar to big_data

hadoop-etl-udfs
The Hadoop ETL UDFs are the main way to load data from Hadoop into EXASOL
Stars: ✭ 17 (-50%)
Mutual labels:  hadoop
metriql
The metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+567.65%)
Mutual labels:  big-data
datasphere-service
an open source dataworks platform
Stars: ✭ 20 (-41.18%)
Mutual labels:  bigdata
beekeeper
Service for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (+26.47%)
Mutual labels:  big-data
openPDC
Open Source Phasor Data Concentrator
Stars: ✭ 109 (+220.59%)
Mutual labels:  hadoop
wrangler
Wrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (+85.29%)
Mutual labels:  big-data
100daysofmlcode
My journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (+329.41%)
Mutual labels:  big-data
FlameStream
Distributed stream processing model and its implementation
Stars: ✭ 14 (-58.82%)
Mutual labels:  big-data
Metamodel
Mirror of Apache Metamodel
Stars: ✭ 143 (+320.59%)
Mutual labels:  big-data
bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-50%)
Mutual labels:  big-data
machine-learning-course
Machine Learning Course @ Santa Clara University
Stars: ✭ 17 (-50%)
Mutual labels:  pyspark
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (+308.82%)
Mutual labels:  big-data
leetspeek
Open and collaborative content from leet hackers!
Stars: ✭ 11 (-67.65%)
Mutual labels:  big-data
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (+302.94%)
Mutual labels:  big-data
darwin
Avro Schema Evolution made easy
Stars: ✭ 26 (-23.53%)
Mutual labels:  hadoop
Attic Apex Malhar
Mirror of Apache Apex malhar
Stars: ✭ 131 (+285.29%)
Mutual labels:  big-data
memex-gate
General Architecture for Text Engineering
Stars: ✭ 47 (+38.24%)
Mutual labels:  hadoop
TiBigData
TiDB connectors for Flink/Hive/Presto
Stars: ✭ 192 (+464.71%)
Mutual labels:  bigdata
Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (+61.76%)
Mutual labels:  spark-sql
dislib
The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (+14.71%)
Mutual labels:  big-data
docker-hadoop-3
Docker file for Hadoop 3
Stars: ✭ 19 (-44.12%)
Mutual labels:  hadoop
ngm
swissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-32.35%)
Mutual labels:  big-data
Tajo
Mirror of Apache Tajo
Stars: ✭ 128 (+276.47%)
Mutual labels:  big-data
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+238.24%)
Mutual labels:  pyspark
hive-bigquery-storage-handler
Hive Storage Handler for interoperability between BigQuery and Apache Hive
Stars: ✭ 16 (-52.94%)
Mutual labels:  hadoop
Richdem
High-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (+273.53%)
Mutual labels:  big-data
cloud
云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件
Stars: ✭ 48 (+41.18%)
Mutual labels:  hadoop
Hazelcast Nodejs Client
Hazelcast IMDG Node.js Client
Stars: ✭ 124 (+264.71%)
Mutual labels:  big-data
jhdf
A pure Java HDF5 library
Stars: ✭ 83 (+144.12%)
Mutual labels:  bigdata
Scala Spark Tutorial
Project for James' Apache Spark with Scala course
Stars: ✭ 121 (+255.88%)
Mutual labels:  big-data
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
Stars: ✭ 2,360 (+6841.18%)
Mutual labels:  big-data
webhdfs
Node.js WebHDFS REST API client
Stars: ✭ 88 (+158.82%)
Mutual labels:  hadoop
dpkb
大数据相关内容汇总,包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词:Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse
Stars: ✭ 123 (+261.76%)
Mutual labels:  hadoop
sparklanes
A lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-50%)
Mutual labels:  pyspark
opendc
Collaborative Datacenter Simulation and Exploration for Everybody
Stars: ✭ 40 (+17.65%)
Mutual labels:  big-data
phoenix-queryserver
Apache Phoenix Query Server
Stars: ✭ 33 (-2.94%)
Mutual labels:  big-data
Cmak
CMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+30911.76%)
Mutual labels:  big-data
vulkn
Love your Data. Love the Environment. Love VULKИ.
Stars: ✭ 43 (+26.47%)
Mutual labels:  bigdata
big-data-upf
RECSM-UPF Summer School: Social Media and Big Data Research
Stars: ✭ 21 (-38.24%)
Mutual labels:  big-data
hadoop-crypto
Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.
Stars: ✭ 38 (+11.76%)
Mutual labels:  hadoop
albis
Albis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (-41.18%)
Mutual labels:  spark-sql
couchdb-pkg
Apache CouchDB Packaging support files
Stars: ✭ 24 (-29.41%)
Mutual labels:  big-data
coolplayflink
Flink: Stateful Computations over Data Streams
Stars: ✭ 14 (-58.82%)
Mutual labels:  bigdata
learn-by-examples
Real-world Spark pipelines examples
Stars: ✭ 84 (+147.06%)
Mutual labels:  pyspark
predictionio-template-ecom-recommender
PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)
Stars: ✭ 73 (+114.71%)
Mutual labels:  big-data
datasqueeze
Hadoop utility to compact small files
Stars: ✭ 18 (-47.06%)
Mutual labels:  hadoop
chatnoir-resiliparse
A robust web archive analytics toolkit
Stars: ✭ 26 (-23.53%)
Mutual labels:  bigdata
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (+191.18%)
Mutual labels:  big-data
163-bigdate-note
bigdata note
Stars: ✭ 38 (+11.76%)
Mutual labels:  bigdata
classifai
🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (+191.18%)
Mutual labels:  big-data
siembol
An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+350%)
Mutual labels:  big-data
cdp-service
cdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-11.76%)
Mutual labels:  big-data
Spark-for-data-engineers
Apache Spark for data engineers
Stars: ✭ 22 (-35.29%)
Mutual labels:  pyspark
Quantitative-Big-Imaging-2018
(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
Stars: ✭ 50 (+47.06%)
Mutual labels:  big-data
soda-spark
Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (+70.59%)
Mutual labels:  pyspark
pulsar-user-group-loc-cn
Workspace for China local user group.
Stars: ✭ 19 (-44.12%)
Mutual labels:  bigdata
room-renting
用Python爬取安居客房源信息,并用高德地图进行可视化
Stars: ✭ 16 (-52.94%)
Mutual labels:  bigdata
subsemble
subsemble R package for ensemble learning on subsets of data
Stars: ✭ 40 (+17.65%)
Mutual labels:  big-data
sgd
An R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (+61.76%)
Mutual labels:  big-data
TonY
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
Stars: ✭ 687 (+1920.59%)
Mutual labels:  hadoop
301-360 of 795 similar projects