Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Stars: ✭ 55 (+61.76%)

Mutual labels: spark-sql

dislib

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Stars: ✭ 39 (+14.71%)

Mutual labels: big-data

docker-hadoop-3

Docker file for Hadoop 3

Stars: ✭ 19 (-44.12%)

Mutual labels: hadoop

ngm

swissgeol.ch gives you insight in geoscientific data - above and below the surface.

Stars: ✭ 23 (-32.35%)

Mutual labels: big-data

Tajo

Mirror of Apache Tajo

Stars: ✭ 128 (+276.47%)

Mutual labels: big-data

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (+238.24%)

Mutual labels: pyspark

hive-bigquery-storage-handler

Hive Storage Handler for interoperability between BigQuery and Apache Hive

Stars: ✭ 16 (-52.94%)

Mutual labels: hadoop

Richdem

High-performance Terrain and Hydrology Analysis

Stars: ✭ 127 (+273.53%)

Mutual labels: big-data

cloud

云计算之hadoop、hive、hue、oozie、sqoop、hbase、zookeeper环境搭建及配置文件

Stars: ✭ 48 (+41.18%)

Mutual labels: hadoop

Hazelcast Nodejs Client

Hazelcast IMDG Node.js Client

Stars: ✭ 124 (+264.71%)

Mutual labels: big-data

jhdf

A pure Java HDF5 library

Stars: ✭ 83 (+144.12%)

Mutual labels: bigdata

Scala Spark Tutorial

Project for James' Apache Spark with Scala course

Stars: ✭ 121 (+255.88%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+6841.18%)

Mutual labels: big-data

webhdfs

Node.js WebHDFS REST API client

Stars: ✭ 88 (+158.82%)

Mutual labels: hadoop

dpkb

大数据相关内容汇总，包括分布式存储引擎、分布式计算引擎、数仓建设等。关键词：Hadoop、HBase、ES、Kudu、Hive、Presto、Spark、Flink、Kylin、ClickHouse

Stars: ✭ 123 (+261.76%)

Mutual labels: hadoop

sparklanes

A lightweight data processing framework for Apache Spark

Stars: ✭ 17 (-50%)

Mutual labels: pyspark

opendc

Collaborative Datacenter Simulation and Exploration for Everybody

Stars: ✭ 40 (+17.65%)

Mutual labels: big-data

phoenix-queryserver

Apache Phoenix Query Server

Stars: ✭ 33 (-2.94%)

Mutual labels: big-data

Cmak

CMAK is a tool for managing Apache Kafka clusters

Stars: ✭ 10,544 (+30911.76%)

Mutual labels: big-data

vulkn

Love your Data. Love the Environment. Love VULKИ.

Stars: ✭ 43 (+26.47%)

Mutual labels: bigdata

big-data-upf

RECSM-UPF Summer School: Social Media and Big Data Research

Stars: ✭ 21 (-38.24%)

Mutual labels: big-data

hadoop-crypto

Library for per-file client-side encyption in Hadoop FileSystems such as HDFS or S3.

Stars: ✭ 38 (+11.76%)

Mutual labels: hadoop

albis

Albis: High-Performance File Format for Big Data Systems

Stars: ✭ 20 (-41.18%)

Mutual labels: spark-sql

couchdb-pkg

Apache CouchDB Packaging support files

Stars: ✭ 24 (-29.41%)

Mutual labels: big-data

coolplayflink

Flink: Stateful Computations over Data Streams

Stars: ✭ 14 (-58.82%)

Mutual labels: bigdata

learn-by-examples

Real-world Spark pipelines examples

Stars: ✭ 84 (+147.06%)

Mutual labels: pyspark

predictionio-template-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 73 (+114.71%)

Mutual labels: big-data

datasqueeze

Hadoop utility to compact small files

Stars: ✭ 18 (-47.06%)

Mutual labels: hadoop

chatnoir-resiliparse

A robust web archive analytics toolkit

Stars: ✭ 26 (-23.53%)

Mutual labels: bigdata

Graph sampling

Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.

Stars: ✭ 99 (+191.18%)

Mutual labels: big-data

163-bigdate-note

bigdata note

Stars: ✭ 38 (+11.76%)

Mutual labels: bigdata

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (+191.18%)

Mutual labels: big-data

siembol

An open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.

Stars: ✭ 153 (+350%)

Mutual labels: big-data

cdp-service

cdp数据平台，帮助企业充分了解客户，实现千人千面的精准营销。

Stars: ✭ 30 (-11.76%)

Mutual labels: big-data

Spark-for-data-engineers

Apache Spark for data engineers

Stars: ✭ 22 (-35.29%)

Mutual labels: pyspark

Quantitative-Big-Imaging-2018

(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018

Stars: ✭ 50 (+47.06%)

Mutual labels: big-data

soda-spark

Soda Spark is a PySpark library that helps you with testing your data in Spark Dataframes

Stars: ✭ 58 (+70.59%)

Mutual labels: pyspark

pulsar-user-group-loc-cn

Workspace for China local user group.

Stars: ✭ 19 (-44.12%)

Mutual labels: bigdata

room-renting

用Python爬取安居客房源信息，并用高德地图进行可视化