Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .

Stars: ✭ 55 (-80.07%)

Mutual labels: parquet

CleanArchitecture

ASP.NET Core 6 Web API Clean Architecture Solution Template

Stars: ✭ 312 (+13.04%)

Mutual labels: dotnet-core

classifai

🔥 One of the most comprehensive open-source data annotation platform.

Stars: ✭ 99 (-64.13%)

Mutual labels: big-data

hadoop-data-ingestion-tool

OLAP and ETL of Big Data

Stars: ✭ 17 (-93.84%)

Mutual labels: big-data

spark-utils

Basic framework utilities to quickly start writing production ready Apache Spark applications

Stars: ✭ 25 (-90.94%)

Mutual labels: apache-spark

Equinox

.NET Event Sourcing library with CosmosDB, EventStoreDB, SqlStreamStore and integration test backends. Focused at stream level; see https://github.com/jet/propulsion for cross-stream projections/subscriptions/reactions

Stars: ✭ 260 (-5.8%)

Mutual labels: dotnet-core

alluxio-py

Alluxio Python client - Access Any Data Source with Python

Stars: ✭ 18 (-93.48%)

Mutual labels: big-data

SparkTwitterAnalysis

An Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.

Stars: ✭ 29 (-89.49%)

Mutual labels: apache-spark

connected-component

Map Reduce Implementation of Connected Component on Apache Spark

Stars: ✭ 68 (-75.36%)

Mutual labels: apache-spark

parquet-dotnet

🐬 Apache Parquet for modern .Net

Stars: ✭ 199 (-27.9%)

Mutual labels: apache-spark

HAL-9000

Automatically setup a productive development environment with Ansible on macOS

Stars: ✭ 72 (-73.91%)

Mutual labels: apache-spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-65.58%)

Mutual labels: big-data

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-92.75%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-91.3%)

Mutual labels: big-data

Attic Predictionio Sdk Php

PredictionIO PHP SDK

Stars: ✭ 272 (-1.45%)

Mutual labels: big-data

Fusillade

An opinionated HTTP library for Mobile Development

Stars: ✭ 269 (-2.54%)

Mutual labels: dotnet-core

Roapi

Create full-fledged APIs for static datasets without writing a single line of code.

Stars: ✭ 253 (-8.33%)

Mutual labels: parquet

bandar-log

Monitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.

Stars: ✭ 20 (-92.75%)

Mutual labels: big-data

centurion

Kotlin Bigdata Toolkit

Stars: ✭ 320 (+15.94%)

Mutual labels: parquet

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (-47.1%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-82.97%)

Mutual labels: big-data

predictionio-template-similar-product

PredictionIO Similar Product Engine Template (Scala-based parallelized engine)

Stars: ✭ 50 (-81.88%)

Mutual labels: big-data

Parquet.jl

Julia implementation of Parquet columnar file format reader

Stars: ✭ 93 (-66.3%)

Mutual labels: parquet

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (-46.38%)

Mutual labels: big-data

meepo

异构存储数据迁移

Stars: ✭ 29 (-89.49%)

Mutual labels: parquet

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-78.26%)

Mutual labels: big-data

xcast

A High-Performance Data Science Toolkit for the Earth Sciences

Stars: ✭ 28 (-89.86%)

Mutual labels: big-data

experiments

Code examples for my blog posts

Stars: ✭ 21 (-92.39%)

Mutual labels: parquet

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-93.84%)

Mutual labels: big-data

hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

Stars: ✭ 31 (-88.77%)

Mutual labels: apache-spark

Spark Jupyter Aws

A guide on how to set up Jupyter with Pyspark painlessly on AWS EC2 clusters, with S3 I/O support

Stars: ✭ 259 (-6.16%)

Mutual labels: apache-spark

bigtable

TypeScript Bigtable Client with 🔋🔋 included.

Stars: ✭ 13 (-95.29%)

Mutual labels: big-data

hotmap

WebGL Heatmap Viewer for Big Data and Bioinformatics

Stars: ✭ 13 (-95.29%)

Mutual labels: big-data

jupyterlab-sparkmonitor

JupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook

Stars: ✭ 78 (-71.74%)

Mutual labels: apache-spark

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+755.07%)

Mutual labels: big-data

pypar

Efficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.

Stars: ✭ 66 (-76.09%)

Mutual labels: big-data

LoL-Match-Prediction

Win probability predictions for League of Legends matches using neural networks

Stars: ✭ 34 (-87.68%)

Mutual labels: big-data

SGDLibrary

MATLAB/Octave library for stochastic optimization algorithms: Version 1.0.20

Stars: ✭ 165 (-40.22%)

Mutual labels: big-data

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-89.13%)

Mutual labels: parquet

egis

Egis - a handy Ruby interface for AWS Athena

Stars: ✭ 38 (-86.23%)

Mutual labels: big-data

wasp

WASP is a framework to build complex real time big data applications. It relies on a kind of Kappa/Lambda architecture mainly leveraging Kafka and Spark. If you need to ingest huge amount of heterogeneous data and analyze them through complex pipelines, this is the framework for you.

Stars: ✭ 19 (-93.12%)

Mutual labels: parquet

insightedge

InsightEdge Core

Stars: ✭ 22 (-92.03%)

Mutual labels: big-data

pytorch kmeans

Implementation of the k-means algorithm in PyTorch that works for large datasets

Stars: ✭ 38 (-86.23%)

Mutual labels: big-data

cloudberry

Big Data Visualization

Stars: ✭ 89 (-67.75%)

Mutual labels: big-data

incubator-liminal

Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.

Stars: ✭ 117 (-57.61%)

Mutual labels: big-data

61-120 of 919 similar projects

‹

›

next*5