Toolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.

Stars: ✭ 54 (-53.04%)

Mutual labels: big-data

net.jgp.books.spark.ch07

Spark in Action, 2nd edition - chapter 7 - Ingestion from files

Stars: ✭ 13 (-88.7%)

Mutual labels: apache-spark

parquet-dotnet

🐬 Apache Parquet for modern .Net

Stars: ✭ 199 (+73.04%)

Mutual labels: apache-spark

IoT-system-PLC-data-to-InfluxDB

This project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.

Stars: ✭ 26 (-77.39%)

Mutual labels: big-data

bftkv

A distributed key-value storage that's tolerant to Byzantine fault.

Stars: ✭ 27 (-76.52%)

Mutual labels: big-data

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-79.13%)

Mutual labels: apache-spark

pyspark-for-data-processing

Code for my presentation: Using PySpark to Process Boat Loads of Data

Stars: ✭ 20 (-82.61%)

Mutual labels: pyspark

spark-root

Apache Spark Data Source for ROOT File Format

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

pulsar-adapters

Apache Pulsar Adapters

Stars: ✭ 18 (-84.35%)

Mutual labels: apache-spark

MLBD

Materials for "Machine Learning on Big Data" course

Stars: ✭ 20 (-82.61%)

Mutual labels: big-data

dxram

A distributed in-memory key-value storage for billions of small objects.

Stars: ✭ 25 (-78.26%)

Mutual labels: big-data

nebula

A distributed, fast open-source graph database featuring horizontal scalability and high availability

Stars: ✭ 8,196 (+7026.96%)

Mutual labels: big-data

ByteSlice

"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)

Stars: ✭ 24 (-79.13%)

Mutual labels: big-data

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Stars: ✭ 1,173 (+920%)

Mutual labels: big-data

Real Time Social Media Mining

DevOps pipeline for Real Time Social/Web Mining

Stars: ✭ 22 (-80.87%)

Mutual labels: big-data

falcon

Mirror of Apache Falcon

Stars: ✭ 95 (-17.39%)

Mutual labels: big-data

PysparkCheatsheet

PySpark Cheatsheet

Stars: ✭ 25 (-78.26%)

Mutual labels: apache-spark

Big-Data-Demo

基于Vue、three.js、echarts，数据可视化展示项目，包含三维模型导入交互、三维模型标注等功能

Stars: ✭ 146 (+26.96%)

Mutual labels: big-data

GDLibrary

Matlab library for gradient descent algorithms: Version 1.0.1

Stars: ✭ 50 (-56.52%)

Mutual labels: big-data

airavata-django-portal

Mirror of Apache Airavata Django Portal

Stars: ✭ 20 (-82.61%)

Mutual labels: big-data

Movies-Analytics-in-Spark-and-Scala

Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.

Stars: ✭ 47 (-59.13%)

Mutual labels: big-data

lcbo-api

A crawler and API server for Liquor Control Board of Ontario retail data

Stars: ✭ 152 (+32.17%)

Mutual labels: big-data

jobAnalytics and search

JobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.

Stars: ✭ 25 (-78.26%)

Mutual labels: pyspark

talaria

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Stars: ✭ 148 (+28.7%)

Mutual labels: big-data

oshinko-s2i

This is a place to put s2i images and utilities for spark application builders for openshift

Stars: ✭ 16 (-86.09%)

Mutual labels: pyspark

meetups-archivos

Ppts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …

Stars: ✭ 60 (-47.83%)

Mutual labels: big-data

anovos

Anovos - An Open Source Library for Scalable feature engineering Using Apache-Spark

Stars: ✭ 77 (-33.04%)

Mutual labels: pyspark

automile-php

Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

dlsa

Distributed least squares approximation (dlsa) implemented with Apache Spark

Stars: ✭ 25 (-78.26%)

Mutual labels: pyspark

machine-learning-course

Machine Learning Course @ Santa Clara University

Stars: ✭ 17 (-85.22%)

Mutual labels: pyspark

kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…

Stars: ✭ 474 (+312.17%)

Mutual labels: pyspark

couchdb-mango

Mirror of Apache CouchDB Mango

Stars: ✭ 34 (-70.43%)

Mutual labels: big-data

xcast

A High-Performance Data Science Toolkit for the Earth Sciences

Stars: ✭ 28 (-75.65%)

Mutual labels: big-data

FlameStream

Distributed stream processing model and its implementation

Stars: ✭ 14 (-87.83%)

Mutual labels: big-data

lubeck

High level linear algebra library for Dlang

Stars: ✭ 57 (-50.43%)

Mutual labels: big-data

bigquery-kafka-connect

☁️ nodejs kafka connect connector for Google BigQuery

Stars: ✭ 17 (-85.22%)

Mutual labels: big-data

flask-spark-docker

Just a boilerplate for PySpark and Flask

Stars: ✭ 32 (-72.17%)

Mutual labels: pyspark

ngm

swissgeol.ch gives you insight in geoscientific data - above and below the surface.

Stars: ✭ 23 (-80%)

Mutual labels: big-data

wrangler

Wrangler Transform: A DMD system for transforming Big Data

Stars: ✭ 63 (-45.22%)

Mutual labels: big-data

hyperdrive

Extensible streaming ingestion pipeline on top of Apache Spark

Stars: ✭ 31 (-73.04%)

Mutual labels: apache-spark

nifi

Deploy a secured, clustered, auto-scaling NiFi service in AWS.

Stars: ✭ 37 (-67.83%)

Mutual labels: big-data

automile-net

Automile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.

Stars: ✭ 24 (-79.13%)

Mutual labels: big-data

big-data-upf

RECSM-UPF Summer School: Social Media and Big Data Research

Stars: ✭ 21 (-81.74%)

Mutual labels: big-data

iis

Information Inference Service of the OpenAIRE system

Stars: ✭ 16 (-86.09%)

Mutual labels: big-data

spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

Stars: ✭ 39 (-66.09%)

Mutual labels: apache-spark

couchdb-couch-plugins

Mirror of Apache CouchDB

Stars: ✭ 14 (-87.83%)

Mutual labels: big-data

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

Stars: ✭ 115 (+0%)

Mutual labels: pyspark

OSCI

Open Source Contributor Index

Stars: ✭ 107 (-6.96%)

Mutual labels: pyspark

predictionio-template-ecom-recommender

PredictionIO E-Commerce Recommendation Engine Template (Scala-based parallelized engine)

Stars: ✭ 73 (-36.52%)

Mutual labels: big-data

arrow-datafusion

Apache Arrow DataFusion SQL Query Engine

Stars: ✭ 2,360 (+1952.17%)

Mutual labels: big-data

61-120 of 536 similar projects

‹

›

next*5