img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+1217.98%)
Mutual labels: big-data
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-24.72%)
Mutual labels: big-data
siembolAn open-source, real-time Security Information & Event Management tool based on big data technologies, providing a scalable, advanced security analytics framework.
Stars: ✭ 153 (+71.91%)
Mutual labels: big-data
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-71.91%)
Mutual labels: big-data
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+194.38%)
Mutual labels: big-data
CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
Stars: ✭ 40 (-55.06%)
Mutual labels: big-data
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-43.82%)
Mutual labels: big-data
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+42.7%)
Mutual labels: big-data
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
Stars: ✭ 54 (-39.33%)
Mutual labels: big-data
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-83.15%)
Mutual labels: big-data
spark-rootApache Spark Data Source for ROOT File Format
Stars: ✭ 28 (-68.54%)
Mutual labels: big-data
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
Stars: ✭ 26 (-70.79%)
Mutual labels: big-data
azure-big-data-starterA boilerplate project for Azure Big Data PaaS services
Stars: ✭ 13 (-85.39%)
Mutual labels: big-data
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+9108.99%)
Mutual labels: big-data
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-64.04%)
Mutual labels: big-data
beam-siteApache Beam Site
Stars: ✭ 28 (-68.54%)
Mutual labels: big-data
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (+31.46%)
Mutual labels: big-data
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-51.69%)
Mutual labels: big-data
airavata-php-gatewayMirror of Apache Airavata PHP Gateway
Stars: ✭ 15 (-83.15%)
Mutual labels: big-data