CS Book🔥 Latest computer science e-books。提供最新技术类电子书下载, “我无非就是想卷死各位,或者被各位卷死!”
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
scarfToolkit for highly memory efficient analysis of single-cell RNA-Seq, scATAC-Seq and CITE-Seq data. Analyze atlas scale datasets with millions of cells on laptop.
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
IoT-system-PLC-data-to-InfluxDBThis project aim is to provide free software to fetch data from plcs (Siemens S7-300/400/1200/1500) and store it. Used stack is completly opensource. I used InfluDB as data storage, so application principle is following Big Data paradigm.
spark-rootApache Spark Data Source for ROOT File Format
dxramA distributed in-memory key-value storage for billions of small objects.
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
automile-phpAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
FlameStreamDistributed stream processing model and its implementation
lubeckHigh level linear algebra library for Dlang
ngmswissgeol.ch gives you insight in geoscientific data - above and below the surface.
nifiDeploy a secured, clustered, auto-scaling NiFi service in AWS.
automile-netAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
big-data-upfRECSM-UPF Summer School: Social Media and Big Data Research
iisInformation Inference Service of the OpenAIRE system
FIW KRTFamilies In the WIld: A Kinship Recogntion Toolbox.
shiftingA privacy-focused list of alternatives to mainstream services to help the competition.
yildiz🦄🌟 Graph Database layer on top of Google Bigtable
data-viz-utilsFunctions for easily making publication-quality figures with matplotlib.
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
lidboxEnd-to-end spoken language identification out of the box.
awesome-toolscurated list of awesome tools and libraries for specific domains
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
leetspeekOpen and collaborative content from leet hackers!
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Quantitative-Big-Imaging-2018(Latest semester at https://github.com/kmader/Quantitative-Big-Imaging-2019) The material for the Quantitative Big Imaging course at ETHZ for the Spring Semester 2018
sgdAn R package for large scale estimation with stochastic gradient descent
mmtf-sparkMethods for the parallel and distributed analysis and mining of the Protein Data Bank using MMTF and Apache Spark.
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
scikit-learn-intelexIntel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
bullet-coreBullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.