SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-20.96%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+104.78%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-76.84%)
ThrillThrill - An EXPERIMENTAL Algorithmic Distributed Big Data Batch Processing Framework in C++
Stars: ✭ 528 (+94.12%)
CalciteApache Calcite
Stars: ✭ 2,816 (+935.29%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+1793.01%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-90.81%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+86.4%)
Couchdb DockerSemi-official Apache CouchDB Docker images
Stars: ✭ 194 (-28.68%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+1582.35%)
bandar-logMonitoring tool to measure flow throughput of data sources and processing components that are part of Data Ingestion and ETL pipelines.
Stars: ✭ 20 (-92.65%)
RedisliteRedis in a python module.
Stars: ✭ 464 (+70.59%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+66.91%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+331.25%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+8005.88%)
Presto Go ClientA Presto client for the Go programming language.
Stars: ✭ 183 (-32.72%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+56.62%)
predictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,510 (+4499.26%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+52.21%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-34.93%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (+50.74%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-81.62%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (+49.26%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-37.13%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (+43.01%)
alluxio-pyAlluxio Python client - Access Any Data Source with Python
Stars: ✭ 18 (-93.38%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+1301.84%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-38.6%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (+36.03%)
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (-44.12%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+33.09%)
FluoApache Fluo
Stars: ✭ 159 (-41.54%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (+30.88%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-89.34%)
Devops RoadmapDevOps methodology & roadmap for a devops developer in 2019. Interesting books to learn new technologies.
Stars: ✭ 349 (+28.31%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-44.12%)
StroomStroom is a highly scalable data storage, processing and analysis platform.
Stars: ✭ 344 (+26.47%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-93.01%)
OzoneScalable, redundant, and distributed object store for Apache Hadoop
Stars: ✭ 330 (+21.32%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-43.75%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+14.71%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-44.85%)
MistServerless proxy for Spark cluster
Stars: ✭ 309 (+13.6%)
FlameStreamDistributed stream processing model and its implementation
Stars: ✭ 14 (-94.85%)
HelixMirror of Apache Helix
Stars: ✭ 304 (+11.76%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (-46.32%)
CloudbreakA tool for provisioning and managing Apache Hadoop clusters in the cloud. Cloudbreak, as part of the Hortonworks Data Platform, makes it easy to provision, configure and elastically grow HDP clusters on cloud infrastructure. Cloudbreak can be used to provision Hadoop across cloud infrastructure providers including AWS, Azure, GCP and OpenStack.
Stars: ✭ 301 (+10.66%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-63.6%)
MetamodelMirror of Apache Metamodel
Stars: ✭ 143 (-47.43%)
DatahubThe Metadata Platform for the Modern Data Stack
Stars: ✭ 4,232 (+1455.88%)
bigstatsrR package for statistical tools with big matrices stored on disk.
Stars: ✭ 139 (-48.9%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-94.85%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-66.54%)
bftkvA distributed key-value storage that's tolerant to Byzantine fault.
Stars: ✭ 27 (-90.07%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (-53.31%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-77.21%)