GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1203.17%)
BeamApache Beam is a unified programming model for Batch and Streaming
Stars: ✭ 5,149 (+3986.51%)
UsqlU-SQL Examples and Issue Tracking
Stars: ✭ 221 (+75.4%)
lensMirror of Apache Lens
Stars: ✭ 57 (-54.76%)
ReefMirror of Apache REEF
Stars: ✭ 92 (-26.98%)
Awkward 0.xManipulate arrays of complex data structures as easily as Numpy.
Stars: ✭ 216 (+71.43%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (-27.78%)
scikit-learn-intelexIntel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (+603.97%)
Parquet MrApache Parquet
Stars: ✭ 1,278 (+914.29%)
HelicalinsightHelical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (+69.84%)
Onlinestats.jlSingle-pass algorithms for statistics
Stars: ✭ 507 (+302.38%)
hyper-enginePython library for Bayesian hyper-parameters optimization
Stars: ✭ 80 (-36.51%)
PanoptesA Global Scale Network Telemetry Ecosystem
Stars: ✭ 80 (-36.51%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+302.38%)
falconMirror of Apache Falcon
Stars: ✭ 95 (-24.6%)
IotdbApache IoTDB
Stars: ✭ 1,221 (+869.05%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (+296.03%)
wranglerWrangler Transform: A DMD system for transforming Big Data
Stars: ✭ 63 (-50%)
scSeqRThis package has migrated to https://github.com/rezakj/iCellR please use iCellR instead of scSeqR for more functionalities and updates.
Stars: ✭ 16 (-87.3%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (+1.59%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+3531.75%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-76.98%)
LabsResearch on distributed system
Stars: ✭ 73 (-42.06%)
classifai🔥 One of the most comprehensive open-source data annotation platform.
Stars: ✭ 99 (-21.43%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (+53.17%)
storm-mlan online learning algorithm library for Storm
Stars: ✭ 18 (-85.71%)
ByteSlice"Byteslice: Pushing the envelop of main memory data processing with a new storage layout" (SIGMOD'15)
Stars: ✭ 24 (-80.95%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (+827.78%)
GunAn open source cybersecurity protocol for syncing decentralized graph data.
Stars: ✭ 15,172 (+11941.27%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+284.92%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (+819.05%)
AzuredatalakeSamples and Docs for Azure Data Lake Store and Analytics
Stars: ✭ 128 (+1.59%)
RedisliteRedis in a python module.
Stars: ✭ 464 (+268.25%)
LoL-Match-PredictionWin probability predictions for League of Legends matches using neural networks
Stars: ✭ 34 (-73.02%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-46.83%)
FlumeMirror of Apache Flume
Stars: ✭ 2,200 (+1646.03%)
incubator-liminalApache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Stars: ✭ 117 (-7.14%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-50%)
beekeeperService for automatically managing and cleaning up unreferenced data
Stars: ✭ 43 (-65.87%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+972.22%)
MAL-MapCluster and visualize relationships between anime on MyAnimeList
Stars: ✭ 201 (+59.52%)
dmmclustdmmclust is a package for clustering short texts, based on Yin and Wang (2014)
Stars: ✭ 23 (-81.75%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+1944.44%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+260.32%)
Conjure UpDeploying complex solutions, magically.
Stars: ✭ 454 (+260.32%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+1.59%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+17398.41%)
ClickhouseClickHouse® is a free analytics DBMS for big data
Stars: ✭ 21,089 (+16637.3%)