spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-48.06%)
MagellanGeo Spatial Data Analytics on Spark
Stars: ✭ 507 (+293.02%)
RemoteShuffleServiceCeleborn provides an elastic and high-performance service for shuffle and spilled data.
Stars: ✭ 262 (+103.1%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+3447.29%)
img2datasetEasily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Stars: ✭ 1,173 (+809.3%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-80.62%)
RedisliteRedis in a python module.
Stars: ✭ 464 (+259.69%)
GDLibraryMatlab library for gradient descent algorithms: Version 1.0.1
Stars: ✭ 50 (-61.24%)
NabhashAn extremely fast Non-crypto-safe AES Based Hash algorithm for Big Data
Stars: ✭ 62 (-51.94%)
lcbo-apiA crawler and API server for Liquor Control Board of Ontario retail data
Stars: ✭ 152 (+17.83%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+251.94%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-85.27%)
KuduMirror of Apache Kudu
Stars: ✭ 1,360 (+954.26%)
FlameStreamDistributed stream processing model and its implementation
Stars: ✭ 14 (-89.15%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+16991.47%)
ngmswissgeol.ch gives you insight in geoscientific data - above and below the surface.
Stars: ✭ 23 (-82.17%)
Attic LensMirror of Apache Lens
Stars: ✭ 58 (-55.04%)
automile-netAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 24 (-81.4%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+230.23%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-87.6%)
Amazon S3 Find And ForgetAmazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)
Stars: ✭ 115 (-10.85%)
FIW KRTFamilies In the WIld: A Kinship Recogntion Toolbox.
Stars: ✭ 18 (-86.05%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+220.93%)
shiftingA privacy-focused list of alternatives to mainstream services to help the competition.
Stars: ✭ 31 (-75.97%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-79.07%)
Cogcomp NlpCogComp's Natural Language Processing libraries and Demos:
Stars: ✭ 410 (+217.83%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-24.81%)
Decentralized InternetA SDK/library for decentralized web and distributing computing projects
Stars: ✭ 406 (+214.73%)
Lifion KinesisA native Node.js producer and consumer library for Amazon Kinesis Data Streams
Stars: ✭ 54 (-58.14%)
merkle-dbHigh-scalability analytics database built on immutable merkle-trees
Stars: ✭ 44 (-65.89%)
OrcApache ORC - the smallest, fastest columnar storage for Hadoop workloads
Stars: ✭ 389 (+201.55%)
metriqlThe metrics layer for your data. Join us at https://metriql.com/slack
Stars: ✭ 227 (+75.97%)
Mobydq🐳 Tool to automate data quality checks on data pipelines
Stars: ✭ 123 (-4.65%)
dislibThe Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.
Stars: ✭ 39 (-69.77%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+2855.81%)
OodtMirror of Apache OODT
Stars: ✭ 52 (-59.69%)
cdp-servicecdp数据平台,帮助企业充分了解客户,实现千人千面的精准营销。
Stars: ✭ 30 (-76.74%)
HalodbA fast, log structured key-value store.
Stars: ✭ 370 (+186.82%)
sgdAn R package for large scale estimation with stochastic gradient descent
Stars: ✭ 55 (-57.36%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+937.21%)
ytprivYT metadata exporter
Stars: ✭ 28 (-78.29%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+180.62%)
scikit-learn-intelexIntel(R) Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
Stars: ✭ 887 (+587.6%)
TrckQuery engine for TrailDB
Stars: ✭ 48 (-62.79%)
BigtopMirror of Apache Bigtop
Stars: ✭ 356 (+175.97%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+1172.87%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-0.78%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-1.55%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+8073.64%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-51.94%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (+275.97%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+6253.49%)