Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-21.87%)
Streamxkafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Stars: ✭ 96 (-50%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+755.21%)
TreevizTree diagrams with JavaScript 🌲 📈
Stars: ✭ 95 (-50.52%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-13.02%)
TajoMirror of Apache Tajo
Stars: ✭ 128 (-33.33%)
100daysofmlcodeMy journey to learn and grow in the domain of Machine Learning and Artificial Intelligence by performing the #100DaysofMLCode Challenge.
Stars: ✭ 146 (-23.96%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (+1241.67%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-58.33%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-7.81%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-58.85%)
RichdemHigh-performance Terrain and Hydrology Analysis
Stars: ✭ 127 (-33.85%)
MetamodelMirror of Apache Metamodel
Stars: ✭ 143 (-25.52%)
LabsResearch on distributed system
Stars: ✭ 73 (-61.98%)
FluoApache Fluo
Stars: ✭ 159 (-17.19%)
AppdocsApplication Performance Optimization Summary
Stars: ✭ 1,169 (+508.85%)
CarbondataMirror of Apache CarbonData
Stars: ✭ 1,158 (+503.13%)
Flink ShadedApache Flink shaded artifacts repository
Stars: ✭ 67 (-65.1%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (-39.06%)
Cloud VolumeRead and write Neuroglancer datasets programmatically.
Stars: ✭ 63 (-67.19%)
Presto Go ClientA Presto client for the Go programming language.
Stars: ✭ 183 (-4.69%)
WarpConvert and analyze large data sets at light speed, on Mac and iOS.
Stars: ✭ 62 (-67.71%)
CmakCMAK is a tool for managing Apache Kafka clusters
Stars: ✭ 10,544 (+5391.67%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-69.27%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (-27.08%)
YmcacheYMCache is a lightweight object caching solution for iOS and Mac OS X that is designed for highly parallel access scenarios.
Stars: ✭ 58 (-69.79%)
AsakusafwAsakusa Framework
Stars: ✭ 114 (-40.62%)
Kibble 1Apache Kibble - a tool to collect, aggregate and visualize data about any software project
Stars: ✭ 54 (-71.87%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-20.83%)
Macro mlCourse Website on Macroeconomic Analysis with Machine Learning and Big Data
Stars: ✭ 53 (-72.4%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-41.15%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+453.65%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-27.6%)
TraildbTrailDB is an efficient tool for storing and querying series of events
Stars: ✭ 1,029 (+435.94%)
GenieDistributed Big Data Orchestration Service
Stars: ✭ 1,544 (+704.17%)
KeyviKeyvi - a key value index that powers Cliqz search engine. It is an in-memory FST-based data structure highly optimized for size and lookup performance.
Stars: ✭ 171 (-10.94%)
EgadsA Java package to automatically detect anomalies in large scale time-series data
Stars: ✭ 997 (+419.27%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-43.23%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-80.73%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-44.27%)
QcportalA client interface to the QCArchive Project (read-only image of QCFractal)
Stars: ✭ 29 (-84.9%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-20.31%)
MahaA framework for rapid reporting API development; with out of the box support for high cardinality dimension lookups with druid.
Stars: ✭ 101 (-47.4%)
GunAn open source cybersecurity protocol for syncing decentralized graph data.
Stars: ✭ 15,172 (+7802.08%)
FlumeMirror of Apache Flume
Stars: ✭ 2,200 (+1045.83%)
Attic PredictionioPredictionIO, a machine learning server for developers and ML engineers.
Stars: ✭ 12,522 (+6421.88%)
FiliEasily make RESTful web services for time series reporting with Big Data analytics engines like Druid and SQL Databases.
Stars: ✭ 151 (-21.35%)