MLBDMaterials for "Machine Learning on Big Data" course
Stars: ✭ 20 (-41.18%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+2800%)
CascadingCascading is a feature rich API for defining and executing complex and fault tolerant data processing flows locally or on a cluster. See https://github.com/Cascading/cascading for the release repository.
Stars: ✭ 318 (+835.29%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+800%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (+2632.35%)
Data Algorithms Book MapReduce, Spark, Java, and Scala for Data Algorithms Book
Stars: ✭ 949 (+2691.18%)
Aws Etl OrchestratorA serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Stars: ✭ 245 (+620.59%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+170.59%)
Bigdataguide大数据学习,从零开始学习大数据,包含大数据学习各阶段学习视频、面试资料
Stars: ✭ 817 (+2302.94%)
meetups-archivosPpts, códigos y videos de las meetups, data science days, videollamadas y workshops. Data Science Research es una organización sin fines de lucro que busca difundir, descentralizar y difundir los conocimientos en Ciencia de Datos e Inteligencia Artificial en el Perú, dando oportunidades a nuevos talentos mediante MeetUps, Workshops y Semilleros …
Stars: ✭ 60 (+76.47%)
HadoopcryptoledgerHadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+270.59%)
ShifuAn end-to-end machine learning and data mining framework on Hadoop
Stars: ✭ 207 (+508.82%)
BigsliceA serverless cluster computing system for the Go programming language
Stars: ✭ 469 (+1279.41%)
Eel SdkBig Data Toolkit for the JVM
Stars: ✭ 140 (+311.76%)
CalciteApache Calcite
Stars: ✭ 2,816 (+8182.35%)
big-data-liteSamples to the Oracle Big Data Lite VM
Stars: ✭ 41 (+20.59%)
learning-sparkTidy up Spark and Hadoop tutorials.
Stars: ✭ 28 (-17.65%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+8426.47%)
TezApache Tez
Stars: ✭ 313 (+820.59%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+817.65%)
IgniteApache Ignite
Stars: ✭ 4,027 (+11744.12%)
CortxCORTX Community Object Storage is 100% open source object storage uniquely optimized for mass capacity storage devices.
Stars: ✭ 426 (+1152.94%)
Spark Movie LensAn on-line movie recommender using Spark, Python Flask, and the MovieLens dataset
Stars: ✭ 745 (+2091.18%)
MoosefsMooseFS – Open Source, Petabyte, Fault-Tolerant, Highly Performing, Scalable Network Distributed File System (Software-Defined Storage)
Stars: ✭ 1,025 (+2914.71%)
Pyspark Setup DemoDemo of PySpark and Jupyter Notebook with the Jupyter Docker Stacks
Stars: ✭ 24 (-29.41%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+220.59%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+214.71%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (+58.82%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (+329.41%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+276.47%)
Hdfs ShellHDFS Shell is a HDFS manipulation tool to work with functions integrated in Hadoop DFS
Stars: ✭ 117 (+244.12%)
dt-sql-parserSQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+297.06%)
PrestoThe official home of the Presto distributed SQL query engine for big data
Stars: ✭ 12,957 (+38008.82%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-41.18%)
iisInformation Inference Service of the OpenAIRE system
Stars: ✭ 16 (-52.94%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+9767.65%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+535.29%)
rastercuberastercube is a python library for big data analysis of georeferenced time series data (e.g. MODIS NDVI)
Stars: ✭ 15 (-55.88%)
Bitcoin Value Predictor[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Stars: ✭ 91 (+167.65%)
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+270.59%)
gomrjobgomrjob - a Go Framework for Hadoop Map Reduce Jobs
Stars: ✭ 39 (+14.71%)
HadoopDedup🍉基于Hadoop和HBase的大规模海量数据去重
Stars: ✭ 27 (-20.59%)
dockerfilesMulti docker container images for main Big Data Tools. (Hadoop, Spark, Kafka, HBase, Cassandra, Zookeeper, Zeppelin, Drill, Flink, Hive, Hue, Mesos, ... )
Stars: ✭ 29 (-14.71%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+126.47%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (+91.18%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+3873.53%)
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (+705.88%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (+135.29%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-44.12%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-5.88%)