Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1637.66%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-55.84%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1180.52%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+1654.55%)
AutoTSAutomated Time Series Forecasting
Stars: ✭ 665 (+763.64%)
PersonNotes个人笔记集中营,快糙猛的形式记录技术性Notes .. 📚☕️⌨️🎧
Stars: ✭ 61 (-20.78%)
Tempsλ A selfhostable serverless function runtime. Inspired by zeit now.
Stars: ✭ 15 (-80.52%)
EvolutionaryForestAn open source python library for automated feature engineering based on Genetic Programming
Stars: ✭ 56 (-27.27%)
msdaLibrary for multi-dimensional, multi-sensor, uni/multivariate time series data analysis, unsupervised feature selection, unsupervised deep anomaly detection, and prototype of explainable AI for anomaly detector
Stars: ✭ 80 (+3.9%)
kaggle-berlinMaterial of the Kaggle Berlin meetup group!
Stars: ✭ 36 (-53.25%)
zdh web大数据采集,抽取平台
Stars: ✭ 292 (+279.22%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-24.68%)
intersect一道面试题的思考 - 6000万数据包和300万数据包在50M内存使用环境中求交集
Stars: ✭ 54 (-29.87%)
StreamBenchMeasuring the performance of popular streaming engines with Yahoo's Streaming Benchmark
Stars: ✭ 52 (-32.47%)
clinkClink is a library that provides APIs and infrastructure to facilitate the development of parallelizable feature engineering operators that can be used in both C++ and Java runtime.
Stars: ✭ 24 (-68.83%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-6.49%)
fengfeng - feature engineering for machine-learning champions
Stars: ✭ 27 (-64.94%)
hayabusaHayabusa: Simple and Fast Full-Text Search Engine for Massive System Log Data
Stars: ✭ 43 (-44.16%)
jhdfA pure Java HDF5 library
Stars: ✭ 83 (+7.79%)
dotdistributed data sync with operational transformation/transforms
Stars: ✭ 73 (-5.19%)
jgit-spark-connectorjgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (-7.79%)
awesome-coder-resources编程路上加油站!------【持续更新中...欢迎star,欢迎常回来看看......】【内容:编程/学习/阅读资源,开源项目,面试题,网站,书,博客,教程等等】
Stars: ✭ 54 (-29.87%)
dt-sql-parserSQL Parsers for BigData, built with antlr4.
Stars: ✭ 135 (+75.32%)
einetUncertainty and causal emergence in complex networks
Stars: ✭ 77 (+0%)
gintonicA declarative transformation language for GraphQL 🍸
Stars: ✭ 27 (-64.94%)
stargan2StarGAN2 for practice
Stars: ✭ 89 (+15.58%)
hedgedhttpHedged HTTP client which helps to reduce tail latency at scale.
Stars: ✭ 103 (+33.77%)
greycatGreyCat - Data Analytics, Temporal data, What-if, Live machine learning
Stars: ✭ 104 (+35.06%)
Clustering4EverC4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Stars: ✭ 126 (+63.64%)
bigquery-data-lineageReference implementation for real-time Data Lineage tracking for BigQuery using Audit Logs, ZetaSQL and Dataflow.
Stars: ✭ 112 (+45.45%)
PubMed-Best-MatchMachine-learning based pipeline relying on LambdaMART currently used in PubMed for relevance (Best Match) searches
Stars: ✭ 36 (-53.25%)
ASV[CVPR16] Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple Scales
Stars: ✭ 26 (-66.23%)
go-hx711Golang HX711 interface using periph.io driver
Stars: ✭ 15 (-80.52%)
GEANThis toolkit deals with GEnomic sequence and genome structure ANnotation files between inbreeding lines and species.
Stars: ✭ 36 (-53.25%)
exemplary-ml-pipelineExemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-70.13%)
scale📦 Toolkit for mapping abstract data into visual representation.
Stars: ✭ 53 (-31.17%)
FIFA-2019-AnalysisThis is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-63.64%)
the-apache-ignite-bookAll code samples, scripts and more in-depth examples for The Apache Ignite Book. Include Apache Ignite 2.6 or above
Stars: ✭ 65 (-15.58%)
nitromlNitroML is a modular, portable, and scalable model-quality benchmarking framework for Machine Learning and Automated Machine Learning (AutoML) pipelines.
Stars: ✭ 40 (-48.05%)
skrobotskrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (-71.43%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-63.64%)
amasAmas is recursive acronym for “Amas, monitor alert system”.
Stars: ✭ 77 (+0%)
lectures-hse-sparkМасштабируемое машинное обучение и анализ больших данных с Apache Spark
Stars: ✭ 20 (-74.03%)
BetterDummyUnlock your displays on your Mac! Smooth scaling, HiDPI unlock, XDR/HDR extra brightness upscale, DDC, brightness and dimming, dummy displays, PIP and lots more!
Stars: ✭ 9,601 (+12368.83%)
pyspark-cassandrapyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (-9.09%)