Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-79.12%)
Bigdata PlaygroundA complete example of a big data application using : Kubernetes (kops/aws), Apache Spark SQL/Streaming/MLib, Apache Flink, Scala, Python, Apache Kafka, Apache Hbase, Apache Parquet, Apache Avro, Apache Storm, Twitter Api, MongoDB, NodeJS, Angular, GraphQL
Stars: ✭ 177 (-71.13%)
DaFlowApache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-96.08%)
learning-hadoop-and-sparkCompanion to Learning Hadoop and Learning Spark courses on Linked In Learning
Stars: ✭ 146 (-76.18%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-81.89%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-91.03%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+822.68%)
Scalable Data ScienceScalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
Stars: ✭ 142 (-76.84%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-75.53%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-93.64%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+647.31%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-97.88%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-64.93%)
sparkucxA high-performance, scalable and efficient ShuffleManager plugin for Apache Spark, utilizing UCX communication layer
Stars: ✭ 32 (-94.78%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-32.63%)
PysparklingA pure Python implementation of Apache Spark's RDD and DStream interfaces.
Stars: ✭ 231 (-62.32%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+402.61%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+3496.74%)
Hadoop study定期更新Hadoop生态圈中常用大数据组件文档 重心依次为: Flink Solr Sparksql ES Scala Kafka Hbase/phoenix Redis Kerberos (项目包含hadoop思维导图 印象笔记 Scala版本简单demo 常用工具类 去敏后的train code 持续更新!!!)
Stars: ✭ 567 (-7.5%)
Interpretable machine learning with pythonExamples of techniques for training interpretable ML models, explaining ML models, and debugging ML models for accuracy, discrimination, and security.
Stars: ✭ 530 (-13.54%)
RumaleRumale is a machine learning library in Ruby
Stars: ✭ 526 (-14.19%)
Imbalanced LearnA Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+816.31%)
Course V3The 3rd edition of course.fast.ai
Stars: ✭ 4,785 (+680.59%)
Feature SelectionFeatures selector based on the self selected-algorithm, loss function and validation method
Stars: ✭ 534 (-12.89%)
Data Science Your WayWays of doing Data Science Engineering and Machine Learning in R and Python
Stars: ✭ 530 (-13.54%)
Awesome Ai UsecasesA list of awesome and proven Artificial Intelligence use cases and applications
Stars: ✭ 587 (-4.24%)
Lets PlotAn open-source plotting library for statistical data.
Stars: ✭ 531 (-13.38%)
AlphapyAutomated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
Stars: ✭ 564 (-7.99%)
Moderndive bookStatistical Inference via Data Science: A ModernDive into R and the Tidyverse
Stars: ✭ 527 (-14.03%)
SmileStatistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+782.87%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+765.42%)
DapyEasy-to-use data analysis / manipulation framework for humans
Stars: ✭ 523 (-14.68%)
Disk.frameFast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Stars: ✭ 517 (-15.66%)
GlueLinked Data Visualizations Across Multiple Files
Stars: ✭ 518 (-15.5%)
Vehicle counting tensorflow🚘 "MORE THAN VEHICLE COUNTING!" This project provides prediction for speed, color and size of the vehicles with TensorFlow Object Counting API.
Stars: ✭ 582 (-5.06%)
Data Science PortfolioPortfolio of data science projects completed by me for academic, self learning, and hobby purposes.
Stars: ✭ 559 (-8.81%)
Facebook data analyzerAnalyze facebook copy of your data with ruby language. Download zip file from facebook and get info about friends ranking by message, vocabulary, contacts, friends added statistics and more
Stars: ✭ 515 (-15.99%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (+708.48%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (-9.14%)
HeamyA set of useful tools for competitive data science.
Stars: ✭ 511 (-16.64%)
Spacy Stanza💥 Use the latest Stanza (StanfordNLP) research models directly in spaCy
Stars: ✭ 508 (-17.13%)
Book sampleanother book on data science
Stars: ✭ 611 (-0.33%)
DatasheetsRead data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (-3.26%)
Data Science CompetitionsGoal of this repo is to provide the solutions of all Data Science Competitions(Kaggle, Data Hack, Machine Hack, Driven Data etc...).
Stars: ✭ 572 (-6.69%)
AtmAuto Tune Models - A multi-tenant, multi-data system for automated machine learning (model selection and tuning).
Stars: ✭ 504 (-17.78%)
EdwardA probabilistic programming language in TensorFlow. Deep generative models, variational inference.
Stars: ✭ 4,674 (+662.48%)
SnorkelA system for quickly generating training data with weak supervision
Stars: ✭ 4,953 (+707.99%)
AlluxioAlluxio, data orchestration for analytics and machine learning in the cloud
Stars: ✭ 5,379 (+777.49%)
Awesome RA curated list of awesome R packages, frameworks and software.
Stars: ✭ 4,858 (+692.5%)
Dataframe GoDataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-20.55%)
Solid🎯 A comprehensive gradient-free optimization framework written in Python
Stars: ✭ 546 (-10.93%)
Bigdata💎🔥大数据学习笔记
Stars: ✭ 488 (-20.39%)