Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-96.36%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (-13.65%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (-97.61%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-98.21%)
SupersetApache Superset is a Data Visualization and Data Exploration Platform
Stars: ✭ 42,634 (+703.66%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-98.51%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-97.98%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-91.44%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (-88.12%)
Model Describermodel-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-99.59%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-74.78%)
Countly Sdk CordovaCountly Product Analytics SDK for Cordova, Icenium and Phonegap
Stars: ✭ 69 (-98.7%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-97.95%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (-83.9%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-97.87%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-97.12%)
golearn🔥 Golang basics and actual-combat (including: crawler, distributed-systems, data-analysis, redis, etcd, raft, crontab-task)
Stars: ✭ 36 (-99.32%)
hotmapWebGL Heatmap Viewer for Big Data and Bioinformatics
Stars: ✭ 13 (-99.75%)
covid-19COVID-19 World is yet another Project to build a Dashboard like app to showcase the data related to the COVID-19(Corona Virus).
Stars: ✭ 28 (-99.47%)
Service FabricService Fabric is a distributed systems platform for packaging, deploying, and managing stateless and stateful distributed applications and containers at large scale.
Stars: ✭ 2,874 (-45.82%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-99.75%)
XlearnHigh performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Stars: ✭ 2,968 (-44.05%)
Knowledge RepoA next-generation curated knowledge sharing platform for data scientists and other technical professions.
Stars: ✭ 4,956 (-6.58%)
DapyEasy-to-use data analysis / manipulation framework for humans
Stars: ✭ 523 (-90.14%)
SealionThe first machine learning framework that encourages learning ML concepts instead of memorizing class functions.
Stars: ✭ 278 (-94.76%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-94.67%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+54.5%)
twitter-analytics-wrapperA simple Python wrapper to download tweets data from the Twitter Analytics platform. Particularly interesting for the impressions metrics that are unavailable on current Twitter API. Also works for the videos data.
Stars: ✭ 44 (-99.17%)
growthbookOpen Source Feature Flagging and A/B Testing Platform
Stars: ✭ 2,342 (-55.85%)
Data Science HacksData Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-94.85%)
Awesome Datascience📝 An awesome Data Science repository to learn and apply for real world problems.
Stars: ✭ 17,520 (+230.25%)
Pydataroadopen source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (-94.31%)
Knowage ServerKnowage is the professional open source suite for modern business analytics over traditional sources and big data systems.
Stars: ✭ 276 (-94.8%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (-22.73%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (-42.62%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (-26.43%)
UrsUniversal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.
Stars: ✭ 275 (-94.82%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (-38.66%)
Ai Learn人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Stars: ✭ 4,387 (-17.3%)
AkshareAKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (-18.3%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (-93.18%)
Quantitative NotebooksEducational notebooks on quantitative finance, algorithmic trading, financial modelling and investment strategy
Stars: ✭ 356 (-93.29%)
Pandas SummaryAn extension to pandas dataframes describe function.
Stars: ✭ 361 (-93.2%)
DataexplorerAutomate Data Exploration and Treatment
Stars: ✭ 362 (-93.18%)
Scikit Mobilityscikit-mobility: mobility analysis in Python
Stars: ✭ 339 (-93.61%)
ArticlesA repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (-93.4%)
Data ScienceCollection of useful data science topics along with code and articles
Stars: ✭ 315 (-94.06%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (-92.63%)
Stats Maths With PythonGeneral statistics, mathematical programming, and numerical/scientific computing scripts and notebooks in Python
Stars: ✭ 381 (-92.82%)
RumaleRumale is a machine learning library in Ruby
Stars: ✭ 526 (-90.08%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-92.21%)
PrettypandasA Pandas Styler class for making beautiful tables
Stars: ✭ 376 (-92.91%)