Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+180.95%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-14.97%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+1970.75%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-26.53%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (+53.74%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (+301.36%)
Data Science HacksData Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+85.71%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+14898.64%)
DatasheetsRead data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (+303.4%)
PycmMulti-class confusion matrix library in Python
Stars: ✭ 1,076 (+631.97%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-60.54%)
MachinelearningcourseA collection of notebooks of my Machine Learning class written in python 3
Stars: ✭ 35 (-76.19%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+570.75%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (-62.59%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+578.91%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-56.46%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-55.78%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-46.26%)
OpenrefineOpenRefine is a free, open source power tool for working with messy data and improving it
Stars: ✭ 8,531 (+5703.4%)
GraphiaA visualisation tool for the creation and analysis of graphs
Stars: ✭ 67 (-54.42%)
Gopup数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Stars: ✭ 1,229 (+736.05%)
Pymc Example ProjectExample PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Stars: ✭ 90 (-38.78%)
Sigmoidal aiTutoriais de Python, Data Science, Machine Learning e Deep Learning - Sigmoidal
Stars: ✭ 103 (-29.93%)
Awesome BigdataA curated list of awesome big data frameworks, ressources and other awesomeness.
Stars: ✭ 10,478 (+7027.89%)
Data Forge JsJavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (-5.44%)
Mlcourse.aiOpen Machine Learning Course
Stars: ✭ 7,963 (+5317.01%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+557.82%)
Data PolygamyData Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
Stars: ✭ 39 (-73.47%)
Python for mlbrief introduction to Python for machine learning
Stars: ✭ 29 (-80.27%)
SkootA package for data science practitioners. This library implements a number of helpful, common data transformations with a scikit-learn friendly interface in an effort to expedite the modeling process.
Stars: ✭ 50 (-65.99%)
Ds and ml projectsData Science & Machine Learning projects and tutorials in python from beginner to advanced level.
Stars: ✭ 56 (-61.9%)
Crime AnalysisAssociation Rule Mining from Spatial Data for Crime Analysis
Stars: ✭ 20 (-86.39%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-5.44%)
SeabornStatistical data visualization in Python
Stars: ✭ 9,007 (+6027.21%)
DatacomparerdataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
Stars: ✭ 58 (-60.54%)
MagicboxA platform that uses real-time data to inform life-saving humanitarian responses to emergency situations
Stars: ✭ 73 (-50.34%)
Locopylocopy: Loading/Unloading to Redshift and Snowflake using Python.
Stars: ✭ 73 (-50.34%)
FlyteAccelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Stars: ✭ 1,242 (+744.9%)
Pandas ProfilingCreate HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+5565.99%)
CodesearchnetDatasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (+837.41%)
SspipeSimple Smart Pipe: python productivity-tool for rapid data manipulation
Stars: ✭ 96 (-34.69%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+810.2%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-23.81%)
SweetvizVisualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1159.18%)
Seaborn TutorialThis repository is my attempt to help Data Science aspirants gain necessary Data Visualization skills required to progress in their career. It includes all the types of plot offered by Seaborn, applied on random datasets.
Stars: ✭ 114 (-22.45%)
Dat8General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+931.29%)
Ml PyxisTool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.
Stars: ✭ 93 (-36.73%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+927.89%)
D6t PythonAccelerate data science
Stars: ✭ 118 (-19.73%)
Dbg PdsDeutsche Boerse's Financial Trading Public Data Set
Stars: ✭ 124 (-15.65%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-17.01%)
Rightmove webscraper.pyPython class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-14.97%)
Pandas VideosJupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (+1067.35%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (+1522.45%)