GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-95.01%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+624.31%)
Danfojsdanfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (-57.16%)
H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+85.81%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-95.17%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-96.35%)
DataframeC++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (-72.8%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-95.07%)
DatasheetsRead data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (-80.52%)
BoltzmanncleanFill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-99.24%)
Dataframe GoDataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-84%)
PandasvaultAdvanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (-89.62%)
PyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (-78.75%)
FoxcrossAsyncIO serving for data science models
Stars: ✭ 18 (-99.41%)
pyjanitorClean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (-68.13%)
ElandPython Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-92.28%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-95.89%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-97.86%)
PdpipeEasy pipelines for pandas DataFrames.
Stars: ✭ 590 (-80.62%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-97.4%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-56.04%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-91.89%)
HyperspaceAn open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-91.92%)
FeastFeature Store for Machine Learning
Stars: ✭ 2,576 (-15.37%)
PandahousePandas interface for Clickhouse database
Stars: ✭ 126 (-95.86%)
GafferA large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-46.06%)
Aws Data WranglerPandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-21.65%)
Rightmove webscraper.pyPython class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-95.89%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-95.99%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-95.8%)
Dtale DesktopBuild a data visualization dashboard with simple snippets of python code
Stars: ✭ 128 (-95.8%)
PandasschemaA validation library for Pandas data frames using user-friendly schemas
Stars: ✭ 135 (-95.57%)
Pandas VideosJupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (-43.63%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-95.43%)
StumpySTUMPY is a powerful and scalable Python library for modern time series analysis
Stars: ✭ 2,019 (-33.67%)
DeepgraphAnalyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-92.38%)
Machine Learning With PythonPractice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (-27.83%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-94.97%)
Py QuantmodPowerful financial charting library based on R's Quantmod | http://py-quantmod.readthedocs.io/en/latest/
Stars: ✭ 155 (-94.91%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-46.45%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-95.5%)
Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (-39.72%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-94.97%)
Orange3🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+3.55%)
LearnpythonforresearchThis repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (-94.65%)
Pandas DatareaderExtract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (-28.29%)
PantheraData-frames & arrays on Clojure
Stars: ✭ 168 (-94.48%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-94.81%)
GeopysparkGeoTrellis for PySpark
Stars: ✭ 167 (-94.51%)
TablesawJava dataframe and visualization library
Stars: ✭ 2,785 (-8.51%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-92.54%)
MarsMars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (-24.18%)
Andrew Ng NotesThis is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (-94.09%)
DtaleVisualizer for pandas data structures
Stars: ✭ 2,864 (-5.91%)