All Projects → Koalas → Similar Projects or Alternatives

1942 Open source projects that are alternatives of or similar to Koalas

Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-95.01%)
Mutual labels:  dataframe, data-science, spark, big-data
Data Science Ipython Notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+624.31%)
Mutual labels:  data-science, spark, pandas, big-data
Danfojs
danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.
Stars: ✭ 1,304 (-57.16%)
Mutual labels:  dataframe, data-science, pandas
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+85.81%)
Mutual labels:  data-science, spark, big-data
Datacompy
Pandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-95.17%)
Mutual labels:  data-science, spark, pandas
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-96.35%)
Mutual labels:  big-data, spark, dataframe
Dataframe
C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved
Stars: ✭ 828 (-72.8%)
Mutual labels:  dataframe, data-science, pandas
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-95.07%)
Mutual labels:  dataframe, spark, big-data
Datasheets
Read data from, write data to, and modify the formatting of Google Sheets
Stars: ✭ 593 (-80.52%)
Mutual labels:  dataframe, data-science, pandas
Boltzmannclean
Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines
Stars: ✭ 23 (-99.24%)
Mutual labels:  dataframe, data-science, pandas
Dataframe Go
DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-84%)
Mutual labels:  dataframe, data-science, pandas
Pandasvault
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).
Stars: ✭ 316 (-89.62%)
Mutual labels:  dataframe, data-science, pandas
Pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 647 (-78.75%)
Mutual labels:  dataframe, pandas, pydata
Foxcross
AsyncIO serving for data science models
Stars: ✭ 18 (-99.41%)
Mutual labels:  dataframe, data-science, pandas
pyjanitor
Clean APIs for data cleaning. Python implementation of R package Janitor
Stars: ✭ 970 (-68.13%)
Mutual labels:  pydata, pandas, dataframe
Eland
Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch
Stars: ✭ 235 (-92.28%)
Mutual labels:  dataframe, pandas, big-data
Cape Python
Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (-95.89%)
Mutual labels:  data-science, spark, pandas
Rsparkling
RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-97.86%)
Mutual labels:  data-science, spark, big-data
Pdpipe
Easy pipelines for pandas DataFrames.
Stars: ✭ 590 (-80.62%)
Mutual labels:  dataframe, data-science, pandas
Setl
A simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-97.4%)
Mutual labels:  data-science, spark, big-data
Spark Py Notebooks
Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (-56.04%)
Mutual labels:  data-science, spark, big-data
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-91.89%)
Mutual labels:  spark, big-data
Data Science For Marketing Analytics
Achieve your marketing goals with the data analytics power of Python
Stars: ✭ 127 (-95.83%)
Mutual labels:  data-science, pandas
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-91.92%)
Mutual labels:  spark, big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (-15.37%)
Mutual labels:  spark, big-data
Pandahouse
Pandas interface for Clickhouse database
Stars: ✭ 126 (-95.86%)
Mutual labels:  dataframe, pandas
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (-46.06%)
Mutual labels:  spark, big-data
Datacamp Python Data Science Track
All the slides, accompanying code and exercises all stored in this repo. 🎈
Stars: ✭ 250 (-91.79%)
Mutual labels:  data-science, pandas
Aws Data Wrangler
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Stars: ✭ 2,385 (-21.65%)
Mutual labels:  data-science, pandas
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
Stars: ✭ 125 (-95.89%)
Mutual labels:  data-science, pandas
Spark Alchemy
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-95.99%)
Mutual labels:  data-science, spark
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-95.8%)
Mutual labels:  data-science, big-data
Dtale Desktop
Build a data visualization dashboard with simple snippets of python code
Stars: ✭ 128 (-95.8%)
Mutual labels:  data-science, pandas
Pandasschema
A validation library for Pandas data frames using user-friendly schemas
Stars: ✭ 135 (-95.57%)
Mutual labels:  data-science, pandas
Pandas Videos
Jupyter notebook and datasets from the pandas Q&A video series
Stars: ✭ 1,716 (-43.63%)
Mutual labels:  data-science, pandas
Python Cheat Sheet
Python Cheat Sheet NumPy, Matplotlib
Stars: ✭ 1,739 (-42.87%)
Mutual labels:  data-science, pandas
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-95.5%)
Mutual labels:  spark, big-data
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-95.43%)
Mutual labels:  spark, big-data
Stumpy
STUMPY is a powerful and scalable Python library for modern time series analysis
Stars: ✭ 2,019 (-33.67%)
Mutual labels:  data-science, pydata
Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (-92.38%)
Mutual labels:  data-science, pandas
Machine Learning With Python
Practice and tutorial-style notebooks covering wide variety of machine learning techniques
Stars: ✭ 2,197 (-27.83%)
Mutual labels:  data-science, pandas
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-94.97%)
Mutual labels:  data-science, big-data
Py Quantmod
Powerful financial charting library based on R's Quantmod | http://py-quantmod.readthedocs.io/en/latest/
Stars: ✭ 155 (-94.91%)
Mutual labels:  data-science, pandas
Ibis
A pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (-46.45%)
Mutual labels:  pandas, spark
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-95.5%)
Mutual labels:  data-science, big-data
Benchm Ml
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Stars: ✭ 1,835 (-39.72%)
Mutual labels:  data-science, spark
Spark.jl
Julia binding for Apache Spark
Stars: ✭ 153 (-94.97%)
Mutual labels:  spark, big-data
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+3.55%)
Mutual labels:  data-science, pandas
Learnpythonforresearch
This repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (-94.65%)
Mutual labels:  data-science, pandas
Scalable Data Science Platform
Content for architecting a data science platform for products using Luigi, Spark & Flask.
Stars: ✭ 158 (-94.81%)
Mutual labels:  data-science, spark
Pandas Datareader
Extract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (-28.29%)
Mutual labels:  pandas, pydata
Panthera
Data-frames & arrays on Clojure
Stars: ✭ 168 (-94.48%)
Mutual labels:  dataframe, pandas
Handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-94.81%)
Mutual labels:  spark, pandas
Geopyspark
GeoTrellis for PySpark
Stars: ✭ 167 (-94.51%)
Mutual labels:  spark, big-data
Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (-8.51%)
Mutual labels:  dataframe, data-science
Mydatascienceportfolio
Applying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (-92.54%)
Mutual labels:  data-science, spark
Mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Stars: ✭ 2,308 (-24.18%)
Mutual labels:  dataframe, pandas
Andrew Ng Notes
This is Andrew NG Coursera Handwritten Notes.
Stars: ✭ 180 (-94.09%)
Mutual labels:  data-science, pandas
Dtale
Visualizer for pandas data structures
Stars: ✭ 2,864 (-5.91%)
Mutual labels:  data-science, pandas
Ditching Excel For Python
Functionalities in Excel translated to Python
Stars: ✭ 172 (-94.35%)
Mutual labels:  dataframe, pandas
1-60 of 1942 similar projects