All Projects → Koalas → Similar Projects or Alternatives

1942 Open source projects that are alternatives of or similar to Koalas

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-95.01%)

Mutual labels: dataframe, data-science, spark, big-data

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+624.31%)

Mutual labels: data-science, spark, pandas, big-data

Danfojs

danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

Stars: ✭ 1,304 (-57.16%)

Mutual labels: dataframe, data-science, pandas

H2o 3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Stars: ✭ 5,656 (+85.81%)

Mutual labels: data-science, spark, big-data

Datacompy

Pandas and Spark DataFrame comparison for humans

Stars: ✭ 147 (-95.17%)

Mutual labels: data-science, spark, pandas

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-96.35%)

Mutual labels: big-data, spark, dataframe

Dataframe

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Stars: ✭ 828 (-72.8%)

Mutual labels: dataframe, data-science, pandas

Spark With Python

Fundamentals of Spark with Python (using PySpark), code examples

Stars: ✭ 150 (-95.07%)

Mutual labels: dataframe, spark, big-data

Datasheets

Read data from, write data to, and modify the formatting of Google Sheets

Stars: ✭ 593 (-80.52%)

Mutual labels: dataframe, data-science, pandas

Boltzmannclean

Fill missing values in Pandas DataFrames using Restricted Boltzmann Machines

Stars: ✭ 23 (-99.24%)

Mutual labels: dataframe, data-science, pandas

Dataframe Go

DataFrames for Go: For statistics, machine-learning, and data manipulation/exploration

Stars: ✭ 487 (-84%)

Mutual labels: dataframe, data-science, pandas

Pandasvault

Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

Stars: ✭ 316 (-89.62%)

Mutual labels: dataframe, data-science, pandas

Pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 647 (-78.75%)

Mutual labels: dataframe, pandas, pydata

Foxcross

AsyncIO serving for data science models

Stars: ✭ 18 (-99.41%)

Mutual labels: dataframe, data-science, pandas

pyjanitor

Clean APIs for data cleaning. Python implementation of R package Janitor

Stars: ✭ 970 (-68.13%)

Mutual labels: pydata, pandas, dataframe

Eland

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Stars: ✭ 235 (-92.28%)

Mutual labels: dataframe, pandas, big-data

Cape Python

Collaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark

Stars: ✭ 125 (-95.89%)

Mutual labels: data-science, spark, pandas

Rsparkling

RSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)

Stars: ✭ 65 (-97.86%)

Mutual labels: data-science, spark, big-data

Pdpipe

Easy pipelines for pandas DataFrames.

Stars: ✭ 590 (-80.62%)

Mutual labels: dataframe, data-science, pandas

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-97.4%)

Mutual labels: data-science, spark, big-data

Spark Py Notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

Stars: ✭ 1,338 (-56.04%)

Mutual labels: data-science, spark, big-data

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-91.89%)

Mutual labels: spark, big-data

Data Science For Marketing Analytics

Achieve your marketing goals with the data analytics power of Python

Stars: ✭ 127 (-95.83%)

Mutual labels: data-science, pandas

Hyperspace

An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.

Stars: ✭ 246 (-91.92%)

Mutual labels: spark, big-data

Feast

Feature Store for Machine Learning

Stars: ✭ 2,576 (-15.37%)

Mutual labels: spark, big-data

Pandahouse

Pandas interface for Clickhouse database

Stars: ✭ 126 (-95.86%)

Mutual labels: dataframe, pandas

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (-46.06%)

Mutual labels: spark, big-data

Datacamp Python Data Science Track

All the slides, accompanying code and exercises all stored in this repo. 🎈

Stars: ✭ 250 (-91.79%)

Mutual labels: data-science, pandas

Aws Data Wrangler

Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Stars: ✭ 2,385 (-21.65%)

Mutual labels: data-science, pandas

Rightmove webscraper.py

Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object

Stars: ✭ 125 (-95.89%)

Mutual labels: data-science, pandas

Spark Alchemy

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

Stars: ✭ 122 (-95.99%)

Mutual labels: data-science, spark

Griffon Vm

Griffon Data Science Virtual Machine

Stars: ✭ 128 (-95.8%)

Mutual labels: data-science, big-data

Dtale Desktop

Build a data visualization dashboard with simple snippets of python code

Stars: ✭ 128 (-95.8%)

Mutual labels: data-science, pandas

Pandasschema

A validation library for Pandas data frames using user-friendly schemas

Stars: ✭ 135 (-95.57%)

Mutual labels: data-science, pandas

Pandas Videos

Jupyter notebook and datasets from the pandas Q&A video series

Stars: ✭ 1,716 (-43.63%)

Mutual labels: data-science, pandas

Python Cheat Sheet

Python Cheat Sheet NumPy, Matplotlib

Stars: ✭ 1,739 (-42.87%)

Mutual labels: data-science, pandas

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-95.5%)

Mutual labels: spark, big-data

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-95.43%)

Mutual labels: spark, big-data

Stumpy

STUMPY is a powerful and scalable Python library for modern time series analysis

Stars: ✭ 2,019 (-33.67%)

Mutual labels: data-science, pydata

Deepgraph

Analyze Data with Pandas-based Networks. Documentation:

Stars: ✭ 232 (-92.38%)

Mutual labels: data-science, pandas

Machine Learning With Python

Practice and tutorial-style notebooks covering wide variety of machine learning techniques

Stars: ✭ 2,197 (-27.83%)

Mutual labels: data-science, pandas

Datasciencevm

Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)

Stars: ✭ 153 (-94.97%)

Mutual labels: data-science, big-data

Py Quantmod

Powerful financial charting library based on R's Quantmod | http://py-quantmod.readthedocs.io/en/latest/

Stars: ✭ 155 (-94.91%)

Mutual labels: data-science, pandas

Ibis

A pandas-like deferred expression system, with first-class SQL support

Stars: ✭ 1,630 (-46.45%)

Mutual labels: pandas, spark

Accelerator

The Accelerator is a tool for fast and reproducible processing of large amounts of data.

Stars: ✭ 137 (-95.5%)

Mutual labels: data-science, big-data

Benchm Ml

A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).

Stars: ✭ 1,835 (-39.72%)

Mutual labels: data-science, spark

Spark.jl

Julia binding for Apache Spark

Stars: ✭ 153 (-94.97%)

Mutual labels: spark, big-data

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+3.55%)

Mutual labels: data-science, pandas

Learnpythonforresearch

This repository provides everything you need to get started with Python for (social science) research.