optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+1025.83%)
datatileA library for managing, validating, summarizing, and visualizing data.
Stars: ✭ 419 (+249.17%)
allie🤖 A machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers).
Stars: ✭ 93 (-22.5%)
foofahFoofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (-80%)
reskitA library for creating and curating reproducible pipelines for scientific and industrial machine learning
Stars: ✭ 27 (-77.5%)
covid-19-data-cleanupScripts to cleanup data from https://github.com/CSSEGISandData/COVID-19
Stars: ✭ 25 (-79.17%)
xarray-beamDistributed Xarray with Apache Beam
Stars: ✭ 83 (-30.83%)
geodaDataData package for accessing GeoDa datasets using R
Stars: ✭ 15 (-87.5%)
big-data-exploration[Archive] Intern project - Big Data Exploration using MongoDB - This Repository is NOT a supported MongoDB product
Stars: ✭ 43 (-64.17%)
fedora-primeSimple program to switch between intel and nvidia gpu
Stars: ✭ 24 (-80%)
dagpiDagpi is a powerful and fast api that does image manipulation as well as serves datasets. It is fast and written in rust and python. Perfect for discord bots, social media apps, camera apps and more.
Stars: ✭ 25 (-79.17%)
DiscEvalDiscourse Based Evaluation of Language Understanding
Stars: ✭ 18 (-85%)
humanflow2Official repository of Learning Multi-Human Optical Flow (IJCV 2019)
Stars: ✭ 37 (-69.17%)
kaggledatasetsCollection of Kaggle Datasets ready to use for Everyone (Looking for contributors)
Stars: ✭ 44 (-63.33%)
qhub🪴 Nebari - your open source data science platform
Stars: ✭ 175 (+45.83%)
Cleaner.jlA toolbox of simple solutions for common data cleaning problems.
Stars: ✭ 21 (-82.5%)
industrial-ml-datasetsA curated list of datasets, publically available for machine learning research in the area of manufacturing
Stars: ✭ 45 (-62.5%)
FIFA-2019-AnalysisThis is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (-76.67%)
morghulisNo description or website provided.
Stars: ✭ 18 (-85%)
coiled-resourcesNotebooks that support blog posts and tech talks on Dask / Coiled.
Stars: ✭ 33 (-72.5%)
rs datasetsTool for autodownloading recommendation systems datasets
Stars: ✭ 22 (-81.67%)
biomechanics datasetInformation of public available data sets for biomechanics.
Stars: ✭ 31 (-74.17%)
awesome-dynamic-graphsA collection of resources on dynamic/streaming/temporal/evolving graph processing systems, databases, data structures, datasets, and related academic and industrial work
Stars: ✭ 89 (-25.83%)
dask-sqlDistributed SQL Engine in Python using Dask
Stars: ✭ 271 (+125.83%)
torchgeoTorchGeo: datasets, samplers, transforms, and pre-trained models for geospatial data
Stars: ✭ 1,125 (+837.5%)
dh-coreFunctional data science
Stars: ✭ 123 (+2.5%)
errorlocateFind and replace erroneous fields in data using validation rules
Stars: ✭ 19 (-84.17%)
metadatMeta-analytic datasets for R
Stars: ✭ 21 (-82.5%)
data-profilinga set of scripts to pull meta data and data profiling metrics from relational database systems
Stars: ✭ 57 (-52.5%)
exemplary-ml-pipelineExemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-80.83%)
delitos-caba🚓 Crime dataset for the City of Buenos Aires, Argentina
Stars: ✭ 44 (-63.33%)
cifairA duplicate-free variant of the CIFAR test set.
Stars: ✭ 13 (-89.17%)
prefect-saturnPython client for using Prefect Cloud with Saturn Cloud
Stars: ✭ 15 (-87.5%)
git-rdmA research data management plugin for the Git version control system.
Stars: ✭ 34 (-71.67%)
machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-81.67%)
scrapeOPA python package for scraping oddsportal.com
Stars: ✭ 99 (-17.5%)
thermostatCollection of NLP model explanations and accompanying analysis tools
Stars: ✭ 126 (+5%)
auctusDataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Stars: ✭ 34 (-71.67%)
Spatio-Temporal-papersThis project is a collection of recent research in areas such as new infrastructure and urban computing, including white papers, academic papers, AI lab and dataset etc.
Stars: ✭ 180 (+50%)
daskperimentReproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
Stars: ✭ 25 (-79.17%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-76.67%)
awesome-mobile-roboticsUseful links of different content related to AI, Computer Vision, and Robotics.
Stars: ✭ 243 (+102.5%)
bugrepoA collection of publicly available bug reports
Stars: ✭ 93 (-22.5%)
enmSdmFaster, better, smarter ecological niche modeling and species distribution modeling
Stars: ✭ 39 (-67.5%)
mlxMachine Learning eXchange (MLX). Data and AI Assets Catalog and Execution Engine
Stars: ✭ 132 (+10%)
NLP PEMDCNLP Predtrained Embeddings, Models and Datasets Collections(NLP_PEMDC). The collection will keep updating.
Stars: ✭ 58 (-51.67%)