machine-learning-data-pipelinePipeline module for parallel real-time data processing for machine learning models development and production purposes.
Stars: ✭ 22 (-59.26%)
SpartaReal Time Analytics and Data Pipelines based on Spark Streaming
Stars: ✭ 513 (+850%)
PanderaA light-weight, flexible, and expressive pandas data validation library
Stars: ✭ 506 (+837.04%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+70.37%)
Cape PythonCollaborate on privacy-preserving policy for data science projects in Pandas and Apache Spark
Stars: ✭ 125 (+131.48%)
foofahFoofah: programming-by-example data transformation program synthesizer
Stars: ✭ 24 (-55.56%)
CboardAn easy to use, self-service open BI reporting and BI dashboard platform.
Stars: ✭ 2,795 (+5075.93%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+40729.63%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+1725.93%)
ExpandDevExpress XAF extension framework. 𝗹𝗶𝗻𝗸𝗲𝗱𝗶𝗻.𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸.𝗰𝗼𝗺, 𝘆𝗼𝘂𝘁𝘂𝗯𝗲.𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸.𝗰𝗼𝗺 and 𝘁𝘄𝗶𝘁𝘁𝗲𝗿 @𝗲𝘅𝗽𝗮𝗻𝗱𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 and or simply 𝗦𝘁𝗮𝗿/𝘄𝗮𝘁𝗰𝗵 this repository and get notified from 𝗚𝗶𝘁𝗛𝘂𝗯
Stars: ✭ 158 (+192.59%)
Data-Wrangling-with-PythonSimplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+66.67%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (+1033.33%)
Data-Science-101Notes and tutorials on how to use python, pandas, seaborn, numpy, matplotlib, scipy for data science.
Stars: ✭ 19 (-64.81%)
MiningBusiness Intelligence (BI) in Python, OLAP
Stars: ✭ 1,128 (+1988.89%)
OLAP-cubeis an hypercube of data
Stars: ✭ 23 (-57.41%)
ZatZeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (+461.11%)
GuitarA Simple and Efficient Distributed Multidimensional BI Analysis Engine.
Stars: ✭ 86 (+59.26%)
Data Forge TsThe JavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 967 (+1690.74%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (+33.33%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+2112.96%)
whyqddata wrangling simplicity, complete audit transparency, and at speed
Stars: ✭ 16 (-70.37%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+2401.85%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-68.52%)
Market-Mix-ModelingMarket Mix Modelling for an eCommerce firm to estimate the impact of various marketing levers on sales
Stars: ✭ 31 (-42.59%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+429.63%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+3759.26%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+192.59%)
Machine Learning Workflow With PythonThis is a comprehensive ML techniques with python: Define the Problem- Specify Inputs & Outputs- Data Collection- Exploratory data analysis -Data Preprocessing- Model Design- Training- Evaluation
Stars: ✭ 157 (+190.74%)
xploreA python package built for data scientist/analysts, AI/ML engineers for exploring features of a dataset in minimal number of lines of code for quick analysis before data wrangling and feature extraction.
Stars: ✭ 21 (-61.11%)
veridical-flowMaking it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (-48.15%)
Retentioneering ToolsRetentioneering: product analytics, data-driven customer journey map optimization, marketing analytics, web analytics, transaction analytics, graph visualization, and behavioral segmentation with customer segments in Python. Opensource analytics, predictive analytics over clickstream, sentiment analysis, AB tests, machine learning, and Monte Carlo Markov Chain simulations, extending Pandas, Networkx and sklearn.
Stars: ✭ 291 (+438.89%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+5537.04%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (+422.22%)
IbisA pandas-like deferred expression system, with first-class SQL support
Stars: ✭ 1,630 (+2918.52%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+37209.26%)
Data Forge JsJavaScript data transformation and analysis toolkit inspired by Pandas and LINQ.
Stars: ✭ 139 (+157.41%)
visionsType System for Data Analysis in Python
Stars: ✭ 136 (+151.85%)
Pulsar SparkWhen Apache Pulsar meets Apache Spark
Stars: ✭ 55 (+1.85%)
SumStatsRehabGWAS summary statistics files QC tool
Stars: ✭ 19 (-64.81%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (+172.22%)
pandas-workshopAn introductory workshop on pandas with notebooks and exercises for following along.
Stars: ✭ 161 (+198.15%)
blogblog entries
Stars: ✭ 39 (-27.78%)
mimirData-ish exploration through SQL+Uncertainty
Stars: ✭ 26 (-51.85%)
my curd超轻量 快速开发脚手架、流程平台。
Stars: ✭ 38 (-29.63%)
SparkV🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (-55.56%)
ibisIBIS is a workflow creation-engine that abstracts the Hadoop internals of ingesting RDBMS data.
Stars: ✭ 48 (-11.11%)
zen-do-rUm livro sobre programação para não-programadores.
Stars: ✭ 24 (-55.56%)
baleen3Baleen 3 is a data processing tool based on the Annot8 framework
Stars: ✭ 15 (-72.22%)
action-sync-node-metaGitHub Action that syncs package.json with the repository metadata.
Stars: ✭ 25 (-53.7%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-74.07%)
tukioTukio is an event based workflow generator library
Stars: ✭ 27 (-50%)
Covid19TrackerA Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (+20.37%)
TesseractA set of libraries for rapidly developing Pipeline driven micro/macroservices.
Stars: ✭ 20 (-62.96%)