Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (+25.28%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+88.78%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-82.94%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+9.95%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-76.3%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-76.78%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+266.98%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (-68.4%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+297.79%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (-60.66%)
PixiedustPython Helper library for Jupyter Notebooks
Stars: ✭ 998 (+57.66%)
Data Science On GcpSource code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+36.49%)
etl managerA python package to create a database on the platform using our moj data warehousing framework
Stars: ✭ 14 (-97.79%)
Etl with pythonETL with Python - Taught at DWH course 2017 (TAU)
Stars: ✭ 68 (-89.26%)
BenthosFancy stream processing made operationally mundane
Stars: ✭ 3,705 (+485.31%)
Sk DistDistributed scikit-learn meta-estimators in PySpark
Stars: ✭ 260 (-58.93%)
Spark NotebookInteractive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+386.73%)
DataformDataform is a framework for managing SQL based data operations in BigQuery, Snowflake, and Redshift
Stars: ✭ 342 (-45.97%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+138.7%)
Python BigdataData science and Big Data with Python
Stars: ✭ 112 (-82.31%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-78.36%)
PipelinexPipelineX: Python package to build ML pipelines for experimentation with Kedro, MLflow, and more
Stars: ✭ 127 (-79.94%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (-41.23%)
MetorikkuA simplified, lightweight ETL Framework based on Apache Spark
Stars: ✭ 361 (-42.97%)
DatacleanerThe premier open source Data Quality solution
Stars: ✭ 391 (-38.23%)
ElasticR client for the Elasticsearch HTTP API
Stars: ✭ 227 (-64.14%)
Gspread PandasA package to easily open an instance of a Google spreadsheet and interact with worksheets through Pandas DataFrames.
Stars: ✭ 226 (-64.3%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+380.88%)
CqlCategorical Query Language IDE
Stars: ✭ 196 (-69.04%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-90.84%)
etl[READ-ONLY] PHP - ETL (Extract Transform Load) data processing library
Stars: ✭ 279 (-55.92%)
polygon-etlETL (extract, transform and load) tools for ingesting Polygon blockchain data to Google BigQuery and Pub/Sub
Stars: ✭ 53 (-91.63%)
Soda SqlMetric collection, data testing and monitoring for SQL accessible data
Stars: ✭ 173 (-72.67%)
versatile-data-kitVersatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.
Stars: ✭ 144 (-77.25%)
hamiltonA scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.
Stars: ✭ 612 (-3.32%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-97.31%)
python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-96.05%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-95.89%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-94.94%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-96.05%)
beneathBeneath is a serverless real-time data platform ⚡️
Stars: ✭ 65 (-89.73%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-34.76%)
DatavecETL Library for Machine Learning - data pipelines, data munging and wrangling
Stars: ✭ 272 (-57.03%)
DagsterAn orchestration platform for the development, production, and observation of data assets.
Stars: ✭ 4,099 (+547.55%)
AuptimizerAn automatic ML model optimization tool.
Stars: ✭ 166 (-73.78%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+288.47%)
Learn Something Every Day📝 A compilation of everything that I learn; Computer Science, Software Development, Engineering, Math, and Coding in General. Read the rendered results here ->
Stars: ✭ 362 (-42.81%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-35.86%)
HeamyA set of useful tools for competitive data science.
Stars: ✭ 511 (-19.27%)
Pygam[HELP REQUESTED] Generalized Additive Models in Python
Stars: ✭ 569 (-10.11%)
CdapAn open source framework for building data analytic applications.
Stars: ✭ 509 (-19.59%)