Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+2036.84%)
Mutual labels: pyspark, parquet
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+4921.05%)
Mutual labels: pandas-dataframe, pyspark
PetastormPetastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet format. It supports ML frameworks such as Tensorflow, Pytorch, and PySpark and can be used from pure Python code.
Stars: ✭ 1,108 (+5731.58%)
Mutual labels: pyspark, parquet
SkaleHigh performance distributed data processing engine
Stars: ✭ 390 (+1952.63%)
Mutual labels: parquet, azure-storage
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+278.95%)
Mutual labels: pyspark, graphframes
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (+78.95%)
Mutual labels: pyspark, spark-sql
data-analysis-using-pythonData Analysis Using Python: A Beginner’s Guide Featuring NYC Open Data
Stars: ✭ 81 (+326.32%)
Mutual labels: pandas-dataframe, matplotlib
CosmicCloneCosmic Clone is a utility that can backup\clone\restore a azure Cosmos database Collection. It can also anonymize cosmos documents and helps hide personally identifiable data.
Stars: ✭ 113 (+494.74%)
Mutual labels: azure-storage, cosmos-db
SparkApache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
Stars: ✭ 55 (+189.47%)
Mutual labels: parquet, spark-sql
Azure-Certification-DP-200Road to Azure Data Engineer Part-I: DP-200 - Implementing an Azure Data Solution
Stars: ✭ 54 (+184.21%)
Mutual labels: azure-storage, azure-databricks
Goofysa high-performance, POSIX-ish Amazon S3 file system written in Go
Stars: ✭ 3,932 (+20594.74%)
Mutual labels: azure-storage, azure-data-lake
Azure-Databricks-NYC-Taxi-WorkshopAn Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Stars: ✭ 71 (+273.68%)
Mutual labels: pyspark, azure-databricks
AzureStorR interface to Azure storage accounts
Stars: ✭ 51 (+168.42%)
Mutual labels: azure-storage, azure-data-lake
PyRepository to store sample python programs for python learning
Stars: ✭ 4,154 (+21763.16%)
Mutual labels: pandas-dataframe, jupyter-notebooks
albisAlbis: High-Performance File Format for Big Data Systems
Stars: ✭ 20 (+5.26%)
Mutual labels: parquet, spark-sql
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (+105.26%)
Mutual labels: pyspark, spark-sql
SimpleSQLiteSimpleSQLite is a Python library to simplify SQLite database operations: table creation, data insertion and get data as other data formats. Simple ORM functionality for SQLite.
Stars: ✭ 116 (+510.53%)
Mutual labels: pandas-dataframe
spark-vcfSpark VCF data source implementation for Dataframes
Stars: ✭ 15 (-21.05%)
Mutual labels: spark-sql