python mozetlETL jobs for Firefox Telemetry
Stars: ✭ 25 (-64.79%)
isarn-sketches-sparkRoutines and data structures for using isarn-sketches idiomatically in Apache Spark
Stars: ✭ 28 (-60.56%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (+61.97%)
aml-registermodelGitHub Action that allows you to register models to your Azure Machine Learning Workspace.
Stars: ✭ 14 (-80.28%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-54.93%)
workshop-sparkCódigo para workshops Spark com ambiente de desenvolvimento em docker
Stars: ✭ 27 (-61.97%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+204.23%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (+8.45%)
aml-deployGitHub Action that allows you to deploy machine learning models in Azure Machine Learning.
Stars: ✭ 37 (-47.89%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (+50.7%)
kuwalaKuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data sc…
Stars: ✭ 474 (+567.61%)
aml-keras-image-recognitionA sample Azure Machine Learning project for Transfer Learning-based custom image recognition by utilizing Keras.
Stars: ✭ 14 (-80.28%)
jgit-spark-connectorjgit-spark-connector is a library for running scalable data retrieval pipelines that process any number of Git repositories for source code analysis.
Stars: ✭ 71 (+0%)
az-ml-batch-scoreDeploying a Batch Scoring Pipeline for Python Models
Stars: ✭ 17 (-76.06%)
spark3DSpark extension for processing large-scale 3D data sets: Astrophysics, High Energy Physics, Meteorology, …
Stars: ✭ 23 (-67.61%)
cejaPySpark phonetic and string matching algorithms
Stars: ✭ 24 (-66.2%)
Morphl Community EditionMorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Stars: ✭ 253 (+256.34%)
jobAnalytics and searchJobAnalytics system consumes data from multiple sources and provides valuable information to both job hunters and recruiters.
Stars: ✭ 25 (-64.79%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+181.69%)
datalake-etl-pipelineSimplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Stars: ✭ 39 (-45.07%)
SynapseMLSimple and Distributed Machine Learning
Stars: ✭ 3,355 (+4625.35%)
SeeingAI-Currency-DetectionThis repository contains the code for the blogpost: How to Develop a Currency Detection Model using Azure Machine Learning
Stars: ✭ 39 (-45.07%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-33.8%)
jupyterlab-sparkmonitorJupyterLab extension that enables monitoring launched Apache Spark jobs from within a notebook
Stars: ✭ 78 (+9.86%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+1.41%)
dlsaDistributed least squares approximation (dlsa) implemented with Apache Spark
Stars: ✭ 25 (-64.79%)
soda-sparkSoda Spark is a PySpark library that helps you with testing your data in Spark Dataframes
Stars: ✭ 58 (-18.31%)
aml-workspaceGitHub Action that allows you to create or connect to your Azure Machine Learning Workspace.
Stars: ✭ 22 (-69.01%)
AI-on-Microsoft-AzureMicrosoft buduje i tworzy Polską Dolinę Cyfrową. W ramach tej inicjatywy podjęliśmy się wyzwania zbudowania chmurowych kompetencji wśród 150tys osób w Polsce. Jednym z elementów tej inicjatywy jest dedykowany kurs na studiach inzynierskich i magisterskich na Politechnice Warszawskiej poświęcony chmurze obliczeniowej oraz sztucznej inteligencji.
Stars: ✭ 11 (-84.51%)
pyspark-cheatsheetPySpark Cheat Sheet - example code to help you learn PySpark and develop apps faster
Stars: ✭ 115 (+61.97%)
pyspark-cassandrapyspark-cassandra is a Python port of the awesome @datastax Spark Cassandra connector. Compatible w/ Spark 2.0, 2.1, 2.2, 2.3 and 2.4
Stars: ✭ 70 (-1.41%)
galleryBentoML Example Projects 🎨
Stars: ✭ 120 (+69.01%)
optimus🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Stars: ✭ 1,351 (+1802.82%)
sparklanesA lightweight data processing framework for Apache Spark
Stars: ✭ 17 (-76.06%)
aml-computeGitHub Action that allows you to attach, create and scale Azure Machine Learning compute resources.
Stars: ✭ 19 (-73.24%)
Quinnpyspark methods to enhance developer productivity 📣 👯 🎉
Stars: ✭ 217 (+205.63%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+3983.1%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+3446.48%)
SparkoraPowerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟
Stars: ✭ 51 (-28.17%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-63.38%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-52.11%)
lineageGenerate beautiful documentation for your data pipelines in markdown format
Stars: ✭ 16 (-77.46%)
check-engineData validation library for PySpark 3.0.0
Stars: ✭ 29 (-59.15%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-77.46%)