splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+235.19%)
EngeznyEngezny is a python package that quickly generates all possible charts from your dataframe and saves them for you, and engezny is only supporting now uni-parameter visualization using the pie, bar and barh visualizations.
Stars: ✭ 25 (-53.7%)
bigdata-funA complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-74.07%)
visualize-data-with-pythonA Jupyter notebook using some standard techniques for data science and data engineering to analyze data for the 2017 flooding in Houston, TX.
Stars: ✭ 60 (+11.11%)
web-dashboard-demoThe following application contains the DevExpress Dashboard Component for Angular. The client side is hosted on the GitHub Pages and gets data from the server side that hosts on DevExpress.com.
Stars: ✭ 65 (+20.37%)
telleryTellery lets you build metrics using SQL and bring them to your team. As easy as using a document. As powerful as a data modeling tool.
Stars: ✭ 219 (+305.56%)
pre-commit-dbt🎣 List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
Stars: ✭ 149 (+175.93%)
openverse-catalogIdentifies and collects data on cc-licensed content across web crawl data and public apis.
Stars: ✭ 27 (-50%)
machine-learning-capstone-projectThis is the final project for the Udacity Machine Learning Nanodegree: Predicting article retweets and likes based on the title using Machine Learning
Stars: ✭ 28 (-48.15%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-74.07%)
dominance-analysisThis package can be used for dominance analysis or Shapley Value Regression for finding relative importance of predictors on given dataset. This library can be used for key driver analysis or marginal resource allocation models.
Stars: ✭ 111 (+105.56%)
alfred-packagistAlfred workflow to search for PHP packages with Packagist
Stars: ✭ 21 (-61.11%)
tukioTukio is an event based workflow generator library
Stars: ✭ 27 (-50%)
muneSimple stock price analytics
Stars: ✭ 14 (-74.07%)
carryPython ETL(Extract-Transform-Load) tool / Data migration tool
Stars: ✭ 115 (+112.96%)
pandas-stubsPandas type stubs. Helps you type-check your code.
Stars: ✭ 84 (+55.56%)
ECG analysisNo description or website provided.
Stars: ✭ 32 (-40.74%)
pyparEfficient and scalable parallelism using the message passing interface (MPI) to handle big data and highly computational problems.
Stars: ✭ 66 (+22.22%)
kobe-every-shot-everA Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (+22.22%)
support-tickets-classificationThis case study shows how to create a model for text analysis and classification and deploy it as a web service in Azure cloud in order to automatically classify support tickets. This project is a proof of concept made by Microsoft (Commercial Software Engineering team) in collaboration with Endava http://endava.com/en
Stars: ✭ 142 (+162.96%)
es pandasRead, write and update large scale pandas DataFrame with Elasticsearch
Stars: ✭ 34 (-37.04%)
flockFlock: A Low-Cost Streaming Query Engine on FaaS Platforms
Stars: ✭ 232 (+329.63%)
open-data-anonimizerPython Data Anonymization & Masking Library For Data Science Tasks
Stars: ✭ 36 (-33.33%)
dashinatorDashinator the daringly delightful dashboard. A replacement for dashing
Stars: ✭ 56 (+3.7%)
xstate-vizVisualizer for XState machines
Stars: ✭ 274 (+407.41%)
CaseManagementCMMN engine implementation in dotnet core
Stars: ✭ 16 (-70.37%)
anestheticNested Sampling post-processing and plotting
Stars: ✭ 34 (-37.04%)
parallel-corpora-toolsTools for filtering and cleaning parallel and monolingual corpora for machine translation and other natural language processing tasks.
Stars: ✭ 35 (-35.19%)
Dominando-PandasEste repositório está destinado ao processo de aprendizagem da biblioteca Pandas.
Stars: ✭ 22 (-59.26%)
ferFacial Expression Recognition
Stars: ✭ 32 (-40.74%)
gan tensorflowAutomatic feature engineering using Generative Adversarial Networks using TensorFlow.
Stars: ✭ 48 (-11.11%)
rec-coreData pipelining service
Stars: ✭ 19 (-64.81%)
Python-MatematicaExplorando aspectos fundamentais da matemática com Python e Jupyter
Stars: ✭ 41 (-24.07%)
ChatisticsA WhatsApp Chat analyzer and statistics.
Stars: ✭ 32 (-40.74%)
tsiocAOP, Ioc container, Boot framework, unit testing framework , activities workflow framework.
Stars: ✭ 15 (-72.22%)
skutilNOTE: skutil is now deprecated. See its sister project: https://github.com/tgsmith61591/skoot. Original description: A set of scikit-learn and h2o extension classes (as well as caret classes for python). See more here: https://tgsmith61591.github.io/skutil
Stars: ✭ 29 (-46.3%)
Arch-Data-ScienceArchlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision
Stars: ✭ 92 (+70.37%)
automile-phpAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 28 (-48.15%)
featurewizUse advanced feature engineering strategies and select best features from your data set with a single line of code.
Stars: ✭ 229 (+324.07%)
datartDatart is a next generation Data Visualization Open Platform
Stars: ✭ 1,042 (+1829.63%)
mindwareAn efficient open-source AutoML system for automating machine learning lifecycle, including feature engineering, neural architecture search, and hyper-parameter tuning.
Stars: ✭ 34 (-37.04%)
Data-Analyst-NanodegreeThis repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.
Stars: ✭ 13 (-75.93%)
pulserlApache Pulsar client library for Erlang/Elixir
Stars: ✭ 15 (-72.22%)
exemplary-ml-pipelineExemplary, annotated machine learning pipeline for any tabular data problem.
Stars: ✭ 23 (-57.41%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+68.52%)
automile-netAutomile offers a simple, smart, cutting-edge telematics solution for businesses to track and manage their business vehicles.
Stars: ✭ 24 (-55.56%)
pantabRead/Write pandas DataFrames with Tableau Hyper Extracts
Stars: ✭ 64 (+18.52%)
iSkyLIMSis an open-source LIMS (laboratory Information Management System) for Next Generation Sequencing sample management, statistics and reports, and bioinformatics analysis service management.
Stars: ✭ 33 (-38.89%)
five-minute-midasPredicting Profitable Day Trading Positions using Decision Tree Classifiers. scikit-learn | Flask | SQLite3 | pandas | MLflow | Heroku | Streamlit
Stars: ✭ 41 (-24.07%)
stargateAn Apache Pulsar client written in Elixir
Stars: ✭ 33 (-38.89%)
release-notify-actionGitHub Action that triggers e-mails with release notes when these are created
Stars: ✭ 64 (+18.52%)
jekyll-deploy-action🪂 A Github Action to deploy the Jekyll site conveniently for GitHub Pages.
Stars: ✭ 162 (+200%)
ydata-qualityData Quality assessment with one line of code
Stars: ✭ 311 (+475.93%)
klar-EDAA python library for automated exploratory data analysis
Stars: ✭ 15 (-72.22%)