Repo 2019BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Stars: ✭ 133 (-19.88%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-16.27%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+1381.33%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-84.94%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (-4.82%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+493.98%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+1646.39%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+30.12%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-92.77%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-61.45%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-9.64%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-84.34%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+144.58%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+706.02%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+1416.87%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-80.72%)
OpenubaA robust, and flexible open source User & Entity Behavior Analytics (UEBA) framework used for Security Analytics. Developed with luv by Data Scientists & Security Analysts from the Cyber Security Industry. [PRE-ALPHA]
Stars: ✭ 127 (-23.49%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+281.33%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+319.28%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-62.05%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-91.57%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (-34.94%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (-11.45%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (-34.94%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+20.48%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-84.94%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+474.7%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-33.13%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+1299.4%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-11.45%)
Technology Talk汇总java生态圈常用技术框架、开源中间件,系统架构、数据库、大公司架构案例、常用三方类库、项目管理、线上问题排查、个人成长、思考等知识
Stars: ✭ 12,136 (+7210.84%)
Vue Info CardSimple and beautiful card component with an elegant spark line, for VueJS.
Stars: ✭ 159 (-4.22%)
SparkmonitorMonitor Apache Spark from Jupyter Notebook
Stars: ✭ 154 (-7.23%)
Nd4jFast, Scientific and Numerical Computing for the JVM (NDArrays)
Stars: ✭ 1,742 (+949.4%)
StumpySTUMPY is a powerful and scalable Python library for modern time series analysis
Stars: ✭ 2,019 (+1116.27%)
QuillCompile-time Language Integrated Queries for Scala
Stars: ✭ 1,998 (+1103.61%)
Spark AuthorizerA Spark SQL extension which provides SQL Standard Authorization for Apache Spark
Stars: ✭ 141 (-15.06%)
RasterframesGeospatial Raster support for Spark DataFrames
Stars: ✭ 142 (-14.46%)
Big WhaleSpark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (-1.81%)
Deep SvddRepository for the Deep One-Class Classification ICML 2018 paper
Stars: ✭ 159 (-4.22%)
Spark.jlJulia binding for Apache Spark
Stars: ✭ 153 (-7.83%)
MatrixprofileA Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Stars: ✭ 141 (-15.06%)
PyoddsAn End-to-end Outlier Detection System
Stars: ✭ 141 (-15.06%)
Kitnet PyKitNET is a lightweight online anomaly detection algorithm, which uses an ensemble of autoencoders.
Stars: ✭ 152 (-8.43%)
Data science blogsA repository to keep track of all the code that I end up writing for my blog posts.
Stars: ✭ 139 (-16.27%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-15.66%)
RemixautomlR package for automation of machine learning, forecasting, feature engineering, model evaluation, model interpretation, data generation, and recommenders.
Stars: ✭ 159 (-4.22%)
Deep Sad PytorchA PyTorch implementation of Deep SAD, a deep Semi-supervised Anomaly Detection method.
Stars: ✭ 152 (-8.43%)
Sparkling GraphSparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-16.27%)