LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+2050.93%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-76.85%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2176.85%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-75.93%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-76.85%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+38.89%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+812.96%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+36.11%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+2231.48%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+2584.26%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+486.11%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+0%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+1138.89%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+783.33%)
Repository个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (-14.81%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+100%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+2.78%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-88.89%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-87.04%)
W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-40.74%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+46.3%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+544.44%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+85.19%)
kafka-compose🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-70.37%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+275.93%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-41.67%)
EarcutThe fastest and smallest JavaScript polygon triangulation library for your WebGL apps
Stars: ✭ 1,359 (+1158.33%)
AdvisorOpen-source implementation of Google Vizier for hyper parameters tuning
Stars: ✭ 1,359 (+1158.33%)
Fracturegenerative algorithm
Stars: ✭ 99 (-8.33%)
Go AlgorithmsAlgorithms and data structures for golang
Stars: ✭ 1,529 (+1315.74%)
QuadsortQuadsort is a stable adaptive merge sort which is faster than quicksort.
Stars: ✭ 1,385 (+1182.41%)
AlgorithmsAlgorithms and data structures implemented in JavaScript with explanations, for further readings
Stars: ✭ 99 (-8.33%)
AlmondA Scala kernel for Jupyter
Stars: ✭ 1,354 (+1153.7%)
Fast methodsN-Dimensional Fast Methods: Fast Marching, Fast Sweeping, Group Marching, Fast Iterative, etc.
Stars: ✭ 102 (-5.56%)
Pyspark StubsApache (Py)Spark type annotations (stub files).
Stars: ✭ 98 (-9.26%)
Deep Reinforcement Learning With PytorchPyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and ....
Stars: ✭ 1,345 (+1145.37%)
Any Angle PathfindingA collection of algorithms used for any-angle pathfinding with visualisations.
Stars: ✭ 107 (-0.93%)
Codelibrary💎Collection of algorithms and data structures
Stars: ✭ 1,585 (+1367.59%)
MystlC++11 实现的简易版 STL
Stars: ✭ 97 (-10.19%)
LogislandScalable stream processing platform for advanced realtime analytics on top of Kafka and Spark. LogIsland also supports MQTT and Kafka Streams (Flink being in the roadmap). The platform does complex event processing and is suitable for time series analysis. A large set of valuable ready to use processors, data sources and sinks are available.
Stars: ✭ 97 (-10.19%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-10.19%)
DelaunatorAn incredibly fast JavaScript library for Delaunay triangulation of 2D points
Stars: ✭ 1,641 (+1419.44%)
BoswatchPython Script to process input data from rtl_fm and multimon-NG - multiple Plugin support
Stars: ✭ 101 (-6.48%)
ScalacasterPurely Functional Algorithms and Data Structures in Scala
Stars: ✭ 1,342 (+1142.59%)