W2vWord2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (+100%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+1168.75%)
Optimus🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+2981.25%)
Pysparkgeoanalysis🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (+96.88%)
Spark PracticeApache Spark (PySpark) Practice on Real Data
Stars: ✭ 200 (+525%)
autThe Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+246.88%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (+2175%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+7159.38%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+4081.25%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+575%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+7584.38%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-21.87%)
ODSC India 2018My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-18.75%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+1878.13%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+2881.25%)
confluent-spark-avroSpark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-43.75%)
Pyspark Cheatsheet🐍 Quick reference guide to common patterns & functions in PySpark.
Stars: ✭ 108 (+237.5%)
HandysparkHandySpark - bringing pandas-like capabilities to Spark dataframes
Stars: ✭ 158 (+393.75%)
Cc PysparkProcess Common Crawl data with Python and Spark
Stars: ✭ 147 (+359.38%)
Spark With PythonFundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+368.75%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+8959.38%)
Mongo KafkaMongoDB Kafka Connector
Stars: ✭ 166 (+418.75%)
Kafka Connect Mongodb**Unofficial / Community** Kafka Connect MongoDB Sink Connector - Find the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Stars: ✭ 137 (+328.13%)
IcebergIceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+1128.13%)
kafka-scala-examplesExamples of Avro, Kafka, Schema Registry, Kafka Streams, Interactive Queries, KSQL, Kafka Connect in Scala
Stars: ✭ 53 (+65.63%)
ScriptisScriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+2075%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-21.87%)
Live log analyzer sparkSpark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-56.25%)
Sparkling TitanicTraining models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-62.5%)
Rumble⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (+81.25%)
AbrisAvro SerDe for Apache Spark structured APIs.
Stars: ✭ 130 (+306.25%)
HnswlibJava library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Stars: ✭ 108 (+237.5%)
SchemerSchema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (+203.13%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+7768.75%)
Kafka Connect TwitterKafka Connect connector to stream data in real time from Twitter.
Stars: ✭ 94 (+193.75%)
avroraA convenient Elixir library to work with Avro schemas and Confluent® Schema Registry
Stars: ✭ 59 (+84.38%)
shut-up-bird🐦 Put your tweets/likes in an EPUB and delete them like a boss
Stars: ✭ 22 (-31.25%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+196.88%)
spark-druid-olapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 286 (+793.75%)
spark-acidACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (+184.38%)
cakephp-social-shareCakePHP link generator for sharing content on social networks
Stars: ✭ 30 (-6.25%)
InsulatorA client UI to inspect Kafka topics, consume, produce and much more
Stars: ✭ 53 (+65.63%)
crabberA Twitter clone written in Python + Flask with extended features and a focus on inclusivity.
Stars: ✭ 42 (+31.25%)
bird-elephantPHP client library for Twitter API v2 endpoints.
Stars: ✭ 28 (-12.5%)
sentry-sparkApache Spark Sentry Integration
Stars: ✭ 14 (-56.25%)