Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (-61.07%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (-61.73%)
big dataA collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-96.8%)
DblinkDistributed Bayesian Entity Resolution in Apache Spark
Stars: ✭ 38 (-96.42%)
DataEngineeringThis repo contains commands that data engineers use in day to day work.
Stars: ✭ 47 (-95.57%)
MobiusC# and F# language binding and extensions to Apache Spark
Stars: ✭ 929 (-12.44%)
spark-transformersSpark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stars: ✭ 39 (-96.32%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (-68.71%)
pyspark-ML-in-ColabPyspark in Google Colab: A simple machine learning (Linear Regression) model
Stars: ✭ 32 (-96.98%)
Coolplayspark酷玩 Spark: Spark 源代码解析、Spark 类库等
Stars: ✭ 3,318 (+212.72%)
MistServerless proxy for Spark cluster
Stars: ✭ 309 (-70.88%)
sparkApache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
Stars: ✭ 609 (-42.6%)
MorpheusMorpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.
Stars: ✭ 303 (-71.44%)
oshinko-s2iThis is a place to put s2i images and utilities for spark application builders for openshift
Stars: ✭ 16 (-98.49%)
Goodreads etl pipelineAn end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Stars: ✭ 793 (-25.26%)
SparkTwitterAnalysisAn Apache Spark standalone application using the Spark API in Scala. The application uses Simple Build Tool(SBT) for building the project.
Stars: ✭ 29 (-97.27%)
SparkflowEasy to use library to bring Tensorflow on Apache Spark
Stars: ✭ 282 (-73.42%)
Spark Sklearn(Deprecated) Scikit-learn integration package for Apache Spark
Stars: ✭ 1,055 (-0.57%)
Datahacksummit 2017Apache Zeppelin notebooks for Recommendation Engines using Keras and Machine Learning on Apache Spark
Stars: ✭ 30 (-97.17%)
Dist KerasDistributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
Stars: ✭ 613 (-42.22%)
spark-extensionA library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-97.64%)
gan deeplearning4jAutomatic feature engineering using Generative Adversarial Networks using Deeplearning4j and Apache Spark.
Stars: ✭ 19 (-98.21%)
phrase-at-scaleDetect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Stars: ✭ 115 (-89.16%)
Tdigestt-Digest data structure in Python. Useful for percentiles and quantiles, including distributed enviroments like PySpark
Stars: ✭ 274 (-74.18%)
cloud-integrationSpark cloud integration: tests, cloud committers and more
Stars: ✭ 20 (-98.11%)
Kafka Storm StarterCode examples that show to integrate Apache Kafka 0.8+ with Apache Storm 0.9+ and Apache Spark Streaming 1.1+, while using Apache Avro as the data serialization format.
Stars: ✭ 728 (-31.39%)
spark-recordsBulletproof Apache Spark jobs with fast root cause analysis of failures.
Stars: ✭ 67 (-93.69%)
Spark FlamegraphEasy CPU Profiling for Apache Spark applications
Stars: ✭ 30 (-97.17%)
BigCLAM-ApacheSparkOverlapping community detection in Large-Scale Networks using BigCLAM model build on Apache Spark
Stars: ✭ 40 (-96.23%)
basinBasin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-97.64%)
Pyspark Example ProjectExample project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (-40.34%)
Spark TdaSparkTDA is a package for Apache Spark providing Topological Data Analysis Functionalities.
Stars: ✭ 45 (-95.76%)
anovosAnovos - An Open Source Library for Scalable feature engineering Using Apache-Spark
Stars: ✭ 77 (-92.74%)
FlintrockA command-line tool for launching Apache Spark clusters.
Stars: ✭ 568 (-46.47%)
OSCIOpen Source Contributor Index
Stars: ✭ 107 (-89.92%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (-93.21%)
geosparkbring sf to spark in production
Stars: ✭ 53 (-95%)
spark-streaming-visualizeSimple demonstration of how to build a complex real time machine learning visualization tool.
Stars: ✭ 16 (-98.49%)
SANSA-StackBig Data RDF Processing and Analytics Stack built on Apache Spark and Apache Jena http://sansa-stack.github.io/SANSA-Stack/
Stars: ✭ 130 (-87.75%)
SparkmagicJupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (-10.08%)
incubator-linkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+131.76%)
osm-parquetizerA converter for the OSM PBFs to Parquet files
Stars: ✭ 71 (-93.31%)
sparklygraphsOld repo for R interface for GraphFrames
Stars: ✭ 13 (-98.77%)
OpenscoringREST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
Stars: ✭ 536 (-49.48%)