MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+573.22%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+125.68%)
AztkAZTK powered by Azure Batch: On-demand, Dockerized, Spark Jobs on Azure
Stars: ✭ 152 (-16.94%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+126.23%)
Spark GbtlrHybrid model of Gradient Boosting Trees and Logistic Regression (GBDT+LR) on Spark
Stars: ✭ 81 (-55.74%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+121.86%)
DigitrecognizerJava Convolutional Neural Network example for Hand Writing Digit Recognition
Stars: ✭ 23 (-87.43%)
Uproot4ROOT I/O in pure Python and NumPy.
Stars: ✭ 80 (-56.28%)
RedashMake Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Stars: ✭ 20,147 (+10909.29%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (-3.83%)
BigdlBuilding Large-Scale AI Applications for Distributed Big Data
Stars: ✭ 3,813 (+1983.61%)
Docker Spark🚢 Docker image for Apache Spark
Stars: ✭ 78 (-57.38%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+103.28%)
Scala SamplesThere are pieces of scala code that explain Scala syntax and related things - like what you can do with all this
Stars: ✭ 125 (-31.69%)
AthenacliAthenaCLI is a CLI tool for AWS Athena service that can do auto-completion and syntax highlighting.
Stars: ✭ 151 (-17.49%)
KyloKylo is a data lake management software platform and framework for enabling scalable enterprise-class data lakes on big data technologies such as Teradata, Apache Spark and/or Hadoop. Kylo is licensed under Apache 2.0. Contributed by Teradata Inc.
Stars: ✭ 916 (+400.55%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+97.81%)
Ds CheatsheetsList of Data Science Cheatsheets to rule the world
Stars: ✭ 9,452 (+5065.03%)
SparkstreamingSpark Streaming+Flume+Kafka+HBase+Hadoop+Zookeeper实现实时日志分析统计;SpringBoot+Echarts实现数据可视化展示
Stars: ✭ 349 (+90.71%)
Spark AlchemyCollection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
Stars: ✭ 122 (-33.33%)
OapOptimized Analytics Package for Spark* Platform
Stars: ✭ 343 (+87.43%)
Javainterview最全的Java技术知识点,以及Java源码分析。为开源贡献自己的一份力。
Stars: ✭ 154 (-15.85%)
10 Weeks10-weeks of technology exploration
Stars: ✭ 22 (-87.98%)
ScalnetA Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (+86.89%)
LinkisLinkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,323 (+1169.4%)
Api.rssRSS as RESTful. This service allows you to transform RSS feed into an awesome API.
Stars: ✭ 340 (+85.79%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+81.42%)
ZparkioBoiler plate framework to use Spark and ZIO together.
Stars: ✭ 121 (-33.88%)
SparklintA tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (+72.68%)
Kamu CliNext generation tool for decentralized exchange and transformation of semi-structured data
Stars: ✭ 69 (-62.3%)
AvroApache Avro is a data serialization system.
Stars: ✭ 2,005 (+995.63%)
Uproot3ROOT I/O in pure Python and NumPy.
Stars: ✭ 312 (+70.49%)
CrayonSimple framework agnostic UI router for SPAs
Stars: ✭ 310 (+69.4%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (+63.39%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+1038.8%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (+63.39%)
KontextfreiWriting application logic for Spark jobs that can be unit-tested without a SparkContext
Stars: ✭ 67 (-63.39%)
Spark Druid OlapSparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.
Stars: ✭ 282 (+54.1%)
Kinesis SqlKinesis Connector for Structured Streaming
Stars: ✭ 120 (-34.43%)
Janusgraph.cn分布式图数据库 JanusGraph 中文社区,关于 JanusGraph 的一切
Stars: ✭ 273 (+49.18%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-64.48%)
ArvadosAn open source platform for managing and analyzing biomedical big data
Stars: ✭ 274 (+49.73%)
DatacompyPandas and Spark DataFrame comparison for humans
Stars: ✭ 147 (-19.67%)
Spark FfmFFM (Field-Awared Factorization Machine) on Spark
Stars: ✭ 101 (-44.81%)
TwitworkMonitor twitter stream
Stars: ✭ 133 (-27.32%)
Sparkling WaterSparkling Water provides H2O functionality inside Spark cluster
Stars: ✭ 887 (+384.7%)