Ammonite SparkRun spark calculations from Ammonite
Stars: ✭ 88 (-64.66%)
Dji Firmware ToolsTools for handling firmwares of DJI products, with focus on quadcopters.
Stars: ✭ 424 (+70.28%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (-29.72%)
FeatranA Scala feature transformation library for data science and machine learning
Stars: ✭ 420 (+68.67%)
Azure Event Hubs SparkEnabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Stars: ✭ 140 (-43.78%)
Agile data code 2Code for Agile Data Science 2.0, O'Reilly 2017, Second Edition
Stars: ✭ 413 (+65.86%)
CuesheetA framework for writing Spark 2.x applications in a pretty way
Stars: ✭ 86 (-65.46%)
MarmarayGeneric Data Ingestion & Dispersal Library for Hadoop
Stars: ✭ 414 (+66.27%)
DparkPython clone of Spark, a MapReduce alike framework in Python
Stars: ✭ 2,668 (+971.49%)
Devops Python Tools80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+63.05%)
Hops ExamplesExamples for Deep Learning/Feature Store/Spark/Flink/Hive/Kafka jobs and Jupyter notebooks on Hops
Stars: ✭ 84 (-66.27%)
BachBach Testing Framework
Stars: ✭ 392 (+57.43%)
Hadoop cookbookCookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (-67.07%)
Docker practiceLearn and understand Docker technologies, with real DevOps practice!
Stars: ✭ 19,768 (+7838.96%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+911.24%)
TensorflowonsparkTensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
Stars: ✭ 3,748 (+1405.22%)
MleapMLeap: Deploy ML Pipelines to Production
Stars: ✭ 1,232 (+394.78%)
WedatasphereWeDataSphere is a financial level one-stop open-source suitcase for big data platforms. Currently the source code of Scriptis and Linkis has already been released to the open-source community. WeDataSphere, Big Data Made Easy!
Stars: ✭ 372 (+49.4%)
Isolation ForestA Spark/Scala implementation of the isolation forest unsupervised outlier detection algorithm.
Stars: ✭ 139 (-44.18%)
SparkmeasureThis is the development repository of SparkMeasure, a tool for performance troubleshooting of Apache Spark workloads. It simplifies the collection and analysis of Spark task metrics data.
Stars: ✭ 368 (+47.79%)
KtfKernel Test Framework - a unit test framework for the Linux kernel
Stars: ✭ 81 (-67.47%)
KyuubiKyuubi is a unified multi-tenant JDBC interface for large-scale data processing and analytics, built on top of Apache Spark
Stars: ✭ 363 (+45.78%)
SparkrdmaRDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-13.65%)
SparklerSpark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Stars: ✭ 362 (+45.38%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-68.27%)
WardA modern Python test framework designed to help you find and fix flaws faster.
Stars: ✭ 350 (+40.56%)
SparklensQubole Sparklens tool for performance tuning Apache Spark
Stars: ✭ 345 (+38.55%)
HomeApacheCN 开源组织:公告、介绍、成员、活动、交流方式
Stars: ✭ 1,199 (+381.53%)
IqlAn ad hoc query service based on the spark sql engine.(基于spark sql引擎的即席查询服务)
Stars: ✭ 341 (+36.95%)
Deeplearning4jSuite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+4830.52%)
WirbelsturmWirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
Stars: ✭ 332 (+33.33%)
Cleanframestype-class based data cleansing library for Apache Spark SQL
Stars: ✭ 75 (-69.88%)
Utest.h🧪 single header unit testing framework for C and C++
Stars: ✭ 315 (+26.51%)
HorovodDistributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Stars: ✭ 11,943 (+4696.39%)
DataspherestudioDataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+379.92%)
Learningsparkv2This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
Stars: ✭ 307 (+23.29%)
DeltaAn open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
Stars: ✭ 3,903 (+1467.47%)
Lpa DetectorOptimize and improve the Label propagation algorithm
Stars: ✭ 75 (-69.88%)
SplineData Lineage Tracking And Visualization Solution
Stars: ✭ 306 (+22.89%)
Tf✔️ tf is a microframework for parameterized testing of functions and HTTP in Go.
Stars: ✭ 133 (-46.59%)
Awesome AdaA curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (+20.08%)
LabsResearch on distributed system
Stars: ✭ 73 (-70.68%)
Spark Hbase ConnectorConnect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (+20.08%)
Luigi WarehouseA luigi powered analytics / warehouse stack
Stars: ✭ 72 (-71.08%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-0.8%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (-1.61%)
Recheck Webrecheck for web apps – change comparison tool with local Golden Masters, Git-like ignore syntax and "Unbreakable Selenium" tests.
Stars: ✭ 224 (-10.04%)
AzuredatabricksbestpracticesVersion 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
Stars: ✭ 186 (-25.3%)
Spark TsneDistributed t-SNE via Apache Spark
Stars: ✭ 151 (-39.36%)
BigdataclassTwo-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-55.82%)