All Projects → hammerlab → spark-util

hammerlab / spark-util

Licence: Apache-2.0 License
low-level helpers for Apache Spark libraries and tests

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to spark-util

Hadoopcryptoledger
Hadoop Crypto Ledger - Analyzing CryptoLedgers, such as Bitcoin Blockchain, on Big Data platforms, such as Hadoop/Spark/Flink/Hive
Stars: ✭ 126 (+687.5%)
Mutual labels:  spark, hadoop
fastdata-cluster
Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)
Stars: ✭ 20 (+25%)
Mutual labels:  spark, hadoop
Airflow Pipeline
An Airflow docker image preconfigured to work well with Spark and Hadoop/EMR
Stars: ✭ 128 (+700%)
Mutual labels:  spark, hadoop
Bigdata Notebook
Stars: ✭ 100 (+525%)
Mutual labels:  spark, hadoop
Deeplearning4j
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
Stars: ✭ 12,277 (+76631.25%)
Mutual labels:  spark, hadoop
Waterdrop
Production Ready Data Integration Product, documentation:
Stars: ✭ 1,856 (+11500%)
Mutual labels:  spark, hadoop
Aliyun Emapreduce Datasources
Extended datasource support for Spark/Hadoop on Aliyun E-MapReduce.
Stars: ✭ 132 (+725%)
Mutual labels:  spark, hadoop
Docker Spark
🚢 Docker image for Apache Spark
Stars: ✭ 78 (+387.5%)
Mutual labels:  spark, hadoop
Big Whale
Spark、Flink等离线任务的调度以及实时任务的监控
Stars: ✭ 163 (+918.75%)
Mutual labels:  spark, hadoop
Bigdata docker
Big Data Ecosystem Docker
Stars: ✭ 161 (+906.25%)
Mutual labels:  spark, hadoop
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+68593.75%)
Mutual labels:  spark, hadoop
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (+1243.75%)
Mutual labels:  spark, hadoop
Repository
个人学习知识库涉及到数据仓库建模、实时计算、大数据、Java、算法等。
Stars: ✭ 92 (+475%)
Mutual labels:  spark, hadoop
Xlearning Xdml
extremely distributed machine learning
Stars: ✭ 113 (+606.25%)
Mutual labels:  spark, hadoop
Hadoop cookbook
Cookbook to install Hadoop 2.0+ using Chef
Stars: ✭ 82 (+412.5%)
Mutual labels:  spark, hadoop
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+10162.5%)
Mutual labels:  spark, hadoop
Apache Spark Hands On
Educational notes,Hands on problems w/ solutions for hadoop ecosystem
Stars: ✭ 74 (+362.5%)
Mutual labels:  spark, hadoop
Dataspherestudio
DataSphereStudio is a one stop data application development& management portal, covering scenarios including data exchange, desensitization/cleansing, analysis/mining, quality measurement, visualization, and task scheduling.
Stars: ✭ 1,195 (+7368.75%)
Mutual labels:  spark, hadoop
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (+837.5%)
Mutual labels:  spark, hadoop
Javaorbigdata Interview
Java开发者或者大数据开发者面试知识点整理
Stars: ✭ 203 (+1168.75%)
Mutual labels:  spark, hadoop

spark-util

Build Status Coverage Status Maven Central

Spark, Hadoop, and Kryo utilities

Kryo registration

Classes that implement the Registrar interface can use various shorthands for registering classes with Kryo.

Adapted from RegistrationTest:

register(
  cls[A],                  // comes with an AlsoRegister that loops in other classes
  arr[Foo],                // register a class and an Array of that class
  cls[B]  BSerializer(),  // use a custom Serializer
  CDRegistrar              // register all of another Registrar's registrations
)
  • custom Serializers and AlsoRegisters are picked up implicitly if not provided explicitly.
  • AlsoRegisters are recursive, allowing for much easier and more robust accountability about what is registered and why, and ensurance that needed registrations aren't overlooked.

Configuration/Context wrappers

  • Configuration: serializable Hadoop-Configuration wrapper
  • Context: SparkContext wrapper that is also a Hadoop Configuration, for unification of "global configuration access" patterns
  • Conf: load a SparkConf with settings from file(s) specified in the SPARK_PROPERTIES_FILES environment variable

Spark Configuration

Misc

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].