All Projects → Intel-bigdata → Oap

Intel-bigdata / Oap

Licence: apache-2.0
Optimized Analytics Package for Spark* Platform

Programming Languages

scala
5932 projects

Projects that are alternatives of or similar to Oap

Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+378.72%)
Mutual labels:  spark, parquet
Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+18.37%)
Mutual labels:  spark, parquet
Iceberg
Iceberg is a table format for large, slow-moving tabular data
Stars: ✭ 393 (+14.58%)
Mutual labels:  spark, parquet
Pucket
Bucketing and partitioning system for Parquet
Stars: ✭ 29 (-91.55%)
Mutual labels:  spark, parquet
Schemer
Schema registry for CSV, TSV, JSON, AVRO and Parquet schema. Supports schema inference and GraphQL API.
Stars: ✭ 97 (-71.72%)
Mutual labels:  spark, parquet
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-83.09%)
Mutual labels:  spark, parquet
Parquet Generator
Parquet file generator
Stars: ✭ 16 (-95.34%)
Mutual labels:  spark, parquet
Parquet Index
Spark SQL index for Parquet tables
Stars: ✭ 109 (-68.22%)
Mutual labels:  spark, parquet
experiments
Code examples for my blog posts
Stars: ✭ 21 (-93.88%)
Mutual labels:  spark, parquet
Awesome Ada
A curated list of awesome resources related to the Ada and SPARK programming language
Stars: ✭ 299 (-12.83%)
Mutual labels:  spark
Cook
Fair job scheduler on Kubernetes and Mesos for batch workloads and Spark
Stars: ✭ 314 (-8.45%)
Mutual labels:  spark
Elasticluster
Create clusters of VMs on the cloud and configure them with Ansible.
Stars: ✭ 298 (-13.12%)
Mutual labels:  spark
Zat
Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark
Stars: ✭ 303 (-11.66%)
Mutual labels:  spark
Sparklint
A tool for monitoring and tuning Spark jobs for efficiency.
Stars: ✭ 316 (-7.87%)
Mutual labels:  spark
Elasticsearch loader
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Stars: ✭ 300 (-12.54%)
Mutual labels:  parquet
Parquet Cpp
Apache Parquet
Stars: ✭ 339 (-1.17%)
Mutual labels:  parquet
Spark Hbase Connector
Connect Spark to HBase for reading and writing data with ease
Stars: ✭ 299 (-12.83%)
Mutual labels:  spark
Spark Notebook
Interactive and Reactive Data Science using Scala and Spark.
Stars: ✭ 3,081 (+798.25%)
Mutual labels:  spark
Scalnet
A Scala wrapper for Deeplearning4j, inspired by Keras. Scala + DL + Spark + GPUs
Stars: ✭ 342 (-0.29%)
Mutual labels:  spark
Ytk Learn
Ytk-learn is a distributed machine learning library which implements most of popular machine learning algorithms(GBDT, GBRT, Mixture Logistic Regression, Gradient Boosting Soft Tree, Factorization Machines, Field-aware Factorization Machines, Logistic Regression, Softmax).
Stars: ✭ 337 (-1.75%)
Mutual labels:  spark

Optimized Analytics Package for Spark* Platform (OAP)

* LEGAL NOTICE: Your use of this software and any required dependent software (the "Software Package") is subject to the terms and conditions of the software license agreements for the Software Package, which may also include notices, disclaimers, or license terms for third party or open source software included in or with the Software Package, and your use indicates your acceptance of all such terms. Please refer to the "TPP.txt" or other similarly-named text file included with the Software Package for additional details.
* Optimized Analytics Package for Spark* Platform is under Apache 2.0 (https://www.apache.org/licenses/LICENSE-2.0).

OAP is a project to optimize Spark by providing optimized implementation of packages for various aspects including cache, shuffle, native SQL engine, Mllib and so on. In this version, OAP contains the optimized implementations of SQL Index and Data Source Cache supporting DRAM and PMem, RDD Cache PMem Extension, Shuffle Remote PMem Extension, Remote Shuffle, Intel MLlib, Unified Arrow Data Source and Native SQL Engine.

Installation Guide

Please follow the link below for the guide to compile and install OAP to your system.

User Guide

Please refer to the corresponding documents below for the introductions on how to use the features.

Developer Guide

Please follow the link below for the guide for developers.

*Other names and brands may be claimed as the property of others.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].