All Projects → apache → Orc

apache / Orc

Licence: apache-2.0
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads

Programming Languages

java
68154 projects - #9 most used programming language
cplusplus
227 projects

Projects that are alternatives of or similar to Orc

Ignite
Apache Ignite
Stars: ✭ 4,027 (+935.22%)
Mutual labels:  big-data, hadoop, database
Metamodel
Mirror of Apache Metamodel
Stars: ✭ 143 (-63.24%)
Mutual labels:  big-data, database, library
Trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+1077.63%)
Mutual labels:  big-data, hadoop, database
Hive
Apache Hive
Stars: ✭ 4,031 (+936.25%)
Mutual labels:  big-data, hadoop, database
Griffon Vm
Griffon Data Science Virtual Machine
Stars: ✭ 128 (-67.1%)
Mutual labels:  big-data, hadoop, database
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-61.44%)
Mutual labels:  big-data, hadoop, database
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-71.47%)
Mutual labels:  big-data, hadoop
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-96.4%)
Mutual labels:  big-data, hadoop
Jackrabbit
Mirror of Apache Jackrabbit
Stars: ✭ 273 (-29.82%)
Mutual labels:  database, library
Crate
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.
Stars: ✭ 3,254 (+736.5%)
Mutual labels:  big-data, database
big-data-lite
Samples to the Oracle Big Data Lite VM
Stars: ✭ 41 (-89.46%)
Mutual labels:  big-data, hadoop
Immortaldb
🔩 A relentless key-value store for the browser.
Stars: ✭ 2,962 (+661.44%)
Mutual labels:  database, library
Couchdb Fauxton
Apache CouchDB
Stars: ✭ 295 (-24.16%)
Mutual labels:  big-data, database
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-96.66%)
Mutual labels:  big-data, hadoop
hadoop-data-ingestion-tool
OLAP and ETL of Big Data
Stars: ✭ 17 (-95.63%)
Mutual labels:  big-data, hadoop
big data
A collection of tutorials on Hadoop, MapReduce, Spark, Docker
Stars: ✭ 34 (-91.26%)
Mutual labels:  big-data, hadoop
Gokv
Simple key-value store abstraction and implementations for Go (Redis, Consul, etcd, bbolt, BadgerDB, LevelDB, Memcached, DynamoDB, S3, PostgreSQL, MongoDB, CockroachDB and many more)
Stars: ✭ 314 (-19.28%)
Mutual labels:  database, library
Tez
Apache Tez
Stars: ✭ 313 (-19.54%)
Mutual labels:  big-data, hadoop
Jackrabbit Oak
Mirror of Apache Jackrabbit Oak
Stars: ✭ 321 (-17.48%)
Mutual labels:  database, library
Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
Stars: ✭ 47 (-87.92%)
Mutual labels:  big-data, hadoop

Apache ORC

ORC is a self-describing type-aware columnar file format designed for Hadoop workloads. It is optimized for large streaming reads, but with integrated support for finding required rows quickly. Storing data in a columnar format lets the reader read, decompress, and process only the values that are required for the current query. Because ORC files are type-aware, the writer chooses the most appropriate encoding for the type and builds an internal index as the file is written. Predicate pushdown uses those indexes to determine which stripes in a file need to be read for a particular query and the row indexes can narrow the search to a particular set of 10,000 rows. ORC supports the complete set of types in Hive, including the complex types: structs, lists, maps, and unions.

ORC File Library

This project includes both a Java library and a C++ library for reading and writing the Optimized Row Columnar (ORC) file format. The C++ and Java libraries are completely independent of each other and will each read all versions of ORC files. But the C++ library only writes the original (Hive 0.11) version of ORC files, and will be extended in the future.

Releases:

The current build status:

Bug tracking: Apache Jira

The subdirectories are:

  • c++ - the c++ reader and writer
  • cmake_modules - the cmake modules
  • docker - docker scripts to build and test on various linuxes
  • examples - various ORC example files that are used to test compatibility
  • java - the java reader and writer
  • proto - the protocol buffer definition for the ORC metadata
  • site - the website and documentation
  • snap - the script to build snaps of the ORC tools
  • tools - the c++ tools for reading and inspecting ORC files

Building

  • Install java 1.8 or higher
  • Install maven 3 or higher
  • Install cmake

To build a release version with debug information:

% mkdir build
% cd build
% cmake ..
% make package
% make test-out

To build a debug version:

% mkdir build
% cd build
% cmake .. -DCMAKE_BUILD_TYPE=DEBUG
% make package
% make test-out

To build a release version without debug information:

% mkdir build
% cd build
% cmake .. -DCMAKE_BUILD_TYPE=RELEASE
% make package
% make test-out

To build only the Java library:

% cd java
% mvn package

To build only the C++ library:

% mkdir build
% cd build
% cmake .. -DBUILD_JAVA=OFF
% make package
% make test-out

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].