All Projects → amplab → Succinct

amplab / Succinct

Licence: apache-2.0
Enabling queries on compressed data.

Programming Languages

java
68154 projects - #9 most used programming language
scala
5932 projects

Projects that are alternatives of or similar to Succinct

Geopyspark
GeoTrellis for PySpark
Stars: ✭ 167 (-35.02%)
Mutual labels:  spark, big-data
Hyperspace
An open source indexing subsystem that brings index-based query acceleration to Apache Spark™ and big data workloads.
Stars: ✭ 246 (-4.28%)
Mutual labels:  spark, big-data
Mmlspark
Simple and Distributed Machine Learning
Stars: ✭ 2,899 (+1028.02%)
Mutual labels:  spark, big-data
Spark With Python
Fundamentals of Spark with Python (using PySpark), code examples
Stars: ✭ 150 (-41.63%)
Mutual labels:  spark, big-data
spark-acid
ACID Data Source for Apache Spark based on Hive ACID
Stars: ✭ 91 (-64.59%)
Mutual labels:  big-data, spark
Spark.jl
Julia binding for Apache Spark
Stars: ✭ 153 (-40.47%)
Mutual labels:  spark, big-data
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (-15.95%)
Mutual labels:  spark, big-data
Feast
Feature Store for Machine Learning
Stars: ✭ 2,576 (+902.33%)
Mutual labels:  spark, big-data
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (-63.04%)
Mutual labels:  big-data, spark
Koalas
Koalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+1084.44%)
Mutual labels:  spark, big-data
Sparkling Graph
SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.
Stars: ✭ 139 (-45.91%)
Mutual labels:  spark, big-data
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (-56.81%)
Mutual labels:  big-data, spark
Spark On Lambda
Apache Spark on AWS Lambda
Stars: ✭ 137 (-46.69%)
Mutual labels:  spark, big-data
Geni
A Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-40.86%)
Mutual labels:  spark, big-data
Gaffer
A large-scale entity and relation database supporting aggregation of properties
Stars: ✭ 1,642 (+538.91%)
Mutual labels:  spark, big-data
Sparkrdma
RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark
Stars: ✭ 215 (-16.34%)
Mutual labels:  spark, big-data
Bigdata Notes
大数据入门指南 ⭐
Stars: ✭ 10,991 (+4176.65%)
Mutual labels:  spark, big-data
Bigdataclass
Two-day workshop that covers how to use R to interact databases and Spark
Stars: ✭ 110 (-57.2%)
Mutual labels:  spark, big-data
Data Accelerator
Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (-3.89%)
Mutual labels:  spark, big-data
leaflet heatmap
简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-94.94%)
Mutual labels:  big-data, spark

Succinct

Build Status License

Succinct is a data store that enables queries directly on a compressed representation of data. This repository maintains the Java implementations of Succinct's core algorithms, and applications that exploit them, such as a Apache Spark binding for Succinct.

Building Succinct

Succinct is built using Apache Maven. To build Succinct and its component modules, run:

mvn clean package

Succinct-Core

The Succinct-Core module contains Java implementation of Succinct's core algorithms. See a more descriptive description of the core module here.

Dependency Information

Apache Maven

To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-core</artifactId>
    <version>0.1.8</version>
</dependency>

Succinct on Apache Spark

We provide Apache Spark and Apache Spark SQL interfaces for Succinct, which expose a compressed, queryable RDD SuccinctRDD, enabling manipulation of unstructured data, and a SuccinctKVRDD for querying semi-structured data (key-value pairs, text and json documents, etc.). We also expose Succinct as a DataSource in Apache Spark SQL as an experimental feature. More details on the integration with Apache Spark can be found here.

Dependency Information

Apache Maven

To build your application to run with Succinct on Apache Spark, you can link against this library using Apache Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-spark</artifactId>
    <version>0.1.8</version>
</dependency>

SBT and Spark-Packages

Add the dependency to your SBT project by adding the following to build.sbt (see the Spark Packages listing for spark-submit and Maven instructions):

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "amplab" % "succinct" % "0.1.8"

The succinct-spark jar file can also be added to a Spark shell using the --jars command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --jars succinct-0.1.8.jar
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].