Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (-63.04%)

Mutual labels: big-data, spark

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+1084.44%)

Mutual labels: spark, big-data

Sparkling Graph

SparklingGraph provides easy to use set of features that will give you ability to proces large scala graphs using Spark and GraphX.

Stars: ✭ 139 (-45.91%)

Mutual labels: spark, big-data

aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

Stars: ✭ 111 (-56.81%)

Mutual labels: big-data, spark

Spark On Lambda

Apache Spark on AWS Lambda

Stars: ✭ 137 (-46.69%)

Mutual labels: spark, big-data

Geni

A Clojure dataframe library that runs on Spark

Stars: ✭ 152 (-40.86%)

Mutual labels: spark, big-data

Gaffer

A large-scale entity and relation database supporting aggregation of properties

Stars: ✭ 1,642 (+538.91%)

Mutual labels: spark, big-data

Sparkrdma

RDMA accelerated, high-performance, scalable and efficient ShuffleManager plugin for Apache Spark

Stars: ✭ 215 (-16.34%)

Mutual labels: spark, big-data

Bigdata Notes

大数据入门指南 ⭐

Stars: ✭ 10,991 (+4176.65%)

Mutual labels: spark, big-data

Bigdataclass

Two-day workshop that covers how to use R to interact databases and Spark

Stars: ✭ 110 (-57.2%)

Mutual labels: spark, big-data

Data Accelerator

Data Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.

Stars: ✭ 247 (-3.89%)

Mutual labels: spark, big-data

leaflet heatmap

简单的可视化湖州通话数据假设数据量很大，没法用浏览器直接绘制热力图，把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后，再使用Apache Spark绘制热力图，然后用leafletjs加载OpenStreetMap图层和热力图图层，以达到良好的交互效果。现在使用Apache Spark实现绘制，可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法，并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .

Stars: ✭ 13 (-94.94%)

Mutual labels: big-data, spark

View All Similar Projects ➔

Succinct

Succinct is a data store that enables queries directly on a compressed representation of data. This repository maintains the Java implementations of Succinct's core algorithms, and applications that exploit them, such as a Apache Spark binding for Succinct.

Building Succinct

Succinct is built using Apache Maven. To build Succinct and its component modules, run:

mvn clean package

Succinct-Core

The Succinct-Core module contains Java implementation of Succinct's core algorithms. See a more descriptive description of the core module here.

Dependency Information

Apache Maven

To build your application with Succinct-Core, you can link against this library using Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-core</artifactId>
    <version>0.1.8</version>
</dependency>

Succinct on Apache Spark

We provide Apache Spark and Apache Spark SQL interfaces for Succinct, which expose a compressed, queryable RDD SuccinctRDD, enabling manipulation of unstructured data, and a SuccinctKVRDD for querying semi-structured data (key-value pairs, text and json documents, etc.). We also expose Succinct as a DataSource in Apache Spark SQL as an experimental feature. More details on the integration with Apache Spark can be found here.

Dependency Information

Apache Maven

To build your application to run with Succinct on Apache Spark, you can link against this library using Apache Maven by adding the following dependency information to your pom.xml file:

<dependency>
    <groupId>amplab</groupId>
    <artifactId>succinct-spark</artifactId>
    <version>0.1.8</version>
</dependency>

SBT and Spark-Packages

Add the dependency to your SBT project by adding the following to build.sbt (see the Spark Packages listing for spark-submit and Maven instructions):

resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "amplab" % "succinct" % "0.1.8"

The succinct-spark jar file can also be added to a Spark shell using the --jars command line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --jars succinct-0.1.8.jar

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 257

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (7) 🔗