All Projects → InsightEdge → insightedge

InsightEdge / insightedge

Licence: other
InsightEdge Core

Programming Languages

scala
5932 projects
java
68154 projects - #9 most used programming language
Batchfile
5799 projects
shell
77523 projects

Projects that are alternatives of or similar to insightedge

Hazelcast Go Client
Hazelcast IMDG Go Client
Stars: ✭ 140 (+536.36%)
Mutual labels:  big-data, distributed, in-memory, datagrid
Hazelcast Cpp Client
Hazelcast IMDG C++ Client
Stars: ✭ 67 (+204.55%)
Mutual labels:  big-data, distributed, in-memory, datagrid
Hazelcast Python Client
Hazelcast IMDG Python Client
Stars: ✭ 92 (+318.18%)
Mutual labels:  big-data, distributed, in-memory, datagrid
hazelcast-csharp-client
Hazelcast .NET Client
Stars: ✭ 98 (+345.45%)
Mutual labels:  big-data, distributed, in-memory, datagrid
Hazelcast Nodejs Client
Hazelcast IMDG Node.js Client
Stars: ✭ 124 (+463.64%)
Mutual labels:  big-data, distributed, in-memory, datagrid
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+21090.91%)
Mutual labels:  big-data, distributed, in-memory
dxram
A distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (+13.64%)
Mutual labels:  big-data, distributed, in-memory
Olric
Distributed cache and in-memory key/value data store. It can be used both as an embedded Go library and as a language-independent service.
Stars: ✭ 2,067 (+9295.45%)
Mutual labels:  distributed, in-memory
Crate
CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of data in real-time.
Stars: ✭ 3,254 (+14690.91%)
Mutual labels:  big-data, distributed
Scanner
Efficient video analysis at scale
Stars: ✭ 569 (+2486.36%)
Mutual labels:  big-data, distributed
H2o 3
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+25609.09%)
Mutual labels:  big-data, distributed
Titanoboa
Titanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+3477.27%)
Mutual labels:  big-data, distributed
Attaca
Robust, distributed version control for large files.
Stars: ✭ 41 (+86.36%)
Mutual labels:  big-data, distributed
Coherence
Oracle Coherence Community Edition
Stars: ✭ 328 (+1390.91%)
Mutual labels:  distributed, in-memory
phalanx
Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.
Stars: ✭ 192 (+772.73%)
Mutual labels:  distributed, cloud-native
GoGetit
Go学习+面试通关, 覆盖Go程序员需要掌握的所有基础知识
Stars: ✭ 44 (+100%)
Mutual labels:  distributed, cloud-native
spicedb
Open Source, Google Zanzibar-inspired fine-grained permissions database
Stars: ✭ 3,358 (+15163.64%)
Mutual labels:  distributed, cloud-native
nebula
A distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+37154.55%)
Mutual labels:  big-data, distributed
docs
Documentation repo of nebula orchestration system
Stars: ✭ 16 (-27.27%)
Mutual labels:  distributed
gotway
☸️ Cloud native API Gateway powered with in-redis cache
Stars: ✭ 71 (+222.73%)
Mutual labels:  cloud-native

InsightEdge

Documentation: User Guide
Community: Slack Channel, StackOverflow tag, Email
Contributing: Contribution Guide
Issue Tracker: Jira
License: Apache 2.0

InsightEdge is a Spark distribution on top of in-memory Data Grid. A single platform for analytical and transactional workloads.

Features

  • Exposes Data Grid as Spark RDDs
  • Saves Spark RDDs to Data Grid
  • Full DataFrames and Dataset API support with persistence
  • Geospatial API for RDD and DataFrames. Geospatial indexes.
  • Transparent integration with SparkContext using Scala implicits
  • Data Grid side filtering with ability apply indexes
  • Running SQL queries in Spark over Data Grid
  • Data locality between Spark and Data Grid nodes
  • Storing MLlib models in Data Grid
  • Continuously saving Spark Streaming computation to Data Grid
  • Off-Heap persistence
  • Interactive Web Notebook
  • Python support

Building InsightEdge

InsightEdge is built using Apache Maven.

First, compile and install InsightEdge Core libraries:

# without unit tests
mvn clean install -DskipTests=true

# with unit tests
mvn clean install

To build InsightEdge zip distribution you need the following binary dependencies:

  • insightedge-datagrid 12.3.0: download a copy of the XAP 12.x Open Source Edition
  • insightedge-examples: use the same branch as in this repo, find build instructions in repository readme
  • insightedge-zeppelin: use the same branch as in this repo, run ./dev/change_scala_version.sh 2.11, then build with mvn clean install -DskipTests -P spark-2.1 -P scala-2.11 -P build-distr -Dspark.version=2.1.1
  • Apache Spark 2.3.0: download zip

Package InsightEdge distribution:

mvn clean package -P package-open -DskipTests=true -Ddist.spark=<path to spark.tgz> -Ddist.xap=file:///<path to xap.zip> -Ddist.zeppelin=<path to zeppelin.tar.gz> -Ddist.examples.target=<path to examples target>

The archive is generated under insightedge-packager/target/open directory. The archive content is under insightedge-packager/target/contents-community.

To run integration tests refer to the wiki page

Quick Start

Build the project and start InsightEdge demo mode with

cd insightedge-packager/target/contents-community
./bin/insightedge -demo

It starts Zeppelin at http://127.0.0.1:9090 with InsightEdge tutorial and example notebooks you can play with. The full documentation is available at website.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].