All Projects → zouzias → spark-lucenerdd-examples

zouzias / spark-lucenerdd-examples

Licence: Apache-2.0 License
Examples of spark-lucenerdd

Programming Languages

scala
5932 projects
shell
77523 projects

Projects that are alternatives of or similar to spark-lucenerdd-examples

examples
Examples from the AIMMS Modeling Book, Functional examples demonstrating particular AIMMS features, AIMMS-style application examples including an end-user UI, Practical examples, including references to the articles on which each example is based.
Stars: ✭ 16 (+6.67%)
Mutual labels:  examples
go-design-pattern
go的设计模式实例
Stars: ✭ 45 (+200%)
Mutual labels:  examples
ShadowEditor-examples
ShadowEditor Demo. https://tengge1.github.io/ShadowEditor-examples/
Stars: ✭ 33 (+120%)
Mutual labels:  examples
Merge-Machine
Merge Dirty Data with Clean Reference Tables
Stars: ✭ 35 (+133.33%)
Mutual labels:  record-linkage
ipld-examples
No description or website provided.
Stars: ✭ 28 (+86.67%)
Mutual labels:  examples
kuromoji-with-mecab-neologd-buildscript
These scripts to build a Lucene Kuromoji or Atilika Kuromoji with bundled mecab-ipadic-NEologd.
Stars: ✭ 19 (+26.67%)
Mutual labels:  lucene
explicit-semantic-analysis
Wikipedia-based Explicit Semantic Analysis, as described by Gabrilovich and Markovitch
Stars: ✭ 34 (+126.67%)
Mutual labels:  lucene
stance
Learned string similarity for entity names using optimal transport.
Stars: ✭ 27 (+80%)
Mutual labels:  record-linkage
terraform-openstack
Create multiple instances with floating ip assigning on openstack using terraform.
Stars: ✭ 33 (+120%)
Mutual labels:  examples
form examples
TYPO3 extension. Ships several examples for the TYPO3 Form Framework, e.g. an upload form or a custom email template with personalized salutation. Includes translation examples (both global and specific).
Stars: ✭ 30 (+100%)
Mutual labels:  examples
cplusplus11.Examples
C++11 Examples
Stars: ✭ 52 (+246.67%)
Mutual labels:  examples
python-weka-wrapper-examples
Example code for the python-weka-wrapper project.
Stars: ✭ 35 (+133.33%)
Mutual labels:  examples
examples
Example code for the Quarkus for Spring Developers eBook
Stars: ✭ 22 (+46.67%)
Mutual labels:  examples
lucene
Apache Lucene open-source search software
Stars: ✭ 1,009 (+6626.67%)
Mutual labels:  lucene
Unity3D-JobsSystemAndBurstSamples
Examples of using the Job System in Unity 2018
Stars: ✭ 46 (+206.67%)
Mutual labels:  examples
gnuplot-examples
GNUPlot Examples
Stars: ✭ 50 (+233.33%)
Mutual labels:  examples
lupyne
Pythonic search engine based on PyLucene.
Stars: ✭ 61 (+306.67%)
Mutual labels:  lucene
lucene
Node.js lib to transform: lucene query → syntax tree → lucene query
Stars: ✭ 61 (+306.67%)
Mutual labels:  lucene
Crypto-API-Rules
This repository contains all CrySL rules currently used in the crypto assistant CogniCrypt.
Stars: ✭ 16 (+6.67%)
Mutual labels:  examples
DscExamples
Small examples of some of our DSC tooling and usage
Stars: ✭ 20 (+33.33%)
Mutual labels:  examples

spark-lucenerdd-examples

Examples of spark-lucenerdd.

Datasets and Entity Likage

The following pairs of datasets are used here to demonstrate the accuracy/quality of the record linkage methods. Note that the goal here is to demonstrate the user-friendliness of the spark-lucenerdd library and no optimization is attempted.

Dataset Domain Attributes Accuracy (top-1) References
DBLP vs ACM article Bibliographic title, authors, venue, year 0.98 Benchmark datasets for entity resolution
DBLP vs Scholar article Bibliographic title, authors, venue, year 0.953 Benchmark datasets for entity resolution
Amazon vs Google products E-commerce name, description, manufacturer, price 0.58 Benchmark datasets for entity resolution
Abt vs Buy products E-commerce name, description, manufacturer, price 0.64 Benchmark datasets for entity resolution

The reported accuracy above is by selecting as the linked entity: the first result from the top-K list of results.

All datasets are available in Spark friendly Parquet format here; original datasets are available here.

Spatial linkage between countries and capitals

This example loads all countries from a parquet file containing fields "name" and "shape" (shape is mostly polygons in WKT)

val allCountries = spark.read.parquet("data/spatial/countries-poly.parquet")

then, it load all capitals from a parquet file containing fields "name" and "shape" (shape is mostly points in WKT)

val capitals = spark.read.parquet("data/spatial/capitals.parquet")

A ShapeLuceneRDD instance is created on the countries and a linkageByRadius is performed on the capitals. The output is presented in the logs.

Development

Usage (spark-submit)

Install Java, SBT and clone the project

git clone https://github.com/zouzias/spark-lucenerdd-examples.git
cd spark-lucenerdd-examples
sbt compile assembly

Download and extract apache spark under your home directory, update the spark-submit.sh script accordingly and run

./spark-linkage-*.sh

to run the record linkage examples and ./spark-search-capitalts.sh to run a search example.

Usage (docker)

Setup docker and assuming that you have a docker machine named default, type

./startZeppelin.sh

To start an Apache Zeppelin with preloaded notebooks.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].