hibayesian / spark-word2vec

Licence: Apache-2.0 License

A parallel implementation of word2vec based on Spark

Programming Languages

scala

5932 projects

Projects that are alternatives of or similar to spark-word2vec

fastdata-cluster

Fast Data Cluster (Apache Cassandra, Kafka, Spark, Flink, YARN and HDFS with Vagrant and VirtualBox)

Stars: ✭ 20 (-16.67%)

Mutual labels: spark

spark-druid-olap

Sparkline BI Accelerator provides fast ad-hoc query capability over Logical Cubes. This has been folded into our SNAP Platform(http://bit.ly/2oBJSpP) an Integrated BI platform on Apache Spark.

Stars: ✭ 286 (+1091.67%)

Mutual labels: spark

yuzhouwan

Code Library for My Blog

Stars: ✭ 39 (+62.5%)

Mutual labels: spark

Spark-Ar

Resources for Spark AR

Stars: ✭ 43 (+79.17%)

Mutual labels: spark

ODSC India 2018

My presentation at ODSC India 2018 about Deep Learning with Apache Spark

Stars: ✭ 26 (+8.33%)

Mutual labels: spark

data processing course

Some class materials for a data processing course using PySpark

Stars: ✭ 50 (+108.33%)

Mutual labels: spark

spark-stringmetric

Spark functions to run popular phonetic and string matching algorithms

Stars: ✭ 51 (+112.5%)

Mutual labels: spark

shamash

Autoscaling for Google Cloud Dataproc

Stars: ✭ 31 (+29.17%)

Mutual labels: spark

reach

Load embeddings and featurize your sentences.

Stars: ✭ 17 (-29.17%)

Mutual labels: word2vec

spark-gradle-template

Apache Spark in your IDE with gradle

Stars: ✭ 39 (+62.5%)

Mutual labels: spark

sparkar-volts

An extensive non-reactive Typescript framework that eases the development experience in Spark AR

Stars: ✭ 15 (-37.5%)

Mutual labels: spark

swordfish

Open-source distribute workflow schedule tools, also support streaming task.

Stars: ✭ 35 (+45.83%)

Mutual labels: spark

openverse-catalog

Identifies and collects data on cc-licensed content across web crawl data and public apis.

Stars: ✭ 27 (+12.5%)

Mutual labels: spark

experiments

Code examples for my blog posts

Stars: ✭ 21 (-12.5%)

Mutual labels: spark

Search Ads Web Service

Online search advertisement platform & Realtime Campaign Monitoring [Maybe Deprecated]

Stars: ✭ 30 (+25%)

Mutual labels: spark

splink

Implementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters

Stars: ✭ 181 (+654.17%)

Mutual labels: spark

awesome-AI-kubernetes

❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc

Stars: ✭ 95 (+295.83%)

Mutual labels: spark

spark-kubernetes

spark on kubernetes

Stars: ✭ 80 (+233.33%)

Mutual labels: spark

sent2vec

How to encode sentences in a high-dimensional vector space, a.k.a., sentence embedding.

Stars: ✭ 99 (+312.5%)

Mutual labels: word2vec

spark-util

low-level helpers for Apache Spark libraries and tests

Stars: ✭ 16 (-33.33%)

Mutual labels: spark

View All Similar Projects ➔

Spark-Word2Vec

Spark-Word2Vec creates vector representation of words in a text corpus. It is based on the implementation of word2vec in Spark MLlib. Several optimization techniques are used to make this algorithm more scalable and accurate.

Highlights

Two models CBOW and Skip-gram are used in our implementation.
Both hierarchical softmax and negative sampling methods are supported to train the model.
The sub-sampling trick can be used to achieve both faster training and significantly better representations of uncommon words.

Examples

Scala API

val spark = SparkSession
  .builder
  .appName("Word2Vec example")
  .master("local[*]")
  .getOrCreate()

  // $example on$
  // Input data: Each row is a bag of words from a sentence or document.
  val documentDF = spark.createDataFrame(Seq(
    "Hi I heard about Spark".split(" "),
    "I wish Java could use case classes".split(" "),
    "Logistic regression models are neat".split(" ")
  ).map(Tuple1.apply)).toDF("text")

  // Learn a mapping from words to Vectors.
  val word2Vec = new Word2Vec()
    .setInputCol("text")
    .setOutputCol("result")
    .setVectorSize(3)
    .setMinCount(0)
  val model = word2Vec.fit(documentDF)

  val result = model.transform(documentDF)
  result.collect().foreach { case Row(text: Seq[_], features: Vector) =>
    println(s"Text: [${text.mkString(", ")}] => \nVector: $features\n") }
  // $example off$

  spark.stop()

Requirements

Spark-Word2Vec is built against Spark 2.1.1.

Build From Source

sbt package

Licenses

Spark-Word2Vec is available under Apache Licenses 2.0.

Contact & Feedback

If you encounter bugs, feel free to submit an issue or pull request. Also you can mail to:

hibayesian ([email protected]).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

hibayesian / spark-word2vec

Programming Languages

Labels

Projects that are alternatives of or similar to spark-word2vec

Spark-Word2Vec

Highlights

Examples

Scala API

Requirements

Build From Source

Licenses

Contact & Feedback