All Projects → wadhwasahil → Relation_extraction

wadhwasahil / Relation_extraction

Relation Extraction using Deep learning(CNN)

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Relation extraction

Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+322.92%)
Mutual labels:  spark, pyspark
Sparkling Titanic
Training models with Apache Spark, PySpark for Titanic Kaggle competition
Stars: ✭ 12 (-87.5%)
Mutual labels:  spark, pyspark
Pyspark Example Project
Example project implementing best practices for PySpark ETL jobs and applications.
Stars: ✭ 633 (+559.38%)
Mutual labels:  spark, pyspark
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (-73.96%)
Mutual labels:  spark, pyspark
Optimus
🚚 Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark
Stars: ✭ 986 (+927.08%)
Mutual labels:  spark, pyspark
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (-64.58%)
Mutual labels:  spark, pyspark
Spark Tdd Example
A simple Spark TDD example
Stars: ✭ 23 (-76.04%)
Mutual labels:  spark, pyspark
data processing course
Some class materials for a data processing course using PySpark
Stars: ✭ 50 (-47.92%)
Mutual labels:  spark, pyspark
Spark python ml examples
Spark 2.0 Python Machine Learning examples
Stars: ✭ 87 (-9.37%)
Mutual labels:  spark, pyspark
Sparkmagic
Jupyter magics and kernels for working with remote Spark clusters
Stars: ✭ 954 (+893.75%)
Mutual labels:  spark, pyspark
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Stars: ✭ 111 (+15.63%)
Mutual labels:  spark, pyspark
W2v
Word2Vec models with Twitter data using Spark. Blog:
Stars: ✭ 64 (-33.33%)
Mutual labels:  spark, pyspark
incubator-linkis
Linkis helps easily connect to various back-end computation/storage engines(Spark, Python, TiDB...), exposes various interfaces(REST, JDBC, Java ...), with multi-tenancy, high performance, and resource control.
Stars: ✭ 2,459 (+2461.46%)
Mutual labels:  spark, pyspark
basin
Basin is a visual programming editor for building Spark and PySpark pipelines. Easily build, debug, and deploy complex ETL pipelines from your browser
Stars: ✭ 25 (-73.96%)
Mutual labels:  spark, pyspark
kafka-compose
🎼 Docker compose files for various kafka stacks
Stars: ✭ 32 (-66.67%)
Mutual labels:  spark, pyspark
Scriptis
Scriptis is for interactive data analysis with script development(SQL, Pyspark, HiveQL), task submission(Spark, Hive), UDF, function, resource management and intelligent diagnosis.
Stars: ✭ 696 (+625%)
Mutual labels:  spark, pyspark
Gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+125%)
Mutual labels:  spark, pyspark
ODSC India 2018
My presentation at ODSC India 2018 about Deep Learning with Apache Spark
Stars: ✭ 26 (-72.92%)
Mutual labels:  spark, pyspark
Live log analyzer spark
Spark Application for analysis of Apache Access logs and detect anamolies! Along with Medium Article.
Stars: ✭ 14 (-85.42%)
Mutual labels:  spark, pyspark
Pysparkgeoanalysis
🌐 Interactive Workshop on GeoAnalysis using PySpark
Stars: ✭ 63 (-34.37%)
Mutual labels:  spark, pyspark

Relation_Extraction

Relation Classification via Convolutional Deep Neural Network

The code is an implementation of the paper http://www.aclweb.org/anthology/C14-1220 using tensorflow.

##Algorithm

  • I almost followed the technique used in the paper mentioned above, only tweaking with some parameters such as dimensions of word vector, position vectors, optimization function and so on.
  • Basic architecture is a convolution layer, max pool and final softamx layer. We can always add/delete the number of conv and max-pool layers b/w the input layer and the final softmax layer. I used only 1 conv and 1 max pool.

##Files

  • text_cnn.py - It is a class which implements the architecture of the model. So it accepts the input, contains all the layers such as conv2d(convolution layer), max_pool etc. which process the input vector and finally gives the output in terms of predictions for each class.
  • data_helpers.py - It is a generic script which contains helpers such as generating batches, loading the training data etc.
  • train.py - This module creates the input vector from the training data, and finally trains the model on the data and saves it on the disk.
  • temp.py - This is a pyspark code used to fetch data from the HBase table and predict the class of each row using the trained model.

##Challenges

  • My training data is around 7K rows. Due to this, the accuracy is around 70.34% on the test set. So as the training set grows, I'm sure the model will perform much better.
  • My data set consists of inter sentencial entities with entities linked with a cause-effect relationship. However, this model can extended to a n class problem.

##TODO

  • Use RNNs maybe LSTMS for the training.
  • Fine tuning the model.

###PYSPARK can now be used with TensorFlow for online training and testing. In my case I am using pysark for online testing.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].