All Projects → NC0DER → GraphOfDocs

NC0DER / GraphOfDocs

Licence: Apache-2.0 license
GraphOfDocs: Representing multiple documents as a single graph

Programming Languages

python
139335 projects - #7 most used programming language
HTML
75241 projects
javascript
184084 projects - #8 most used programming language
CSS
56736 projects

Projects that are alternatives of or similar to GraphOfDocs

fsfc
Feature Selection for Clustering
Stars: ✭ 80 (+515.38%)
Mutual labels:  feature-selection
FIFA-2019-Analysis
This is a project based on the FIFA World Cup 2019 and Analyzes the Performance and Efficiency of Teams, Players, Countries and other related things using Data Analysis and Data Visualizations
Stars: ✭ 28 (+115.38%)
Mutual labels:  feature-selection
angular-neo4j
Neo4j Bolt driver wrapper for Angular
Stars: ✭ 18 (+38.46%)
Mutual labels:  graph-database
spicedb
Open Source, Google Zanzibar-inspired fine-grained permissions database
Stars: ✭ 3,358 (+25730.77%)
Mutual labels:  graph-database
database-journal
Databases: Concepts, commands, codes, interview questions and more...
Stars: ✭ 50 (+284.62%)
Mutual labels:  graph-database
Reinforcement-Learning-Feature-Selection
Feature selection for maximizing expected cumulative reward
Stars: ✭ 27 (+107.69%)
Mutual labels:  feature-selection
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+11761.54%)
Mutual labels:  feature-selection
skrobot
skrobot is a Python module for designing, running and tracking Machine Learning experiments / tasks. It is built on top of scikit-learn framework.
Stars: ✭ 22 (+69.23%)
Mutual labels:  feature-selection
janusgraph-docker
Yet another JanusGraph, Cassandra/Scylla and Elasticsearch in Docker Compose setup
Stars: ✭ 54 (+315.38%)
Mutual labels:  graph-database
dgraph
Dgraph Dart client which communicates with the server using gRPC.
Stars: ✭ 27 (+107.69%)
Mutual labels:  graph-database
incubator-age-viewer
Graph database optimized for fast analysis and real-time data processing. It is provided as an extension to PostgreSQL.
Stars: ✭ 123 (+846.15%)
Mutual labels:  graph-database
neo4j-faker
Use faker cypher functions to generate demo and test data with cypher
Stars: ✭ 30 (+130.77%)
Mutual labels:  graph-database
bess
Best Subset Selection algorithm for Regression, Classification, Count, Survival analysis
Stars: ✭ 14 (+7.69%)
Mutual labels:  feature-selection
mizo
Super-fast Spark RDD for Titan Graph Database on HBase
Stars: ✭ 24 (+84.62%)
Mutual labels:  graph-database
grafito
Portable, Serverless & Lightweight SQLite-based Graph Database in Arturo
Stars: ✭ 95 (+630.77%)
Mutual labels:  graph-database
nebula-docker-compose
Docker compose for Nebula Graph
Stars: ✭ 84 (+546.15%)
Mutual labels:  graph-database
uber-graph-benchmark
A framework to benchmark different graph databases, based on generated data from customizable schema, distribution, and size.
Stars: ✭ 25 (+92.31%)
Mutual labels:  graph-database
Cayley.Net
.Net Client for an open-source graph database Cayley
Stars: ✭ 14 (+7.69%)
Mutual labels:  graph-database
CyFHIR
A Neo4j Plugin for Handling HL7 FHIR Data
Stars: ✭ 39 (+200%)
Mutual labels:  graph-database
docs
Source code of the ArangoDB online documentation
Stars: ✭ 18 (+38.46%)
Mutual labels:  graph-database

Graph-of-docs Text Representation

This repository hosts code for the papers:

image1

Datasets

Available in this link

Test Results

Edit GraphOfdocs/config_experiments.py to setup the experiments and run experiments.py.

Installation

Prequisites:

  • Windows 10 64-bit / Debian based Linux 64-bit.
  • Python 3 (min. version 3.6), pip3 (& py launcher Windows-only).
  • Working Neo4j Database (min. version 3.5.12).

Windows 10

Download the project from the green button above, unzip it,
and then open a cmd terminal to this folder and type pip3 install -r requirements.txt.
This command will install the neccessary Python libraries* to run the project.

Debian Based Linux

We ran the following commands to update Python, git,
clone the project to a local folder and install the necessary Python libraries*.

sudo apt install python3.6
sudo apt install git-all
git clone https://github.com/NC0DER/GraphOfDocs
cd GraphOfDocs
pip3 install -r requirements.txt

* Optionally you could create a virtual environment first,
* to isolate the libraries from your python user install.
* However the setup script doesn't downgrade existing libraries,
* so there's zero risk in affecting your local user install.

Database Setup (Windows / Linux)

Create a new database from the Neo4j desktop app using 3.5.12 as the min. version.
Update your memory settings to match the following values,
and install the following extra plugins as depicted in the image. image2 Hint: if you use a dedicated server that only runs Neo4j, you could increase these values, accordingly as specified in the comments of these parameters.

Run the GraphOfDocs.py script which will create thousands of nodes, and millions of relationships in the database.
Once it's done, the database is initialized and ready for use.

Running the app

You could use the Neo4j Browser to run your queries, or for large queries you could use the custom visualization tool
visualize.html which is located in the GraphOfDocs Subdirectory.

Citation

On a novel representation of multiple textual documents in a single graph (KES-IDT 2020) paper:

Giarelis N., Kanakaris N., Karacapilidis N. (2020) On a Novel Representation of Multiple Textual Documents in a Single Graph. In: Czarnowski I., Howlett R., Jain L. (eds) Intelligent Decision Technologies. IDT 2020. Smart Innovation, Systems and Technologies, vol 193. Springer, Singapore
@InProceedings{10.1007/978-981-15-5925-9_9,
author="Giarelis, Nikolaos
and Kanakaris, Nikos
and Karacapilidis, Nikos",
editor="Czarnowski, Ireneusz
and Howlett, Robert J.
and Jain, Lakhmi C.",
title="On a Novel Representation of Multiple Textual Documents in a Single Graph",
booktitle="Intelligent Decision Technologies",
year="2020",
publisher="Springer Singapore",
address="Singapore",
pages="105--115",
abstract="This paper introduces a novel approach to represent multiple documents as a single graph, namely, the graph-of-docs model, together with an associated novel algorithm for text categorization. The proposed approach enables the investigation of the importance of a term into a whole corpus of documents and supports the inclusion of relationship edges between documents, thus enabling the calculation of important metrics as far as documents are concerned. Compared to well-tried existing solutions, our initial experimentations demonstrate a significant improvement of the accuracy of the text categorization process. For the experimentations reported in this paper, we used a well-known dataset containing about 19,000 documents organized in various subjects.",
isbn="978-981-15-5925-9"
}

An innovative graph-based approach to advance feature selection from multiple textual documents (AIAI 2020) paper:

Giarelis N., Kanakaris N., Karacapilidis N. (2020) An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents. In: Maglogiannis I., Iliadis L., Pimenidis E. (eds) Artificial Intelligence Applications and Innovations. AIAI 2020. IFIP Advances in Information and Communication Technology, vol 583. Springer, Cham
@InProceedings{10.1007/978-3-030-49161-1_9,
author="Giarelis, Nikolaos
and Kanakaris, Nikos
and Karacapilidis, Nikos",
editor="Maglogiannis, Ilias
and Iliadis, Lazaros
and Pimenidis, Elias",
title="An Innovative Graph-Based Approach to Advance Feature Selection from Multiple Textual Documents",
booktitle="Artificial Intelligence Applications and Innovations",
year="2020",
publisher="Springer International Publishing",
address="Cham",
pages="96--106",
abstract="This paper introduces a novel graph-based approach to select features from multiple textual documents. The proposed solution enables the investigation of the importance of a term into a whole corpus of documents by utilizing contemporary graph theory methods, such as community detection algorithms and node centrality measures. Compared to well-tried existing solutions, evaluation results show that the proposed approach increases the accuracy of most text classifiers employed and decreases the number of features required to achieve `state-of-the-art' accuracy. Well-known datasets used for the experimentations reported in this paper include 20Newsgroups, LingSpam, Amazon Reviews and Reuters.",
isbn="978-3-030-49161-1"
}

Contributors

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].