All Projects β†’ harrisonpim β†’ bookworm

harrisonpim / bookworm

Licence: MIT license
πŸ“š social networks from novels

Programming Languages

Jupyter Notebook
11667 projects

Projects that are alternatives of or similar to bookworm

Deepgraph
Analyze Data with Pandas-based Networks. Documentation:
Stars: ✭ 232 (+222.22%)
Mutual labels:  data-mining, graph-theory, network-analysis
Wordtokenizers.jl
High performance tokenizers for natural language processing and other related tasks
Stars: ✭ 63 (-12.5%)
Mutual labels:  information-retrieval, data-mining
Graph sampling
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Stars: ✭ 99 (+37.5%)
Mutual labels:  data-mining, network-analysis
panther
Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention....
Stars: ✭ 27 (-62.5%)
Mutual labels:  social-network, network-analysis
Nfstream
NFStream: a Flexible Network Data Analysis Framework.
Stars: ✭ 622 (+763.89%)
Mutual labels:  data-mining, network-analysis
Cgnn
Crystal Graph Neural Networks
Stars: ✭ 48 (-33.33%)
Mutual labels:  data-mining, graph-theory
Rmdl
RMDL: Random Multimodel Deep Learning for Classification
Stars: ✭ 375 (+420.83%)
Mutual labels:  information-retrieval, data-mining
brainGraph
Graph theory analysis of brain MRI data
Stars: ✭ 136 (+88.89%)
Mutual labels:  graph-theory, network-analysis
perke
A keyphrase extractor for Persian
Stars: ✭ 60 (-16.67%)
Mutual labels:  information-retrieval, data-mining
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+17626.39%)
Mutual labels:  information-retrieval, data-mining
Jekyll
Jekyll-based static site for The Programming Historian
Stars: ✭ 387 (+437.5%)
Mutual labels:  data-mining, network-analysis
ml-nlp-services
ζœΊε™¨ε­¦δΉ γ€ζ·±εΊ¦ε­¦δΉ γ€θ‡ͺ焢语言倄理
Stars: ✭ 23 (-68.06%)
Mutual labels:  information-retrieval, data-mining
pathpy
pathpy is an OpenSource python package for the modeling and analysis of pathways and temporal networks using higher-order and multi-order graphical models
Stars: ✭ 124 (+72.22%)
Mutual labels:  data-mining, network-analysis
Daggy
Daggy - Data Aggregation Utility. Open source, free, cross-platform, server-less, useful utility for remote or local data aggregation and streaming
Stars: ✭ 91 (+26.39%)
Mutual labels:  data-mining, network-analysis
Network-Embedding-Resources
Network Embedding Survey and Resources
Stars: ✭ 43 (-40.28%)
Mutual labels:  data-mining, network-analysis
Awesome Community Detection
A curated list of community detection research papers with implementations.
Stars: ✭ 1,874 (+2502.78%)
Mutual labels:  social-network, network-analysis
Awesome Network Analysis
A curated list of awesome network analysis resources.
Stars: ✭ 2,525 (+3406.94%)
Mutual labels:  graph-theory, network-analysis
App
free software application for social network analysis and visualization
Stars: ✭ 94 (+30.56%)
Mutual labels:  social-network, network-analysis
Easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Stars: ✭ 13,379 (+18481.94%)
Mutual labels:  information-retrieval, data-mining
AILA-Artificial-Intelligence-for-Legal-Assistance
Python implementations of the various methods used in FIRE 2019 conference.
Stars: ✭ 39 (-45.83%)
Mutual labels:  information-retrieval, data-mining

Bookworm πŸ“š

Most novels are, in some way, a description of a social network. Bookworm ingests novels, builds a solid version of their implicit character network and spits out a intuitively understandable and deeply analysable graph.

Navigation

  • bookworm for the code itself.
  • Notebooks including example usage (with a load of interwoven description of how the thing actually works), in jupyter notebook form. Start Here
  • data for a description of how to get hold of data so that you can run bookworm yourself.

Usage

Command Line Usage

The bookworm('path/to/book.txt') function wraps the following steps into one simple command, allowing the entire analysis process to be run easily from the command line

python run_bookworm.py --path 'path/to/book.txt'
  • Add --d3 to format the output for interpretation by the d3.js force directed graph
  • Add --threshold n where n is an integer to specify the minimum character interaction strength to be included in the output (default 2)
  • Add --output_file 'path/to/file' to specify where the .json or .csv should be left

Detailed API Usage

Start by loading in a book

book = load_book('path/to/book.txt')

Split the book into individual sentences, sequences of n words, or sequences of n characters by respectively running

sequences = get_sentence_sequences(book)
sequences = get_word_sequences(book, n=50)
sequences = get_character_sequences(book, n=200)

Manually input a list of character names or automatically extract a list of 'plausible' character names by respectively using

characters = load_characters('path/to/character_list.csv')
characters = extract_character_names(book)

Find instances of each character in each sequence with find_connections(), enumerate their cooccurences with calculate_cooccurence(), and transform that into a more easily interpretable format using get_interaction_df()

df = find_connections(sequences, characters)
cooccurence = calculate_cooccurence(df)
interaction_df = get_interaction_df(cooccurence, characters)

The resulting dataframe can be easily transform into a networkx graph using

nx.from_pandas_dataframe(interaction_df,
                         source='source',
                         target='target')

From there, all sorts of interesting analysis can be done. See the project's associated jupyter notebooks and the networkx documentation for more details.

Slides

I presented a bunch of this stuff at

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].