All Projects → emdaniels → character-extraction

emdaniels / character-extraction

Licence: other
Extracts character names from a text file and performs analysis of text sentences containing the names.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to character-extraction

J2N
Java-like Components for .NET
Stars: ✭ 37 (-7.5%)
Mutual labels:  analysis, character
knime-r
KNIME Interactive R Statistics Integration
Stars: ✭ 18 (-55%)
Mutual labels:  analysis
nlp-cheat-sheet-python
NLP Cheat Sheet, Python, spacy, LexNPL, NLTK, tokenization, stemming, sentence detection, named entity recognition
Stars: ✭ 69 (+72.5%)
Mutual labels:  nltk
go-mnd
Magic number detector for Go.
Stars: ✭ 153 (+282.5%)
Mutual labels:  analysis
Unitor
Tool for analysing and disassembling any unity game. Supports both mono and il2cpp.
Stars: ✭ 31 (-22.5%)
Mutual labels:  analysis
census
📜Automated review of open source software projects
Stars: ✭ 111 (+177.5%)
Mutual labels:  analysis
traffic analyser
Retrieve useful information from apache/nginx access logs to help troubleshoot traffic related problems
Stars: ✭ 44 (+10%)
Mutual labels:  analysis
sandbox
Simple Windows Sandbox Configuration
Stars: ✭ 37 (-7.5%)
Mutual labels:  analysis
gutenberg-forms
The Next Generation WordPress Form Builder.
Stars: ✭ 98 (+145%)
Mutual labels:  gutenberg
dmarc-viewer
Django based web-app to visually analyze DMARC aggregate reports
Stars: ✭ 51 (+27.5%)
Mutual labels:  analysis
block-unit-test
Preparing WordPress themes for Gutenberg with the Block Unit Test WordPress Plugin
Stars: ✭ 60 (+50%)
Mutual labels:  gutenberg
appdata-environment-desktop
A selection of script and the manual for Privacy International's data interception environment
Stars: ✭ 70 (+75%)
Mutual labels:  analysis
spring-startup-analysis
Simple module to analyse bean construction in Java Spring
Stars: ✭ 76 (+90%)
Mutual labels:  analysis
aino-blocks
Aino blocks are a collection of Gutenberg editor blocks for page building in WordPress.
Stars: ✭ 57 (+42.5%)
Mutual labels:  gutenberg
GroupDocs.Classification-for-.NET
GroupDocs.Classification-for-.NET samples and showcase (text and documents classification and sentiment analysis)
Stars: ✭ 38 (-5%)
Mutual labels:  analysis
ipython-notebook-nltk
An introduction to Natural Language processing using NLTK with python.
Stars: ✭ 19 (-52.5%)
Mutual labels:  nltk
poet
Configuration-based post type, taxonomy, block category, and block registration for Sage 10.
Stars: ✭ 124 (+210%)
Mutual labels:  gutenberg
eightshift-forms
WordPress plugin project for Gutenberg forms
Stars: ✭ 23 (-42.5%)
Mutual labels:  gutenberg
pandapower gui
A Graphical User Interface for the open source pandapower load flow program. [ I was so inexperienced when I started this, but maybe we can try again]
Stars: ✭ 28 (-30%)
Mutual labels:  analysis
gutendex
Web API for Project Gutenberg ebook metadata
Stars: ✭ 91 (+127.5%)
Mutual labels:  gutenberg

Character Extraction

The purpose of this program is to extract the names of fictional characters from a novel and analyze the sentences the characters appear in or are referenced in within the text in order to build a profile containing data specific to each character. It was created using the 32-bit version of Python 2.7 with the Natural Language Toolkit 2.0.4 and Pattern 2.6 libraries.

To change the book to be analyzed, add the book as a text file to the same file directory as the program, change the name of the text file on line 25 of the file and rerun the program. You can also have the book file in a different directory and reference the file path to the book instead.

References

Oliver Twist

This and all associated files of various formats will be found in: http://www.gutenberg.org/7/3/730/

Produced by Peggy Gaugy and Leigh Little. HTML version by Al Haines. This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at www.gutenberg.net

NLTK

Bird, Steven, Edward Loper and Ewan Klein (2009). Natural Language Processing with Python. O'Reilly Media Inc.

NLTK -- the Natural Language Toolkit -- is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing.

NLTK source code is distributed under the Apache 2.0 License. NLTK documentation is distributed under the Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States license. NLTK corpora are provided under the terms given in the README file for each corpus; all are redistributable, and available for non-commercial use. NLTK may be freely redistributed, subject to the provisions of these licenses.

https://github.com/nltk/nltk/blob/develop/LICENSE.txt

Pattern

De Smedt, T., Daelemans, W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2031–2035.

Pattern is a web mining module for Python. It has tools for data mining (web services for Google, Twitter and Wikipedia, web crawler, HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, classification using KNN, SVM, Perceptron) and network analysis (graph centrality and visualization). It is well documented and bundled with 50+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern.

https://github.com/clips/pattern/blob/master/README.txt

License

Copyright 2014-2015 Emily Daniels

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].