Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → sudhamstarun → Understanding Financial Reports Using Natural Language Processing

sudhamstarun / Understanding Financial Reports Using Natural Language Processing

Licence: mit

Investigate how mutual funds leverage credit derivatives by studying their routine filings to the SEC using NLP techniques 📈🤑

Labels

css natural-language-processing named-entity-recognition information-extraction swap

Projects that are alternatives of or similar to Understanding Financial Reports Using Natural Language Processing

Awesome Hungarian Nlp

A curated list of NLP resources for Hungarian

Stars: ✭ 121 (+236.11%)

Mutual labels: natural-language-processing, named-entity-recognition, information-extraction

Nested Ner Tacl2020 Transformers

Implementation of Nested Named Entity Recognition using BERT

Stars: ✭ 76 (+111.11%)

Mutual labels: natural-language-processing, named-entity-recognition, information-extraction

Nlp Progress

Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

Stars: ✭ 19,518 (+54116.67%)

Mutual labels: natural-language-processing, named-entity-recognition

Transformers Tutorials

Github repo with tutorials to fine tune transformers for diff NLP tasks

Stars: ✭ 384 (+966.67%)

Mutual labels: natural-language-processing, named-entity-recognition

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+1177.78%)

Mutual labels: natural-language-processing, named-entity-recognition

Vncorenlp

A Vietnamese natural language processing toolkit (NAACL 2018)

Stars: ✭ 354 (+883.33%)

Mutual labels: natural-language-processing, named-entity-recognition

Spacy Streamlit

👑 spaCy building blocks and visualizers for Streamlit apps

Stars: ✭ 360 (+900%)

Mutual labels: natural-language-processing, named-entity-recognition

Spacy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stars: ✭ 21,978 (+60950%)

Mutual labels: natural-language-processing, named-entity-recognition

Medacy

🏥 Medical Text Mining and Information Extraction with spaCy

Stars: ✭ 287 (+697.22%)

Mutual labels: natural-language-processing, information-extraction

Hanlp

中文分词词性标注命名实体识别依存句法分析成分句法分析语义依存分析语义角色标注指代消解风格转换语义相似度新词发现关键词短语提取自动摘要文本分类聚类拼音简繁转换自然语言处理

Stars: ✭ 24,626 (+68305.56%)

Mutual labels: natural-language-processing, named-entity-recognition

Ner Lstm

Named Entity Recognition using multilayered bidirectional LSTM

Stars: ✭ 532 (+1377.78%)

Mutual labels: natural-language-processing, named-entity-recognition

Stanza

Official Stanford NLP Python Library for Many Human Languages

Stars: ✭ 5,887 (+16252.78%)

Mutual labels: natural-language-processing, named-entity-recognition

Snips Nlu

Snips Python library to extract meaning from text

Stars: ✭ 3,583 (+9852.78%)

Mutual labels: named-entity-recognition, information-extraction

Gcn Over Pruned Trees

Graph Convolution over Pruned Dependency Trees Improves Relation Extraction (authors' PyTorch implementation)

Stars: ✭ 312 (+766.67%)

Mutual labels: natural-language-processing, information-extraction

Usc Ds Relationextraction

Distantly Supervised Relation Extraction

Stars: ✭ 378 (+950%)

Mutual labels: natural-language-processing, information-extraction

Ner

Named Entity Recognition

Stars: ✭ 288 (+700%)

Mutual labels: natural-language-processing, named-entity-recognition

Neuronlp2

Deep neural models for core NLP tasks (Pytorch version)

Stars: ✭ 397 (+1002.78%)

Mutual labels: natural-language-processing, named-entity-recognition

Named Entity Recognition

name entity recognition with recurrent neural network(RNN) in tensorflow

Stars: ✭ 20 (-44.44%)

Mutual labels: natural-language-processing, named-entity-recognition

Chatbot ner

chatbot_ner: Named Entity Recognition for chatbots.

Stars: ✭ 273 (+658.33%)

Mutual labels: natural-language-processing, named-entity-recognition

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (+686.11%)

Mutual labels: natural-language-processing, information-extraction

View All Similar Projects ➔

Understanding Financial Reports using Natural Language Processing

This project serves as my undergraduate Computer Science thesis in Natural Language Processing.

Background

This project investigates how mutual funds leverage credit derivative by studying their routine filings to the U.S. Securities and Exchange Commission. Credit derivatives are used to transfer credit risk related to an underlying entity from one party to another without transferring the actual underlying entity.

Instead of studying all credit derivatives, we focus on Credit Default Swap (CDS), one of the popular credit derivatives that were considered the culprit of the 2007-2008 financial crisis. A credit default swap is a particular type of swap designed to transfer the credit exposure of fixed income products between two or more parties. In a credit default swap, the buyer of the swap makes payments to the swaps seller up until the maturity date of a contract. In return, the seller agrees that, in the event that the debt issuer defaults or experiences another credit event, the seller will pay the buyer the securitys premium as well as all interest payments that would have been paid between that time and the securitys maturity date.

CDS is traded over-the-counter, thus there exists little public information on its trading activities for the outside investors. However, such information is valuable. CDS is designed as a hedging tool that the buyers use to protect themselves from potential default events of the reference entity. Besides, it is also used for speculation and liquidity management especially during a crisis.

Before SEC has requested more frequent and detailed fund holdings reporting at the end of 2016, mutual funds filed the forms in discrepant formats. This made it extremely difficult to effectively extract information from the reports for carrying out further analysis. There exist some previous studies that explored how mutual funds have made use of CDS (Adam and Guttler, 2015, Jiang and Zhu, 2016), but only examined a fraction of institutions over a short period of time. In this project, we aim to extract as much CDS-related information as possible from all the filings available to date to enable more thorough downstream analysis. This information appears not only in the form of charts but also in words, thus Natural Language Processing (NLP) is the key.

Tools Used

The core of this project can be recognised as a Named Entity Recognition Task, so we implemented a BiLSTM-CRF model and a CRF model to conduct sequence labelling on unsturctured data. Its implementation is still in progress and can be found here: https://github.com/sudhamstarun/AwesomeNER
A RESTful API based web application is developed to work as a Credit Default Swap Search Engine in order to make it extremely accessible for researchers and analysts to have access to all the historical mentions of Credit Default Swap by simply searching counterparty or reference entities https://github.com/sudhamstarun/Credit-Default-Swap-Search-Engine

Basic Folder Structure

The Data Crawling folder is essentially the web crawling scripts written in Python to extract the N-CSR, N-CSRS and N-Q reports from the SEC website.
Data Preprocessing folder contains two further folders dedicated to:
1. Restructuring Scripts: These scripts were written to further restructure the data extracted from the SEC website(148 GB) and to it's current folder heirarchy shown in the image below. Some of the noteworthy scripts are:
  1. restructure.sh: This script focuses on restructuring the initial folder structure into 3 different folders for N-CSR, N-CSRS, N-Q
2. Sentence Extraction: The python-based scripts were written to parse the HTML tags present in the report and also to perform other tasks such as removing stop words and extracting sentences which contained unstructured CDS information.
Rule-Based Extraction: This folder contains the rule-based framework developed based on python to extract the tables containing CDS information and save it in a .csv format. This makes it extremely easy to convert reports from .NET format to .csv format making it easy to visualise and analyse the data.
Finally, the website folder contains the code for the landing page created for course requirements.

Installation and Demo

Before running any of the scripts, make sure you set up a virtual environment and activate the environment.
Then install all the necessary python dependencies by using the command:

pip3 install -r requirements.txt

To run the sentence extraction script simply run:

python3 sentenceExtraction.py [name of the .txt or .htmlfile]

To run the HTMl tags parsing script, run:

python3 HTML_Parser.py [name of the .txt or .html file]

Finally, to run the table extractor script, simply run the following command:

python3 parserExtractor.py [name of the .txt or .html file]

The output of the table-extractor script will be saved in the sample output folder.

Authors:

Tarun Sudhams Varun Vamsi

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 36

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (5) 🔗