All Projects → dwallach1 → Stocker

dwallach1 / Stocker

Financial Web Scraper & Sentiment Classifier

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Stocker

Introduction Datascience Python Book
Introduction to Data Science: A Python Approach to Concepts, Techniques and Applications
Stars: ✭ 275 (+216.09%)
Mutual labels:  data-science, sentiment-analysis
Thesemicolon
This repository contains Ipython notebooks and datasets for the data analytics youtube tutorials on The Semicolon.
Stars: ✭ 345 (+296.55%)
Mutual labels:  data-science, sentiment-analysis
Akshare
AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库
Stars: ✭ 4,334 (+4881.61%)
Mutual labels:  data-science, finance
Dash.jl
Dash for Julia - A Julia interface to the Dash ecosystem for creating analytic web applications in Julia. No JavaScript required.
Stars: ✭ 248 (+185.06%)
Mutual labels:  data-science, finance
Awesome Streamlit
The purpose of this project is to share knowledge on how awesome Streamlit is and can be
Stars: ✭ 769 (+783.91%)
Mutual labels:  data-science, finance
Darwinexlabs
Datasets, tools and more from Darwinex Labs - Prop Investing Arm & Quant Team @ Darwinex
Stars: ✭ 248 (+185.06%)
Mutual labels:  data-science, sentiment-analysis
Deltapy
DeltaPy - Tabular Data Augmentation (by @firmai)
Stars: ✭ 344 (+295.4%)
Mutual labels:  data-science, finance
Reddit Hyped Stocks
A web application to explore currently hyped stocks on Reddit
Stars: ✭ 173 (+98.85%)
Mutual labels:  data-science, finance
Awesome Twitter Data
A list of Twitter datasets and related resources.
Stars: ✭ 533 (+512.64%)
Mutual labels:  data-science, sentiment-analysis
Introneuralnetworks
Introducing neural networks to predict stock prices
Stars: ✭ 486 (+458.62%)
Mutual labels:  data-science, finance
Tweetfeels
Real-time sentiment analysis in Python using twitter's streaming api
Stars: ✭ 249 (+186.21%)
Mutual labels:  data-science, sentiment-analysis
Machine Learning From Scratch
Succinct Machine Learning algorithm implementations from scratch in Python, solving real-world problems (Notebooks and Book). Examples of Logistic Regression, Linear Regression, Decision Trees, K-means clustering, Sentiment Analysis, Recommender Systems, Neural Networks and Reinforcement Learning.
Stars: ✭ 42 (-51.72%)
Mutual labels:  data-science, sentiment-analysis
Dash
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.
Stars: ✭ 15,592 (+17821.84%)
Mutual labels:  data-science, finance
Chinese financial sentiment dictionary
A Chinese financial sentiment word dictionary
Stars: ✭ 67 (-22.99%)
Mutual labels:  finance, sentiment-analysis
Tutorials
AI-related tutorials. Access any of them for free → https://towardsai.net/editorial
Stars: ✭ 204 (+134.48%)
Mutual labels:  data-science, sentiment-analysis
Machine Learning For Trading
Code for Machine Learning for Algorithmic Trading, 2nd edition.
Stars: ✭ 4,979 (+5622.99%)
Mutual labels:  data-science, finance
Learnpythonforresearch
This repository provides everything you need to get started with Python for (social science) research.
Stars: ✭ 163 (+87.36%)
Mutual labels:  data-science, finance
Finance
Here you can find all the quantitative finance algorithms that I've worked on and refined over the past year!
Stars: ✭ 194 (+122.99%)
Mutual labels:  data-science, finance
Pandapy
PandaPy has the speed of NumPy and the usability of Pandas 10x to 50x faster (by @firmai)
Stars: ✭ 474 (+444.83%)
Mutual labels:  data-science, finance
Python Training
Python training for business analysts and traders
Stars: ✭ 972 (+1017.24%)
Mutual labels:  data-science, finance

Stocker

Tesla's stock jumped 2.5% after Tencent said it amassed a 5% stake in the electric car maker. Ocwen jumped 12% premarket after disclosing it reached a deal with New York regulators that will end third-party monitoring of its business within the next three weeks. In addition, restrictions on buying mortgage-servicing rights may get eased. Cara Therapeutics's shares surged 16% premarket, after the biotech company reported positive results in a trial of a treatment for uremic pruritus.

This project, Stocker, is at its core a financial data scraper. The Python package is itended to generate google queries to get recent articles and parse them for information. All that stocker needs is a list of stock tickers and a list of sources (that correlate to domain names).

Due to the closing of Google's and Yahoo's free daily stock price APIs, I decided to create a Twitter bot StockerBot that uses the Twitter platform to generate information pertaining to stocks on the user's watchlist.

Recently, I moved away from the original implementation in the current version because the code was not modularized and the intention of the project ran into roadblocks due to lack of access to historical financial data to build a usable financial sentiment classifier. Furthermore, the open-source ones I found online tended to perform far too poorly to provide any insights.

I have thus moved the direction of this project to be more of a brand tracker which I go more into depth here.

If you want to use the older version, you can still do so by following the directions here.

V2.0

V1.0

Once you clone this repo, you will need to checkout the proper branch

git checkout origin v1.0

Example Usage

from stocker import Stocker

tickers = ['AAPL', 'GOOG', 'GPRO', 'TSLA']		# or define your own set of stock tickers
sources = ['bloomberg', 'seekingalpha', 'reuters']  # and define own set of sources
csv_path = '../data/examples.csv'				# path of where to write output (gathered information)
json_path = '../data/links.json' 				# path of where to write output (for skipping duplicates)
stocker = Stocker(tickers, sources, csv_path, json_path)	# initalize stocker
flags = {'date_checker': True, 'classification': True, 'magnitude': True}
stocker.stock(flags=flags)							# start the stocker

Stocker creates queries on its own, based on the stock tickers and sources provided, however, you can define your own queries.

from stocker import querify
# define our own queries
query_strings = ['under armour most recent articles', 'nikes recent stockholders meeting news']
ticker = 'SNAP'
stocker.queries = [querify(ticker, None, string) for string in query_strings]
stocker.stock(query=False, curious=True)

# or if you want to make sure they are from a certain source
source = 'bloomberg'
stocker.queries = [querify(ticker, source, string) for string in query_strings]
stocker.stock(query=False)

This package also has built in functions for getting popular stocks as well as a list of the sources that Stocker has been tested to work for. The current list of verified and tested sources are:

from stocker import SNP_500, NYSE_Top100, NASDAQ_Top100, valid_sources

# get stock tickers
nyse, nasdaq = [], [] 
while len(nyse) == 0: nyse = NYSE_Top100()	# need to poll, because sometime site retuns None
while len(nasdaq) == 0: nasdaq = NASDAQ_Top100()
snp500 = SNP_500()

tickers = nyse + nasdaq + snp500 # list of 700 stock tickers
sources = valid_sources() # use all specialized sources

If you want to use stocker as just a means to query google and get a list of results, you can do so.

from stocker import googler

# Googler takes in a string (what you would type in the search bar) and returns a list of urls generated from the query
# if an error occurs, Googler returns None

results = googler('What is there to do in Berkeley?')

Modules

  • stocker.py : manages the overall process, telling webparser which links to parse and takes care of writing the data to disk, generating queries and handling user flags.
  • webparser.py : does the dirty work of parsing articles and storing all of the information in a WebNode.
  • finsent.py : after using stocker and webparser to generate a csv file of data, finsent can be used to create and a train a sentiment analysis classifier.

Dependencies

Function Parameters

args for Stocker's stock method

  • gui=True : when set to true uses tdqm to show a progress bar as well as the current ticker & source it is parsing

  • csv=True : tells stocker if it should write the output to a csv file

  • json=True : tells stocker if it should write the newly parsed links to a json file to avoid duplicates

  • flags={} : a dictionary of kwargs to be used when parsing articles

    kwargs recognized by the Webparser

    • ticker=None : used to find information associated with the stock ticker
    • min_length=30 : the minimum amount of words each article must have, only enforced if length_checker = True
    • curious=False : is set to true, stocker won't ensure that the link it is parsing is from the source field of the query
    • industry=False : looks up and store the industry associated with the ticker
    • sector=False : looks up and store the industry associated with the ticker
    • date_checker=False : forces each article to have a date; if no date and date_checker is set to true, it will return None
    • length_checker=False : forces each article to have a word count of at least min_length

Storing Query Data

To store all of the information, every time a url is parsed, a new webnode is generated. After all of the urls are parsed for a given query, the batch of webnodes are written to the csv file and the parsed urls are stored in the json file under the stocks ticker (uppercase). Along with the WebNode fields, the stock's ticker is included in each row as well as classification. The WedNode's attributes are based on a dictionary that is formed from the flags set in the function call. The attributes set on the WebNode, will be the headers of the csv file. If you try to combine WebNodes with different attributes in one csv file, the program will throw an error becuase it is all based on dictionary operations. The classification is based on the stock price fluctuations of the associated ticker over a given time interval (default 10 minutes). If the stock's price increased, the classification is +1, decreased: -1, and neutral: 0. I am using Google's API for the stock prices, they only offer free data (at the minute interval) for the most recent 14 weekdays. For this reason, if there was no associated stock price change (becasue the article was published before the past 14 weekdays, I assign the classification a value of -1000 to indicate 'not found'.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].