All Projects → cristianpjensen → stock-market-prediction-via-google-trends

cristianpjensen / stock-market-prediction-via-google-trends

Licence: MIT license
Attempt to predict future stock prices based on Google Trends data.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to stock-market-prediction-via-google-trends

Stocks
Programs for stock prediction and evaluation
Stars: ✭ 155 (+244.44%)
Mutual labels:  stock-market, stock-price-prediction
intrinio-realtime-java-sdk
Intrinio Java SDK for Real-Time Stock Prices
Stars: ✭ 22 (-51.11%)
Mutual labels:  stock-market, stock-prices
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (+433.33%)
Mutual labels:  stock-market, stock-price-prediction
Stock Market Prediction Web App Using Machine Learning And Sentiment Analysis
Stock Market Prediction Web App based on Machine Learning and Sentiment Analysis of Tweets (API keys included in code). The front end of the Web App is based on Flask and Wordpress. The App forecasts stock prices of the next seven days for any given stock under NASDAQ or NSE as input by the user. Predictions are made using three algorithms: ARIMA, LSTM, Linear Regression. The Web App combines the predicted prices of the next seven days with the sentiment analysis of tweets to give recommendation whether the price is going to rise or fall
Stars: ✭ 101 (+124.44%)
Mutual labels:  stock-market, stock-price-prediction
stox
A Python Module That Uses ANN To Predict A Stocks Price And Also Provides Accurate Technical Analysis With Many High Potential Implementations!
Stars: ✭ 29 (-35.56%)
Mutual labels:  stock-market, stock-price-prediction
Stock Market Analysis And Prediction
Stock Market Analysis and Prediction is the project on technical analysis, visualization and prediction using data provided by Google Finance.
Stars: ✭ 112 (+148.89%)
Mutual labels:  stock-market, stock-price-prediction
TradeTheEvent
Implementation of "Trade the Event: Corporate Events Detection for News-Based Event-Driven Trading." In Findings of ACL2021
Stars: ✭ 64 (+42.22%)
Mutual labels:  stock-market, stock-price-prediction
Stock Prediction Models
Gathers machine learning and deep learning models for Stock forecasting including trading bots and simulations
Stars: ✭ 4,660 (+10255.56%)
Mutual labels:  stock-market, stock-price-prediction
market-monitor
Interactive app to monitor market using Python
Stars: ✭ 20 (-55.56%)
Mutual labels:  stock-market, stock-prices
hmm market behavior
Unsupervised Learning to Market Behavior Forecasting Example
Stars: ✭ 36 (-20%)
Mutual labels:  stock-market, stock-price-prediction
Algobot
A C++ stock market algorithmic trading bot
Stars: ✭ 78 (+73.33%)
Mutual labels:  stock-market, stock-price-prediction
StockScreener
A handy tool for screening stocks based on certain criteria from several markets around the world. The list can then be delivered to your email address (one-off or regularly via crontab).
Stars: ✭ 51 (+13.33%)
Mutual labels:  stock-market, stock-prices
Stocksight
Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
Stars: ✭ 1,037 (+2204.44%)
Mutual labels:  stock-market, stock-price-prediction
TerminalStocks
Pure terminal stock ticker for Windows.
Stars: ✭ 88 (+95.56%)
Mutual labels:  stock-market, stock-prices
Deep Convolution Stock Technical Analysis
Uses Deep Convolutional Neural Networks (CNNs) to model the stock market using technical analysis. Predicts the future trend of stock selections.
Stars: ✭ 407 (+804.44%)
Mutual labels:  stock-market, stock-price-prediction
Chase
Automatic trading bot (WIP)
Stars: ✭ 73 (+62.22%)
Mutual labels:  stock-market, stock-price-prediction
Steward
A stock portfolio manager that provides neural net based short-term predictions for stocks and natural language processing based analysis on community sentiments.
Stars: ✭ 25 (-44.44%)
Mutual labels:  stock-market, stock-price-prediction
stocktwits-sentiment
Stocktwits market sentiment analysis in Python with Keras and TensorFlow.
Stars: ✭ 23 (-48.89%)
Mutual labels:  stock-market, stock-price-prediction
FAIG
Fully Automated IG Trading
Stars: ✭ 134 (+197.78%)
Mutual labels:  stock-market, stock-price-prediction
orderbook modeling
Example of order book modeling.
Stars: ✭ 38 (-15.56%)
Mutual labels:  stock-market, stock-price-prediction

Table of Contents

About

The data used is downloaded from Google Trends. The concept for this project came from research by Tobias Preis, Helen Susannah Moat, and H. Eugene Stanley, "Quantifying Trading Behavior in Financial Markets Using Google Trends". In this research was found that the search volume for certain (financial) words are linked to the stock price of the Dow Jones Industrial Average stock price, and can in most cases predict a dip in the market. The purpose of this project is to combine this research with machine learning.

Results

Two machine learning algorithms have been explored for this project: XGBoost and MLPClassifier. The MLPClassifier clearly performed better than XGBoost. The best annual return, which XGBoost got is 44.2%. In contrast, MLPClassifier's best model got a 91.3% between 2008 and the present. A big contribution towards these insanely high annual returns was the coronavirus. Because of the coronavirus, the stock market crashed, which could be a major source of profits for these algorithms.

MLPClassifier

MLPClassifier performed very well on the test data. This algorithm was very strong in identifying that it was impossible to predict the small changes in the market in between crashes. Thus, for the most part, it held a buy-and-hold strategy, but during a stock market crash (like corona) or other, slightly bigger, changes, it performed well. As can be seen in figure 8.

Comparison of the MLPClassifier, 10.000 random and a buy-and-hold strategy

Figure 8. Comparison of the mean plus and minus 1 standard deviation of 10.000 random simulations, MLPClassifier algorithm and a buy-and-hold strategy.

XGBoost

XGBoost did not have the insight, which MLP did. It tried to predict the small changes, which it ultimately failed at. However, XGBoost was still able to predict the stock market crash caused by the coronavirus. This was the reason why XGBoost still had such a large annual return (44.2%).

Comparison of the MLPClassifier, 10.000 random and a buy-and-hold strategy

Figure 9. Comparison of the mean plus and minus 1 standard deviation of 10.000 random simulations, MLPClassifier algorithm, XGBoost algorithm and a buy-and-hold strategy.

Data

Data Collection

Two datasets were needed for this project; the Google Trends daily data for a specific keyword and the stock price daily data for a specific ticker. To collect the Google Trends daily data, you have to download all 6-month increments, 5-year increments, and 2004—present within the 2004—2020 timespan. All this data will eventually be adjusted to be relative to each other, instead of only within its respective timespan. To collect the stock price daily data for a specific ticker you want to predict, you have to download it from a website like Yahoo Finance, where you can download the historical data of any ticker.

Data Visualisation

Correlation

To prove that there indeed is a correlation between Google Trends data (e.g. 'debt') and stock prices (e.g. Dow Jones Industrial Average). I plotted the DJIA stock price with indicators of peaks in the search volume for "stock market". As you can see, before a major stock market crash, there are usually some peaks to be observed. There are also some peaks in the middle of a crash, but the peaks before the crash are quite indicative.

DJIA stock price data with peak-indicators of 'stock market'.

Figure 1. A graph where the stock price of DJIA is plotted with red dots where a peak in search volume for "stock market" has been observed. From this graph can be observed that erratic movement in search volume precedes a major stock crash.

Adjusted

After all adjustments of the data to eventually get relative daily data, which is relative to each other, the data visually looks as follows:

Adjusted daily data over entire timespan.

Figure 2. A graph in which the adjusted daily data is visualised.

Restrictions

All data on Google Trends is relative (0—100) to each other within one timeframe and you can only get daily data in 6-month increments, weekly data in 5-year increments, and only monthly data is provided for the entire timespan available. So to aggregate all data needed for this project was quite a challenge and because of these restrictions aren't completely accurate, however, the method I used was the only method to getting daily data over the entire timespan available (which is crucial for this project).

Method

To get all the data relative to each other, instead of only within its 6-month increment, I had to merge them based on weekly data. However, the weekly data is only available in 5-year increments, so I had to merge these 5-year increments based on the monthly data, which is available for timespan needed for this project. To merge all the 6-month, and 5-year increments, I computed the percentage change of each data point within its respective increment. Afterwards, I got one data point from the higher up periodicity data per increment and computed the missing days by applying the percentage change to the provided data point.

Example

An example of the search term 'debt' ('debt' is the best search term to predict market change, according to the research mentioned earlier) in the timespan 2007—2009:

Before adjustments

Before adjustments of example.

Figure 3. A graph where the unadjusted relative daily data is visualised. The black vertical lines indicate the edges of the 6-month increments.

After adjustments

After adjustments of example.

Figure 4. A graph where the adjusted relative daily data is visualised. The graph follows the actual weekly data much better.

Weekly

Actual monthly data.

Figure 5. The actual weekly data.

Features

To get better results, the raw data had to be feature engineered. Features used include:

Following the computation for these features, all of them are shifted 3 through 10 days. This is because Google Trends data is available three days after the fact and the target may correlate well with further shifted data. Afterward, there are 272 features. The top 50 correlating (with the target, according to the Pearson correlation coefficient) are used in the training and predicting of the direction of the Dow Jones Industrial Average.

Simple Moving Average Delta

SMA delta.

Figure 6. When this feature becomes more volatile, the close price follows. This is a good indicator for a machine learning algorithm. It can also be seen that the close price percentage change loosely follows the line of the feature.

Bollinger Bands

Bollinger bands.

Figure 7. When the 20-day simple moving average crosses the upper Bollinger band, the close price becomes more volatile. The stock close percentage change also loosely follows the lower Bollinger band.

Project Organisation

    ├── LICENSE
    ├── Makefile           <- Makefile with commands like `make data` or `make train`
    ├── README.md          <- The top-level README for developers using this project.
    ├── data
    │   ├── processed      <- The final, canonical data sets for modeling.
    │   └── raw            <- The original, immutable data dump.
    │
    ├── docs               <- A default Sphinx project; see sphinx-doc.org for details
    │
    ├── models             <- Trained and serialized models, model predictions, or model summaries
    │
    ├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
    │                         the creator's initials, and a short `-` delimited description, e.g.
    │                         `1.0-jqp-initial-data-exploration`.
    │
    ├── references         <- Data dictionaries, manuals, and all other explanatory materials.
    │
    ├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
    │   └── figures        <- Generated graphics and figures to be used in reporting
    │
    ├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
    │                         generated with `pip freeze > requirements.txt`
    │
    ├── setup.py           <- makes project pip installable (pip install -e .) so src can be imported
    └── src                <- Source code for use in this project.
        ├── __init__.py    <- Makes src a Python module
        │
        ├── data           <- Scripts to download or generate data
        │   └── make_dataset.py
        │
        └── features       <- Scripts to turn raw data into features for modeling
            └── build_features.py

License

MIT License

Copyright (c) 2020 Cristian Perez Jensen

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].