Stock Analysis v1

Date: Apr 2024 - May 2024 Category: Data Science, Financial Analysis, Sentiment Analysis Technologies: Python, yfinance, GDELT Event Database, Pandas, CSV

This Python-based project performs sentiment analysis to investigate the correlation between global events and stock market performance. It comprises a four-stage pipeline: collecting and processing 20 years of historical stock data and daily global events (from GDELT); analyzing event headlines to assign sentiment scores to individual words based on concurrent stock price changes; and finally, offering tools to analyze these sentiment scores and predict potential stock movements based on current day's news.

Features

Collects 20 years of daily historical stock data via Yahoo Finance for a defined list of tickers.
Calculates daily net percentage gain/loss for each stock.
Retrieves the top 10 global events daily from the GDELT Event Database for the past 20 years.
Extracts potential headlines from event URLs (post-April 2013 data).
Correlates words from event headlines with daily stock performance.
Assigns a sentiment score to each word based on its association with stock price changes.
Generates a stock-specific sentiment lexicon, mapping words to their impact scores.
Identifies stocks exhibiting the largest sentiment score variance between positive and negative words.
Determines the overall "best" (most positive) and "worst" (most negative) sentiment words across all stocks.
Predicts potential daily stock movements by analyzing current event headlines against the generated sentiment lexicon.
Modular design using four distinct Python scripts executed sequentially.

Gallery

Technical Details

The project operates through a sequence of four Python scripts. collect_stock_data.py uses the yfinance library to download two decades of daily stock data for tickers specified in "Stock Information.csv," then calculates daily percentage changes. collect_event_data.py fetches daily top 10 events from the GDELT database over the same period, attempting to extract headlines from URLs in newer entries. stock_and_event_analysis.py merges these datasets, associating words from event headlines with the daily percentage change of each stock, thereby generating a sentiment score for each word relative to each stock. This output details how words historically correlated with stock increases or decreases. Finally, data_analysis.py allows users to query this sentiment data: to find stocks with high word-sentiment divergence, identify globally strong positive/negative words, or analyze current day's GDELT events to predict stock performance based on the learned sentiment scores.

GitHub

Menu

Stock Analysis v1

Features

Gallery

Technical Details