Menu

Stock Analysis v1

Date: Apr 2024 - May 2024 Category: Data Science, Financial Analysis, Sentiment Analysis Technologies: Python, yfinance, GDELT Event Database, Pandas, CSV

This Python-based project performs sentiment analysis to investigate the correlation between global events and stock market performance. It comprises a four-stage pipeline: collecting and processing 20 years of historical stock data and daily global events (from GDELT); analyzing event headlines to assign sentiment scores to individual words based on concurrent stock price changes; and finally, offering tools to analyze these sentiment scores and predict potential stock movements based on current day's news.

Features

  • Collects 20 years of daily historical stock data via Yahoo Finance for a defined list of tickers.
  • Calculates daily net percentage gain/loss for each stock.
  • Retrieves the top 10 global events daily from the GDELT Event Database for the past 20 years.
  • Extracts potential headlines from event URLs (post-April 2013 data).
  • Correlates words from event headlines with daily stock performance.
  • Assigns a sentiment score to each word based on its association with stock price changes.
  • Generates a stock-specific sentiment lexicon, mapping words to their impact scores.
  • Identifies stocks exhibiting the largest sentiment score variance between positive and negative words.
  • Determines the overall "best" (most positive) and "worst" (most negative) sentiment words across all stocks.
  • Predicts potential daily stock movements by analyzing current event headlines against the generated sentiment lexicon.
  • Modular design using four distinct Python scripts executed sequentially.

Gallery

Technical Details

The project operates through a sequence of four Python scripts. collect_stock_data.py uses the yfinance library to download two decades of daily stock data for tickers specified in "Stock Information.csv," then calculates daily percentage changes. collect_event_data.py fetches daily top 10 events from the GDELT database over the same period, attempting to extract headlines from URLs in newer entries. stock_and_event_analysis.py merges these datasets, associating words from event headlines with the daily percentage change of each stock, thereby generating a sentiment score for each word relative to each stock. This output details how words historically correlated with stock increases or decreases. Finally, data_analysis.py allows users to query this sentiment data: to find stocks with high word-sentiment divergence, identify globally strong positive/negative words, or analyze current day's GDELT events to predict stock performance based on the learned sentiment scores.