How might we determine the impact of COVID-19 on humanitarian discourse in global media?
We used Natural Language Processing (NLP) to analyse humanitarian news articles from Euro-Atlantic Countries, Gulf Donors, and New Global Media Players from December 2019 to August 2020. This analysis revealed insights into the directionality of humanitarian aid, key topics, and national approaches to pandemic management.
MY ROLE
Researcher
Data Analyst
Designer
Developer
TOOLS & METHODS
Python for NLP
Topic Modelling
TF-IDF Analysis
HighCharts.js
AmCharts
TEAM
Dr. Maria Wolters
Tashfeen Ahmed
Tamara Lottering
Xiaohang Xu
Minjia Zhao
Jin Mu
DURATION
3 months
Methodology
Our analysis began with data cleaning and preprocessing, including normalization, stemming, and stopword processing. We utilised trigrams for collocation analysis, which proved more effective than bigrams in revealing contextual term usage.
Analysis & Visualisation
We employed LSA and LDA topic modeling alongside TF-IDF analysis to understand topic evolution over time. The exploratory data analysis revealed interesting patterns in news discourse subjectivity, particularly between US and UK sources from 2010 to 2020. One of my favourte ways of finding patterns in the data was through concordance, which is a list of all occurrences of a phrase in a text with its immediate context.
At the time of this project, GPT models were not popularly being used for text analysis. We relied on traditional NLP techniques for data analysis and visualisation.
Results
Using Highcharts.js and AmCharts, we created interactive visualisations that effectively communicate key insights from the data. The project website features a comprehensive timeline of news articles alongside dynamic line and bar graphs, providing an intuitive interface for exploring media discourse patterns during the pandemic.