Fairness computation and multitask learning on the Ego4D dataset
March 22, 2024
| Keywords: | machine learning, deep learning, fairness, trustworthy AI, multitask learning, computer vision |
| Prerequisites: | Deep Learning, Statistics, Trustworthy and Explainable AI |
| Difficulty: | Medium (M.Sc.). Hard (B.Sc.) |
| Group work (only for B.Sc.): | possible |
Abstract
Ego4D is a large-scale dataset comprised of more than 3000 hours of egocentric (first-person) videos depicting daily activities captured from wearable cameras.
Visualization of the dataset, from the Ego4D dataset paper.
- The dataset has annotations for several tasks, which makes it possible to train a machine learning model for multitask learning, to obtain prediction on multiple variables at the same time. Multitask learning can be by itself an interesting research subject; however, it opens the door for further analyses, like obtaining local explanations on the predictions by fusing explanations obtaied on multiple tasks, or analyzing the profile of uncertainty in the prediction of single-task models vs. multi-task models.
- Despite the authors of the dataset putting effort into ensuring unbiasedness and demographic representativity, they still note concerns on the biasedness of the data, which might cause the model to produce unexpected predictions on specific demographic groups. Training machine learnign models on this dataset and testing them on popular fairness benchmarks, such as FACET, could reveal some bias in the Ego4D dataset.
Topics 1. and 2. can be combined in a single project (suggested for M.Sc. level).
Notice: I am currently in the process of obtaining the dataset. After I am able to access it, I will be providing additional information, e.g., on the labels.
Required work
- Literature review on selected topic(s)
- Select models and supporting datasets (e.g., for fairness computation)
- Operate training (NB: can be very intensive depending on the data!)
- Analyze the data according to selected methods
Relevant literature
- Paper on Ego4D dataset: Grauman et al. Ego4D: Around the World in 3,000 Hours of Egocentric Video. 2021.
- Survey on fairness: Mehrabi et al. A Survey on Bias and Fairness in Machine Learning. ACM Computing Surveys. 2021.
- NashMTL, the state-of-the-art model for Multitask Learning: Multi-Task Learning as a Bargaining Game. 2022.
- Recently-published dataset for fairness assessment: Gustafson et al. FACET: Benchmarking fairness of vision models. 2023.