Health Data Challenge 2019: Feedback

on the March 13, 2020

We organized a Health Data Challenge (2nd edition) about Matrix factorization and deconvolution methods to quantify tumor heterogeneity in cancer research that took place on November 25 to 29, 2019.
The aim of HADACA program is to provide (i) analytical frameworks to bridge the gap between large dataset and personalized medicine in disease treatments and (ii) innovative pedagogical methods to train students and health professionals to big data analysis in health science. Integrating a large amount of data from different sources and using this knowledge to better characterize specificities of each individual will provide significant opportunities to improve disease diagnosis and to adapt accordingly patients’ treatment and care.
Successful treatment of cancer is still a challenge and this is partly due to a wide heterogeneity of cancer composition across patient population. Unfortunately, accounting for such heterogeneity is very difficult. Clinical evaluation of tumor heterogeneity often requires the expertise of anatomical pathologists and radiologists. The HADACA Cancer Heterogeneity challenge is an online data challenge dedicated to the quantification of intra-tumor heterogeneity using appropriate statistical methods on cancer omics data. In particular, it focuses on estimating cell types and proportion in biological samples (in silico simulations) based on averaged DNA methylation and gene expression.
The HADACA challenge Health Data Challenge (2nd edition): Matrix factorization and deconvolution methods to quantify tumor heterogeneity in cancer research lasted from 2019 November 25th to November 29th and gathered 31 international participants from bioinformatics, computer science, biology and medical scientific background.

The participants used and partly developed advanced statistical methods to quantify tumor heterogeneity in real data from cancer research. The goal was to explore various statistical methods for source separation/deconvolution analysis, such as Non-negative Matrix Factorization, Surrogate Variable Analysis, Principal component Analysis and Latent Factor Models.

The program and the challenge was designed to encourage cooperation and mixed groups of participants 4-6 individuals were grouped for two consecutive challenges. The participants and the teachers/organizers were all gathered before starting the group work for an introductory session of lectures on the medical background and methods used in the field.

A majority of the HADACA participants participated in interview/group discussion about their participation to the challenge. The three questions which were discussed were the following:
a) Why did you join the HADACA challenge?
b) What role does the challenge/competition part have in the activity ?
c) Were your expectation on the learning outcomes fulfilled ?
Some important points of summary :
•    Methods expected/learnt are central for participation
•    Challenge/competition is not liked by everyone, some participants do not like it. A majority still see a lot of advantages with the stress and demands during the workshop (?It is highly motivating and fun?). The ?friendly environment? and the willingness by participants and organizers to share knowledge was raised as a very positive factor by several groups.
•    The learning outcomes fulfill the expectations especially when it comes to applying methods but less so when it comes to developing existing methods (?too little time and too little experience?) and some comment that their appreciation of soft skills, such as communication and networking skills have been raised (?the role of being able to discuss with team members with different background really is crucial for obtaining results?)

The HADACA challenge can be used as an example of:
- a carefully prepared data set to use or develop existing bioinformatic methods to analyse complex biological data relevant to health
- a preparation for a shift in working methods in research, combining real world clinically highly relevant data and cross-disciplinary teams aiming to solve a scientific problem

Published on September 14, 2020