Track: Track 1

Statistical Machine Learning I

Chairs: Matthias Schmid and Thomas Welchowski

Interpretable Machine Learning
Bernd Bischl
Ludwig-Maximilians-Universität München

Adapting Variational Autoencoders for Realistic Synthetic Data with Skewed and Bimodal Distributions
Kiana Farhadyar, Harald Binder
Faculty of Medicine and Medical Center – University of Freiburg, Germany

Background: Passing synthetic data instead of original data to the other researchers is an option when data protection restrictions exist. Such data should preserve the statistical relationships between the variables while protecting privacy. In recent years, deep generative models have allowed for significant progress in the field of synthetic data generation. In particular, variational autoencoders (VAEs) are a popular class of deep generative models. Standard VAEs are typically built around a latent space with a Gaussian distribution and this is a key challenge for VAEs when they encounter more complex data distributions like bimodal or skewed data.

Methods: In this work, we propose a novel method for synthetic data generation that handles bimodal and skewed data as well, while keeping the overall VAE framework. Moreover, this method can generate synthetic data for datasets consisting of both continuous and binary variables. We apply two transformations to convert the data into a form that is more compliant with VAEs. First, we use Box-Cox transformations to transform the skewed distribution to something closer to a symmetric distribution. Then, dealing with potential bimodal data, we employ a power function sgn(x)|x|^p that can transform the data in a way that it has closer peaks and lighter tails. For the evaluation, we use a simulation design data, which is based on a large breast cancer study and The International Stroke Trial (IST) dataset as a real data example.

Results: We show that the pre-transformations can make a considerable improvement in the utility of synthetic data for skewed and bimodal distributions. We investigate this in comparison with standard VAEs and a VAE with an autoregressive implicit quantile network approach (AIQN) and also Generative Adversarial Networks (GAN). We see that our method is the only method that can generate bimodality and the other methods typically generate unimodal distributions. For skewed data, these methods decrease the skewness of synthetic data and make the data closer to a symmetric distribution while our method produces similar skewness to original data and honors the value range of original data better.

Conclusion: In conclusion, we developed a simple method, which adapts VAEs by transformations to handle skewed and bimodal data. Due to its simplicity, it is possible to combine it with many extensions of VAEs. Thus, it becomes feasible to generate high-quality synthetic clinical data for research under data protection constraints.

Statistical power for cell identity detection in deep generative models
Martin Treppner1,2, Harald Binder1,2
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg, Germany; 2Freiburg Center of Data Analysis and Modelling, Mathematical Institute – Faculty of Mathematics and Physics, University of Freiburg, Germany

One of the most common applications of single-cell RNA-sequencing experiments is to discover groups of cells with a similar expression profile in an attempt to define cell identities. The similarity of these expression profiles is typically examined in a low-dimensional latent space, which can be learned by deep generative models such as variational autoencoders (VAEs). However, the quality of representations in VAEs varies greatly depending on the number of cells under study, which is also reflected in the assignment to specific cell identities. We propose a strategy to answer what number of cells is needed so that a pre-specified percentage of the cells in the latent space is well represented.

We train VAEs on a varying number of cells and evaluate the learned representations‘ quality by use of the estimated log-likelihood lower bound of each cell. The distribution arising from the values of the log-likelihoods are then compared to a permutation-based distribution of log-likelihoods. We generate the permutation-based distribution by randomly drawing a small subset of cells before training the VAE and permuting each gene’s expression values among these randomly drawn cells. By doing so, we ensure that the latent representation’s overall structure is preserved, and at the same time, we obtain a null distribution for the log-likelihoods. We then compare log-likelihood distributions for different numbers of cells. We also harness the properties of VAEs by artificially increasing the number of samples in small datasets by generating synthetic data and combining them with the original pilot datasets.

We demonstrate performance on varying sizes of subsamples of the Tabula Muris scRNA-seq dataset from the brain of seven mice processed with the SMART-Seq2 protocol. We show that our approach can be used to plan cell numbers for single-cell RNA-seq experiments, which might improve the reliability of downstream analyses such as cell identity detection and inference of developmental trajectories.

Individualizing deep dynamic models for psychological resilience data
Göran Köber1,2, Shakoor Pooseh2,3, Haakon Engen4, Andrea Chmitorz5,6,7, Miriam Kampa5,8,9, Anita Schick4,10, Alexandra Sebastian6, Oliver Tüscher5,6, Michèle Wessa5,11, Kenneth S.L. Yuen4,5, Henrik Walter12,13, Raffael Kalisch4,5, Jens Timmer2,3,14, Harald Binder1,2
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany; 2Freiburg Center of Data Analysis and Modelling (FDM), University of Freiburg, Freiburg, 79104, Germany; 3Institute of Physics, University of Freiburg, 79104, Germany; 4Neuroimaging Center (NIC), Focus Program Translational Neuroscience (FTN), Johannes Gutenberg University Medical Center, Mainz, 55131, Germany; 5Leibniz Institute for Resilience Research (LIR), Mainz, 55122, Germany; 6Department of Psychiatry and Psychotherapy, Johannes Gutenberg University Medical Center, Mainz, 55131, Germany; 7Faculty of Social Work, Health and Nursing, University of Applied Sciences Esslingen, Esslingen, 73728, Germany; 8Department of Clinical Psychology, University of Siegen, 57076, Germany; 9Bender Institute of Neuroimaging (BION), Department of Psychology, Justus Liebig University, Gießen, 35394, Germany; 10Department of Public Mental Health, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University, Germany; 11Department of Clinical Psychology and Neuropsychology, Institute of Psychology, Johannes Gutenberg University, Mainz, 55131, Germany; 12Research Division of Mind and Brain, Charité–Universitätsmedizin Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Germany; 13Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Germany; 14CIBSS—Centre for Integrative Biological Signaling Studies, University of Freiburg, 79104, Germany

Deep learning approaches can uncover complex patterns in data. In particular, variational autoencoders (VAEs) achieve this by a non-linear mapping of data into a low-dimensional latent space. Motivated by an application to psychological resilience in the Mainz Resilience Project (MARP), which features intermittent longitudinal measurements of stressors and mental health, we propose an approach for individualized, dynamic modeling in this latent space. Specifically, we utilize ordinary differential equations (ODEs) and develop a novel technique for obtaining person-specific ODE parameters even in settings with a rather small number of individuals and observations, incomplete data, and a differing number of observations per individual. This technique allows us to subsequently investigate individual reactions to stimuli, such as the mental health impact of stressors. A potentially large number of baseline characteristics can then be linked to this individual response by regularized regression, e.g., for identifying resilience factors. Thus, our new method provides a way of connecting different kinds of complex longitudinal and baseline measures via individualized, dynamic models. The promising results obtained in the exemplary resilience application indicate that our proposal for dynamic deep learning might also be more generally useful for other application domains.

Statistical Software Development

Chairs: Fabian Scheipl and Gernot Wassmer

A Web-Application to determine statistical optimal designs for dose-response trials, especially with interactions.
Tim Holland-Letz, Annette Kopp-Schneider
German Cancer Research Center DKFZ, Germany

Statistical optimal design theory is well developed, but almost never used in practical applications in fields such as toxicology. For the area of dose response trials we therefore present an R-shiny based web application which calculates D-optimal designs for the most commonly fitted dose response functions, namely the log-logistic and the Weibull function. In this context, the application also generates a graphical representation of the design space (a “design heatmap”). Furthermore, the application allows checking the efficiencies of user specified designs. In addition, uncertainty in regard to the assumptions about the true parameters can be included in the form of average optimal designs. Thus, the user can find a design which is a compromise between rigid optimality and more practical designs which also incorporate specific preferences and technical requirements.

Finally, the app can also be used to compute designs for substance interaction trials between two substances combined in a ray design setup, including an a-priori estimate for the parameters of the combination to be expected under the (Loewe-) additivity assumption.

Distributed Computation of the AUROC-GLM Confidence Intervals Using DataSHIELD
Daniel Schalk1, Stefan Buchka2, Ulrich Mansmann2, Verena Hoffmann2
1Department of Statistics, LMU Munich; 2The Institute for Medical Information Processing, Biometry, and Epidemiology, LMU Munich

Distributed calculation protects data privacy without ruling out complex statistical analyses. Individual data stays in local databases invisible to the analyst who only receives aggregated results. A distributed algorithm that calculates a ROC curve, its AUC estimate with confidence interval is presented to evaluate a therapeutic decision rule. It will be embedded in the DataSHIELD framework [1].

Starting point is the ROC-GLM approach by Pepe et al. [2]. The additivity of the Fisher information matrix, of the score vector, and of the CI proposed by DeLong [3] to aggregate intermediate results allows to design a distributed algorithm to calculate estimates of the ROC-GLM, its AUC, and CI.

We simulate scores and labels (responses) to create AUC values within the range of [0.5, 1]. The size of individual studies is uniformly distributed on [100, 2500] while the percentage of treatment-response covers [0.2,0.8]. Per scenario, 10000 studies are produced. Per study, the AUC is calculated within a non-distributed empiric as well as a distributed setting. The difference in AUC between both approaches is independent of the number of distributed components and is within the range of [-0.019, 0.013]. The boundaries of bootstrapped CIs in the non-distributed empirical setting are close to those in the distributed approach with the CI of DeLong: Range of differences in the lower boundary [-0.015, 0.03]; range of the upper boundary deviations [-0.012, 0.026].

The distributed algorithm allows anonymous multicentric validation of the discrimination of a classification rules. A specific application is the audit use case within the MII consortium DIFUTURE ( The multicentric prospective ProVAL-MS study (DRKS: 00014034) on patients with newly diagnosed relapsing-remitting multiple sclerosis provides the data for a privacy-protected validation of a treatment decision score (also developed by DIFUTURE) regarding discrimination between good and insufficient treatment response. The simulation results demonstrate that our algorithm is suitable for the planned validation. The algorithm is implemented in R to be used within DataSHIELD. It will be made publicly available.

[1] Amadou Gaye et al (2014). DataSHIELD: taking the analysis to the data, not the data to the analysis. International Journal of Epidemiology

[2] Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Medicine.

[3] DeLong, E. R., DeLong, D. M., and Clarke-Pearson, D. L. (1988). Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.Biometrics, pages 837–845.

Interactive review of safety data during a data monitoring committee using R-Shiny
Tobias Mütze1, Bo Wang2, Douglas Robinson2
1Statistical Methodology, Novartis Pharma AG, Switzerland; 2Scientific Computing and Consulting, Novartis Pharma AG, Switzerland

In clinical trials it is common that the safety of patients is monitored by a data monitoring committee (DMC) that operates independently of the clinical trial teams. After each review of the accumulating trial data, it is within the DMC’s responsibility to decide on whether to continue or stop the trial. The data are generally presented to DMCs in a static report through tables, listing, and sometimes figures. In this presentation, we share our experiences with supplementing the safety data review with an interactive R-Shiny app. We will first present the layout and content of the app. Then, we outline the advantages of reviewing (safety) data by means of an interactive app compared to the standard review of a DMC report, namely, extensive use of graphical illustrations in addition to tables, ability to quickly change the level of detail, and to switch between study-level data and subject-level data. We argue that this leads to a robust collaborative discussion and a more complete understanding of the data. Finally, we discuss the qualification process itself of an R Shiny app and outline how the learnings may be applied to enhance standard DMC reports


[1] Wang, W., Revis, R., Nilsson, M. and Crowe, B., 2020. Clinical Trial Drug Safety Assessment with Interactive Visual Analytics. Statistics in Biopharmaceutical Research, pp.1-12.

[2] Fleming, T.R., Ellenberg, S.S. and DeMets, D.L., 2018. Data monitoring committees: current issues. Clinical Trials, 15(4), pp.321-328.

[3] Mütze, T. and Friede, T., 2020. Data monitoring committees for clinical trials evaluating treatments of COVID-19. Contemporary Clinical Trials, 98, 106154.

[4] Buhr, K.A., Downs, M., Rhorer, J., Bechhofer, R. and Wittes, J., 2018. Reports to independent data monitoring committees: an appeal for clarity, completeness, and comprehensibility. Therapeutic innovation & regulatory science, 52(4), pp.459-468.

An R package for an integrated evaluation of statistical approaches to cancer incidence projection
Maximilian Knoll1,2,3,4, Jennifer Furkel1,2,3,4, Jürgen Debus1,3,4, Amir Abdollahi1,3,4, André Karch5, Christian Stock6,7
1Department of Radiation Oncology, Heidelberg University Hospital, Heidelberg, Germany; 2Faculty of Biosciences, Heidelberg University, Heidelberg, Germany; 3Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany; 4German Cancer Consortium (DKTK) Core Center Heidelberg, Heidelberg, Germany; 5Institute of Epidemiology and Social Medicine, University of Muenster, Muenster, Germany.; 6Institute of Medical Biometry and Informatics (IMBI), University of Heidelberg, Heidelberg, Germany; 7Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany

Background: Projection of future cancer incidence is an important task in cancer epidemiology. The results are of interest also for biomedical research and public health policy. Age-Period-Cohort (APC) models, usually based on long-term cancer registry data (>20yrs), are established for such projections. In many countries (including Germany), however, nationwide long-term data are not yet available. It is unclear which statistical approach should be recommended for projections using rather short-term data.

Methods: To enable a comparative analysis of the performance of statistical approaches to cancer incidence projection, we developed an R package (incAnalysis), supporting in particular Bayesian models fitted by Integrated Nested Laplace Approximations (INLA). Its use is demonstrated by an extensive empirical evaluation of operating characteristics (bias, coverage and precision) of potentially applicable models differing by complexity. Observed long-term data from three cancer registries (SEER-9, NORDCAN, Saarland) was used for benchmarking.

Results: Overall, coverage was high (mostly >90%) for Bayesian APC models (BAPC), whereas less complex models showed differences in coverage dependent on projection-period. Intercept-only models yielded values below 20% for coverage. Bias increased and precision decreased for longer projection periods (>15 years) for all except intercept-only models. Precision was lowest for complex models such as BAPC models, generalized additive models with multivariate smoothers and generalized linear models with age x period interaction effects.

Conclusion: The incAnalysis R package allows a straightforward comparison of cancer incidence rate projection approaches. Further detailed and targeted investigations into model performance in addition to the presented empirical results are recommended to derive guidance on appropriate statistical projection methods in a given setting.

Using Differentiable Programming for Flexible Statistical Modeling
Maren Hackenberg1, Marlon Grodd1, Clemens Kreutz1, Martina Fischer2, Janina Esins2, Linus Grabenhenrich2, Christian Karagiannidis3, Harald Binder1
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Germany; 2Robert Koch Institute, Berlin, Germany; 3Department of Pneumology and Critical Care Medicine, Cologne-Merheim Hospital, ARDS and ECMO Center, Kliniken der Stadt Köln, Witten/Herdecke University Hospital, Cologne, Germany

Differentiable programming has recently received much interest as a paradigm that facilitates taking gradients of computer programs. While the corresponding flexible gradient-based optimization approaches so far have been used predominantly for deep learning or enriching the latter with modeling components, we want to demonstrate that they can also be useful for statistical modeling per se, e.g., for quick prototyping when classical maximum likelihood approaches are challenging or not feasible.

In an application from a COVID-19 setting, we utilize differentiable programming to quickly build and optimize a flexible prediction model adapted to the data quality challenges at hand. Specifically, we develop a regression model, inspired by delay differential equations, that can bridge temporal gaps of observations in the central German registry of COVID-19 intensive care cases for predicting future demand. With this exemplary modeling challenge, we illustrate how differentiable programming can enable simple gradient-based optimization of the model by automatic differentiation. This allowed us to quickly prototype a model under time pressure that outperforms simpler benchmark models.

We thus exemplify the potential of differentiable programming also outside deep learning applications, to provide more options for flexible applied statistical modeling.

Opening Session / Keynote: Machine Learning in Biometry

Chairs: Werner Brannath and Katja Ickstadt

Speakers: Andreas Faldum (Conference president) , Frank Müller (Dean of the Medical Faculty), Werner Brannath (President of the IBS-DR), Markus Lewe (Mayor, City of Münster) || Keynote speaker: Chris Holmes

Title: Machine Learning in Biometrics
Chris Holmes

Machine learning (ML) and artificial intelligence (AI) have had a major impact across many disciplines including biometrics. In the first half of this talk we will review some of the characteristics of ML that make for successful applications and also those features that present challenges, in particular around robustness and reproducibility. Relatively speaking, ML is mainly concerned with prediction while the majority of biometric analyses are focussed on inference. In the second half of the talk we will review the prediction-inference dichotomy and explore, from a Bayesian perspective, the theoretical foundations on how modern ML predictive models can be utilised for inference.

Public lecture

Chairs: Werner Brannath and Andreas Faldum

ab 17:00 Uhr:  Einwahl in den Zoom-Raum möglich
18:00 – 19:30 Uhr: Vortrag (auf deutsch/in German)

PD Dr. Benjamin Hofner, Leiter des Fachgebiets Biostatistik des Paul-Ehrlich-Instituts Langen, Bundesinstitut für Impfstoffe und biomedizinische Arzneimittel; Lehrbeauftragter der Universität Erlangen-Nürnberg.

Thema: „Statistik in Zeiten von Corona – Der komplexe Weg zur Zulassung eines Impfstoffes

Statistik in Zeiten von Corona – Der komplexe Weg zur Zulassung eines Impfstoffes

Benjamin Hofner, Paul-Ehrlich-Institut, Bundesinstitut für Impfstoffe und biomedizinische Arzneimittel, Langen.

Die schnelle Entwicklung eines wirksamen und sicheren Impfstoffes während einer Pandemie ist eine unglaubliche Herausforderung. Wie man im vergangenen Jahr in den Hauptnachrichten verfolgen konnte besteht diese Herausforderung nicht nur in der Entwicklung des Impfstoffes im Labor sondern auch in der darauf folgenden Testung und Zulassung des Impfstoffes.

Dieser Vortrag beleuchtet die klinische Entwicklung, in der letzten und entscheidenden Phase 3 Studie. Exemplarisch geht er dabei auf die Studien der zwei bereits zugelassenen Impfstoffe von BioNTech/Pfizer und Moderna ein. Er stellt die zentrale Rolle der statistischen Aspekte (z.B. Studiendesign, Fallzahlplanung,  Studienpopulation, Endpunkte) sowohl bei der Planung als auch bei der Auswertung der Studien heraus. Statistische Konzepte und deren Notwendigkeit in Impfstoffstudien sollen dabei auch für interessierte Laien verständlich erklärt und motiviert werden. Außerdem werden regulatorische Abläufe die zur Zulassung der Impfstoffe führen aufgezeigt.

Zeit: Sonntag, 14. März 2021, 18:00 Uhr – 19:30 Uhr

Short CV

PD Dr Benjamin Hofner is Head of the Section Biostatistics at the Paul-Ehrlich-Institute, the German Federal Institute for Vaccines and Biomedicines. He provides input to the Biostatistics Working Party (BSWP) of the European Medicines Agency (EMA). He is member in several BSWP task forces, including a group working on guidance for studies for the treatment and prevention of COVID-19.

Dr Hofner graduated in Statistics from the LMU Munich in 2008. In 2011 he obtained his PhD in Statistics from the LMU Munich for his work on statistical approaches to machine learning. In 2018 he received his Venia Legendi (“Privatdozent”) in Biostatistics from the University Erlangen-Nuremberg.

Dr Hofner’s current research interests mainly focus on innovative clinical trial designs and other statistical issues in the field of “regulatory biostatistics”. He is Task Lead in the EU-funded IMI project EU-PEARL on patient-centric platform trials and Work Package Lead in the IMI project COMBINE on anti-microbial resistance. Besides his duties at the Paul-Ehrlich-Institute, he is Adjunct Lecturer for Biostatistics at the medical school of the University Erlangen-Nuremberg.

An Introduction to Causal Inference and Target Trials

Login details (Zoom Session ID and password) will be sent to registered participants by email until March 10th.
If you are registered and did not receive this email until March 10th, please contact us at

Tutorium “An Introduction to Causal Inference and Target Trials” (full-day), Sonja Swanson

– Basic concepts and assumptions of causal inference, using counterfactual or potential outcomes
– Key sources of bias: e.g., confounding, selection, and information bias
– Describing a target trial: describing the key protocol elements of an ideal randomized trial, including the eligibility criteria, treatment strategies, treatment assignment, follow-up period, outcome, causal contrast, and statistical analysis
– Emulating a target trial: designing and analyzing observational data to estimate causal effects, including the use of g-methods

Cain LE, Saag MS, Petersen M, May MT, Ingle SM, Logan R, et al. Using observational data to emulate a randomized trial of dynamic treatment-switching strategies: an application to antiretroviral therapy. Int J Epidemiol. 2016; 45(6):2038–49.

García-Albéniz X, Hsu J, Hernán MA. The value of explicitly emulating a target trial when using real world evidence: an application to colorectal cancer screening. Eur J Epidemiol. 2017. doi:10.1007/s10654-017-0287-2.

Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016; 183(8):758–64.

Labrecque, J.A., Swanson, S.A. Target trial emulation: teaching epidemiology and beyond. Eur J Epidemiol. 2017; 32, 473–475.

Short CV
Sonja Swanson is an Assistant Professor in the Department of Epidemiology at Erasmus MC and holds an adjunct affiliation with the Department of Epidemiology at the Harvard T. H. Chan School of Public Health. She was recently invited to join the editorial team of Epidemiology. Her methodological research focuses on improving the use and transparency of methods for estimating causal effects in epidemiology. This work spans applications in observational studies and pragmatic randomized trials. She has made important contributions to instrumental variables methods, e.g. in the context of Mendelian randomization, as well as target trials, effect heterogeneity or causal mediation. Her substantive research primarily focuses on neuropsychiatric disorders and related health outcomes, including using appropriate methods to studying potential prevention and treatment strategies.