Track: Track 1

IBS-DR Mitgliederversammlung

Vorsitz: IBS-DR Vorstand


Tagesordnung der Mitgliederversammlung 2021

TOP 1 Verabschiedung der Tagesordnung Brannath
TOP 2 Genehmigung des Protokolls der Mitgliederversammlung vom 09.09.2020 Scharpenberg
TOP 3 Bericht des Präsidenten Brannath
TOP 4 Nachwuchspreise Brannath
TOP 5 Berichte aus den internationalen Gremien Bretz, Ickstadt, Kieser, Kübler, Pigeot, Ziegler
TOP 6 Bericht des Schriftführers Scharpenberg
TOP 7 Bericht aus der Geschäftsstelle Scharpenberg
TOP 8 Bericht des Schatzmeisters Knapp
TOP 9 Bericht der Kassenprüfer Dierig, Tuğ
TOP 10 Beschlüsse über Rückstellungen und Mitgliedsbeiträge 2022 Knapp
TOP 11 Berichte aus den Arbeitsgruppen Asendorf
TOP 12 Sommerschulen, Weiterbildung Brannath
TOP 13 Zukünftige Kolloquien Brannath
TOP 14 Biometrical Journal Bathke, Schmid
TOP 15 Bericht des Wahlleiters über die Beiratswahl Gerß
TOP 16 Verschiedenes Brannath

Statistics in Practice II

Chairs: Theresa Keller and Thomas Schmelter


Education for Statistics in Practice: Development and evaluation of prediction models: pitfalls and solutions
Ben Van Calster1, Maarten van Smeden2
1Department of Development and Regeneration, University of Leuven, Leuven, Belgium; 2Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, Netherlands

With fast developments in medical statistics, machine learning and artificial intelligence, the current opportunities for making accurate predictions about the future seem nearly endless. In this lecture we will share some experiences from a medical prediction perspective, where prediction modelling has a long history and models have been implemented in patient care with varying success. We will focus on best practices for the development, evaluation and presentation of prediction models, highlight some common pitfalls, present solutions to circumvent bad prediction modelling and discuss some methodological challenges for the future.


EXTENDED ABSTRACT

Prediction models are developed throughout science. In this session the focus will be on applications in the medical domain, where prediction models have a long history commonly serving either a diagnostic or prognostic purpose. The ultimate goal of such models is to assist in medical decision making by providing accurate predictions for future individuals.

As we anticipate participants to this session are already well versed in fitting statistical models to data, the focus will be on the common pitfalls when developing statistical (learning) and machine learning models with a prediction aim. Our goal is that the participants gain knowledge about the pitfalls of prediction modeling and increase their familiarity with methods providing solutions for these pitfalls.

The sessions will be arranged in sections of 20 to 30 minutes. The following topics will be covered.

State of the medical prediction modeling art

This section begins with a small introduction into the history of prediction modeling in medical research. Positive examples will be highlighted and we will draw from the extensive systematic review literature on clinical prediction models. Recent experiences with a living systematic review on COVID-19 related prediction modeling will be discussed.

Just another prediction model

For most health conditions prediction models already exist. How does one prevent that prediction modeling project ends up on the large failed and unused model pile? Using the PROGRESS framework, we discuss various prediction modeling goals. Some good modeling practices and the harm of commonly applied modeling methods are illustrated. Finally, we will highlight some recent developments in formalizing prediction goals (predictimands).

Methods against overfitting

Overfitting is arguably the biggest enemy of prediction modeling. There is a large literature on shrinkage estimators that aim at preventing overfitting. In this section we will reflect on the history of shrinkage methods (e.g. Stein’s estimator & Le Cessie van Houwelingen heuristic shrinkage) and more recent developments (e.g. lasso and ridge regression variants). The advantages and limitations will be discussed.

Methods for deciding on appropriate sample size

Rules of thumb have dominated the discussions on sample size for prediction models for decades (e.g. the need for at least 10 events for every predictor considered). The history and limitations of these rules of thumb will be shown. Recently developed sample size criteria for prediction model development and validation will be presented.

Model performance and validation

Validation of prediction models goes beyond evaluation of model coefficients and goodness-of-fit tests. Prediction models should give higher risk estimates for events than for non-events (discrimination). Predictions may be used to support clinical decisions, therefore the risks should be accurate (calibration). We will describe various levels at which a model can be calibrated. Further, the performance of the model to classify patients into low vs high risk patients to support decision making can be evaluated. We discuss decision curve analysis, the most well-known tool for utility validation. The link between calibration and utility is explained.

Heterogeneity over time and place: there is no such thing as a validated model

We discuss the different levels of validation (apparent, internal, and external), and what they can tell us. However, it is increasingly recognized that one should expect performance to be heterogeneous between different settings/hospitals. This can be taken into account on many levels: we may focus on having clustered (e.g. multicenter data, IPD) datasets for model development and validation, internal-external cross-validation can be used during model development, and cluster-specific performance can be meta-analyzed at validation. If data allow, meta-regression can be used to gain insight in performance heterogeneity. Model updating can be used to adapt a model to a new setting. In addition, populations tend to change over time. This calls for continuous updating strategies.

Applied example

We will describe the development and validation of the ADNEX model to diagnose ovarian cancer. Development, validation, target population, meta-regression, validation studies, model updating, and implementation in ultrasound machines.

Future perspective: machine learning and AI

Flexible machine learning algorithms have been around for a while. Recently, however, we have observed a strong increase in their use. We discuss challenges for these methods, such as data hungriness, the risk of automation, increasing complexity of model building, the no free lunch idea, and the winner’s curse.

Statistics in Practice I

Chairs: Theresa Keller and Willi Sauerbrei


Education for Statistics in Practice: Development and evaluation of prediction models: pitfalls and solutions
Ben Van Calster1, Maarten van Smeden2
1Department of Development and Regeneration, University of Leuven, Leuven, Belgium; 2Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, Netherlands

With fast developments in medical statistics, machine learning and artificial intelligence, the current opportunities for making accurate predictions about the future seem nearly endless. In this lecture we will share some experiences from a medical prediction perspective, where prediction modelling has a long history and models have been implemented in patient care with varying success. We will focus on best practices for the development, evaluation and presentation of prediction models, highlight some common pitfalls, present solutions to circumvent bad prediction modelling and discuss some methodological challenges for the future.


EXTENDED ABSTRACT

Prediction models are developed throughout science. In this session the focus will be on applications in the medical domain, where prediction models have a long history commonly serving either a diagnostic or prognostic purpose. The ultimate goal of such models is to assist in medical decision making by providing accurate predictions for future individuals.

As we anticipate participants to this session are already well versed in fitting statistical models to data, the focus will be on the common pitfalls when developing statistical (learning) and machine learning models with a prediction aim. Our goal is that the participants gain knowledge about the pitfalls of prediction modeling and increase their familiarity with methods providing solutions for these pitfalls.

The sessions will be arranged in sections of 20 to 30 minutes. The following topics will be covered.

State of the medical prediction modeling art

This section begins with a small introduction into the history of prediction modeling in medical research. Positive examples will be highlighted and we will draw from the extensive systematic review literature on clinical prediction models. Recent experiences with a living systematic review on COVID-19 related prediction modeling will be discussed.

Just another prediction model

For most health conditions prediction models already exist. How does one prevent that prediction modeling project ends up on the large failed and unused model pile? Using the PROGRESS framework, we discuss various prediction modeling goals. Some good modeling practices and the harm of commonly applied modeling methods are illustrated. Finally, we will highlight some recent developments in formalizing prediction goals (predictimands).

Methods against overfitting

Overfitting is arguably the biggest enemy of prediction modeling. There is a large literature on shrinkage estimators that aim at preventing overfitting. In this section we will reflect on the history of shrinkage methods (e.g. Stein’s estimator & Le Cessie van Houwelingen heuristic shrinkage) and more recent developments (e.g. lasso and ridge regression variants). The advantages and limitations will be discussed.

Methods for deciding on appropriate sample size

Rules of thumb have dominated the discussions on sample size for prediction models for decades (e.g. the need for at least 10 events for every predictor considered). The history and limitations of these rules of thumb will be shown. Recently developed sample size criteria for prediction model development and validation will be presented.

Model performance and validation

Validation of prediction models goes beyond evaluation of model coefficients and goodness-of-fit tests. Prediction models should give higher risk estimates for events than for non-events (discrimination). Predictions may be used to support clinical decisions, therefore the risks should be accurate (calibration). We will describe various levels at which a model can be calibrated. Further, the performance of the model to classify patients into low vs high risk patients to support decision making can be evaluated. We discuss decision curve analysis, the most well-known tool for utility validation. The link between calibration and utility is explained.

Heterogeneity over time and place: there is no such thing as a validated model

We discuss the different levels of validation (apparent, internal, and external), and what they can tell us. However, it is increasingly recognized that one should expect performance to be heterogeneous between different settings/hospitals. This can be taken into account on many levels: we may focus on having clustered (e.g. multicenter data, IPD) datasets for model development and validation, internal-external cross-validation can be used during model development, and cluster-specific performance can be meta-analyzed at validation. If data allow, meta-regression can be used to gain insight in performance heterogeneity. Model updating can be used to adapt a model to a new setting. In addition, populations tend to change over time. This calls for continuous updating strategies.

Applied example

We will describe the development and validation of the ADNEX model to diagnose ovarian cancer. Development, validation, target population, meta-regression, validation studies, model updating, and implementation in ultrasound machines.

Future perspective: machine learning and AI

Flexible machine learning algorithms have been around for a while. Recently, however, we have observed a strong increase in their use. We discuss challenges for these methods, such as data hungriness, the risk of automation, increasing complexity of model building, the no free lunch idea, and the winner’s curse.

Panel Discussion: Drug Development beyond Traditional Paths

Chairs: Cornelia-Ursula Kunz and Kaspar Rufibach


Academia-industry collaborations in biostatistics – It is not about the whether, just about the how
Lisa Hampson1, Frank Fleischer2
1Advanced Methodology & Data Science, Novartis Pharma AG, Switzerland; 2Biostatistics & Data Sciences, Boehringer Ingelheim Pharma, Germany

Methodological collaborations between academia and the pharmaceutical industry can have several benefits for both parties. In addition to the development and application of new statistical methods, there is also the education and recruitment of the next generation of biostatisticians and data scientists. In this presentation, we begin by reflecting on the key components (and maybe some pitfalls) of an academia-industry collaboration. We consider the different models that these collaborations can follow, ranging from co-supervision of student projects to collaborations between institutions. We will use several examples to illustrate the various models and their direct impact on statistical methodology and the business. Topics covered are diverse and range from data science to innovative clinical trial design. We conclude by looking to the future and provide an overview of emerging methodological questions in the pharmaceutical industry which we think are ripe for future academia-industry partnerships.

Statistical Machine Learning II

Chairs: Harald Binder and Marvin Wright


Variable relation analysis utilizing surrogate variables in random forests
Stephan Seifert1, Sven Gundlach2, Silke Szymczak3
1University of Hamburg; 2Kiel University; 3University of Lübeck

The machine learning approach random forests [1] can be successfully applied to omics data, such as gene expression data, for classification or regression. However, the interpretation of the trained prediction models is currently mainly limited to the selection of relevant variables identified based on so-called importance measurements of each individual variable. Thus, relationships between the predictor variables are not considered. We developed a new RF based variable selection method called Surrogate Minimal Depth (SMD) that incorporates variable relations into the selection process of important variables. [2] This is achieved by the exploitation of surrogate variables that have originally been introduced to deal with missing predictor variables. [3] In addition to improving variable selection, surrogate variables and their relationship to the primary split variables measured by the parameter mean adjusted agreement can also be utilized as proxy for the relations between the different variables. This relation analysis goes beyond the investigation of ordinary correlation coefficients because it takes into account the association with the outcome. I will present the basic concept of surrogate variables and mean adjusted agreement, as well as the relation analysis of simulated data as proof of concept and the investigation of experimental breast cancer gene expression datasets to show the practical applicability of this new approach.

References

[1] L. Breiman, Mach. Learn. 2001, 45, 5-32.

[2] S. Seifert, S. Gundlach, S. Szymczak, Bioinformatics 2019, 35, 3663-3671.

[3] L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen, Classification and Regression Trees, Taylor & Francis, 1984.


Variable Importance in Random Forests in the Presence of Confounding
Robert Miltenberger, Christoph Wies, Gunter Grieser, Antje Jahn
University of applied sciences Darmstadt, Deutschland

Patients with a need for kidney transplantation suffer from a lack of available organ donors. Still, patients commonly reject an allocated kidney when they consider its quality to be insufficient [1]. Rejection is of major concern as it can reduce the organs quality due to prolonged ischemic time and thus its use for further patients. To better understand the association between organ quality and patient prognosis after transplantation, random survival forests will be applied to data on more than 50.000 kidney transplantations of the US organ transplantation registry. However, the US allocation process is allocating kidneys of high quality to patients with good prognosis. Thus confounding is of major concern and needs to be adressed.

In this talk, we investigate methods to address confounding in random forest analysis by using residuals from a generalized propensity score analysis. We show, that by considering the residuals instead of original variables the permutation variable importance measures refer to semipartial correlations between outcome and variable instead of correlations that are disturbed by confounder effects. This facilitates the interpretation of the variable importance measure. As our findings rely on linear models, we further investigate the approach for non-linear and non-additive models by the use of simulations.

The proposed method is used to analyse the impact of kidney quality on failure-free survival after transplantation based on the US registry data. Results are compared to other methods, that have been proposed for a better understanding and explainability of random forest analyses [2].

[1] Husain SA et.al.: Association Between Declined Offers of Deceased Donor Kidney Allograft and Outcomes in Kidney Transplant Candidates. JAMA Netw Open. 2019; doi:10.1001/jamanetworkopen.2019.10312

[2] Paluszynska A, Przemyslaw Biecek P and Jiang Y (2020). randomForestExplainer: Explaining and Visualizing Random Forests in Terms of Variable Importance. R package version 0.10.1. https://CRAN.R-project.org/package=randomForestExplainer


Identification of representative trees in random forests based on a new tree-based distance measure
Björn-Hergen Laabs, Inke R. König
Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Germany

In life sciences random forests are often used to train predictive models, but it is rather complex to gain any explanatory insight into the mechanics leading to a specific outcome, which impedes the implementation of random forests in clinical practice. Typically, variable importance measures are used, but they can neither explain how a variable influences the outcome nor find interactions between variables; furthermore, they ignore the tree structure in the forest in total. A different approach is to select a single or a set of a few trees from the ensemble which best represent the forest. It is hoped that by simplifying a complex ensemble of decision trees to a set of a few representative trees, it is possible to observe common tree structures, the importance of specific features and variable interactions. Thus, representative trees could also help to understand interactions between genetic variants.

The intuitive definition of representative trees are those with the minimal distance to all other trees, which requires a proper definition of the distance between two trees. The currently proposed tree-based distance metrics[1] compare trees regarding either the prediction, the clustering in the terminal nodes, or the variables that were used for splitting. Therefore they either need an additional data set for calculating the distances or capture only few aspects of the tree architecture. Thus, we developed a new tree-based distance measure, which does not use an additional data set and incorporates more of the tree structure, by evaluating not only whether a certain variable was used for splitting in the tree, but also where in the tree it was used. We compared our new method with the existing metrics in an extensive simulation study and show that our new distance metric is superior in depicting the differences in tree structures. Furthermore, we found that the most representative tree selected by our method has the best prediction performance on independent validation data compared to the trees selected by other metrics.

[1] Banerjee et al. (2012), Identifying representative trees from ensembles, Statistics in Medicine 31(15), 1601-16


Interaction forests: Identifying and exploiting influential quantitative and qualitative interaction effects
Roman Hornung
University of Munich, Germany

Even though interaction effects are omnipresent in biomedical data and play a particularly prominent role in genetics, they are given little attention in analysis, in particular in prediction modelling. Identifying influential interaction effects is valuable, both, because they allow important insights into the interplay between the covariates and because these effects can be used to improve the prediction performance of automatic prediction rules.

Random forest is one of the most popular machine learning methods and known for its ability to capture complex non-linear dependencies between the covariates and the outcome. A key feature of random forest is that it allows to rank the considered covariates with respect to their contribution to prediction using various variable importance measures.

We developed ‚interaction forest‘, a variation of random forest for categorical, metric, and survival outcomes that explicitly considers several types of interaction effects in the splitting performed by the trees constituting the forest. The new ‚effect importance measure (EIM)‘ associated with interaction forest allows to rank the interaction effects between the covariate pairs with respect to their importance for prediction in addition to ranking the univariable effects of the covariates in this respect. Using EIM, separate importance value lists for univariable effects, quantitative interaction effects, and qualitative interaction effects are provided. In a real data study using 220 publicly available data sets it is seen that the prediction performance of interaction forest is statistically significantly better than that of random forest and competing random forest variants that, as does interaction forest, use multivariable splitting. Moreover, a simulation study suggests that EIM allows to identify consistently the relevant quantitative and qualitative interaction effects in datasets. Here, the rankings obtained from the EIM value lists for quantitative interaction effects on the one hand and qualitative interaction effects on the other are confirmed to be specific for each of these two types of interaction effects. These results indicate that interaction forest is a suitable tool for identifying and making use of relevant interaction effects in prediction modelling.


A Machine Learning Approach to Empirical Dynamic Modeling for Biochemical Systems
Kevin Siswandi
University of Freiburg, Germany

BACKGROUND

In the biosciences, dynamic modeling plays a very important role for understanding and predicting the temporal behaviour of biochemical systems, with wide-ranging applications from bioengineering to precision medicine. Traditionally, dynamic modeling (e.g. in systems biology) is commonly done with Ordinary Differential Equations (ODEs) to predict system dynamics. Such models are typically constructed based on first-principles equations (e.g. Michaelis-Menten kinetics) that are further iteratively modified to be consistent with experiments. Consequently, it could well take several years before a model is quantitatively predictive. Moreover, such ODE models do not scale with increasing amounts of data. At the same time, the demand for high accuracy predictions is increasing in the biotechnology and synthetic biology industry. Here, we investigate a data-driven approach based on machine learning for empirical dynamic modeling that can allow for faster development relative to traditional first-principles modeling, with a particular focus on biochemical systems.

METHODS

We present a numerical framework for a machine learning approach to discover dynamics from time-series data. The main workflow consists of data augmentation, model training and validation, numerical integration, and model explanation. In contrast to other works, our method does not assume any prior (biological) knowledge or governing equations.

Specifically, by posing it as a supervised learning problem, the dynamics can be reconstructed from time-series measurements through solving the resulting optimisation problem. This is done by embedding it within the classical framework of a numerical method (e.g. linear multi-step method or LMM). We evaluate this approach on canonical systems and complex biochemical systems with nonlinear dynamics.

RESULTS

We show that this method can discover the dynamics of our test systems given enough data. We further find that it could discover bifurcations, is robust to noise, and capable of leveraging additional data to improve its prediction accuracy at scale. Finally, we employ various explainability studies to extract mechanistic insights from the biochemical systems.

CONCLUSION

By avoiding assumptions about specific mechanisms, we are able to propose a general machine learning workflow. Thus, it can be applied to any new systems (e.g. pathways or hosts), and could be used to capture complex dynamic relationships which are still unknown in the literature. We believe that it has the potential to accelerate the development of predictive dynamic models due to its data-driven approach.