Location: https://wwu.zoom.us/j/61240744560

Mathematical Methods in Medicine and Biology

Chairs: Ingmar Glauche and Matthias Horn

Future Prevalence of Type 2 Diabetes – A Comparative Analysis of Chronic Disease Projection Methods
Dina Voeltz1, Thaddäus Tönnies2, Ralph Brinks1,2,3, Annika Hoyer1
1Ludwig-Maximilians-Universität München, Germany; 2Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich-Heine-University Duesseldorf; 3Hiller Research Unit for Rheumatology Duesseldorf

Background: Precise projections of future chronic disease cases needing pharmaco-intensive treatments are necessary for effective resource allocation and health care planning in response to increasing disease burden.

Aim: To compare different projection methods to estimate the number of people diagnosed with type 2 diabetes (T2D) in Germany in 2040.

Methods: We compare the results of three methods to project the number of people with T2D in Germany 2040. In a relatively simple approach, method 1) combines the sex- and age-specific prevalence of T2D in 2015 with sex- and age-specific population distributions projected by the German Federal Statistical Office (FSO). Methods 2) and 3) additionally account for incidence of T2D and mortality rates using mathematical relations as proposed by the illness-death model for chronic diseases [1]. Therefore, they are more comprehensive than method 1), which likely adds to their results’ validity and accuracy. For this purpose, method 2) firstly models the prevalence of T2D employing a partial differential equation (PDE) which incorporates incidence and mortality [2]. This flexible, yet simple PDE used yields is validated in contexts of dementia, amongst others, and is recommended for chronic disease epidemiology. Subsequently, the estimated prevalence is multiplied with the population projection of the FSO [3]. Hence, method 2) uses the projected general mortality of the FSO and the mortality rate ratio of diseased vs. non-diseased people. By contrast, method 3) estimates future mortality of non-diseased and diseased people independently from the projection of the FSO. These estimated future mortality rates function as input for two PDEs to directly project the absolute number of cases. The sex- and age-specific incidence rate for methods 2) and 3) stems from the risk structure compensation (Risikostrukturausgleich, MorbiRSA) which comprises data from about 70 million Germans in the public health insurance. The incidence rate is assumed to remain as in 2015 throughout the overall projection horizon from 2015 to 2040.

Results: Method 1) projects 8.3 million people with diagnosed T2D in Germany in 2040. Compared to 6.9 million people in 2015, this equals an increase by 21%. Methods 2) and 3) project 11.5 million (+65% compared to 2015) and 12.5 million (+85%) T2D patients, respectively.

Conclusions: The methods’ results differ substantially. Method 1) accounts for the aging of the German population but is otherwise relatively little comprehensive. Method 2) and 3) additionally consider underlying changes in the incidence and mortality rates affecting disease prevalence.

Mixed-effects ANCOVA for estimating the difference in population mean parameters in case of nonlinearly related data
Ricarda Graf
University of Göttingen, Germany

Repeated measures data can be found in many fields. The two types of variation characteristic for this type of data – referred to as within-subject and between-subject variation – are accounted for by linear and nonlinear mixed-effects models. ANOVA-type models are sometimes applied for comparison of population means despite a nonlinear relationship in the data. Accurate parameter estimation through more appropriate nonlinear-mixed effects (NLME) models, such as for sigmoidal curves, might be hampered due to insufficient data near the asymptotes, the choice of starting values for the iterative optimization algorithms used given the lack of closed-form expressions of the likelihood or due to convergence problems of these algorithms.

The main objective of this thesis is to compare the performance of a one-way mixed-effects ANCOVA and a NLME three-parameter logistic regression model with respect to the accuracy in estimating the difference in population means. Data from a clinical trial1, in which the difference in mean blood pressure (BP50) between two groups was estimated by repeated-measures ANOVA, served as a reference for data simulation. A third simplifying method, used in toxicity studies², was additionally included. It considers the two measurements per subject lying immediately below and above mean half maximal response (E_max). Population means are obtained by considering the intersections of the horizontal line represented by half E_max and the line derived from connecting the two data points per subject and group. A simulation study with two scenarios was conducted to compare bias, coverage rates and empirical SE of the three methods when estimating the difference in BP50 for purpose of identification of the disadvantages by using the simpler linear instead of the nonlinear model. In the first scenario, the true individual blood pressure ranges were considered, while in the second scenario, measurements at characteristic points of the sigmoidal curves were considered, regardless of the true measurement ranges, in order to obtain a more distinct nonlinear relationship.

The estimates of the mixed-effects ANCOVA model were more biased but also more precise compared with the NLME model. The ANCOVA method could not detect the difference in BP50 in the second scenario anymore. The results of the third method did not seem reliable since its estimates did on average even reverse the direction of the true parameter.

NLME models should be preferred for data with a known nonlinear relationship if the available data allows it. Convergence problems can be overcome by using a Bayesian approach.

Explained Variation in the Linear Mixed Model
Nicholas Schreck
DKFZ Heidelberg, Germany

The coefficient of determination is a standard characteristic in linear models with quantitative response variables. It is widely used to assess the proportion of variation explained, to determine the goodness-of-fit and to compare models with different covariates.

However, there has not been an agreement on a similar quantity for the class of linear mixed models yet.

We introduce a natural extension of the well-known adjusted coefficient of determination in linear models to the variance components form of the linear mixed model.

This extension is dimensionless, has an intuitive and simple definition in terms of variance explained, is additive for several random effects and reduces to the adjusted coefficient of determination in the linear model.

To this end, we prove a full decomposition of the sum of squares of the independent variable into the explained and residual variance.

Based on the restricted maximum likelihood equations, we introduce a novel measure for the explained variation which we allocate specifically to the contribution of the fixed and the random covariates of the model.

We illustrate that this empirical explained variation can in particular be used as an improved estimator of the classical additive genetic variance of continuous complex traits.

Modelling acute myeloid leukemia: Closing the gap between model parameters and individual clinical patient data
Dennis Görlich
Institute of Biostatistics and Clinical Research, University Münster, Germany

In this contribution, we will illustrate and discuss our approach to fit a mechanistic mathematical model of acute myeloid leukemia (AML) to individual patient data, leading to personalized model parameter estimates.

We use a previously published model (Banck and Görlich, 2019) that describes the healthy hematopoiesis and the leukemia dynamics. Here, we consider a situation where the healthy hematopoiesis is calibrated to a population average and personalized leukemia parameters (self renewal, proliferation, and treatment intensity) needs to be estimated.

To link the mathematical model to clinical data model predictions needs to be aligned to observable clinical outcome measures. In AML research, blast load, complete remission, and survival are typically considered. Based on the model’s properties, especially the capability to predict the considered outcomes, blast load turned out to be well suited for the model fitting process.

We formulated an optimization problem to estimate personalized model parameters based on the comparison between observed and predicted blast load (cf. Görlich, 2021).

A grid search was performed to evaluate the fitness landscape of the optimization problem. The grid search approach showed that, depending on the patient’s individual blast course, noisy fitness landscapes can occur. In these cases, a gradient-descent algorithm will usually perform poorly. This problem can be overcome by application of e.g. the differential evolution algorithm (Price et al., 2006). The estimated personalized leukemia parameters can be further correlated to observed clinical data. A preliminary analysis showed promising results.

Finally, the application of mechanistic mathematical models in combination with personalized model fitting seems to be a promising approach within clinical research.


Dennis Görlich (accepted). Fitting Personalized Mechanistic Mathematical Models of Acute Myeloid Leukaemia to Clinical Patient Data. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 3: BIOINFORMATICS 2021

Jan C. Banck and Dennis Görlich (2019). In-silico comparison of two induction regimens (7 + 3 vs 7 + 3 plus additional bone marrow evaluation) in acute myeloid leukemia treatment. BMC Systems Biology, 13(1):18.

Kenneth V. Price, Rainer M. Storn and Jouni A. Lampinen (2006). Differential Evolution – A Practical Approach to Global Optimization. Berlin Heidelberg: Springer-Verlag.

Effect of missing values in multi-environmental trials on variance component estimates
Jens Hartung, Hans-Peter Piepho
University of Hohenheim, Germany

A common task in the analysis of multi-environmental trials (MET) by linear mixed models (LMM) is the estimation of variance components (VCs). Most often, MET data are imbalanced, e.g., due to selection. The imbalance mechanism can be missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). If the missing-data pattern in MET is not MNAR, likelihood-based methods are the preferred methods for analysis as they can account for selection. Likelihood-based methods used to estimate VCs in LMM have the property that all VC estimates are constrained to be non-negative and thus the estimators are generally biased. Therefore, there are two potential causes of bias in MET analysis: a MNAR data pattern and the small-sample properties of likelihood-based estimators. The current study tries to distinguish between both possible sources of bias. A simulation study with MET data typical for cultivar evaluation trials was conducted. The missing data pattern and size of VCs was varied. The results showed that for the simulated MET, VC estimates from likelihood-based methods are mainly biased due to the small-sample properties of likelihood-based methods for a small ratio of genotype variance to error variance.

Open Topics

Chairs: Andre Scherag and Reinhard Vonthein

Using Historical Data to Predict Health Outcomes – The Prediction Design
Stella Erdmann, Manuel Feißt, Johannes Krisam, Meinhard Kieser
Institute of Medical Biometry and Informatics, University of Heidelberg, Germany

The gold standard for the investigation of the efficacy of a new therapy is a randomized controlled trial (RCT). This is costly, time consuming and not always practicable (e.g. for lethal conditions with limited treatment possibilities) or even possible in a reasonable time frame (e.g. in rare diseases due to small sample sizes). At the same time, huge quantities of available control-condition data in analyzable format of former RCTs or real-world data (RWD), i.e., patient‐level data gathered outside the conventional clinical trial setting, are neglected if not often completely ignored. To overcome these shortcomings, alternative study designs using data more efficiently would be desirable.

Assuming that the standard therapy and its mode of functioning is well known and large volumes of patient data exist, it is possible to set up a sound prediction model to determine the treatment effect of this standard therapy for future patients. When a new therapy is intended to be tested against the standard therapy, the vision would be to conduct a single-arm trial and to use the prediction model to determine the effect of the standard therapy on the outcome of interest of patients receiving the test treatment only, instead of setting up a two-arm trial for this comparison. While the advantages of using historical data to estimate the counterfactual are obvious (increased efficiency, lower cost, alleviating participants’ fear of being on placebo), bias could be caused by confounding (e.g. by indication, severity, or prognosis) or a number of other data issues that could jeopardize the validity of the non-randomized comparison.

The aim is to investigate if and how such a design – the prediction design – may be used to provide information on treatment effects by leveraging existing infrastructure and data sources (historical data of RCTs and/or RWD). Therefore, we investigate under what assumptions a linear prediction model could be used to predict the counterfactual of patients precisely enough to construct a test for evaluating the treatment effect for normally distributed endpoints. In particular, it is investigated what amount of data is necessary (for the historical data and for the single arm trial to be conducted). Via simulation studies, it is examined how sensible the design acts towards violations of the assumptions. The results are compared to reasonable (conventional) benchmark scenarios, e.g., the setting of a single-arm study with pre-defined threshold or a setting, where a propensity score matching was performed.

Arguments for exhuming nonnegative garrote out of grave
Edwin Kipruto, Willi Sauerbrei
Medical Center-University of Freiburg, Germany

Background: The original nonnegative garrote (Breiman 1995) seems to have been forgotten despite some of its good conceptual properties. Its unpopularity is probably caused by dependence on least square estimates which does not have solution in high dimensional data and performs very poorly in high degree of multicollinearity. However, Yuan and Lin (2007) showed that nonnegative garrote is a flexible approach that can be used in combination with other estimators besides least squares such as ridge hence the aforementioned challenges can be circumvented; despite this proposal, it is hardly used in practice. Considerable attention has been given to prediction models compared to descriptive models where the aim is to summarize the data structure in a compact manner (Shmueli, 2010). Here our main interest is on descriptive modeling and as a byproduct we will present results of prediction.

Objectives: To evaluate the performance of nonnegative garrote and compare results with some popular approaches using three different real datasets with low to high degree of multicollinearity and in high dimensional data

Methods: We evaluated four penalized regression methods (Nonnegative garrote, lasso, adaptive lasso, relaxed lasso) and two classical variable selection methods (best subset, backward elimination) with and without post-estimation shrinkage.

Results: Nonnegative garrote can be used with other initial estimators besides least squares in highly correlated data and in high dimensional datasets. Negligible differences in predictions were observed in methods while considerable differences were observed in the number of variables selected.

Conclusion: To fit nonnegative garrote in highly correlated data and in high dimensional settings the proposed initial estimates can be used as an alternative to least squares estimates.

On the assessment of methods to identify influential points in high-dimensional data

Shuo Wang, Edwin Kipruto, Willi Sauerbrei

Medical Center – University of Freiburg, Germany

Extreme values and influential points in predictors often strongly affect the results of statistical analyses in low and high-dimensional settings. Many methods to detect such values have been proposed but there is no consensus on advantages and disadvantages as well as guidance for practice. We will present various classes of methods and illustrate their use in several high-dimensional data. First, we consider a simple pre-transformation which is combined with feature ranking lists to identify influential points, concentrating on univariable situations (Boulesteix and Sauerbrei, 2011, DOI: 10.1002/bimj.201000189). The procedure will be extended by checking for influential points in bivariate models and by adding some steps to the multivariable approach.

Second, to increase stability of feature ranking lists, we will use various aggregation approaches to explore for extreme values in features and influential observations. The former incurs the rank changes of a specific feature, while the latter causes a universal ranking change. For the detection of extreme values, we employ the simple pretransformation on data and detect the features whose ranks significantly changed after the transformation. For the detection of influential observations, we consider a combination of leave-one-out and rank comparison to detect the observations causing large rank changes. These methods are applied in several publicly available datasets.

Acceleration of diagnostic research: Is there a potential for seamless designs?
Werner Vach1, Eric Bibiza-Freiwald2, Oke Gerke3, Tim Friede4, Patrick Bossuyt5, Antonia Zapf2
1Basel Academy for Quality and Research in Medicine, Switzerland; 2Institute of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf; 3Department of Nuclear Medicine, Odense University Hospital; 4Department of Medical Statistics, University Medical Center Goettingen; 5Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centers

Background: New diagnostic tests to identify a well-established disease state have to undergo a series of scientific studies from test construction until finally demonstrating a societal impact. Traditionally, these studies are performed with substantial time gaps in between. Seamless designs allow us to combine a sequence of studies in one protocol and may hence accelerate this process.

Aim: A systematic investigation of the potential of seamless designs in diagnostic research.

Methods: We summarized the major study types in diagnostic research and identified their basic characteristics with respect to applying seamless designs. This information was used to identify major hurdles and opportunities for seamless designs.

Results: 11 major study types were identified. The following basic characteristics were identified: type of recruitment (case-control vs population-based), application of a reference standard, inclusion of a comparator, paired or unpaired application of a comparator, assessment of patient relevant outcomes, possibility for blinding of test results.

Two basic hurdles could be identified: 1) Accuracy studies are hard to combine with post-accuracy studies, as the first are required to justify the latter and as application of a reference test in outcome studies is a threat to the study’s integrity. 2) Questions, which can be clarified by other study designs, should be clarified before performing a randomized diagnostic study.

However, there is a substantial potential for seamless designs since all steps from the construction until the comparison with the current standard can be combined in one protocol. This may include a switch from case-control to population-based recruitment as well as a switch from a single arm study to a comparative accuracy study. In addition, change in management studies can be combined with an outcome study in discordant pairs. Examples from the literature illustrate the feasibility of both approaches.

Conclusions: There is a potential for seamless designs in diagnostic research.

Reference: Vach W, Bibiza E, Gerke O, Bossuyt PM, Friede T, Zapf A (2021). A potential for seamless designs in diagnostic research could be identified. J Clin Epidemiol. 29:51-59. doi: 10.1016/j.jclinepi.2020.09.019.

The augmented binary method for composite endpoints based on forced vital capacity (FVC) in systemic sclerosis-associated interstitial lung disease
Carolyn Cook1, Michael Kreuter2, Susanne Stowasser3, Christian Stock4
1mainanalytics GmbH, Sulzbach, Germany; 2Center for Interstitial and Rare Lung Diseases, Pneumology and Respiratory Care Medicine, Thoraxklinik, University of Heidelberg, Heidelberg, Germany and German Center for Lung Research, Heidelberg, Germany; 3Boehringer Ingelheim International GmbH, Ingelheim am Rhein, Germany; 4Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim am Rhein, Germany


The augmented binary method (Wason & Seaman. Stat Med, 2013; 32(26)) is a novel method for precisely estimating response rates and differences among response rates defined based on a composite endpoint that contains a dichotomized continuous variable and additional inherently binary components. The method is an alternative to traditional approaches such as logistic regression techniques. Due to the complexity and computational demands of the method, experience in clinical studies has been limited thus far and is mainly restricted to oncological studies. Operating characteristics and, thus, potential statistical benefits are unclear for other settings.


We aimed to perform a Monte Carlo simulation study to assess operating characteristics of the augmented binary method in settings relevant to randomized controlled trials and non-interventional studies in systemic sclerosis-associated interstitial lung disease (SSc-ILD), a rare, chronic autoimmune disease, where composite endpoints of the above described type are frequently applied.


An extensive simulation study was performed assessing type I error, power, coverage, and bias of the augmented binary method and a standard logistic model for the composite endpoint. Parameters were varied to resemble lung function decline (as measured through the forced vital capacity, FVC), hospitalization events and mortality in patients with SSc-ILD over a 1- and 2-year period. A relative treatment effect of 50% on FVC was assumed, while assumed effects on hospitalizations and mortality were derived from joined modeling analyses of existing trial data (as indirect effects of the treatment on FVC). Further, the methods were exemplarily applied to data from the SENSCIS trial, a phase III randomized, double-blind, placebo-controlled trial to investigate the efficacy and safety of nintedanib in patients with SSc-ILD.


The simulation study is currently in progress and results will be available by the end of January. In preliminary results modest gains in power and precision were observed, with acceptable compromises of type I error, if any. In scenarios with lower statistical powers, these results were more likely to make a difference on inferences concerning the treatment effect. In the exemplary application of the augmented binary method to trial data confidence intervals and p-values on selected endpoints involving FVC decline, hospitalization and mortality were smaller.


Based on preliminary results from a simulation study, we identified areas where the augmented binary method conveys an appreciable advantage compared to standard methods.

Beyond binary: Causal inference for adaptive treatment strategies and time-varying or multi-component exposures

Chairs: Ryan Andrews and Vanessa Didelez

Doubly robust estimation of adaptive dosing rules
Erica E. M. Moodie1, Juliana Schulz2
1McGill University; 2HEC Montreal

Dynamic weighted ordinary least squares (dWOLS) was proposed as a simple analytic tool for estimating optimal adaptive treatment strategies. The approach aimed to combine the double robustness of G-estimation with the ease of implementation of Q-learning, however early methodology was limited to only the continuous outcome/binary treatment setting. In this talk, I will introduce generalized dWOLS, an extension that allowed for continuous-valued treatments to estimate optimal dosing strategies, and demonstrate the approach in estimating an optimal Warfarin dosing rule.

Efficient, doubly robust estimation of the effect of dose switching for switchers in a randomised clinical trial
Kelly Van Lancker1, An Vandebosch2, Stijn Vansteelandt1,3
1Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium; 2Janssen R&D, a division of Janssen Pharmaceutica NV, Beerse, Belgium; 3Department of Medical Statistics, London School of Hygiene and Tropical Medicine, London, United Kingdom

The interpretation of intention-to-treat analyses of randomised clinical trials is often hindered as a result of noncompliance and treatment switching. This has recently given rise to a vigorous research activity on the identification and estimation of so-called estimands.

Motivated by an ongoing clinical trial conducted by Janssen Pharmaceutica in which a flexible dosing regimen is compared to placebo, we evaluate how switchers in the treatment arm (i.e., patients who were switched to the higher dose) would have fared had they been kept on the low dose in order to understand whether flexible dosing is potentially beneficial for them. Comparing these patients’ responses with those of patients who stayed on the low dose does not likely entail a satisfactory evaluation because the latter patients are usually in a better health condition and the available information is too limited to enable a reliable adjustment. In view of this, we will transport data from a fixed dosing trial that has been conducted concurrently on the same target, albeit not in an identical patient population.

In particular, we will propose a doubly robust estimator, which relies on an outcome model and a propensity score model for the association between study and patient characteristics. The proposed estimator is easy to evaluate, asymptotically unbiased if either model is correctly specified and efficient (under the restricted semi-parametric model where the randomisation probabilities are known and independent of baseline covariates) when both models are correctly specified. Theoretical properties are also evaluated through Monte Carlo simulations and the method will be illustrated based on the motivating example.

New causal criteria for decisions making under fairness constraints
Mats Stensrud
Ecole Polytechnique Fédérale de Lausanne, Switzerland

To justify that a decision is fair, causal reasoning is important: we usually evaluate how the decision was made (the causes of the decision) and what would happen if a different was made (the effects of the decision).

Several causal (counterfactual) definitions of fairness have recently been suggested, but these definitions suffer from any of the following caveats: they rely on ill-defined interventions, require identification conditions that are unreasonably strong, can be gamed by decision makers with malicious intentions, or fail to capture arguably reasonable notions of discrimination.

Motivated by the shortcomings of the existing definitions of fairness, we introduce two new causal criteria to prevent discrimination in practice. These criteria can be applied to settings with non-binary and time-varying decisions. We suggest strategies to evaluate whether these criteria hold in observed data and give conditions that allow identification of counterfactual outcomes under new, non-discriminatory decision rules. The interpretation of our criteria is discussed in several examples.

Causal Inference for time-to-event data

Chairs: Sarah Friedrich and Jan Feifel

Truncation by death and the survival-incorporated median: What are we measuring? And why?
Judith J. Lok1, Qingyan Xiang2, Ronald J. Bosch3
1Department of Mathematics and Statistics, Boston University, United States of America; 2Department of Biostatistics, Boston University, United States of America; 3Center for Biostatistics in AIDS Research, Harvard University, United States of America

One could argue that if a person dies, their subsequent health outcomes are missing. On the other hand, one could argue that if a person dies, their health outcomes are completely obvious. This talk considers the second point of view, and advocates to not always see death as a mechanism through which health outcomes are missing, but rather as part of the outcome measure. This is especially useful when some people’s lives may be saved by a treatment we wish to study. We will show that both the median health score in those alive and the median health score in the always-survivors can lead one to believe that there is a trade-off between survival and good health scores, even in cases where in clinical practice both the probability of survival and the probability of a good health score are better for one treatment arm. To overcome this issue, we propose the survival-incorporated median as an alternative summary measure of health outcomes in the presence of death. It is the outcome value such that 50% of the population is alive with an outcome above that value. The survival-incorporated median can be interpreted as what happens to the “average” person. The survival-incorporated median is particularly relevant in settings with non-negligible mortality. We will illustrate our approach by estimating the effect of statins on neurocognitive function.

Multi-state modeling and causal censoring of treatment discontinuations in randomized clinical trials
Alexandra Nießl1, Jan Beyersmann1, Anja Loos2
1University of Ulm, Germany; 2Global Biostatistics, Merck KGaA, Darmstadt, Germany

The current COVID-19 pandemic and subsequent restrictions have various consequences on planned and ongoing clinical trials. Its effects on the conduct of a clinical trial create several challenges in analyzing and interpreting study data. In particular, a substantial amount of COVID-19-related treatment interruptions will affect the ability of the study to show the primary objective of the trial.

Recently, we investigated the impact of treatment discontinuations due to a clinical hold on the treatment effect of a clinical trial. A clinical hold order by the Food and Drug Administration (FDA) to the sponsor of a clinical trial is a measure to delay a proposed or to suspend an ongoing clinical investigation. The phase III clinical trial START with primary endpoint overall survival served as the motivating data example to explore implications and potential statistical approaches for a trial continuing after a clinical hold is lifted. We proposed a multistate model incorporating the clinical hold as well as disease progression as intermediate events to investigate the impact of the clinical hold on the treatment effect. The multistate modeling approach offers several advantages: Firstly, it naturally models the dependence between PFS and OS. Secondly, it could easily be extended to additionally account for time-dependent exposures. Thirdly, it provides the framework for a simple causal analysis of treatment effects using censoring. Here, we censor patients at the beginning of the clinical hold. Using a realistic simulation study informed by the START data, we showed that our censoring approach is flexible and it provides reasonable estimates of the treatment effect, which would be observed if no clinical hold has occurred. We pointed out that the censoring approach coincides with the causal g-computation formula and has a causal interpretation regarding the intention of the initial treatment.

Within the talk, we will present our multistate model approach and show our results with a focus on the censoring approach and the link to causal inference. Furthermore, we also propose a causal filtering approach. We will discuss the assumptions that have to be fulfilled for the ‘causal’ censoring or filtering to address treatment interruptions in general settings with an external time-dependent covariate inducing a time-varying treatment and, particularly, in the context of COVID-19.


Nießl, Alexandra, Jan Beyersmann, and Anja Loos. „Multistate modeling of clinical hold in randomized clinical trials.“ Pharmaceutical Statistics 19.3 (2020): 262-275

Examining the causal mediating role of brain pathology on the relationship between subclinical cardiovascular disease and cognitive impairment: The Cardiovascular Health Study
Ryan M Andrews1, Vanessa Didelez1, Ilya Shpitser2, Michelle C Carlson2
1Leibniz Institute for Prevention Research and Epidemiology – BIPS, Germany; 2Johns Hopkins University

Accumulating evidence suggests that there is a link between subclinical cardiovascular disease and the onset of cognitive impairment in later life. Less is known about possible causal mechanisms underlying this relationship; however, a leading hypothesis is that brain biomarkers play an intermediary role. In this study, we aimed to estimate the proportion of the total effect of subclinical cardiovascular disease on incident cognitive impairment that is mediated through two brain biomarkers–brain hypoperfusion and white matter disease. To do this, we used data from the Cardiovascular Health Study, a large longitudinal cohort study of older adults across the United States. Because brain hypoperfusion and white matter disease may themselves be causally linked with an uncertain temporal ordering, we could not use most multiple mediator methods because we did not believe their assumptions would be met (i.e., that we had independent and causally ordered mediators). We overcame this challenge by applying an innovative causal mediation method—inverse odds ratio weighting—that can accommodate multiple mediators regardless of their temporal ordering or possible effects on each other.

We found that after imposing inclusion and exclusion criteria, approximately 20% of the effect of subclinical cardiovascular disease on incident cognitive impairment was jointly mediated by brain hypoperfusion and white matter disease. We also found that the mediated proportion varied by the type of cognitive impairment, with 21% of the effect being mediated among those with Mild Cognitive Impairment and 12% being mediated among those with dementia.

Interpreting our results as causal effects relies on the plausibility of many assumptions and must be done carefully. Based on subject matter knowledge and the results of several sensitivity analyses, we conclude that most (if not all) assumptions are indeed plausible; consequently, we believe our findings support the idea that brain hypoperfusion and white matter disease are on the causal pathway between subclinical cardiovascular disease and cognitive impairment, particularly Mild Cognitive Impairment. To our knowledge, our study is the first epidemiological study to support the existence of this etiological mechanism. We encourage future studies to extend and to replicate these results.

Statistical Methods for Spatial Cluster Detection in Rare Diseases: A Simulation Study of Childhood Cancer Incidence
Michael Schündeln1, Toni Lange2, Maximilian Knoll3, Claudia Spix4, Hermann Brenner5,6,7, Kayan Bozorgmehr8, Christian Stock9
1Pediatric Hematology and Oncology, Department of Pediatrics III, University Hospital Essen and the University of Duisburg-Essen, Essen, Germany.; 2Center for Evidence-based Healthcare, University Hospital and Faculty of Medicine Carl Gustav Carus, TU Dresden, Germany.; 3Clinical Cooperation Unit Radiation Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany.; 4German Childhood Cancer Registry, Institute for Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Centre of the Johannes Gutenberg University Mainz, Mainz, Germany.; 5Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany; 6Division of Preventive Oncology, German Cancer Research Center (DKFZ) and National Center for Tumor Diseases (NCT), Heidelberg, Germany; 7German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Heidelberg, Germany; 8Department of Population Medicine and Health Services Research, School of Public Health, Bielefeld University, Bielefeld, Germany; 9Institute of Medical Biometry and Informatics (IMBI), University of Heidelberg, Heidelberg, Germany

Background and objective: The potential existence of spatial clusters in childhood cancer incidence is a debated topic. Identification of such clusters may help to better understand etiology and develop preventive strategies. We evaluated widely used statistical approaches of cluster detection in this context.

Simulation Study: We simulated the incidence of newly diagnosed childhood cancer (140/1,000,000 children under 15 years) and nephroblastoma (7/1,000,000). Clusters of defined size (1 to 50) and relative risk (1 to 100) were randomly assembled on the district level in Germany. For each combination of size and RR 2000 iterations were performed. We then applied three local clustering tests to the simulated data. The Besag-Newell method, the spatial scan statistic and the Bayesian Besag-York-Mollié with Integrated Nested Laplace Approximation approach. We then described the operating characteristics of the tests systematically (such as sensitivity, specificity, predictive values, power etc.).

Results: Depending on the simulated setting, the performance of the tests varied considerably within and between methods. In all methods, the sensitivity was positively associated with increasing size, incidence and RR of the high-risk area. In low RR scenarios, the BYM method showed the highest specificity. In the nephroblastoma scenario compared with the scenario including all cancer cases the performance of all methods was lower.

Conclusion: Reliable inferences on the existence of spatial clusters based on single statistical approaches in childhood cancer remains a challenge. The application of multiple methods, ideally with known operating characteristics, and a critical discussion of the joint evidence is required when aiming to identify high-risk clusters.

Statistical Methods in Epidemiology

Chairs: Anke Huels and Jörg Rahnenführer

Epidemiologische Modelle in der Öffentlichkeit – mit Statistik durch die Pandemie
Lars Koppers
Science Media Center Germany; Department für Wissenschaftskommunikation, Karlsruher Institut für Technologie, Germany

Die Corona-Pandemie hat gezeigt, wie wichtig mathematisches und statistisches Grundwissen auch im Alltag ist. Seit Anfang 2020 werden auch in der Öffentlichkeit statistische Maßzahlen und Modelle diskutiert. Die Bandbreite reicht dabei von einfachen Meldezahlen über Mittelwerte bis zu SIR-Modellen und Simulationen von aktiven Teilchen. Aber welche Modelle und Maßzahlen helfen in welchen Situationen? Welche Schlüsse können aus einer Simulation gezogen werden und welche nicht? Und wie können komplexe Zusammenhänge so vermittelt werden, dass diese auch in der Öffentlichkeit ankommen?

Das gemeinnützige Science Media Center Germany (SMC) wurde 2015 als Intermediär zwischen Wissenschaft und Wissenschaftsjournalismus gegründet. Es stellt dazu zeitnah Einschätzungen und Zitate zu tagesaktuellen Geschehnissen aus der Wissenschaft zur Verfügung und bieten zu unübersichtlichen oder vielschichtigen Themen Expertise und Hintergrundwissen. Das SMC Lab entwickelt als Datenlabor des SMC Software und Services für die eigene Redaktion und für die journalistische Community.

Im Zuge der Corona-Pandemie wuchs der Bedarf an statistischer Expertise im Jounalismus exponentiell. Maßzahlen wie die Verdopplungszeit, der Reproduktionsfaktor R oder die Eigenschaften eines exponentiellen Wachstums müssen so erklärt werden, dass Journalist*innen dazu befähigt werden kompetent über die Pandemie zu berichten. Ein wichtiger Schwerpunkt dabei sind auch die Limitationen eines jeden Modells, schließlich mag ein exponentielles Wachstum für einen kurzen Zeitraum eine treffende Beschreibung einer Zeitreihe sein, in einer endlichen Population kommt dieses Modell aber schnell an seine Grenzen.

Mit zuerst täglichen, inzwischen wöchentlichen Corona-Reports hilft das SMC die aktuelle Datenlage, wie zum Beispiel die Meldezahlen des Robert Koch-Instituts (RKI) und des DIVI-Intensivregisters einzuordnen und zu erklären. Insbesondere die Meldedaten des RKI erzeugen dabei einen hohen Erklärungsbedarf, da Meldeverzug und die Tatsache, dass es sich hier nicht um eine Zufallsstichprobe handelt, dazu verleiten, falsche Schlüsse zu ziehen.

Im Bereich der epidemiologischen Modelle wurden im vergangenen Jahr von vielen Gruppen Preprints und Paper veröffentlicht, oft begleitet von online zugänglichen Dashboards und der Pressemitteilung der zugehörigen Einrichtung. Nicht jedes neue Modell trägt allerdings zum Erkenntnisstand bei, zuweilen fehlt es an fachlicher Expertise in der Modellierung einer Pandemie, die Validierung von Prognosen ist oft unzureichend. Eine Auseinandersetzung mit der Öffentlichkeitswirkung der publizierten Arbeit ist hier notwendig, erst recht wenn dies außerhalb der üblichen Peer Review Verfahren geschieht.

Correcting for bias due to misclassification in dietary patterns using 24 hour dietary recall data
Timm Intemann1, Iris Pigeot1,2
1Leibniz Institute for Prevention Research and Epidemiology – BIPS, Germany; 2Institute of Statistics, Faculty of Mathematics and Computer Science, University of Bremen, Germany

The development of statistical methods for nutritional epidemiology is a challenge, as nutritional data are usually multidimensional and error-prone. Analysing dietary data requires an appropriate method taking into account both multidimensionality and measurement error, but measurement error is often ignored when such data is analysed (1). For example, associations between dietary patterns and health outcomes are commonly investigated by first applying cluster analysis algorithms to derive dietary patterns and then fitting a regression model to estimate the associations. In such a naïve approach, errors in the underlying continuous dietary variables lead to misclassified dietary patterns and to biased effect estimates. To reduce this bias, we developed three correction algorithms for data assessed with a 24 hour dietary recall (24HDR), which has become the preferred dietary assessment tool in large epidemiological studies.

The newly developed correction algorithms combine the measurement error correction methods regression calibration (RC), simulation extrapolation (SIMEX) and multiple imputation (MI) with the cluster methods k-means cluster algorithm and the Gaussian mixture model. These new algorithms are based on univariate correction methods for Box-Cox transformed data (2) and consider the measurement error structure of 24HDR data. They consist mainly of the following three stages: (i) estimation of usual intakes, (ii) deriving patterns based on usual intakes and (iii) estimation of the association between these patterns and an outcome.

We apply the correction algorithms to real data from the IDEFICS/I.Family cohort to estimate the association between meal timing patterns and a marker for the long-term blood sugar level (HbA1c) in European children. Furthermore, we use the fitted parameters from this analysis to mimic the real cohort data in a simulation study. In this simulation study, we consider continuous and binary outcomes in different scenarios and compare the performance of the proposed correction algorithms and the naïve approach with respect to absolute, maximum and relative bias.

Simulation results show that the correction algorithms based on RC and MI perform better than the naïve and the SIMEX-based algorithms. Furthermore, the MI-based approach, which can use outcome information in the error model, is superior to the RC-based approach in most scenarios.


1. Shaw, P. et al. (2018). Epidemiologic analyses with error-prone exposures: Review of current practice and recommendations. Ann Epidemiol 28, 821-828.

2. Intemann, T. et al. (2019). SIMEX for correction of dietary exposure effects with Box-Cox transformed data. Biom J 62, 221-237

Statistical analysis of Covid-19 data in Rhineland-Palatinate
Markus Schepers1, Konstantin Strauch1, Klaus Jahn3, Philipp Zanger2, Emilio Gianicolo1
1IMBEI Unimedizin Mainz, Germany; 2Institut für Hygiene und Infektionsschutz Abteilung Humanmedizin, Landesuntersuchungsamt; 3Gesundheitsministerium (MSAGD)

In this ongoing project we study the infection dynamics and settings of Covid-19 in Rhineland-Palatinate: what are the most common infection pathways? How does the virus typically spread?

Our analysis is based on data of all reported cases (positively tested individuals) in Rhineland-Palatinate during a specific time period, including at least 17 August – 10 November 2020. Around 20% of the reported cases have been traced to an infection cluster. This leads to a second data set of infection clusters, whose observation variables include size of the infection cluster and infection setting (such as `private household‘ or `restaurant‘). In line with previous studies, we found that the majority of infection clusters occurs in `private households‘ (including gatherings where multiple households are involved). Therefore, we are collecting additional information for a stratified sample of infection clusters with infection setting `private household‘. Here, the stratification is according to counties (Landkreise) with separate public health departments (Gesundheitsämter) and size of the infection cluster. We developed a questionnaire whose responses will provide the additional information. The questionnaire contains questions on contact persons, specific occasions and activities promoting the spread of the virus. We calculate descriptive statistics such as mean, median, standard deviation, min and max of the quantities of interest.

Results and observations so far include: Cities have a higher prevalence of Covid-19 cases than the countryside. Most of the infection clusters are local rather than over-regional. We also observe a phenomenon often called over-dispersion or super-spreading, meaning that a relatively small number of individuals and clusters is responsible for the majority of all infection transmissions.

Simultanes regionales Monitorieren von SARS-CoV-2 Infektionen und COVID-19 Sterblichkeit in Bayern durch die standardisierte Infektionsmortalitätsrate (sIFR)
Kirsi Manz, Ulrich Mansmann
Ludwig-Maximilians-Universität München, Deutschland


Regionale Karten erlauben einen schnellen Überblick über die räumliche Verteilung des SARS-CoV-2 Infektionsgeschehens und erlauben regionale Unterschiede zu identifizieren. Zur Vermeidung falsch-positiver Signale werden Gesundheitskarten geglättet. Dies macht eine sachgerechte Interpretation der geographischen Informationen möglich.

Ziel des Beitrags

Wir stellen die standardisierte Infektionsmortalitätsrate (sIFR) als Maßzahl vor, mit der sich simultan das Divergieren von standardisierten COVID-19 spezifischen Infektions- und Sterberaten regional monitorieren lässt. Regionale Abweichungen beider Prozesse von einem globalen Standard erlauben eine Priorisierung regionaler Maßnahmen zwischen Infektionsschutz und Patientenversorgung.

Materialien und Methoden

Die regionale sIFR ist der Quotient zwischen standardisierter Mortalitäts- und Infektionsrate. Sie beschreibt um wieviel mehr die regionale Abweichung im Sterbeprozess sich von der regionalen Abweichung im Infektionsprozess unterscheidet. Die sIFR-Werte werden mittels eines bayesianischen Konvolutionsmodells geschätzt und in Karten dargestellt. Unsere Analysen verwenden die Meldedaten zum SARS-CoV-2 Geschehen in Bayern im Jahr 2020 und betrachten 4 Zeitperioden zu je drei Monaten.

Ergebnisse und Diskussion

Die empirische Infektionssterblichkeit in Bayern zeigt einen abfallenden Trend über die Zeitperioden. Regionen mit höheren Abweichungen im Sterben vom bayerischen Standard verglichen zum Infektionsgeschehen (sIFR > 2) sind in den ersten drei Monaten nur in der Oberpfalz zu beobachten. Im Sommer befinden sie sich im gesamten Osten, im Spätsommer/Herbst dann im Norden Bayerns. Wir zeigen regionale Veränderungen der sIFR-Werte für Bayerns Regionen über die Zeit. Damit werden Regionen identifiziert, die zusätzlich zum Management der Infektionsausbreitung Maßnahmen zur Kontrolle der Sterblichkeit benötigen.

Causal Inference and Statistical Methods for Epidemiology

Chairs: Ryan Andrews and Vanessa Didelez

Causal inference methods for small non-randomized studies: Methods and an application in COVID-19
Sarah Friedrich, Tim Friede
University Medical Center Göttingen, Germany

The usual development cycles are too slow for the development of vaccines, diagnostics and treatments in pandemics such as the ongoing SARS-CoV-2 pandemic. Given the pressure in such a situation, there is a risk that findings of early clinical trials are overinterpreted despite their limitations in terms of size and design. Motivated by a non-randomized open-label study investigating the efficacy of hydroxychloroquine in patients with COVID-19, we describe in a unified fashion various alternative approaches to the analysis of non-randomized studies. A widely used tool to reduce the impact of treatment-selection bias are propensity score (PS) methods and g-computation. Conditioning on the propensity score allows one to replicate the design of a randomized controlled trial, con­ditional on observed covariates. Moreover, doubly robust estimators provide additional advantages. Here, we investigate the properties of propensity score based methods including three variations of doubly robust esti­mators in small sample settings, typical for early trials, in a simulation study.

Testing Instrument Validity in Multivariable Mendelian Randomisation
Maximilian Michael Mandl1, Anne-Laure Boulesteix1, Stephen Burgess2, Verena Zuber3
1Ludwig-Maximilians-Universität München; 2University of Cambridge; 3Imperial College London

Identification of causal effects in biomedical sciences is a challenging task. Most causal inference methods rely on specific assumptions which in practice may be unrealistic and too restrictive. However, Mendelian Randomisation (MR) is an instrumental variable approach that makes use of genetic variants to infer a causal effect of a risk factor on an outcome. Due to the randomisation of the genetic variants during meiosis, these are predestined instrumental variables that have the potential to naturally meet the restrictive methodological requirements. Thus, causal effects can be consistently inferred even if unobserved confounders are present. Obviously, this setting still requires the genetic variants to be independent of the outcome conditional on the risk factor and unobserved confounders, which is known as the exclusion-restriction assumption (ERA). Violations of this assumption, i.e. the effect of the instrumental variables on the outcome through a different path than the risk factor included in the model, can be caused by pleiotropy, which is a common phenomenon in human genetics. As an extension to the standard MR approach, multivariable MR includes multiple potential risk factors in one joint model accounting for measured pleiotropy. Genetic variants which deviate from the ERA appear as outliers to the MR model fit and can be detected by general heterogeneity statistics proposed in the literature. In MR analysis these are often inflated due to heterogeneity of how genetic variants exert their downstream effect on the exposures of interest, which impedes detection of outlying instruments using the traditional methods.

Removing valid instruments or keeping invalid instruments in the MR model may lead to a bias of the causal effect estimates and false positive findings. As different heterogeneity measures lead to a variety of conclusions with regard to outlying instruments, researchers face a typical decision problem, also known as researcher degrees of freedom. These free choices in the selection of valid instruments can lead to serious problems like fishing for significance.

Firstly, we demonstrate the impact of outliers and how arbitrary choices in the selection of instrumental variables can induce false positive findings in realistic simulation studies and in the analysis of real data investigating the effect of blood lipids on coronary heart disease and Alzheimer’s disease. Secondly, we propose a method that corrects for overdispersion of the heterogeneity statistics in MR analysis by making use of the estimated inflation factor to correctly remove outlying instruments and therefore accounting for pleiotropic effects.

Causal Discovery with Incomplete Cohort Data
Janine Witte1,2, Ronja Foraita1, Ryan M. Andrews1, Vanessa Didelez1,2
1Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany; 2University of Bremen, Germany


Cohort studies in health research often involve the collection of large numbers of variables over a period of time. They thus form the ideal basis for exploring the relationships among many variables simultaneously, e.g. by methods such as constraint-based causal discovery. These methods aim at inferring a causal graph, combining causal assumptions with statistical tests for conditional independence. A typical problem in practice are missing values. Simple methods for dealing with incomplete data, such as list-wise deletion and mean imputation, can lead to inefficient and biased inference.


We consider test-wise deletion and multiple imputation for causal discovery. The former applies each conditional independence test to the records containing complete information on the variables used for the test. For multiple imputation, the missing values are imputed M>1 times, the conditional independence test is run on each of the M data sets, and the test results are combined using an appropriate pooling method. We implemented multiple imputation and pooling procedures for causal discovery with continuous, discrete and mixed data. We then compared the performance of test-wise deletion and multiple imputation in scenarios with different missing data patterns typical for cohort data.


Both test-wise deletion and multiple imputation rely on untestable assumptions about the missingness mechanism. Test-wise deletion is computationally simple and can in principle be combined with any conditional independence test. However, it ignores possibly valuable information in partially observed records, hence the power can be low. Multiple imputation has the potential to exploit more information, and outperformed test-wise deletion in several of our simulation scenarios. The simulations also showed, however, that conditional independence testing after multiple imputation is impaired by small sample sizes and large numbers of conditioning variables, especially when the variables are categorical or mixed. Care needs to be taken when choosing the imputation models, as multiple imputation may break down when the number of variables is large, as is typical for cohort studies. Preliminary results suggest that drop-out is best dealt with using test-wise deletion.


Both test-wise deletion and multiple imputation are promising strategies for dealing with missing values in causal discovery, each with their own advantages. Multiple imputation can potentially exploit more information than test-wise deletion, but requires some care when choosing the imputation models. R code for combining test-wise deletion and multiple imputation with different conditional independence tests is available.

What Difference Does Multiple Imputation Make In Longitudinal Modeling of EQ-5D-5L Data: Empirical Analyses of Two Datasets
Lina Maria Serna Higuita1, Inka Roesel1, Fatima Al Sayah2, Maresa Buchholz3, Ines Buchholz3, Thomas Kohlmann3, Peter Martus1, You-Shan Feng1
1Institute for Clinical Epidemiology and Applied Biostatistics, Medical University of Tübingen, Tübingen, Germany; 2Alberta PROMs and EQ-5D Research and Support Unit (APERSU), School of Public Health, University of Alberta, Alberta, Canada; 3Institute for Community Medicine, Medical University Greifswald, Greifswald, Germany

Background: Although multiple imputation (MI) is the state-of-the-art method for managing missing data, it is not clear how missing values in multi-item instruments should be handled, e.g. MI at item or at score level. In addition, longitudinal data analysis techniques such as mixed models (MM) may be equally valid. We therefore explored the differences in modeling the scores of a health-related quality of life questionnaire (EQ-5D-5L) using MM with and without MI at item and score level, in two real data sets.

Methods: We explored 1) Agreement analysis using the observed missing data patterns of EQ-5D-5L responses for a Canadian study, which included patients with type-II diabetes at three time points (Alberta’s Caring for Diabetes (ABCD); n=2,040); and 2) Validation analysis using simulated missing patterns for complete cases of a German multi-center study of rehabilitation patients pre- and post-treatment (German Rehabilitation (GR); n=691). Two missing mechanisms (MCAR and MAR) at 8 percentages of missings (5%-65%) were applied to the GR data. Approaches to handle missing EQ-5D-5L scores for all datasets were: Approach-1) MM using respondents with complete cases, approach-2) MM using all available data, approach-3) MM after MI of the EQ-5D-5L scores, and approach-4) MM after MI of EQ-5D-5L items. Agreement was assessed by comparing predicted values and regression coefficients. Validation was examined using mean squared errors (MSE) and standard errors (SE) compared to the original dataset.


Agreement: The ABCD respondents with missing EQ-5D-5L (40.3%) had significantly poorer self-rated health, and lower academic achievement. All 4 approaches estimated similar baseline scores (ABCD≈0.798). At follow up, approach-1 resulted in the highest mean scores (ABCD=0.792) while approach-4 produced the lowest scores (ABCD=0.765). The largest slope of change was observed for approach-4 (visit1–visit3: -0.027), while the smallest slopes were observed for approach-2 (visit3–visit1:-0.011).

Validation: SE and MSE increased with increasing percentages of simulated missing GR data. All approaches showed similar SE and MSE (SE: 0.006-0.011; MSE: 0.032-0.033), however approach-4 showed in the most inaccurate predictions, underestimating the score.

Discussion: In these data, complete case analyses overestimated the scores and MM after MI by items yielded the lowest scores. As there was no loss of accuracy, MM without MI, when baseline covariates are complete, might be the most parsimonious choice to deal with missing data. However, MI may be needed when baseline covariates are missing and/or more than two timepoints are considered.

Exploring missing patterns and missingness mechanisms in longitudinal patient-reported outcomes using data from a non-randomized controlled trial study
Pimrapat Gebert1,2,3, Daniel Schindel1, Johann Frick1, Liane Schenk1, Ulrike Grittner2,3
1Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Medical Sociology and Rehabilitation Science; 2Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology; 3Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany

Missing data mechanism plays an important role in the context of handling and analyzing data subject to missings. Longitudinal patient-reported outcome measures (PROMs) are usually far from complete, especially in seriously ill patients. To choose an appropriate strategy for handling missing data, most statistical approaches require knowledge about missingness patterns and assumptions about the type of missing data mechanism. We demonstrated how to explore the missingness patterns and mechanisms using PROMs data including global health status/QoL (GH/QoL) in the EORTC QLQ-C30, Patient reaction assessment (PRA-D), The Revised Illness Perception Questionnaire (IPQ-R), German modified version of the Autonomy Preference Index (API-DM), Decision Conflict Scale (DCS), and European health literacy survey (HLS-EU-Q6) from the Oncological Social Care Project (OSCAR) study. Linear random-effects pattern-mixture models were performed for identifying missing not at random (MNAR) for each pattern. We found that the missing data on the GH/QoL in the EORTC QLQ-C30 could be assumed as MNAR in missing data due to massive worsening of health status and death. However, there was no evidence of MNAR in any other PROMs measures. Although determining the true missing data mechanism is impossible, a pattern-mixture model can be useful in evaluating the effects of informative missingness in longitudinal PROMs.

Teaching and Didactics in Biometry

Chairs: Carolin Herrmann and Maren Vens

How to enhance gameful learning in the STEM subjects
Amir Madany Mamlouk
Institute for Neuro- and Biocomputing, University of Lübeck, Germany

Playing games is fun, learning should actually be just as much fun. But at universities – in the STEM subjects in particular – it’s usually not fun at all. On the contrary, many students drop out of their studies because they are not up to the requirements and cannot close existing gaps in knowledge. Others get sick in their studies because they are not up to the demands. In this lecture, I would like to raise awareness of the fact that our current study system often runs counter to all the principles of a successful game design. Furthermore, I would like to tell you in this talk about my own efforts to correct this systemic misalignment between learning at universities and gameful learning. Over the last few years, we have developed a multiple award-winning experience points-based assessment system (XPerts – From Zero to Hero) and systematically evaluated it in practice using a lecture on bioinformatics. I will illustrate this with a few examples and offer suggestions on how you can already achieve a fundamental change in the teaching and learning culture in your own courses, even with the smallest of changes. 

Herausforderungen der Online-Lehre und was wir gelernt haben – am Beispiel des Masterstudiengangs Medical Biometry/Biostatistics und Zertifikats Medical Data Science der Universität Heidelberg
Marietta Kirchner, Regina Krisam, Meinhard Kieser
Institute of Medical Biometry and Informatics, Heidelberg University, Germany

Am Institut für Medizinische Biometrie und Informatik der Universität Heidelberg wird seit 2006 der weiterbildende Masterstudiengang Medical Biometry/ Biostatistics und seit 2019 das Zertifikat Medical Data Science angeboten. Beide Programme sind berufsbegleitend, deren Lehrveranstaltungen in Blockkursen an 3 aufeinander folgenden Tagen mit mehreren 90-minütigen Einheiten stattfinden. Als im März 2020 aufgrund der COVID-19 Pandemie alle Präsenzlehrveranstaltungen der Universität Heidelberg mit sofortiger Wirkung eingestellt wurden, erforderte dies eine schnelle Umorganisation der laufenden und anstehenden Kurse, um den Studienbetrieb erfolgreich aufrecht zu erhalten. Die Universität Heidelberg stellte ein Online Curriculum bereit, welches fortlaufend angepasst wurde, sowie ein Videokonferenzsystem für synchrone Lehrveranstaltungen.

Die abrupte Unterbrechung und der schnelle Umstieg auf Online-Lehre führten zu neuen Herausforderungen, bei denen das Zurückgreifen auf bewährte Vorgehensweisen nicht gegeben war. Die praktischen Programmier-Einheiten in R und die Block-Gestaltung stellten hierbei zusätzliche Herausforderungen dar, sowohl für die Teilnehmer als auch für die Dozenten. Auch wenn das Angebot an Online Lehrveranstaltungen in den letzten Jahren stetig zugenommen hat, ist nicht umfassend untersucht, ob dies einen vergleichbaren Wert wie der traditionelle Präsenzunterricht hat und welche Voraussetzungen geschaffen werden müssen für eine erfolgreiche Lehr-/Lernsituation. Richtig umgesetzt kann Online-Lehre zu einer Leistungsverbesserung bei den Studierenden führen (Shah, 2016).

Doch was macht gute Online-Lehre aus? Gelungene Online-Lehrveranstaltungen nutzen die Vorteile der verwendeten Online-Tools aus und fördern die Kommunikation zwischen den Dozenten und Studenten (Oliver, 1999). Das zur Verfügung gestellte Videokonferenzsystem bietet verschiedene Strategien an, um eine fruchtbare Online-Lernumgebungen zu schaffen. Einführungen in die Verwendung des Videokonferenzsystem enthielten Empfehlungen zum Einsatz des Systems und zur Förderung und Gestaltung der Interaktion mit den Studierenden.

Im Vortrag wird dargestellt, welche Herausforderungen und Chancen aus Sicht der Organisatoren der Studienprogramme und der Lehrenden aufgetreten sind. Die Sicht der Lernenden wird dargestellt basierend auf durchgeführten Evaluationen und ausführlichem Feedback aus Gesprächen und E-Mails. Es werden die Erfahrungen aus zwei Semestern Online-Lehre präsentiert mit dem Fokus auf „Was haben wir gemacht, um eine erfolgreiche Vermittlung der Inhalte zu gewährleisten?“ und „Was haben wir für zukünftige Lehrveranstaltungen gelernt – Präsenz oder Online?“.


R. Oliver (1999). Exploring strategies for online teaching and learning. Distance Education, 20:240-254. DOI: 10.1080/0158791990200205

D. Shah (2016). Online education: should we take it seriously? Climacteric, 19:3-6, DOI: 10.3109/13697137.2015.1115314

The iBikE Smart Learner: evaluation of an interactive web-based learning tool to specifically address statistical misconceptions
Sophie K. Piper1,2, Ralph Schilling1,2, Oliver Schweizerhof1,2, Anne Pohrt1,2, Dörte Huscher1,2, Uwe Schöneberg1,2, Eike Middell3, Ulrike Grittner1,2
1Institute of Biometry and Clinical Epidemiology, Charité – Universitätsmedizin Berlin, Charitéplatz 1, D-10117 Berlin, Germany; 2Berlin Institute of Health (BIH), Anna-Louisa-Karsch Str. 2, 10178 Berlin, Germany; 3Dr. Eike Middell, Moosdorfstr. 4, 12435 Berlin


Statistics is often an unpopular subject for medical students and researchers. However, methodological skills are essential for the correct interpretation of research results and thus for the quality of research in general. Understanding statistical concepts in particular plays a central role. In standard medical training, relatively little attention is paid to the development of these competencies, so that researching physicians (from students to professors) often have deficits and misconceptions.

The most typical example is the incorrect interpretation of the p-value. Misconceptions lead to misinterpretations of what statistics can do and where certain methods reach their limits. Therefore, methods are misused and/or results are misinterpreted, which in turn can have consequences for further research and ultimately for patients.


We developed a learning tool called the „iBikE-Smart Learner“ – an interactive, web-based teaching program similar to the AMBOSS learning software for medical students. It is designed to address common misconceptions in statistics in a targeted (modular) manner and provides teaching elements adapted to the individual knowledge and demand of the user.

Specifically, we were able to complete the first module „Statistical misconceptions about the p-value“. This module consists of a self-contained set of multiple-choice questions directly addressing common misconceptions about the p-value based on typical examples in medical Research. A first (beta) version of the „iBikE-Smart Learner“ was already available at the end of October 2019 and has been tested internally by experienced staff members of our institute.

In November 2020, we started a randomized controlled trial among researchers at the Charité to evaluate this first module. We plan to recruit 100 participants. Primary outcome is the overall performance rate which will be compared between users randomized to the full version of the tool and those randomized to the control version that has all teaching features turned off. Additionally, self-reported statistical literacy before and after using the tool as well as a subjective evaluation of the tools’ usefulness were assessed.


Until submission of this abstract, 30 participants have been recruited for the ongoing randomized controlled evaluation study. We plan to promote the iBikE-Smart Learner and show results of the evaluation study at the Charité.


We developed and evaluated a first module of the “iBikE-Smart Learner” as a web-based teaching tool addressing common misconceptions about the p-value.

Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin
Ursula Berger1, Carolin Herrmann2
1LMU München, Germany; 2Charité – Universitätsmedizin Berlin, Germany

Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin umfasst zentrale biometrische Begriffe, Kennzahlen, Konzepte und Methoden sowie Fertigkeiten, die Medizinstudierenden ein Grundverständnis für Biometrie und Datenanalyse vermitteln. Er soll die Planung von Lehrangeboten zur Medizinischen Biometrie im Studium der Humanmedizin erleichtern und Studierenden eine Orientierungshilfe bieten.

Der Lernzielkatalog listet die verschiedenen Lernthemen nach Oberthemen zusammengefasst auf. Zu jedem Lernthema werden die geforderten Fähigkeiten, Fertigkeiten und Kenntnisse der Studierenden durch Verben beschrieben, die auch den Wissensgrad bzw. die Ebene der Lernziele widerspiegelt. Zusätzlich wurden die Lernthemen mit Anmerkungen und Hinweisen für die Lehrenden ergänzt. Bei der Erstellung der Lernthemen wurde der neue Nationale Kompetenzbasierte Lernzielkatalog Medizin NKLM 2.0 im aktuell verfügbaren Entwicklungsstadium (11.2020) berücksichtigt. Der Lernzielkatalog gibt keine Abfolge und keinen zeitlichen Rahmen für ein Curriculum vor und kann daher flexibel in unterschiedlich strukturierten Curricula und unterschiedlichen Typen von Studiengängen der Humanmedizin angewendet werden.

Die Erstellung des Lernzielkatalogs wurde von der gemeinsamen Arbeitsgruppe Lehre und Didaktik der Biometrie der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS) und der Internationalen Biometrischen Gesellschaft der Deutschen Region (IBS-DR) koordiniert. Dazu wurden in 2020 mehrere Workshops ausgerichtet, in welchen unter der Mitwirkung vieler Fachkolleg*innen eine erste Version erarbeitet werden konnte, die im Dezember 2020 der Fachöffentlichkeit zur Kommentierung vorgestellt wurde (send-to: LZK-Biometrie@charite.de). Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin soll nun, nach Ender der Kommentierungsphase, in seiner überarbeiteten Version vorgestellt werden.

Statistical humor in classroom: Jokes and cartoons for significant fun with relevant effect
Annette Aigner
Charité Universitätsmedizin Berlin, Germany

Small talk with a statistician: Q: „What’s your relationship with your parents? A: „1:2“

Such and similar, short or long jokes, but also cartoons and other humorous means not only amuse statisticians, but also create an easy, positive access for students to a subject generally perceived as difficult, such as statistics.

This article aims to highlight the relevance and positive effects of humor in teaching in general, but especially of easy-to-use materials such as jokes and cartoons. Hints and suggestions for their proper use are given, but of course there are no limits to their implementation in the classroom. In addition, the article contains a collection of freely available online resources that can be used immediately in the statistics classroom and in which everyone can find materials suitable for the specific teaching situation. As a exemplary application, materials for use in an introductory session on linear regression are shown and the author’s personal experiences are briefly summarized.

As statisticians, we know that statistics is fun – now we should also convey this to students, why not with the help of jokes and cartoons?


STRengthening Analytical Thinking for Observational Studies: a brief overview of some contributions of the STRATOS initiative

Chairs: Anne-Laure Boulesteix and Willi Sauerbrei

On recent progress of topic groups and panels
Willi Sauerbrei1, Michal Abrahamowicz2, Marianne Huebner3, Ruth Keogh4 on behalf of the STRATOS initiative, Freiburg, Germany
1Medical Center – University of Freiburg, Germany, 2McGill, Montreal, Canada, 3Michigan State University, East Lansing, USA, 4London School of Hygiene and Tropical Medicine, UK

Observational studies present researchers with a number of analytical challenges, related to both: complexity of the underlying processes and imperfections of the available data (e.g. unmeasured confounders, missing data, measurement errors). Whereas many methods have been proposed to address specific challenges, there is little consensus regarding which among the alternative methods are preferable for what types of data. Often, there is also lack of solid evidence concerning systematic validation and comparisons of the performance of the methods.

To address these complex issues, the STRATOS initiative was launched in 2013. In 2021, STRATOS involves more than 100 researchers from 19 countries worldwide with background in biostatistical and epidemiological methods. The initiative has 9 Topic Groups (TG), each focusing on a different set of ‘generic’ analytical challenges (e.g. measurement errors or survival analysis) and 11 panels (e.g. publications, simulation studies, visualisation) co-ordinate it,  to share best research practices and to disseminate research tools and results from the work of the TGs.  

We will provide a short overview of recent progress, point to some research urgently needed and emphasize the importance of knowledge translation.More details are provided in short reports from all TGs and some panels which are regular contributions in the Biometric Bulletin, the newsletter of the International Biometric Society (https://stratos-initiative.org/publications), since issue 3 from 2017.

Statistical analysis of high-dimensional biomedical data: issues and challenges in translation to medically useful results
Lisa Meier McShane on behalf of the high-dimensional data topic group, US-NCI, Bethesda, USA
Division of Cancer Treatment and Diagnosis, U.S. National Cancer Institute, National Institutes of Health, USA

Successful translation of research involving high-dimensional biomedical data to medically useful results requires a research team with expertise including clinical and laboratory science, bioinformatics, computational science, and statistics.  A proliferation of pubic databases and powerful data analysis tools have led to many biomedical publications reporting results suggested to have potential clinical application.  However, many of these results cannot be reproduced in subsequent studies, or the findings, although meeting statistical significance criteria or other numerical performance criteria, have no clear clinical utility.  Many factors have been suggested as contributors to irreproducible or clinically non-translatable biomedical research, including poor study design, analytic instability of measurement methods, sloppy data handling, inappropriate and misleading statistical analysis methods, improper reporting or interpretation of results, and on rare occasions, outright scientific misconduct.  Although these challenges can arise in a variety of medical research studies, this talk will focus on research involving use of novel measurement technologies such as “omics assays” which generate large volumes of data requiring specialized expertise and computational approaches for proper management, analysis and interpretation [http://iom.edu/Reports/2012/Evolution-of-Translational-Omics.aspx].  Research team members share responsibility for ensuring that research is performed with integrity and best practices are followed to ensure reproducible results.  Further, strong engagement of statisticians and other computational scientists with experts in the relevant medical specialties is critical to generation of medically interpretable and useful findings.  Through a series of case studies, the many dimensions of reproducible and medically translatable omics research are explored and recommendations aiming to increase the translational value of the research output are discussed. 

Towards stronger simulation studies in statistical research
Tim Morris on behalf of the Simulation panel, London, UK
MRC Clinical Trials Unit at University College London, UK

Simulation studies are a tool for understanding and evaluating statistical methods. They are sometimes necessary to generate evidence about which methods are suitable and – importantly – unsuitable for use, and when. In medical research, statisticians have been pivotal to the introduction of reporting guidelines such as CONSORT. The idea is that these give readers enough understanding of how a study was conducted that they could replicate the study themselves. Simulation studies are relatively easy to replicate but, as a profession, we tend to forget our fondness for clear reporting. In this talk, I will describe some common failings and make suggestions about the structure and details that help to clarify published reports of simulation studies.


General discussion about potential contributions to the future work of the STRATOS initiative (https://stratos-initiative.org/).

Statistics in Nursing Sciences

Chairs: Werner Brannath and Karin Wolf-Ostermann

Methodische Impulse und statistische Analyseverfahren, die zur Theorieentwicklung und -Prüfung in der Pflegewissenschaft beitragen können
Albert Bruehl
Philosophisch-Theologische Hochschule Vallendar

Statistische Analyseverfahren können Impulse zur Theorieentwicklung in der Pflegewissenschaft geben. Methoden hierzu sind die Entwicklung und Prüfung von Hypothesen. In Standardwerken zur Einführung in die Statistik in den Sozialwissenschaften wird ausschließlich die Hypothesenprüfung als Haupt-Aufgabe der Statistik definiert. Hypothesenentwicklung wäre eine zusätzliche Aufgabe für die Anwendung von Statistik, die bei Gegenständen, wie sie in der Pflegewissenschaft behandelt werden, besonders wichtig werden kann (Brühl, Fried, 2020).

Bei vielen Fragestellungen innerhalb der Pflegewissenschaft haben wir es nämlich mit Versuchen zu tun, empirische Gegenstände über Konstrukte zu modellieren. Beispiele hierfür wären die Modelle zu den Konstrukten „Pflegebedürftigkeit“ und „Pflegequalität“. Theoretisch grundgelegt und empirisch unterstützt sind diese Modelle nicht.

Werden nun Regressionen mit klassischen H0-Hypothesentests zur Datenanalyse im Bereich von Konstrukten wie der Pflegebedürftigkeit und der Pflegequalität eingesetzt, lernen wir, dass die Konstrukte, die wir zu Pflegebedürftigkeit und Pflegequalität im Einsatz haben, empirisch wenig hilfreich sind. Das gilt für multivariate Regressionen, die Arbeitszeiten mit Hilfe von Pflegegradkriterien schlecht erklären (Rothgang, 2020), das gilt für nicht-parametrische Regressionen, Multivariate Regression Splines und Mehr-Ebenen-Modelle, die Arbeitszeit mit Bewohner- und auch Organisations-Variablen nicht gut erklären (Brühl, Planer 2019) und das gilt auch für logistische Regressionen (Görres et al.,2017) und logistische Mehr-Ebenen-Analysen (Brühl, Planer, 2019), die Qualitätsindikatoren nicht gut erklären. Meist werden trotz der bescheidenen Erfolge der statistischen Analysen, auf dieser Basis trotzdem Anwendungsroutinen z.B. zur Personalbemessung und zur Messung von Pflegequalität etabliert.

Aus dieser Art des Einsatzes von Statistik ergeben sich kaum Ansätze für die Weiterentwicklung der eingesetzten Konstrukte. Hierzu sind strukturierende Verfahren besser geeignet. Beispiel hierfür kann der Einsatz verschiedener Varianten der ordinalen Multidimensionalen Skalierung (Borg, 2018) sein, die bei der Weiterentwicklung des Konstrukts der Pflegebedürftigkeit (Teigeler, 2017) und bei der Erfassung von Prozessqualität (Brühl et al, 2021) helfen. Ein weiteres Verfahren, das hier helfen kann, sind die Multiplen Korrespondenzanalysen (Greenacre, 2017), die auch bei kleinen Fallzahlen und mit Nominaldaten eingesetzt werden können. Zur Theorieprüfung können konfirmatorische Varianten der strukturierenden Verfahren eingesetzt werden. Im Vortrag werden Beispiele hierzu vorgestellt.


Borg, I., Groenen, P. J., & Mair, P. (2018). Applied multidimensional scaling and unfolding (2nd ed.). Springer-Verlag.

Brühl, A., Planer, K. (2019): PiBaWü – Zur Interaktion von Pflegebedürftigkeit, Pflegequalität und Personalbedarf. Freiburg: Lambertus

Brühl, A. (2020): Anwendung von statistischen Analyseverfahren, die die Entwicklung von Theorien in der Pflegewissenschaft fördern, S. 7 -S. 37. In: Brühl, A., Fried, K. (Hsg.) (2020): Innovative Statistik in der Pflegeforschung. Freiburg: Lambertus

Brühl, A., Sappok-Laue, H., Lau, S., Christ-Kobiela, P., Müller, J., Sesterhenn-Ochtendung, B., Stürmer-Korff, R., Stelzig, A., Lobb, M., Bleidt, W. (2021): Indicating Care Process Quality: A Multidimensional Scaling Analysis. Journal of Nursing Measurement, Volume 30, Number 2, 2021 (Advance online publication) http://dx.doi.org/10.1891/JNM-D-20-00096

Greenacre, M. (2017). Correspondence Analysis in Practice (Third Edition). Chapman & Hall / CRC Interdisciplinary Statistics. Boca Raton: CRC Press Taylor and Francis Group.

Görres, Stefan; Rothgang, Heinz (2017): Modellhafte Pilotierung von Indikatoren in der stationären Pflege (MoPIP). Abschlussbericht zum Forschungsprojekt. (SV14-9015). Unter Mitarbeit von Sophie Horstmann, Maren Riemann, Julia Bidmon, Susanne Stiefler, Sabrina Pohlmann, Mareike Würdemann et al. UBC-Zentrum für Alterns- und Pflegeforschung, UBCZentrumfür Sozialpolitik. Bremen

Rothgang, H., Görres, S., Darmann-Finck, I., Wolf-Ostermann, K., Becke, G, Brannath, W. (2020): Zweiter Zwischenbericht. Online verfügbar unter: https://www.gs-qsa-pflege.de/wp-content/uploads/2020/02/2.-Zwischenbericht-Personalbemessung-%C2%A7-113c-SGB-XI.pdf, zuletzt geprüft am 07.09.2020

Teigeler, Anna Maria. (2017): Die multidimensionale Skalierung als grundlegendes Verfahren zur Explikation des Pflegebedürftigkeitsverständnisses von beruflich Pflegenden. Masterthesis an der Philosophisch Theologischen Hochschule Vallendar.https://kidoks.bsz-bw.de/files/1097/Masterthesis +11+17+V.pdf, letzter Zugriff am 30.05.2019.

Pflegewissenschaftliche Versorgungsforschung – Herausforderungen und Chancen

Prof. Dr. Karin Wolf-Ostermann1

1Universität Bremen

Pflegewissenschaftliche Versorgungsforschung ist einerseits ein Bekenntnis zur Wissenschaftsdisziplin Pflegewissenschaft und andererseits auch ein deutlicher Hinweis darauf, dass sich hieraus auch ein Auftrag zur evidenzbasierten Gestaltung von Versorgung ableitet. Anhand von aktuellen Studien sollen Chancen und Herausforderungen für eine pflegewissenschaftliche Versorgungsforschung am Beispiel Demenz näher beleuchtet werden. Hierbei sollen anhand von Studienbeispielen insbesondere drei Felder näher beleuchtet werden:

  • die Definition von – auch aus Sicht der „Betroffenen“ – relevanten Zielgrößen,
  • die Frage der Zielgruppen von Interventionen
  • die Diskussion passender Studiendesigns– nicht zuletzt mit Blick auf Herausforderungen bei der Evaluation „neuer Technologien“ in der Versorgung.

Hier besteht zukünftig verstärkter Forschungsbedarf in Bezug auf methodische Herausforderungen. Zudem muss stärker als bisher diskutiert werden, wie dem politisch, ethisch und rechtlich fundierten Anliegen der Partizipation Rechnung getragen werden kann. Und nicht zuletzt muss intensiver erörtert werden, wie die Dissemination von Forschungsergebnissen bzw. Implementierung von (evidenzbasierten) Interventionen besser gelingen kann.

Statistical challenges in Nursing Science – a practical example
Maja von Cube1, Martin Wolkewitz1, Christiane Kugler2
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg; 2Medizinische Fakultät der AlbertLudwigsUniversität Freiburg, Institut für Pflegewissenschaft Klinisch-Theoretisches Institut des Universitätsklinikums

We give a practical example of a study in Nursing Science. The goal of our study is to investigate whether pets increase the quality of life of patients who had an organ transplantation. As these patients are considered to be at higher risk of acquiring infections, clinical practice has restrictions on holding pets after a transplantation. Nonetheless, pets are presumed to facilitate a healthy lifestyle and thus have a positive impact on human health and well being.

In this practical example, we use data from an observational longitudinal follow up study (n=533) in which clinical parameters, including the acquisition of infections as well as quality of life measurements, were assessed. The latter were measured at seven time points with the Hospital Anxiety and Depression Scale (HADS) and the Short Form health survey (SF-36), a patient reported questionnaire with 36 items. Additionally, information on pets is available for a non-random cross-sectional subsample (n=226).

By combining information from the two datasets, we study whether pets increase the quality of life after an organ transplantation using a linear regression model. Moreover, we use time-to-event analysis to estimate the effect of pets on the time to first infection.

This study bears numerous statistical challenges including the study design, confounding, multiple testing, missing values, competing risks and significant differences in survival between the baseline cohort and the cross-sectional sample. We use this study as a practical example to show how statistical considerations can help to minimize the risk of typical biases arising in clinical epidemiology. Yet, rather than proposing sophisticated statistical approaches, we discuss pragmatic solutions.

Personalstruktur und Outcome in der stationären Langzeitpflege –
Methoden und Limitationen einer statistischen Auswertung von longitudinalen Routinedaten

Werner Brannath1  und Pascal Rink1
1Institut für Statistik und KKSB, Fachbereich Mathematik und Informatik, Universität Bremen

Angesichts des demographischen Wandels und dem gleichzeitigen Mangel an professionellen Pflegekräften, gewinnt die Frage nach dem benötigten Umfang und der adäquaten Struktur des Pflegepersonals einer Einrichtung mit stationärer Langzeitpflege zunehmend an gesellschaftlicher und politischer Bedeutung. Neben einer vom Gesetzgeber in Auftrag gegebenen Studie zur Entwicklung eines einheitlichen Verfahrens zur Bemessung des Personalbedarfs, wurde dieser Frage in einer auf longitudinale Routinedaten basierenden Beobachtungsstudie (StaVaCare 2.0) nachgegangen. Ziel der letzteren war es, Erkenntnisse über den komplexen Zusammenhang zwischen der Bewohnerstruktur (Care-Mix) und der Qualifikationsstruktur sowie des Personaleinsatzes des Pflegepersonals (Case-Mix) einer Einrichtung in Hinblick auf dessen Pflege-Outcome zu gewinnen. Die Beschränkung auf Routinedaten führte naturgemäß zu Limitationen und Komplikationen bzgl. der Datenqualität und Datendichte, den statistischen Auswertungen und ihrer Interpretation. Sie lieferte aber anderseits die Möglichkeit zur Vollerhebung innerhalb der Einrichtungen. Darüber hinaus gibt es bisher kein allgemein anerkanntes Verfahren zur Erhebung und Beurteilung des Pflege-Outcomes. In diesem Vortrag sollen die in StaVaCare 2.0 gewählten statistischen Ansätze zur Lösung bzw. Abschwächung der genannten Schwierigkeiten beschrieben und diskutiert werden.

Görres S, Brannath W, Böttcher S, Schulte K,  Arndt G, Bendig J, Rink P, Günay S (2020). Stabilität und Variation des Care-Mix in Pflegeheimen unter Berücksichtigung von Case-Mix, Outcome und Organisationscharakteristika (StaVaCare 2.0). Abschlussbericht des Modellvorhabens mit Anhang.