Track: Track 4

Causal Inference and Statistical Methods for Epidemiology

Chairs: Ryan Andrews and Vanessa Didelez

Causal inference methods for small non-randomized studies: Methods and an application in COVID-19
Sarah Friedrich, Tim Friede
University Medical Center Göttingen, Germany

The usual development cycles are too slow for the development of vaccines, diagnostics and treatments in pandemics such as the ongoing SARS-CoV-2 pandemic. Given the pressure in such a situation, there is a risk that findings of early clinical trials are overinterpreted despite their limitations in terms of size and design. Motivated by a non-randomized open-label study investigating the efficacy of hydroxychloroquine in patients with COVID-19, we describe in a unified fashion various alternative approaches to the analysis of non-randomized studies. A widely used tool to reduce the impact of treatment-selection bias are propensity score (PS) methods and g-computation. Conditioning on the propensity score allows one to replicate the design of a randomized controlled trial, con­ditional on observed covariates. Moreover, doubly robust estimators provide additional advantages. Here, we investigate the properties of propensity score based methods including three variations of doubly robust esti­mators in small sample settings, typical for early trials, in a simulation study.

Testing Instrument Validity in Multivariable Mendelian Randomisation
Maximilian Michael Mandl1, Anne-Laure Boulesteix1, Stephen Burgess2, Verena Zuber3
1Ludwig-Maximilians-Universität München; 2University of Cambridge; 3Imperial College London

Identification of causal effects in biomedical sciences is a challenging task. Most causal inference methods rely on specific assumptions which in practice may be unrealistic and too restrictive. However, Mendelian Randomisation (MR) is an instrumental variable approach that makes use of genetic variants to infer a causal effect of a risk factor on an outcome. Due to the randomisation of the genetic variants during meiosis, these are predestined instrumental variables that have the potential to naturally meet the restrictive methodological requirements. Thus, causal effects can be consistently inferred even if unobserved confounders are present. Obviously, this setting still requires the genetic variants to be independent of the outcome conditional on the risk factor and unobserved confounders, which is known as the exclusion-restriction assumption (ERA). Violations of this assumption, i.e. the effect of the instrumental variables on the outcome through a different path than the risk factor included in the model, can be caused by pleiotropy, which is a common phenomenon in human genetics. As an extension to the standard MR approach, multivariable MR includes multiple potential risk factors in one joint model accounting for measured pleiotropy. Genetic variants which deviate from the ERA appear as outliers to the MR model fit and can be detected by general heterogeneity statistics proposed in the literature. In MR analysis these are often inflated due to heterogeneity of how genetic variants exert their downstream effect on the exposures of interest, which impedes detection of outlying instruments using the traditional methods.

Removing valid instruments or keeping invalid instruments in the MR model may lead to a bias of the causal effect estimates and false positive findings. As different heterogeneity measures lead to a variety of conclusions with regard to outlying instruments, researchers face a typical decision problem, also known as researcher degrees of freedom. These free choices in the selection of valid instruments can lead to serious problems like fishing for significance.

Firstly, we demonstrate the impact of outliers and how arbitrary choices in the selection of instrumental variables can induce false positive findings in realistic simulation studies and in the analysis of real data investigating the effect of blood lipids on coronary heart disease and Alzheimer’s disease. Secondly, we propose a method that corrects for overdispersion of the heterogeneity statistics in MR analysis by making use of the estimated inflation factor to correctly remove outlying instruments and therefore accounting for pleiotropic effects.

Causal Discovery with Incomplete Cohort Data
Janine Witte1,2, Ronja Foraita1, Ryan M. Andrews1, Vanessa Didelez1,2
1Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany; 2University of Bremen, Germany


Cohort studies in health research often involve the collection of large numbers of variables over a period of time. They thus form the ideal basis for exploring the relationships among many variables simultaneously, e.g. by methods such as constraint-based causal discovery. These methods aim at inferring a causal graph, combining causal assumptions with statistical tests for conditional independence. A typical problem in practice are missing values. Simple methods for dealing with incomplete data, such as list-wise deletion and mean imputation, can lead to inefficient and biased inference.


We consider test-wise deletion and multiple imputation for causal discovery. The former applies each conditional independence test to the records containing complete information on the variables used for the test. For multiple imputation, the missing values are imputed M>1 times, the conditional independence test is run on each of the M data sets, and the test results are combined using an appropriate pooling method. We implemented multiple imputation and pooling procedures for causal discovery with continuous, discrete and mixed data. We then compared the performance of test-wise deletion and multiple imputation in scenarios with different missing data patterns typical for cohort data.


Both test-wise deletion and multiple imputation rely on untestable assumptions about the missingness mechanism. Test-wise deletion is computationally simple and can in principle be combined with any conditional independence test. However, it ignores possibly valuable information in partially observed records, hence the power can be low. Multiple imputation has the potential to exploit more information, and outperformed test-wise deletion in several of our simulation scenarios. The simulations also showed, however, that conditional independence testing after multiple imputation is impaired by small sample sizes and large numbers of conditioning variables, especially when the variables are categorical or mixed. Care needs to be taken when choosing the imputation models, as multiple imputation may break down when the number of variables is large, as is typical for cohort studies. Preliminary results suggest that drop-out is best dealt with using test-wise deletion.


Both test-wise deletion and multiple imputation are promising strategies for dealing with missing values in causal discovery, each with their own advantages. Multiple imputation can potentially exploit more information than test-wise deletion, but requires some care when choosing the imputation models. R code for combining test-wise deletion and multiple imputation with different conditional independence tests is available.

What Difference Does Multiple Imputation Make In Longitudinal Modeling of EQ-5D-5L Data: Empirical Analyses of Two Datasets
Lina Maria Serna Higuita1, Inka Roesel1, Fatima Al Sayah2, Maresa Buchholz3, Ines Buchholz3, Thomas Kohlmann3, Peter Martus1, You-Shan Feng1
1Institute for Clinical Epidemiology and Applied Biostatistics, Medical University of Tübingen, Tübingen, Germany; 2Alberta PROMs and EQ-5D Research and Support Unit (APERSU), School of Public Health, University of Alberta, Alberta, Canada; 3Institute for Community Medicine, Medical University Greifswald, Greifswald, Germany

Background: Although multiple imputation (MI) is the state-of-the-art method for managing missing data, it is not clear how missing values in multi-item instruments should be handled, e.g. MI at item or at score level. In addition, longitudinal data analysis techniques such as mixed models (MM) may be equally valid. We therefore explored the differences in modeling the scores of a health-related quality of life questionnaire (EQ-5D-5L) using MM with and without MI at item and score level, in two real data sets.

Methods: We explored 1) Agreement analysis using the observed missing data patterns of EQ-5D-5L responses for a Canadian study, which included patients with type-II diabetes at three time points (Alberta’s Caring for Diabetes (ABCD); n=2,040); and 2) Validation analysis using simulated missing patterns for complete cases of a German multi-center study of rehabilitation patients pre- and post-treatment (German Rehabilitation (GR); n=691). Two missing mechanisms (MCAR and MAR) at 8 percentages of missings (5%-65%) were applied to the GR data. Approaches to handle missing EQ-5D-5L scores for all datasets were: Approach-1) MM using respondents with complete cases, approach-2) MM using all available data, approach-3) MM after MI of the EQ-5D-5L scores, and approach-4) MM after MI of EQ-5D-5L items. Agreement was assessed by comparing predicted values and regression coefficients. Validation was examined using mean squared errors (MSE) and standard errors (SE) compared to the original dataset.


Agreement: The ABCD respondents with missing EQ-5D-5L (40.3%) had significantly poorer self-rated health, and lower academic achievement. All 4 approaches estimated similar baseline scores (ABCD≈0.798). At follow up, approach-1 resulted in the highest mean scores (ABCD=0.792) while approach-4 produced the lowest scores (ABCD=0.765). The largest slope of change was observed for approach-4 (visit1–visit3: -0.027), while the smallest slopes were observed for approach-2 (visit3–visit1:-0.011).

Validation: SE and MSE increased with increasing percentages of simulated missing GR data. All approaches showed similar SE and MSE (SE: 0.006-0.011; MSE: 0.032-0.033), however approach-4 showed in the most inaccurate predictions, underestimating the score.

Discussion: In these data, complete case analyses overestimated the scores and MM after MI by items yielded the lowest scores. As there was no loss of accuracy, MM without MI, when baseline covariates are complete, might be the most parsimonious choice to deal with missing data. However, MI may be needed when baseline covariates are missing and/or more than two timepoints are considered.

Exploring missing patterns and missingness mechanisms in longitudinal patient-reported outcomes using data from a non-randomized controlled trial study
Pimrapat Gebert1,2,3, Daniel Schindel1, Johann Frick1, Liane Schenk1, Ulrike Grittner2,3
1Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Medical Sociology and Rehabilitation Science; 2Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute of Biometry and Clinical Epidemiology; 3Berlin Institute of Health (BIH), Anna-Louisa-Karsch-Str. 2, 10178 Berlin, Germany

Missing data mechanism plays an important role in the context of handling and analyzing data subject to missings. Longitudinal patient-reported outcome measures (PROMs) are usually far from complete, especially in seriously ill patients. To choose an appropriate strategy for handling missing data, most statistical approaches require knowledge about missingness patterns and assumptions about the type of missing data mechanism. We demonstrated how to explore the missingness patterns and mechanisms using PROMs data including global health status/QoL (GH/QoL) in the EORTC QLQ-C30, Patient reaction assessment (PRA-D), The Revised Illness Perception Questionnaire (IPQ-R), German modified version of the Autonomy Preference Index (API-DM), Decision Conflict Scale (DCS), and European health literacy survey (HLS-EU-Q6) from the Oncological Social Care Project (OSCAR) study. Linear random-effects pattern-mixture models were performed for identifying missing not at random (MNAR) for each pattern. We found that the missing data on the GH/QoL in the EORTC QLQ-C30 could be assumed as MNAR in missing data due to massive worsening of health status and death. However, there was no evidence of MNAR in any other PROMs measures. Although determining the true missing data mechanism is impossible, a pattern-mixture model can be useful in evaluating the effects of informative missingness in longitudinal PROMs.

Teaching and Didactics in Biometry

Chairs: Carolin Herrmann and Maren Vens

How to enhance gameful learning in the STEM subjects
Amir Madany Mamlouk
Institute for Neuro- and Biocomputing, University of Lübeck, Germany

Playing games is fun, learning should actually be just as much fun. But at universities – in the STEM subjects in particular – it’s usually not fun at all. On the contrary, many students drop out of their studies because they are not up to the requirements and cannot close existing gaps in knowledge. Others get sick in their studies because they are not up to the demands. In this lecture, I would like to raise awareness of the fact that our current study system often runs counter to all the principles of a successful game design. Furthermore, I would like to tell you in this talk about my own efforts to correct this systemic misalignment between learning at universities and gameful learning. Over the last few years, we have developed a multiple award-winning experience points-based assessment system (XPerts – From Zero to Hero) and systematically evaluated it in practice using a lecture on bioinformatics. I will illustrate this with a few examples and offer suggestions on how you can already achieve a fundamental change in the teaching and learning culture in your own courses, even with the smallest of changes. 

Herausforderungen der Online-Lehre und was wir gelernt haben – am Beispiel des Masterstudiengangs Medical Biometry/Biostatistics und Zertifikats Medical Data Science der Universität Heidelberg
Marietta Kirchner, Regina Krisam, Meinhard Kieser
Institute of Medical Biometry and Informatics, Heidelberg University, Germany

Am Institut für Medizinische Biometrie und Informatik der Universität Heidelberg wird seit 2006 der weiterbildende Masterstudiengang Medical Biometry/ Biostatistics und seit 2019 das Zertifikat Medical Data Science angeboten. Beide Programme sind berufsbegleitend, deren Lehrveranstaltungen in Blockkursen an 3 aufeinander folgenden Tagen mit mehreren 90-minütigen Einheiten stattfinden. Als im März 2020 aufgrund der COVID-19 Pandemie alle Präsenzlehrveranstaltungen der Universität Heidelberg mit sofortiger Wirkung eingestellt wurden, erforderte dies eine schnelle Umorganisation der laufenden und anstehenden Kurse, um den Studienbetrieb erfolgreich aufrecht zu erhalten. Die Universität Heidelberg stellte ein Online Curriculum bereit, welches fortlaufend angepasst wurde, sowie ein Videokonferenzsystem für synchrone Lehrveranstaltungen.

Die abrupte Unterbrechung und der schnelle Umstieg auf Online-Lehre führten zu neuen Herausforderungen, bei denen das Zurückgreifen auf bewährte Vorgehensweisen nicht gegeben war. Die praktischen Programmier-Einheiten in R und die Block-Gestaltung stellten hierbei zusätzliche Herausforderungen dar, sowohl für die Teilnehmer als auch für die Dozenten. Auch wenn das Angebot an Online Lehrveranstaltungen in den letzten Jahren stetig zugenommen hat, ist nicht umfassend untersucht, ob dies einen vergleichbaren Wert wie der traditionelle Präsenzunterricht hat und welche Voraussetzungen geschaffen werden müssen für eine erfolgreiche Lehr-/Lernsituation. Richtig umgesetzt kann Online-Lehre zu einer Leistungsverbesserung bei den Studierenden führen (Shah, 2016).

Doch was macht gute Online-Lehre aus? Gelungene Online-Lehrveranstaltungen nutzen die Vorteile der verwendeten Online-Tools aus und fördern die Kommunikation zwischen den Dozenten und Studenten (Oliver, 1999). Das zur Verfügung gestellte Videokonferenzsystem bietet verschiedene Strategien an, um eine fruchtbare Online-Lernumgebungen zu schaffen. Einführungen in die Verwendung des Videokonferenzsystem enthielten Empfehlungen zum Einsatz des Systems und zur Förderung und Gestaltung der Interaktion mit den Studierenden.

Im Vortrag wird dargestellt, welche Herausforderungen und Chancen aus Sicht der Organisatoren der Studienprogramme und der Lehrenden aufgetreten sind. Die Sicht der Lernenden wird dargestellt basierend auf durchgeführten Evaluationen und ausführlichem Feedback aus Gesprächen und E-Mails. Es werden die Erfahrungen aus zwei Semestern Online-Lehre präsentiert mit dem Fokus auf „Was haben wir gemacht, um eine erfolgreiche Vermittlung der Inhalte zu gewährleisten?“ und „Was haben wir für zukünftige Lehrveranstaltungen gelernt – Präsenz oder Online?“.


R. Oliver (1999). Exploring strategies for online teaching and learning. Distance Education, 20:240-254. DOI: 10.1080/0158791990200205

D. Shah (2016). Online education: should we take it seriously? Climacteric, 19:3-6, DOI: 10.3109/13697137.2015.1115314

The iBikE Smart Learner: evaluation of an interactive web-based learning tool to specifically address statistical misconceptions
Sophie K. Piper1,2, Ralph Schilling1,2, Oliver Schweizerhof1,2, Anne Pohrt1,2, Dörte Huscher1,2, Uwe Schöneberg1,2, Eike Middell3, Ulrike Grittner1,2
1Institute of Biometry and Clinical Epidemiology, Charité – Universitätsmedizin Berlin, Charitéplatz 1, D-10117 Berlin, Germany; 2Berlin Institute of Health (BIH), Anna-Louisa-Karsch Str. 2, 10178 Berlin, Germany; 3Dr. Eike Middell, Moosdorfstr. 4, 12435 Berlin


Statistics is often an unpopular subject for medical students and researchers. However, methodological skills are essential for the correct interpretation of research results and thus for the quality of research in general. Understanding statistical concepts in particular plays a central role. In standard medical training, relatively little attention is paid to the development of these competencies, so that researching physicians (from students to professors) often have deficits and misconceptions.

The most typical example is the incorrect interpretation of the p-value. Misconceptions lead to misinterpretations of what statistics can do and where certain methods reach their limits. Therefore, methods are misused and/or results are misinterpreted, which in turn can have consequences for further research and ultimately for patients.


We developed a learning tool called the „iBikE-Smart Learner“ – an interactive, web-based teaching program similar to the AMBOSS learning software for medical students. It is designed to address common misconceptions in statistics in a targeted (modular) manner and provides teaching elements adapted to the individual knowledge and demand of the user.

Specifically, we were able to complete the first module „Statistical misconceptions about the p-value“. This module consists of a self-contained set of multiple-choice questions directly addressing common misconceptions about the p-value based on typical examples in medical Research. A first (beta) version of the „iBikE-Smart Learner“ was already available at the end of October 2019 and has been tested internally by experienced staff members of our institute.

In November 2020, we started a randomized controlled trial among researchers at the Charité to evaluate this first module. We plan to recruit 100 participants. Primary outcome is the overall performance rate which will be compared between users randomized to the full version of the tool and those randomized to the control version that has all teaching features turned off. Additionally, self-reported statistical literacy before and after using the tool as well as a subjective evaluation of the tools’ usefulness were assessed.


Until submission of this abstract, 30 participants have been recruited for the ongoing randomized controlled evaluation study. We plan to promote the iBikE-Smart Learner and show results of the evaluation study at the Charité.


We developed and evaluated a first module of the “iBikE-Smart Learner” as a web-based teaching tool addressing common misconceptions about the p-value.

Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin
Ursula Berger1, Carolin Herrmann2
1LMU München, Germany; 2Charité – Universitätsmedizin Berlin, Germany

Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin umfasst zentrale biometrische Begriffe, Kennzahlen, Konzepte und Methoden sowie Fertigkeiten, die Medizinstudierenden ein Grundverständnis für Biometrie und Datenanalyse vermitteln. Er soll die Planung von Lehrangeboten zur Medizinischen Biometrie im Studium der Humanmedizin erleichtern und Studierenden eine Orientierungshilfe bieten.

Der Lernzielkatalog listet die verschiedenen Lernthemen nach Oberthemen zusammengefasst auf. Zu jedem Lernthema werden die geforderten Fähigkeiten, Fertigkeiten und Kenntnisse der Studierenden durch Verben beschrieben, die auch den Wissensgrad bzw. die Ebene der Lernziele widerspiegelt. Zusätzlich wurden die Lernthemen mit Anmerkungen und Hinweisen für die Lehrenden ergänzt. Bei der Erstellung der Lernthemen wurde der neue Nationale Kompetenzbasierte Lernzielkatalog Medizin NKLM 2.0 im aktuell verfügbaren Entwicklungsstadium (11.2020) berücksichtigt. Der Lernzielkatalog gibt keine Abfolge und keinen zeitlichen Rahmen für ein Curriculum vor und kann daher flexibel in unterschiedlich strukturierten Curricula und unterschiedlichen Typen von Studiengängen der Humanmedizin angewendet werden.

Die Erstellung des Lernzielkatalogs wurde von der gemeinsamen Arbeitsgruppe Lehre und Didaktik der Biometrie der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS) und der Internationalen Biometrischen Gesellschaft der Deutschen Region (IBS-DR) koordiniert. Dazu wurden in 2020 mehrere Workshops ausgerichtet, in welchen unter der Mitwirkung vieler Fachkolleg*innen eine erste Version erarbeitet werden konnte, die im Dezember 2020 der Fachöffentlichkeit zur Kommentierung vorgestellt wurde (send-to: Der Lernzielkatalog Medizinische Biometrie für das Studium der Humanmedizin soll nun, nach Ender der Kommentierungsphase, in seiner überarbeiteten Version vorgestellt werden.

Statistical humor in classroom: Jokes and cartoons for significant fun with relevant effect
Annette Aigner
Charité Universitätsmedizin Berlin, Germany

Small talk with a statistician: Q: „What’s your relationship with your parents? A: „1:2“

Such and similar, short or long jokes, but also cartoons and other humorous means not only amuse statisticians, but also create an easy, positive access for students to a subject generally perceived as difficult, such as statistics.

This article aims to highlight the relevance and positive effects of humor in teaching in general, but especially of easy-to-use materials such as jokes and cartoons. Hints and suggestions for their proper use are given, but of course there are no limits to their implementation in the classroom. In addition, the article contains a collection of freely available online resources that can be used immediately in the statistics classroom and in which everyone can find materials suitable for the specific teaching situation. As a exemplary application, materials for use in an introductory session on linear regression are shown and the author’s personal experiences are briefly summarized.

As statisticians, we know that statistics is fun – now we should also convey this to students, why not with the help of jokes and cartoons?


STRengthening Analytical Thinking for Observational Studies: a brief overview of some contributions of the STRATOS initiative

Chairs: Anne-Laure Boulesteix and Willi Sauerbrei

On recent progress of topic groups and panels
Willi Sauerbrei1, Michal Abrahamowicz2, Marianne Huebner3, Ruth Keogh4 on behalf of the STRATOS initiative, Freiburg, Germany
1Medical Center – University of Freiburg, Germany, 2McGill, Montreal, Canada, 3Michigan State University, East Lansing, USA, 4London School of Hygiene and Tropical Medicine, UK

Observational studies present researchers with a number of analytical challenges, related to both: complexity of the underlying processes and imperfections of the available data (e.g. unmeasured confounders, missing data, measurement errors). Whereas many methods have been proposed to address specific challenges, there is little consensus regarding which among the alternative methods are preferable for what types of data. Often, there is also lack of solid evidence concerning systematic validation and comparisons of the performance of the methods.

To address these complex issues, the STRATOS initiative was launched in 2013. In 2021, STRATOS involves more than 100 researchers from 19 countries worldwide with background in biostatistical and epidemiological methods. The initiative has 9 Topic Groups (TG), each focusing on a different set of ‘generic’ analytical challenges (e.g. measurement errors or survival analysis) and 11 panels (e.g. publications, simulation studies, visualisation) co-ordinate it,  to share best research practices and to disseminate research tools and results from the work of the TGs.  

We will provide a short overview of recent progress, point to some research urgently needed and emphasize the importance of knowledge translation.More details are provided in short reports from all TGs and some panels which are regular contributions in the Biometric Bulletin, the newsletter of the International Biometric Society (, since issue 3 from 2017.

Statistical analysis of high-dimensional biomedical data: issues and challenges in translation to medically useful results
Lisa Meier McShane on behalf of the high-dimensional data topic group, US-NCI, Bethesda, USA
Division of Cancer Treatment and Diagnosis, U.S. National Cancer Institute, National Institutes of Health, USA

Successful translation of research involving high-dimensional biomedical data to medically useful results requires a research team with expertise including clinical and laboratory science, bioinformatics, computational science, and statistics.  A proliferation of pubic databases and powerful data analysis tools have led to many biomedical publications reporting results suggested to have potential clinical application.  However, many of these results cannot be reproduced in subsequent studies, or the findings, although meeting statistical significance criteria or other numerical performance criteria, have no clear clinical utility.  Many factors have been suggested as contributors to irreproducible or clinically non-translatable biomedical research, including poor study design, analytic instability of measurement methods, sloppy data handling, inappropriate and misleading statistical analysis methods, improper reporting or interpretation of results, and on rare occasions, outright scientific misconduct.  Although these challenges can arise in a variety of medical research studies, this talk will focus on research involving use of novel measurement technologies such as “omics assays” which generate large volumes of data requiring specialized expertise and computational approaches for proper management, analysis and interpretation [].  Research team members share responsibility for ensuring that research is performed with integrity and best practices are followed to ensure reproducible results.  Further, strong engagement of statisticians and other computational scientists with experts in the relevant medical specialties is critical to generation of medically interpretable and useful findings.  Through a series of case studies, the many dimensions of reproducible and medically translatable omics research are explored and recommendations aiming to increase the translational value of the research output are discussed. 

Towards stronger simulation studies in statistical research
Tim Morris on behalf of the Simulation panel, London, UK
MRC Clinical Trials Unit at University College London, UK

Simulation studies are a tool for understanding and evaluating statistical methods. They are sometimes necessary to generate evidence about which methods are suitable and – importantly – unsuitable for use, and when. In medical research, statisticians have been pivotal to the introduction of reporting guidelines such as CONSORT. The idea is that these give readers enough understanding of how a study was conducted that they could replicate the study themselves. Simulation studies are relatively easy to replicate but, as a profession, we tend to forget our fondness for clear reporting. In this talk, I will describe some common failings and make suggestions about the structure and details that help to clarify published reports of simulation studies.

General discussion about potential contributions to the future work of the STRATOS initiative (

Statistics in Nursing Sciences

Chairs: Werner Brannath and Karin Wolf-Ostermann

Methodische Impulse und statistische Analyseverfahren, die zur Theorieentwicklung und -Prüfung in der Pflegewissenschaft beitragen können
Albert Bruehl
Philosophisch-Theologische Hochschule Vallendar

Statistische Analyseverfahren können Impulse zur Theorieentwicklung in der Pflegewissenschaft geben. Methoden hierzu sind die Entwicklung und Prüfung von Hypothesen. In Standardwerken zur Einführung in die Statistik in den Sozialwissenschaften wird ausschließlich die Hypothesenprüfung als Haupt-Aufgabe der Statistik definiert. Hypothesenentwicklung wäre eine zusätzliche Aufgabe für die Anwendung von Statistik, die bei Gegenständen, wie sie in der Pflegewissenschaft behandelt werden, besonders wichtig werden kann (Brühl, Fried, 2020).

Bei vielen Fragestellungen innerhalb der Pflegewissenschaft haben wir es nämlich mit Versuchen zu tun, empirische Gegenstände über Konstrukte zu modellieren. Beispiele hierfür wären die Modelle zu den Konstrukten „Pflegebedürftigkeit“ und „Pflegequalität“. Theoretisch grundgelegt und empirisch unterstützt sind diese Modelle nicht.

Werden nun Regressionen mit klassischen H0-Hypothesentests zur Datenanalyse im Bereich von Konstrukten wie der Pflegebedürftigkeit und der Pflegequalität eingesetzt, lernen wir, dass die Konstrukte, die wir zu Pflegebedürftigkeit und Pflegequalität im Einsatz haben, empirisch wenig hilfreich sind. Das gilt für multivariate Regressionen, die Arbeitszeiten mit Hilfe von Pflegegradkriterien schlecht erklären (Rothgang, 2020), das gilt für nicht-parametrische Regressionen, Multivariate Regression Splines und Mehr-Ebenen-Modelle, die Arbeitszeit mit Bewohner- und auch Organisations-Variablen nicht gut erklären (Brühl, Planer 2019) und das gilt auch für logistische Regressionen (Görres et al.,2017) und logistische Mehr-Ebenen-Analysen (Brühl, Planer, 2019), die Qualitätsindikatoren nicht gut erklären. Meist werden trotz der bescheidenen Erfolge der statistischen Analysen, auf dieser Basis trotzdem Anwendungsroutinen z.B. zur Personalbemessung und zur Messung von Pflegequalität etabliert.

Aus dieser Art des Einsatzes von Statistik ergeben sich kaum Ansätze für die Weiterentwicklung der eingesetzten Konstrukte. Hierzu sind strukturierende Verfahren besser geeignet. Beispiel hierfür kann der Einsatz verschiedener Varianten der ordinalen Multidimensionalen Skalierung (Borg, 2018) sein, die bei der Weiterentwicklung des Konstrukts der Pflegebedürftigkeit (Teigeler, 2017) und bei der Erfassung von Prozessqualität (Brühl et al, 2021) helfen. Ein weiteres Verfahren, das hier helfen kann, sind die Multiplen Korrespondenzanalysen (Greenacre, 2017), die auch bei kleinen Fallzahlen und mit Nominaldaten eingesetzt werden können. Zur Theorieprüfung können konfirmatorische Varianten der strukturierenden Verfahren eingesetzt werden. Im Vortrag werden Beispiele hierzu vorgestellt.


Borg, I., Groenen, P. J., & Mair, P. (2018). Applied multidimensional scaling and unfolding (2nd ed.). Springer-Verlag.

Brühl, A., Planer, K. (2019): PiBaWü – Zur Interaktion von Pflegebedürftigkeit, Pflegequalität und Personalbedarf. Freiburg: Lambertus

Brühl, A. (2020): Anwendung von statistischen Analyseverfahren, die die Entwicklung von Theorien in der Pflegewissenschaft fördern, S. 7 -S. 37. In: Brühl, A., Fried, K. (Hsg.) (2020): Innovative Statistik in der Pflegeforschung. Freiburg: Lambertus

Brühl, A., Sappok-Laue, H., Lau, S., Christ-Kobiela, P., Müller, J., Sesterhenn-Ochtendung, B., Stürmer-Korff, R., Stelzig, A., Lobb, M., Bleidt, W. (2021): Indicating Care Process Quality: A Multidimensional Scaling Analysis. Journal of Nursing Measurement, Volume 30, Number 2, 2021 (Advance online publication)

Greenacre, M. (2017). Correspondence Analysis in Practice (Third Edition). Chapman & Hall / CRC Interdisciplinary Statistics. Boca Raton: CRC Press Taylor and Francis Group.

Görres, Stefan; Rothgang, Heinz (2017): Modellhafte Pilotierung von Indikatoren in der stationären Pflege (MoPIP). Abschlussbericht zum Forschungsprojekt. (SV14-9015). Unter Mitarbeit von Sophie Horstmann, Maren Riemann, Julia Bidmon, Susanne Stiefler, Sabrina Pohlmann, Mareike Würdemann et al. UBC-Zentrum für Alterns- und Pflegeforschung, UBCZentrumfür Sozialpolitik. Bremen

Rothgang, H., Görres, S., Darmann-Finck, I., Wolf-Ostermann, K., Becke, G, Brannath, W. (2020): Zweiter Zwischenbericht. Online verfügbar unter:, zuletzt geprüft am 07.09.2020

Teigeler, Anna Maria. (2017): Die multidimensionale Skalierung als grundlegendes Verfahren zur Explikation des Pflegebedürftigkeitsverständnisses von beruflich Pflegenden. Masterthesis an der Philosophisch Theologischen Hochschule Vallendar. +11+17+V.pdf, letzter Zugriff am 30.05.2019.

Pflegewissenschaftliche Versorgungsforschung – Herausforderungen und Chancen

Prof. Dr. Karin Wolf-Ostermann1

1Universität Bremen

Pflegewissenschaftliche Versorgungsforschung ist einerseits ein Bekenntnis zur Wissenschaftsdisziplin Pflegewissenschaft und andererseits auch ein deutlicher Hinweis darauf, dass sich hieraus auch ein Auftrag zur evidenzbasierten Gestaltung von Versorgung ableitet. Anhand von aktuellen Studien sollen Chancen und Herausforderungen für eine pflegewissenschaftliche Versorgungsforschung am Beispiel Demenz näher beleuchtet werden. Hierbei sollen anhand von Studienbeispielen insbesondere drei Felder näher beleuchtet werden:

  • die Definition von – auch aus Sicht der „Betroffenen“ – relevanten Zielgrößen,
  • die Frage der Zielgruppen von Interventionen
  • die Diskussion passender Studiendesigns– nicht zuletzt mit Blick auf Herausforderungen bei der Evaluation „neuer Technologien“ in der Versorgung.

Hier besteht zukünftig verstärkter Forschungsbedarf in Bezug auf methodische Herausforderungen. Zudem muss stärker als bisher diskutiert werden, wie dem politisch, ethisch und rechtlich fundierten Anliegen der Partizipation Rechnung getragen werden kann. Und nicht zuletzt muss intensiver erörtert werden, wie die Dissemination von Forschungsergebnissen bzw. Implementierung von (evidenzbasierten) Interventionen besser gelingen kann.

Statistical challenges in Nursing Science – a practical example
Maja von Cube1, Martin Wolkewitz1, Christiane Kugler2
1Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg; 2Medizinische Fakultät der AlbertLudwigsUniversität Freiburg, Institut für Pflegewissenschaft Klinisch-Theoretisches Institut des Universitätsklinikums

We give a practical example of a study in Nursing Science. The goal of our study is to investigate whether pets increase the quality of life of patients who had an organ transplantation. As these patients are considered to be at higher risk of acquiring infections, clinical practice has restrictions on holding pets after a transplantation. Nonetheless, pets are presumed to facilitate a healthy lifestyle and thus have a positive impact on human health and well being.

In this practical example, we use data from an observational longitudinal follow up study (n=533) in which clinical parameters, including the acquisition of infections as well as quality of life measurements, were assessed. The latter were measured at seven time points with the Hospital Anxiety and Depression Scale (HADS) and the Short Form health survey (SF-36), a patient reported questionnaire with 36 items. Additionally, information on pets is available for a non-random cross-sectional subsample (n=226).

By combining information from the two datasets, we study whether pets increase the quality of life after an organ transplantation using a linear regression model. Moreover, we use time-to-event analysis to estimate the effect of pets on the time to first infection.

This study bears numerous statistical challenges including the study design, confounding, multiple testing, missing values, competing risks and significant differences in survival between the baseline cohort and the cross-sectional sample. We use this study as a practical example to show how statistical considerations can help to minimize the risk of typical biases arising in clinical epidemiology. Yet, rather than proposing sophisticated statistical approaches, we discuss pragmatic solutions.

Personalstruktur und Outcome in der stationären Langzeitpflege –
Methoden und Limitationen einer statistischen Auswertung von longitudinalen Routinedaten

Werner Brannath1  und Pascal Rink1
1Institut für Statistik und KKSB, Fachbereich Mathematik und Informatik, Universität Bremen

Angesichts des demographischen Wandels und dem gleichzeitigen Mangel an professionellen Pflegekräften, gewinnt die Frage nach dem benötigten Umfang und der adäquaten Struktur des Pflegepersonals einer Einrichtung mit stationärer Langzeitpflege zunehmend an gesellschaftlicher und politischer Bedeutung. Neben einer vom Gesetzgeber in Auftrag gegebenen Studie zur Entwicklung eines einheitlichen Verfahrens zur Bemessung des Personalbedarfs, wurde dieser Frage in einer auf longitudinale Routinedaten basierenden Beobachtungsstudie (StaVaCare 2.0) nachgegangen. Ziel der letzteren war es, Erkenntnisse über den komplexen Zusammenhang zwischen der Bewohnerstruktur (Care-Mix) und der Qualifikationsstruktur sowie des Personaleinsatzes des Pflegepersonals (Case-Mix) einer Einrichtung in Hinblick auf dessen Pflege-Outcome zu gewinnen. Die Beschränkung auf Routinedaten führte naturgemäß zu Limitationen und Komplikationen bzgl. der Datenqualität und Datendichte, den statistischen Auswertungen und ihrer Interpretation. Sie lieferte aber anderseits die Möglichkeit zur Vollerhebung innerhalb der Einrichtungen. Darüber hinaus gibt es bisher kein allgemein anerkanntes Verfahren zur Erhebung und Beurteilung des Pflege-Outcomes. In diesem Vortrag sollen die in StaVaCare 2.0 gewählten statistischen Ansätze zur Lösung bzw. Abschwächung der genannten Schwierigkeiten beschrieben und diskutiert werden.

Görres S, Brannath W, Böttcher S, Schulte K,  Arndt G, Bendig J, Rink P, Günay S (2020). Stabilität und Variation des Care-Mix in Pflegeheimen unter Berücksichtigung von Case-Mix, Outcome und Organisationscharakteristika (StaVaCare 2.0). Abschlussbericht des Modellvorhabens mit Anhang.

Opening Session / Keynote: Machine Learning in Biometry

Chairs: Werner Brannath and Katja Ickstadt

Speakers: Andreas Faldum (Conference president) , Frank Müller (Dean of the Medical Faculty), Werner Brannath (President of the IBS-DR), Markus Lewe (Mayor, City of Münster) || Keynote speaker: Chris Holmes

Title: Machine Learning in Biometrics
Chris Holmes

Machine learning (ML) and artificial intelligence (AI) have had a major impact across many disciplines including biometrics. In the first half of this talk we will review some of the characteristics of ML that make for successful applications and also those features that present challenges, in particular around robustness and reproducibility. Relatively speaking, ML is mainly concerned with prediction while the majority of biometric analyses are focussed on inference. In the second half of the talk we will review the prediction-inference dichotomy and explore, from a Bayesian perspective, the theoretical foundations on how modern ML predictive models can be utilised for inference.