## Track: Track 4

### Keynote: Estimands and Causality / Closing Session

Chairs: Werner Brannath and Annette Kopp-Schneider

**Semiparametric Sensitivity Analysis: Unmeasured Confounding in Observational Studies**

Daniel Scharfstein*Department of Population Health Sciences, University of Utah School of Medicine, USA*

Establishing cause-effect relationships from observational data often relies on untestable assumptions. It is crucial to know whether, and to what extent, the conclusions drawn from non-experimental studies are robust to potential unmeasured confounding. In this paper, we focus on the average causal effect (ACE) as our target of inference. We build on the work of Franks et al. (2019) and Robins et al. (2000) by specifying non-identified sensitivity parameters that govern a contrast between the conditional (on measured covariates) distributions of the outcome under treatment (control) between treated and untreated individuals. We use semi-parametric theory to derive the non-parametric efficient influence function of the ACE, for fixed sensitivity parameters. We utilize this influence function to construct a one-step, split-sample bias-corrected estimator of the ACE. Our estimator depends on semi-parametric models for the distribution of the observed data; importantly, these models do not impose any restrictions on the values of sensitivity analysis parameters. We establish that our estimator has $\sqrt{n}$ asymptotics. We utilize our methodology to evaluate the causal effect of smoking during pregnancy on birth weight. We also evaluate the performance of estimation procedure in a simulation study. This is joint work with Razieh Nabi, Edward Kennedy, Ming-Yueh Huang, Matteo Bonvini and Marcela Smid.

Closing: Andreas Faldum, Werner Brannath / Annette Kopp-Schneider

### Mathematical Methods in Medicine and Biology

Chairs: Ingmar Glauche and Matthias Horn

**Future Prevalence of Type 2 Diabetes – A Comparative Analysis of Chronic Disease Projection Methods**

Dina Voeltz^{1}, Thaddäus Tönnies^{2}, Ralph Brinks^{1,2,3}, Annika Hoyer^{1}^{1}Ludwig-Maximilians-Universität München, Germany; ^{2}Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich-Heine-University Duesseldorf; ^{3}Hiller Research Unit for Rheumatology Duesseldorf

Background: Precise projections of future chronic disease cases needing pharmaco-intensive treatments are necessary for effective resource allocation and health care planning in response to increasing disease burden.

Aim: To compare different projection methods to estimate the number of people diagnosed with type 2 diabetes (T2D) in Germany in 2040.

Methods: We compare the results of three methods to project the number of people with T2D in Germany 2040. In a relatively simple approach, method 1) combines the sex- and age-specific prevalence of T2D in 2015 with sex- and age-specific population distributions projected by the German Federal Statistical Office (FSO). Methods 2) and 3) additionally account for incidence of T2D and mortality rates using mathematical relations as proposed by the illness-death model for chronic diseases [1]. Therefore, they are more comprehensive than method 1), which likely adds to their results’ validity and accuracy. For this purpose, method 2) firstly models the prevalence of T2D employing a partial differential equation (PDE) which incorporates incidence and mortality [2]. This flexible, yet simple PDE used yields is validated in contexts of dementia, amongst others, and is recommended for chronic disease epidemiology. Subsequently, the estimated prevalence is multiplied with the population projection of the FSO [3]. Hence, method 2) uses the projected general mortality of the FSO and the mortality rate ratio of diseased vs. non-diseased people. By contrast, method 3) estimates future mortality of non-diseased and diseased people independently from the projection of the FSO. These estimated future mortality rates function as input for two PDEs to directly project the absolute number of cases. The sex- and age-specific incidence rate for methods 2) and 3) stems from the risk structure compensation (Risikostrukturausgleich, MorbiRSA) which comprises data from about 70 million Germans in the public health insurance. The incidence rate is assumed to remain as in 2015 throughout the overall projection horizon from 2015 to 2040.

Results: Method 1) projects 8.3 million people with diagnosed T2D in Germany in 2040. Compared to 6.9 million people in 2015, this equals an increase by 21%. Methods 2) and 3) project 11.5 million (+65% compared to 2015) and 12.5 million (+85%) T2D patients, respectively.

Conclusions: The methods’ results differ substantially. Method 1) accounts for the aging of the German population but is otherwise relatively little comprehensive. Method 2) and 3) additionally consider underlying changes in the incidence and mortality rates affecting disease prevalence.

**Mixed-effects ANCOVA for estimating the difference in population mean parameters in case of nonlinearly related data**

Ricarda Graf*University of Göttingen, Germany*

Repeated measures data can be found in many fields. The two types of variation characteristic for this type of data – referred to as within-subject and between-subject variation – are accounted for by linear and nonlinear mixed-effects models. ANOVA-type models are sometimes applied for comparison of population means despite a nonlinear relationship in the data. Accurate parameter estimation through more appropriate nonlinear-mixed effects (NLME) models, such as for sigmoidal curves, might be hampered due to insufficient data near the asymptotes, the choice of starting values for the iterative optimization algorithms used given the lack of closed-form expressions of the likelihood or due to convergence problems of these algorithms.

The main objective of this thesis is to compare the performance of a one-way mixed-effects ANCOVA and a NLME three-parameter logistic regression model with respect to the accuracy in estimating the difference in population means. Data from a clinical trial1, in which the difference in mean blood pressure (BP50) between two groups was estimated by repeated-measures ANOVA, served as a reference for data simulation. A third simplifying method, used in toxicity studies², was additionally included. It considers the two measurements per subject lying immediately below and above mean half maximal response (E_max). Population means are obtained by considering the intersections of the horizontal line represented by half E_max and the line derived from connecting the two data points per subject and group. A simulation study with two scenarios was conducted to compare bias, coverage rates and empirical SE of the three methods when estimating the difference in BP50 for purpose of identification of the disadvantages by using the simpler linear instead of the nonlinear model. In the first scenario, the true individual blood pressure ranges were considered, while in the second scenario, measurements at characteristic points of the sigmoidal curves were considered, regardless of the true measurement ranges, in order to obtain a more distinct nonlinear relationship.

The estimates of the mixed-effects ANCOVA model were more biased but also more precise compared with the NLME model. The ANCOVA method could not detect the difference in BP50 in the second scenario anymore. The results of the third method did not seem reliable since its estimates did on average even reverse the direction of the true parameter.

NLME models should be preferred for data with a known nonlinear relationship if the available data allows it. Convergence problems can be overcome by using a Bayesian approach.

**Explained Variation in the Linear Mixed Model**

Nicholas Schreck*DKFZ Heidelberg, Germany*

The coefficient of determination is a standard characteristic in linear models with quantitative response variables. It is widely used to assess the proportion of variation explained, to determine the goodness-of-fit and to compare models with different covariates.

However, there has not been an agreement on a similar quantity for the class of linear mixed models yet.

We introduce a natural extension of the well-known adjusted coefficient of determination in linear models to the variance components form of the linear mixed model.

This extension is dimensionless, has an intuitive and simple definition in terms of variance explained, is additive for several random effects and reduces to the adjusted coefficient of determination in the linear model.

To this end, we prove a full decomposition of the sum of squares of the independent variable into the explained and residual variance.

Based on the restricted maximum likelihood equations, we introduce a novel measure for the explained variation which we allocate specifically to the contribution of the fixed and the random covariates of the model.

We illustrate that this empirical explained variation can in particular be used as an improved estimator of the classical additive genetic variance of continuous complex traits.

**Modelling acute myeloid leukemia: Closing the gap between model parameters and individual clinical patient data**

Dennis Görlich*Institute of Biostatistics and Clinical Research, University Münster, Germany*

In this contribution, we will illustrate and discuss our approach to fit a mechanistic mathematical model of acute myeloid leukemia (AML) to individual patient data, leading to personalized model parameter estimates.

We use a previously published model (Banck and Görlich, 2019) that describes the healthy hematopoiesis and the leukemia dynamics. Here, we consider a situation where the healthy hematopoiesis is calibrated to a population average and personalized leukemia parameters (self renewal, proliferation, and treatment intensity) needs to be estimated.

To link the mathematical model to clinical data model predictions needs to be aligned to observable clinical outcome measures. In AML research, blast load, complete remission, and survival are typically considered. Based on the model’s properties, especially the capability to predict the considered outcomes, blast load turned out to be well suited for the model fitting process.

We formulated an optimization problem to estimate personalized model parameters based on the comparison between observed and predicted blast load (cf. Görlich, 2021).

A grid search was performed to evaluate the fitness landscape of the optimization problem. The grid search approach showed that, depending on the patient’s individual blast course, noisy fitness landscapes can occur. In these cases, a gradient-descent algorithm will usually perform poorly. This problem can be overcome by application of e.g. the differential evolution algorithm (Price et al., 2006). The estimated personalized leukemia parameters can be further correlated to observed clinical data. A preliminary analysis showed promising results.

Finally, the application of mechanistic mathematical models in combination with personalized model fitting seems to be a promising approach within clinical research.

References

Dennis Görlich (accepted). Fitting Personalized Mechanistic Mathematical Models of Acute Myeloid Leukaemia to Clinical Patient Data. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 3: BIOINFORMATICS 2021

Jan C. Banck and Dennis Görlich (2019). In-silico comparison of two induction regimens (7 + 3 vs 7 + 3 plus additional bone marrow evaluation) in acute myeloid leukemia treatment. BMC Systems Biology, 13(1):18.

Kenneth V. Price, Rainer M. Storn and Jouni A. Lampinen (2006). Differential Evolution – A Practical Approach to Global Optimization. Berlin Heidelberg: Springer-Verlag.

**Effect of missing values in multi-environmental trials on variance component estimates**

Jens Hartung, Hans-Peter Piepho*University of Hohenheim, Germany*

A common task in the analysis of multi-environmental trials (MET) by linear mixed models (LMM) is the estimation of variance components (VCs). Most often, MET data are imbalanced, e.g., due to selection. The imbalance mechanism can be missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). If the missing-data pattern in MET is not MNAR, likelihood-based methods are the preferred methods for analysis as they can account for selection. Likelihood-based methods used to estimate VCs in LMM have the property that all VC estimates are constrained to be non-negative and thus the estimators are generally biased. Therefore, there are two potential causes of bias in MET analysis: a MNAR data pattern and the small-sample properties of likelihood-based estimators. The current study tries to distinguish between both possible sources of bias. A simulation study with MET data typical for cultivar evaluation trials was conducted. The missing data pattern and size of VCs was varied. The results showed that for the simulated MET, VC estimates from likelihood-based methods are mainly biased due to the small-sample properties of likelihood-based methods for a small ratio of genotype variance to error variance.

### Open Topics

Chairs: Andre Scherag and Reinhard Vonthein

**Using Historical Data to Predict Health Outcomes – The Prediction Design**

Stella Erdmann, Manuel Feißt, Johannes Krisam, Meinhard Kieser*Institute of Medical Biometry and Informatics, University of Heidelberg, Germany*

The gold standard for the investigation of the efficacy of a new therapy is a randomized controlled trial (RCT). This is costly, time consuming and not always practicable (e.g. for lethal conditions with limited treatment possibilities) or even possible in a reasonable time frame (e.g. in rare diseases due to small sample sizes). At the same time, huge quantities of available control-condition data in analyzable format of former RCTs or real-world data (RWD), i.e., patient‐level data gathered outside the conventional clinical trial setting, are neglected if not often completely ignored. To overcome these shortcomings, alternative study designs using data more efficiently would be desirable.

Assuming that the standard therapy and its mode of functioning is well known and large volumes of patient data exist, it is possible to set up a sound prediction model to determine the treatment effect of this standard therapy for future patients. When a new therapy is intended to be tested against the standard therapy, the vision would be to conduct a single-arm trial and to use the prediction model to determine the effect of the standard therapy on the outcome of interest of patients receiving the test treatment only, instead of setting up a two-arm trial for this comparison. While the advantages of using historical data to estimate the counterfactual are obvious (increased efficiency, lower cost, alleviating participants’ fear of being on placebo), bias could be caused by confounding (e.g. by indication, severity, or prognosis) or a number of other data issues that could jeopardize the validity of the non-randomized comparison.

The aim is to investigate if and how such a design – the prediction design – may be used to provide information on treatment effects by leveraging existing infrastructure and data sources (historical data of RCTs and/or RWD). Therefore, we investigate under what assumptions a linear prediction model could be used to predict the counterfactual of patients precisely enough to construct a test for evaluating the treatment effect for normally distributed endpoints. In particular, it is investigated what amount of data is necessary (for the historical data and for the single arm trial to be conducted). Via simulation studies, it is examined how sensible the design acts towards violations of the assumptions. The results are compared to reasonable (conventional) benchmark scenarios, e.g., the setting of a single-arm study with pre-defined threshold or a setting, where a propensity score matching was performed.

**Arguments for exhuming nonnegative garrote out of grave**

Edwin Kipruto, Willi Sauerbrei*Medical Center-University of Freiburg, Germany*

Background: The original nonnegative garrote (Breiman 1995) seems to have been forgotten despite some of its good conceptual properties. Its unpopularity is probably caused by dependence on least square estimates which does not have solution in high dimensional data and performs very poorly in high degree of multicollinearity. However, Yuan and Lin (2007) showed that nonnegative garrote is a flexible approach that can be used in combination with other estimators besides least squares such as ridge hence the aforementioned challenges can be circumvented; despite this proposal, it is hardly used in practice. Considerable attention has been given to prediction models compared to descriptive models where the aim is to summarize the data structure in a compact manner (Shmueli, 2010). Here our main interest is on descriptive modeling and as a byproduct we will present results of prediction.

Objectives: To evaluate the performance of nonnegative garrote and compare results with some popular approaches using three different real datasets with low to high degree of multicollinearity and in high dimensional data

Methods: We evaluated four penalized regression methods (Nonnegative garrote, lasso, adaptive lasso, relaxed lasso) and two classical variable selection methods (best subset, backward elimination) with and without post-estimation shrinkage.

Results: Nonnegative garrote can be used with other initial estimators besides least squares in highly correlated data and in high dimensional datasets. Negligible differences in predictions were observed in methods while considerable differences were observed in the number of variables selected.

Conclusion: To fit nonnegative garrote in highly correlated data and in high dimensional settings the proposed initial estimates can be used as an alternative to least squares estimates.

**On the assessment of methods to identify influential points in high-dimensional data**

Shuo Wang, Edwin Kipruto, Willi Sauerbrei

Medical Center – University of Freiburg, Germany

Extreme values and influential points in predictors often strongly affect the results of statistical analyses in low and high-dimensional settings. Many methods to detect such values have been proposed but there is no consensus on advantages and disadvantages as well as guidance for practice. We will present various classes of methods and illustrate their use in several high-dimensional data. First, we consider a simple pre-transformation which is combined with feature ranking lists to identify influential points, concentrating on univariable situations (Boulesteix and Sauerbrei, 2011, DOI: 10.1002/bimj.201000189). The procedure will be extended by checking for influential points in bivariate models and by adding some steps to the multivariable approach.

Second, to increase stability of feature ranking lists, we will use various aggregation approaches to explore for extreme values in features and influential observations. The former incurs the rank changes of a specific feature, while the latter causes a universal ranking change. For the detection of extreme values, we employ the simple pretransformation on data and detect the features whose ranks significantly changed after the transformation. For the detection of influential observations, we consider a combination of leave-one-out and rank comparison to detect the observations causing large rank changes. These methods are applied in several publicly available datasets.

**Acceleration of diagnostic research: Is there a potential for seamless designs?**

Werner Vach^{1}, Eric Bibiza-Freiwald^{2}, Oke Gerke^{3}, Tim Friede^{4}, Patrick Bossuyt^{5}, Antonia Zapf^{2}^{1}Basel Academy for Quality and Research in Medicine, Switzerland; ^{2}Institute of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf; ^{3}Department of Nuclear Medicine, Odense University Hospital; ^{4}Department of Medical Statistics, University Medical Center Goettingen; ^{5}Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centers

Background: New diagnostic tests to identify a well-established disease state have to undergo a series of scientific studies from test construction until finally demonstrating a societal impact. Traditionally, these studies are performed with substantial time gaps in between. Seamless designs allow us to combine a sequence of studies in one protocol and may hence accelerate this process.

Aim: A systematic investigation of the potential of seamless designs in diagnostic research.

Methods: We summarized the major study types in diagnostic research and identified their basic characteristics with respect to applying seamless designs. This information was used to identify major hurdles and opportunities for seamless designs.

Results: 11 major study types were identified. The following basic characteristics were identified: type of recruitment (case-control vs population-based), application of a reference standard, inclusion of a comparator, paired or unpaired application of a comparator, assessment of patient relevant outcomes, possibility for blinding of test results.

Two basic hurdles could be identified: 1) Accuracy studies are hard to combine with post-accuracy studies, as the first are required to justify the latter and as application of a reference test in outcome studies is a threat to the study’s integrity. 2) Questions, which can be clarified by other study designs, should be clarified before performing a randomized diagnostic study.

However, there is a substantial potential for seamless designs since all steps from the construction until the comparison with the current standard can be combined in one protocol. This may include a switch from case-control to population-based recruitment as well as a switch from a single arm study to a comparative accuracy study. In addition, change in management studies can be combined with an outcome study in discordant pairs. Examples from the literature illustrate the feasibility of both approaches.

Conclusions: There is a potential for seamless designs in diagnostic research.

Reference: Vach W, Bibiza E, Gerke O, Bossuyt PM, Friede T, Zapf A (2021). A potential for seamless designs in diagnostic research could be identified. J Clin Epidemiol. 29:51-59. doi: 10.1016/j.jclinepi.2020.09.019.

**The augmented binary method for composite endpoints based on forced vital capacity (FVC) in systemic sclerosis-associated interstitial lung disease**

Carolyn Cook^{1}, Michael Kreuter^{2}, Susanne Stowasser^{3}, Christian Stock^{4}^{1}mainanalytics GmbH, Sulzbach, Germany; ^{2}Center for Interstitial and Rare Lung Diseases, Pneumology and Respiratory Care Medicine, Thoraxklinik, University of Heidelberg, Heidelberg, Germany and German Center for Lung Research, Heidelberg, Germany; ^{3}Boehringer Ingelheim International GmbH, Ingelheim am Rhein, Germany; ^{4}Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim am Rhein, Germany

Background

The augmented binary method (Wason & Seaman. Stat Med, 2013; 32(26)) is a novel method for precisely estimating response rates and differences among response rates defined based on a composite endpoint that contains a dichotomized continuous variable and additional inherently binary components. The method is an alternative to traditional approaches such as logistic regression techniques. Due to the complexity and computational demands of the method, experience in clinical studies has been limited thus far and is mainly restricted to oncological studies. Operating characteristics and, thus, potential statistical benefits are unclear for other settings.

Objective

We aimed to perform a Monte Carlo simulation study to assess operating characteristics of the augmented binary method in settings relevant to randomized controlled trials and non-interventional studies in systemic sclerosis-associated interstitial lung disease (SSc-ILD), a rare, chronic autoimmune disease, where composite endpoints of the above described type are frequently applied.

Methods

An extensive simulation study was performed assessing type I error, power, coverage, and bias of the augmented binary method and a standard logistic model for the composite endpoint. Parameters were varied to resemble lung function decline (as measured through the forced vital capacity, FVC), hospitalization events and mortality in patients with SSc-ILD over a 1- and 2-year period. A relative treatment effect of 50% on FVC was assumed, while assumed effects on hospitalizations and mortality were derived from joined modeling analyses of existing trial data (as indirect effects of the treatment on FVC). Further, the methods were exemplarily applied to data from the SENSCIS trial, a phase III randomized, double-blind, placebo-controlled trial to investigate the efficacy and safety of nintedanib in patients with SSc-ILD.

Results

The simulation study is currently in progress and results will be available by the end of January. In preliminary results modest gains in power and precision were observed, with acceptable compromises of type I error, if any. In scenarios with lower statistical powers, these results were more likely to make a difference on inferences concerning the treatment effect. In the exemplary application of the augmented binary method to trial data confidence intervals and p-values on selected endpoints involving FVC decline, hospitalization and mortality were smaller.

Conclusion

Based on preliminary results from a simulation study, we identified areas where the augmented binary method conveys an appreciable advantage compared to standard methods.