Chairs: Ingmar Glauche and Matthias Horn

**Future Prevalence of Type 2 Diabetes – A Comparative Analysis of Chronic Disease Projection Methods**

Dina Voeltz^{1}, Thaddäus Tönnies^{2}, Ralph Brinks^{1,2,3}, Annika Hoyer^{1}^{1}Ludwig-Maximilians-Universität München, Germany; ^{2}Institute for Biometrics and Epidemiology, German Diabetes Center, Leibniz Institute for Diabetes Research at Heinrich-Heine-University Duesseldorf; ^{3}Hiller Research Unit for Rheumatology Duesseldorf

Background: Precise projections of future chronic disease cases needing pharmaco-intensive treatments are necessary for effective resource allocation and health care planning in response to increasing disease burden.

Aim: To compare different projection methods to estimate the number of people diagnosed with type 2 diabetes (T2D) in Germany in 2040.

Methods: We compare the results of three methods to project the number of people with T2D in Germany 2040. In a relatively simple approach, method 1) combines the sex- and age-specific prevalence of T2D in 2015 with sex- and age-specific population distributions projected by the German Federal Statistical Office (FSO). Methods 2) and 3) additionally account for incidence of T2D and mortality rates using mathematical relations as proposed by the illness-death model for chronic diseases [1]. Therefore, they are more comprehensive than method 1), which likely adds to their results’ validity and accuracy. For this purpose, method 2) firstly models the prevalence of T2D employing a partial differential equation (PDE) which incorporates incidence and mortality [2]. This flexible, yet simple PDE used yields is validated in contexts of dementia, amongst others, and is recommended for chronic disease epidemiology. Subsequently, the estimated prevalence is multiplied with the population projection of the FSO [3]. Hence, method 2) uses the projected general mortality of the FSO and the mortality rate ratio of diseased vs. non-diseased people. By contrast, method 3) estimates future mortality of non-diseased and diseased people independently from the projection of the FSO. These estimated future mortality rates function as input for two PDEs to directly project the absolute number of cases. The sex- and age-specific incidence rate for methods 2) and 3) stems from the risk structure compensation (Risikostrukturausgleich, MorbiRSA) which comprises data from about 70 million Germans in the public health insurance. The incidence rate is assumed to remain as in 2015 throughout the overall projection horizon from 2015 to 2040.

Results: Method 1) projects 8.3 million people with diagnosed T2D in Germany in 2040. Compared to 6.9 million people in 2015, this equals an increase by 21%. Methods 2) and 3) project 11.5 million (+65% compared to 2015) and 12.5 million (+85%) T2D patients, respectively.

Conclusions: The methods’ results differ substantially. Method 1) accounts for the aging of the German population but is otherwise relatively little comprehensive. Method 2) and 3) additionally consider underlying changes in the incidence and mortality rates affecting disease prevalence.

**Mixed-effects ANCOVA for estimating the difference in population mean parameters in case of nonlinearly related data**

Ricarda Graf*University of Göttingen, Germany*

Repeated measures data can be found in many fields. The two types of variation characteristic for this type of data – referred to as within-subject and between-subject variation – are accounted for by linear and nonlinear mixed-effects models. ANOVA-type models are sometimes applied for comparison of population means despite a nonlinear relationship in the data. Accurate parameter estimation through more appropriate nonlinear-mixed effects (NLME) models, such as for sigmoidal curves, might be hampered due to insufficient data near the asymptotes, the choice of starting values for the iterative optimization algorithms used given the lack of closed-form expressions of the likelihood or due to convergence problems of these algorithms.

The main objective of this thesis is to compare the performance of a one-way mixed-effects ANCOVA and a NLME three-parameter logistic regression model with respect to the accuracy in estimating the difference in population means. Data from a clinical trial1, in which the difference in mean blood pressure (BP50) between two groups was estimated by repeated-measures ANOVA, served as a reference for data simulation. A third simplifying method, used in toxicity studies², was additionally included. It considers the two measurements per subject lying immediately below and above mean half maximal response (E_max). Population means are obtained by considering the intersections of the horizontal line represented by half E_max and the line derived from connecting the two data points per subject and group. A simulation study with two scenarios was conducted to compare bias, coverage rates and empirical SE of the three methods when estimating the difference in BP50 for purpose of identification of the disadvantages by using the simpler linear instead of the nonlinear model. In the first scenario, the true individual blood pressure ranges were considered, while in the second scenario, measurements at characteristic points of the sigmoidal curves were considered, regardless of the true measurement ranges, in order to obtain a more distinct nonlinear relationship.

The estimates of the mixed-effects ANCOVA model were more biased but also more precise compared with the NLME model. The ANCOVA method could not detect the difference in BP50 in the second scenario anymore. The results of the third method did not seem reliable since its estimates did on average even reverse the direction of the true parameter.

NLME models should be preferred for data with a known nonlinear relationship if the available data allows it. Convergence problems can be overcome by using a Bayesian approach.

**Explained Variation in the Linear Mixed Model**

Nicholas Schreck*DKFZ Heidelberg, Germany*

The coefficient of determination is a standard characteristic in linear models with quantitative response variables. It is widely used to assess the proportion of variation explained, to determine the goodness-of-fit and to compare models with different covariates.

However, there has not been an agreement on a similar quantity for the class of linear mixed models yet.

We introduce a natural extension of the well-known adjusted coefficient of determination in linear models to the variance components form of the linear mixed model.

This extension is dimensionless, has an intuitive and simple definition in terms of variance explained, is additive for several random effects and reduces to the adjusted coefficient of determination in the linear model.

To this end, we prove a full decomposition of the sum of squares of the independent variable into the explained and residual variance.

Based on the restricted maximum likelihood equations, we introduce a novel measure for the explained variation which we allocate specifically to the contribution of the fixed and the random covariates of the model.

We illustrate that this empirical explained variation can in particular be used as an improved estimator of the classical additive genetic variance of continuous complex traits.

**Modelling acute myeloid leukemia: Closing the gap between model parameters and individual clinical patient data**

Dennis Görlich*Institute of Biostatistics and Clinical Research, University Münster, Germany*

In this contribution, we will illustrate and discuss our approach to fit a mechanistic mathematical model of acute myeloid leukemia (AML) to individual patient data, leading to personalized model parameter estimates.

We use a previously published model (Banck and Görlich, 2019) that describes the healthy hematopoiesis and the leukemia dynamics. Here, we consider a situation where the healthy hematopoiesis is calibrated to a population average and personalized leukemia parameters (self renewal, proliferation, and treatment intensity) needs to be estimated.

To link the mathematical model to clinical data model predictions needs to be aligned to observable clinical outcome measures. In AML research, blast load, complete remission, and survival are typically considered. Based on the model’s properties, especially the capability to predict the considered outcomes, blast load turned out to be well suited for the model fitting process.

We formulated an optimization problem to estimate personalized model parameters based on the comparison between observed and predicted blast load (cf. Görlich, 2021).

A grid search was performed to evaluate the fitness landscape of the optimization problem. The grid search approach showed that, depending on the patient’s individual blast course, noisy fitness landscapes can occur. In these cases, a gradient-descent algorithm will usually perform poorly. This problem can be overcome by application of e.g. the differential evolution algorithm (Price et al., 2006). The estimated personalized leukemia parameters can be further correlated to observed clinical data. A preliminary analysis showed promising results.

Finally, the application of mechanistic mathematical models in combination with personalized model fitting seems to be a promising approach within clinical research.

References

Dennis Görlich (accepted). Fitting Personalized Mechanistic Mathematical Models of Acute Myeloid Leukaemia to Clinical Patient Data. Proceedings of the 14th International Joint Conference on Biomedical Engineering Systems and Technologies, Volume 3: BIOINFORMATICS 2021

Jan C. Banck and Dennis Görlich (2019). In-silico comparison of two induction regimens (7 + 3 vs 7 + 3 plus additional bone marrow evaluation) in acute myeloid leukemia treatment. BMC Systems Biology, 13(1):18.

Kenneth V. Price, Rainer M. Storn and Jouni A. Lampinen (2006). Differential Evolution – A Practical Approach to Global Optimization. Berlin Heidelberg: Springer-Verlag.

**Effect of missing values in multi-environmental trials on variance component estimates**

Jens Hartung, Hans-Peter Piepho*University of Hohenheim, Germany*

A common task in the analysis of multi-environmental trials (MET) by linear mixed models (LMM) is the estimation of variance components (VCs). Most often, MET data are imbalanced, e.g., due to selection. The imbalance mechanism can be missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). If the missing-data pattern in MET is not MNAR, likelihood-based methods are the preferred methods for analysis as they can account for selection. Likelihood-based methods used to estimate VCs in LMM have the property that all VC estimates are constrained to be non-negative and thus the estimators are generally biased. Therefore, there are two potential causes of bias in MET analysis: a MNAR data pattern and the small-sample properties of likelihood-based estimators. The current study tries to distinguish between both possible sources of bias. A simulation study with MET data typical for cultivar evaluation trials was conducted. The missing data pattern and size of VCs was varied. The results showed that for the simulated MET, VC estimates from likelihood-based methods are mainly biased due to the small-sample properties of likelihood-based methods for a small ratio of genotype variance to error variance.