Open Topics

Chairs: Andre Scherag and Reinhard Vonthein

Using Historical Data to Predict Health Outcomes – The Prediction Design
Stella Erdmann, Manuel Feißt, Johannes Krisam, Meinhard Kieser
Institute of Medical Biometry and Informatics, University of Heidelberg, Germany

The gold standard for the investigation of the efficacy of a new therapy is a randomized controlled trial (RCT). This is costly, time consuming and not always practicable (e.g. for lethal conditions with limited treatment possibilities) or even possible in a reasonable time frame (e.g. in rare diseases due to small sample sizes). At the same time, huge quantities of available control-condition data in analyzable format of former RCTs or real-world data (RWD), i.e., patient‐level data gathered outside the conventional clinical trial setting, are neglected if not often completely ignored. To overcome these shortcomings, alternative study designs using data more efficiently would be desirable.

Assuming that the standard therapy and its mode of functioning is well known and large volumes of patient data exist, it is possible to set up a sound prediction model to determine the treatment effect of this standard therapy for future patients. When a new therapy is intended to be tested against the standard therapy, the vision would be to conduct a single-arm trial and to use the prediction model to determine the effect of the standard therapy on the outcome of interest of patients receiving the test treatment only, instead of setting up a two-arm trial for this comparison. While the advantages of using historical data to estimate the counterfactual are obvious (increased efficiency, lower cost, alleviating participants’ fear of being on placebo), bias could be caused by confounding (e.g. by indication, severity, or prognosis) or a number of other data issues that could jeopardize the validity of the non-randomized comparison.

The aim is to investigate if and how such a design – the prediction design – may be used to provide information on treatment effects by leveraging existing infrastructure and data sources (historical data of RCTs and/or RWD). Therefore, we investigate under what assumptions a linear prediction model could be used to predict the counterfactual of patients precisely enough to construct a test for evaluating the treatment effect for normally distributed endpoints. In particular, it is investigated what amount of data is necessary (for the historical data and for the single arm trial to be conducted). Via simulation studies, it is examined how sensible the design acts towards violations of the assumptions. The results are compared to reasonable (conventional) benchmark scenarios, e.g., the setting of a single-arm study with pre-defined threshold or a setting, where a propensity score matching was performed.

Arguments for exhuming nonnegative garrote out of grave
Edwin Kipruto, Willi Sauerbrei
Medical Center-University of Freiburg, Germany

Background: The original nonnegative garrote (Breiman 1995) seems to have been forgotten despite some of its good conceptual properties. Its unpopularity is probably caused by dependence on least square estimates which does not have solution in high dimensional data and performs very poorly in high degree of multicollinearity. However, Yuan and Lin (2007) showed that nonnegative garrote is a flexible approach that can be used in combination with other estimators besides least squares such as ridge hence the aforementioned challenges can be circumvented; despite this proposal, it is hardly used in practice. Considerable attention has been given to prediction models compared to descriptive models where the aim is to summarize the data structure in a compact manner (Shmueli, 2010). Here our main interest is on descriptive modeling and as a byproduct we will present results of prediction.

Objectives: To evaluate the performance of nonnegative garrote and compare results with some popular approaches using three different real datasets with low to high degree of multicollinearity and in high dimensional data

Methods: We evaluated four penalized regression methods (Nonnegative garrote, lasso, adaptive lasso, relaxed lasso) and two classical variable selection methods (best subset, backward elimination) with and without post-estimation shrinkage.

Results: Nonnegative garrote can be used with other initial estimators besides least squares in highly correlated data and in high dimensional datasets. Negligible differences in predictions were observed in methods while considerable differences were observed in the number of variables selected.

Conclusion: To fit nonnegative garrote in highly correlated data and in high dimensional settings the proposed initial estimates can be used as an alternative to least squares estimates.

On the assessment of methods to identify influential points in high-dimensional data

Shuo Wang, Edwin Kipruto, Willi Sauerbrei

Medical Center – University of Freiburg, Germany

Extreme values and influential points in predictors often strongly affect the results of statistical analyses in low and high-dimensional settings. Many methods to detect such values have been proposed but there is no consensus on advantages and disadvantages as well as guidance for practice. We will present various classes of methods and illustrate their use in several high-dimensional data. First, we consider a simple pre-transformation which is combined with feature ranking lists to identify influential points, concentrating on univariable situations (Boulesteix and Sauerbrei, 2011, DOI: 10.1002/bimj.201000189). The procedure will be extended by checking for influential points in bivariate models and by adding some steps to the multivariable approach.

Second, to increase stability of feature ranking lists, we will use various aggregation approaches to explore for extreme values in features and influential observations. The former incurs the rank changes of a specific feature, while the latter causes a universal ranking change. For the detection of extreme values, we employ the simple pretransformation on data and detect the features whose ranks significantly changed after the transformation. For the detection of influential observations, we consider a combination of leave-one-out and rank comparison to detect the observations causing large rank changes. These methods are applied in several publicly available datasets.

Acceleration of diagnostic research: Is there a potential for seamless designs?
Werner Vach1, Eric Bibiza-Freiwald2, Oke Gerke3, Tim Friede4, Patrick Bossuyt5, Antonia Zapf2
1Basel Academy for Quality and Research in Medicine, Switzerland; 2Institute of Medical Biometry and Epidemiology, University Medical Center Hamburg-Eppendorf; 3Department of Nuclear Medicine, Odense University Hospital; 4Department of Medical Statistics, University Medical Center Goettingen; 5Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Amsterdam University Medical Centers

Background: New diagnostic tests to identify a well-established disease state have to undergo a series of scientific studies from test construction until finally demonstrating a societal impact. Traditionally, these studies are performed with substantial time gaps in between. Seamless designs allow us to combine a sequence of studies in one protocol and may hence accelerate this process.

Aim: A systematic investigation of the potential of seamless designs in diagnostic research.

Methods: We summarized the major study types in diagnostic research and identified their basic characteristics with respect to applying seamless designs. This information was used to identify major hurdles and opportunities for seamless designs.

Results: 11 major study types were identified. The following basic characteristics were identified: type of recruitment (case-control vs population-based), application of a reference standard, inclusion of a comparator, paired or unpaired application of a comparator, assessment of patient relevant outcomes, possibility for blinding of test results.

Two basic hurdles could be identified: 1) Accuracy studies are hard to combine with post-accuracy studies, as the first are required to justify the latter and as application of a reference test in outcome studies is a threat to the study’s integrity. 2) Questions, which can be clarified by other study designs, should be clarified before performing a randomized diagnostic study.

However, there is a substantial potential for seamless designs since all steps from the construction until the comparison with the current standard can be combined in one protocol. This may include a switch from case-control to population-based recruitment as well as a switch from a single arm study to a comparative accuracy study. In addition, change in management studies can be combined with an outcome study in discordant pairs. Examples from the literature illustrate the feasibility of both approaches.

Conclusions: There is a potential for seamless designs in diagnostic research.

Reference: Vach W, Bibiza E, Gerke O, Bossuyt PM, Friede T, Zapf A (2021). A potential for seamless designs in diagnostic research could be identified. J Clin Epidemiol. 29:51-59. doi: 10.1016/j.jclinepi.2020.09.019.

The augmented binary method for composite endpoints based on forced vital capacity (FVC) in systemic sclerosis-associated interstitial lung disease
Carolyn Cook1, Michael Kreuter2, Susanne Stowasser3, Christian Stock4
1mainanalytics GmbH, Sulzbach, Germany; 2Center for Interstitial and Rare Lung Diseases, Pneumology and Respiratory Care Medicine, Thoraxklinik, University of Heidelberg, Heidelberg, Germany and German Center for Lung Research, Heidelberg, Germany; 3Boehringer Ingelheim International GmbH, Ingelheim am Rhein, Germany; 4Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim am Rhein, Germany


The augmented binary method (Wason & Seaman. Stat Med, 2013; 32(26)) is a novel method for precisely estimating response rates and differences among response rates defined based on a composite endpoint that contains a dichotomized continuous variable and additional inherently binary components. The method is an alternative to traditional approaches such as logistic regression techniques. Due to the complexity and computational demands of the method, experience in clinical studies has been limited thus far and is mainly restricted to oncological studies. Operating characteristics and, thus, potential statistical benefits are unclear for other settings.


We aimed to perform a Monte Carlo simulation study to assess operating characteristics of the augmented binary method in settings relevant to randomized controlled trials and non-interventional studies in systemic sclerosis-associated interstitial lung disease (SSc-ILD), a rare, chronic autoimmune disease, where composite endpoints of the above described type are frequently applied.


An extensive simulation study was performed assessing type I error, power, coverage, and bias of the augmented binary method and a standard logistic model for the composite endpoint. Parameters were varied to resemble lung function decline (as measured through the forced vital capacity, FVC), hospitalization events and mortality in patients with SSc-ILD over a 1- and 2-year period. A relative treatment effect of 50% on FVC was assumed, while assumed effects on hospitalizations and mortality were derived from joined modeling analyses of existing trial data (as indirect effects of the treatment on FVC). Further, the methods were exemplarily applied to data from the SENSCIS trial, a phase III randomized, double-blind, placebo-controlled trial to investigate the efficacy and safety of nintedanib in patients with SSc-ILD.


The simulation study is currently in progress and results will be available by the end of January. In preliminary results modest gains in power and precision were observed, with acceptable compromises of type I error, if any. In scenarios with lower statistical powers, these results were more likely to make a difference on inferences concerning the treatment effect. In the exemplary application of the augmented binary method to trial data confidence intervals and p-values on selected endpoints involving FVC decline, hospitalization and mortality were smaller.


Based on preliminary results from a simulation study, we identified areas where the augmented binary method conveys an appreciable advantage compared to standard methods.