Chairs: Gudio Knapp and Gerta Rücker

**Investigating treatment-effect modification by a continuous covariate in IPD meta-analysis: an approach using fractional polynomials**

Willi Sauerbrei^{1}, Patrick Royston^{2}^{1}Medical Center – University of Freiburg, Germany; ^{2}MRC Clinical Trials Unit at UCL, London, UK

Context: In clinical trials, there is considerable interest in investigating whether a treatment effect is similar in all patients, or that some prognostic variable indicates a differential response to treatment. To examine this, a continuous predictor is usually categorised into groups according to one or more cutpoints. Several weaknesses of categorisation are well known.

Objectives: To avoid the disadvantages of cutpoints and to retain full information, it is preferable to keep continuous variables continuous in the analysis. The aim is to derive a statistical procedure to handle this situation when individual patient data (IPD) are available from several studies.

Methods: For continuous variables, the multivariable fractional polynomial interaction (MFPI) method provides a treatment effect function (TEF), that is, a measure of the treatment effect on the continuous scale of the covariate (Royston and Sauerbrei, Stat Med 2004, 2509‐25). MFPI is applicable to most of the popular regression models, including Cox and logistic regression. A meta‐analysis approach for averaging functions across several studies has been proposed (Sauerbrei and Royston, Stat Med 2011, 3341‐60). A first example combining these two techniques (called metaTEFs) was published (Kasenda et al, BMJ Open 2016; 6:e011148). Another approach called meta-stepp was proposed (Wang et al, Stat Med 2016, 3704- 16). Using the data from Wang (8 RCTs in patients with breast cancer) we will illustrate various issues of our metaTEFs approach.

Results and Conclusions: We used metaTEFs to investigate a potential treatment effect modifier in a meta‐analysis of IPD from eight RCTs. In contrast to cutpoint‐based analyses, the approach avoids several critical issues and gives more detailed insight into how the treatment effect is related to a continuous biomarker. MetaTEFs retains the full information when performing IPD meta‐analyses of continuous effect modifiers in randomised trials. Early experience suggests it is a promising approach.

**Standardisierte Mittelwertdifferenzen aus Mixed Model Repeated Measures – Analysen**

Lars Beckmann, Ulrich Grouven, Guido Skipka*Institut für Qualität und Wirtschaftlichkeit im Gesundheitswesen (IQWiG), Deutschland*

In klinischen Studien werden für Patientinnen und Patienten häufig Daten zur gesundheitsbezogenen Lebensqualität und zur Symptomatik zu aufeinanderfolgenden Zeitpunkten erhoben. Für die Auswertung dieser longitudinalen Daten werden in der Literatur lineare gemischte Modelle für Messwiederholungen (Mixed Models Repeated Measures – Modelle [MMRM]) vorgeschlagen. Diese Endpunkte werden in der Regel mit Skalen mit nicht natürlichen Einheiten gemessen.

Es liegt nahe, für die Bestimmung einer klinischen Relevanz oder für die Durchführung von Metaanalysen auf standardisierte Mittelwertdifferenzen (SMD), wie beispielsweise Cohens d oder Hedges’ g, zurückzugreifen. Allerdings ist unklar, wie die für die SMD benötigte gepoolte Standardabweichung aus MMRM – Analysen berechnet werden kann. Anhand einer Simulationsstudie wurden verschiedene Verfahren zur Schätzung einer SMD untersucht. Die Verfahren lassen sich unterteilen in Ansätze, die auf die im MMRM geschätzten Standardfehler der Mittelwertdifferenz (MD) zurückgreifen, und in Ansätze, die die individuellen Patientendaten (IPD) benutzen.

Simuliert wurden Daten einer randomisierten kontrollierten Studie. Die longitudinalen Daten wurden mittels eines autoregressiven Modells 1. Ordnung (AR) für die Abhängigkeiten zwischen den Erhebungszeitpunkten simuliert. Parameter für die Simulationen waren die SMD, die Varianz für die Änderung zum Ausgangswert, die Korrelation für das AR sowie die Stichprobengrößen in den Therapiearmen. Der betrachtete Endpunkt ist die Differenz zwischen den Therapiearmen hinsichtlich der mittleren Änderung zum Ausgangswert über den gesamten Studienverlauf. Die verschiedenen Verfahren wurden bezüglich Überdeckungswahrscheinlichkeit, Verzerrung, Mean Squared Error (MSE), Power und Fehler 1. Art sowie Konkordanz von MD und SMD bez. der statistischen Signifikanz und der Überdeckung des wahren Effektes verglichen.

Die Verfahren, bei denen die gepoolte Standardabweichung aus Standardfehlern des MMRM berechnet wird, zeigen Verzerrungen, die zu einer deutlichen Überschätzung des wahren Effektes führen. Verfahren, die die gepoolte Standardabweichung aus den beobachteten Veränderungen zum Studienanfang schätzen, zeigen eine deutlich geringere Verzerrung und einen geringeren MSE. Allerdings ist die Power, im Vergleich zur MD, kleiner.

Die Schätzung einer SMD mittels der Standardfehler aus dem MMRM ist nicht angemessen. Dies ist insbesondere bei der Bewertung von großen SMDs zu berücksichtigen. Zu einer angemessenen Schätzung einer SMD sind Verfahren notwendig, aus denen die gepoolte Standardabweichung der Änderung zum Ausgangswert mit IPD geschätzt werden kann.

**Robust Covariance Estimation in Multivariate Meta-Regression**

Thilo Welz *TU Dortmund University, Germany*

Univariate Meta-Regression (MR) is an important technique for medical and psychological research and has been deeply researched. Its multivariate counterpart, however, remains less explored. Multivariate MR holds the potential to incorporate the dependency structure of multiple effect measures as opposed to performing multiple univariate analyses. We explore the possibilities for robust estimation of the covariance of the coefficients in our multivariate MR model. More specifically, we extend heteroscedasticity consistent (also called sandwich or HC-type) estimators from the univariate to the multivariate context. These, along with the Knapp-Hartung adjustment, proved useful in previous work (see Viechtbauer (2015) for an analysis of Knapp-Hartung and Welz & Pauly (2020) for HC-estimators in univariate MR). In our simulations we focus on the bivariate case, which is important for incorporating secondary outcomes as in Copas et al. (2018), but higher dimensions are also possible. The validity of the considered robust estimators is evaluated based on the type-I-error and power of statistical tests based on these estimators. We compare our robust estimation approach with a classical (non-robust) procedure. Finally, we highlight some of the numerical and statistical issues we encountered and provide pointers for others wishing to employ these methods in their analyses.

**A Bayesian approach to combine rater assessments**

Lorenz Uhlmann^{1,2}, Christine Fink^{3}, Christian Stock^{2}, Marc Vandemeulebroecke^{1}, Meinhard Kieser^{2} * *^{1}Novartis Pharma AG, Basel, Switzerland; ^{2}Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Germany; ^{3}Department of Dermatology, University Medical Center, Ruprecht-Karls University, Heidelberg, Germany

Background: Ideally, endpoints in clinical studies are objectively measurable and easy to assess. However, sometimes this is infeasible and alternative approaches based on (more subjective) rater assessments need to be considered. A Bayesian approach to combine such rater assessments and to estimate relative treatment effects is proposed. Methods: We focus on a setting where each subject is observed under the condition of every group and where one or multiple raters assign scores that constitute the endpoints. We further assume that the raters compare the arms in a pairwise way by simply scoring them on an individual subject-level. This setting has principle similarities to network meta-analysis where groups (or treatment arms) are ranked in a probabilistic fashion. Many ideas from this field, such as heterogeneity (within raters) or inconsistency (between raters), can be directly applied. We build on Bayesian methodology used in this field and derive models for normally distributed and ordered categorical scores which take into account an arbitrary number of raters and groups. Results: A general framework is created which is at the same time easy to implement and allows for a straightforward interpretation of the results. The method is illustrated with a real clinical study example on a computer-aided hair detection and removal algorithm in dermatoscopy. Raters assessed the image quality of pictures generated by the algorithm compared to pictures of unshaved and shaved nevis. Conclusion: A Bayesian approach to combine rater assessments based on an ordinal or continuous scoring system to compare groups in a pairwise fashion is proposed and illustrated using a real data example. The model allows to assess all pairwise comparisons among multiple groups. Since the approach is based on the well-established network meta-analysis methodology, many characteristics can be inferred from that methodology.