Non Clinical Statistics I

Can statistics save preclinical research?
Ulrich Dirnagl
Charité / Berlin Institute of Health, Deutschland


Meggie Danziger1,2, Ulrich Dirnagl1,2, Ulf Toelch2
1Charité – Universitätsmedizin Berlin, Germany; 2BIH QUEST Center for Transforming Biomedical Research

Low statistical power in preclinical experiments has been repeatedly pointed out as a roadblock to successful replication and translation. To increase reproducibility of preclinical experiments under ethical and budget constraints, it is necessary to devise strategies that improve the efficiency of confirmatory studies.

To this end, we simulate two preclinical research trajectories from the exploratory stage to the results of a within-lab replication study based on empirical pre-study odds. In a first step, a decision is made based on exploratory data whether to continue to a replication. One trajectory (T1) employs the conventional significance threshold for this decision. The second trajectory (T2) uses a more lenient threshold based on an a priori determined smallest effect size of interest (SESOI). The sample size of a potential replication study is calculated via a standard power analysis using the initial exploratory effect size (T1) or using a SESOI (T2). The two trajectories are compared regarding the number of experiments proceeding to replication, number of animals tested, and positive predictive value (PPV).

Our simulations show that under the conventional significance threshold, only 32 percent of the initial exploratory experiments progress to the replication stage. Using the decision criterion based on a SESOI, 65 percent of initial studies proceed to replication. T1 results in the lowest number of animals needed for replication (n = 7 per group) but yields a PPV that is below pre-study odds. T2 increases PPV above pre-study odds while keeping sample size at a reasonably low number (n = 23 per group).

Our results reveal that current practice, represented by T1, impedes efforts to replicate preclinical experiments. Optimizing decision criteria and experimental design by employing easily applicable variations as shown in T2 keeps tested animal numbers low while generating more robust preclinical evidence that may ultimately benefit translation.

Information sharing across genes for improved parameter estimation in concentration-response curves
Franziska Kappenberg, Jörg Rahnenführer
TU Dortmund University, Germany

Technologies for measuring high-dimensional gene expression values for tens of thousands of genes simultaneously are well established. In toxicology, for estimating concentration-response curves, such data can be used to understand the biological processes initiated at different concentrations. Increasing the number of concentrations or the number of replicates per concentration can improve the accuracy of the fit, but causes critical additional costs. A statistical approach to obtain higher-quality fits is to exploit similarities between high-dimensional concentration-gene expression data. This idea can also be called information sharing across genes. Parameters of the concentration-response curves can be linked, according to a priori assumptions or estimates of the distributions of the parameters, in a Bayesian framework.

Here, we consider the special case of the sigmoidal 4pLL model for estimating the curves associated with single genes, and we are interested in the EC50 value of the curve, i.e. the concentration at which the half-maximal effect is reached. This value is a parameter of the 4pLL model and can be considered a reasonable indicator for a relevant expression effect of the corresponding gene. We introduce an empirical Bayes method for information sharing across genes in this situation, by modelling the distribution of the EC50 values across all genes. Based on this distribution, for each gene a weighted mean of the individually estimated parameter and the overall mean of the estimated parameters of all genes is calculated. In other words, parameters are shrunk towards an overall mean. We evaluate our approach using several simulation studies that differ with respect to their degree of assumptions made for the distribution of the EC50 values. Finally, the method is also applied to a real gene expression dataset to demonstrate the influence of the analysis strategy on the results.

An intuitive time-dose-response model for cytotoxicity data with varying exposure times
Julia Christin Duda, Jörg Rahnenführer
TU Dortmund University, Germany

Modeling approaches for dose-response or concentration-response analyses are slowly becoming more popular in toxicological applications. For cytotoxicity assays, not only the concentration but also the exposure or incubation time of the compound administered to cells can be varied and might have influence on the response. A popular concentration-response model is the four-parameter log-logistic (4pLL) or, more specific and tailored to cytotoxicity data, the two-parameter log-logistic (2pLL) model. Both models, however, model the response based on the concentration only.

We propose a two-step procedure and a new time-concentration-response model for cytotoxicity data in which both concentration and exposure time are varied. The parameter of interest for the estimation is the EC50 value, i.e. the concentration at which half of the maximal effect is reached. The procedure consists of a testing step and a modeling step. In the testing step, a nested ANOVA test is performed to decide if the exposure time has an effect on the shape of the concentration-response curve. If no effect is identified then a classical 2pLL model is fitted. Otherwise, a new time-concentration-response model called td2pLL is fitted. In this model, we incorporate exposure time information into the 2pLL model by making the EC50 parameter dependent on the exposure time.

In simulation studies inspired by and based on a real data set, we compare the proposed procedure against various alternatives with respect to the precision of the estimation of the EC50 value. In all simulations, the new procedure provides estimates with higher or comparable precision, which demonstrates its universal applicability in corresponding toxicological experiments. In addition, we show that the use of optimal designs for cytotoxicity experiments further improves the EC50 estimates throughout all considered scenarios while reducing numerical problems. In order to facilitate the application in toxicological practice, the developed methods will be made available to practitioners via the R package td2pLL and a corresponding vignette that demonstrates the application on an example dataset.