Non Clinical Statistics II

On the Role of Historical Control Data in Preclinical Development
Helena Geys
Johnson & Johnson, Belgium

Historical control databases are established by many companies in order to be able to contextualize results from single studies against previous studies performed under similar conditions, to properly design studies and/or to come up with quality control instruments.

Typical preclinical experiments involve a study of a control group of untreated animals and groups of animals exposed to increasing doses. The ultimate aim is to test for a dose related trend in the response of interest. Usually one would focus on one particular experiment. However, since such experiments are conducted in genetically homogeneous animal strains, historical control data from previous similar experiments are sometimes used in interpreting results of a current study.

The use of historical control data in supporting inferences varies across different assays. For example, in genetic toxicology and safety pharmacology, a response may be considered positive in a specific experiment if the result is outside the distribution of the historical negative control data (95% control limits). Whereas, in carcinogenicity studies, historical control data are particularly useful in classifying tumors as rare or common and for evaluation of disparate findings in dual concurrent controls.

Historical control data are often used to carry out an informal equivalence test, whereby a New Molecular Entity (NME) is considered to be “safe” when the results from the treatment groups fall entirely within the negative control distribution.

In addition, formal statistical procedures have been proposed that allow to incorporate historical control data and to combine them with the current control group in tests trend identification.

Clearly historical control data are playing an important role in preclinical development as quality control and interpretation instrument. Yet, the issue of when and how to use historical control data is still not clear and subject to ongoing debate. In this presentation we will highlight pros and cons and the important role a preclinical statistician can play in this.


A comparison of different statistical strategies for the analysis of data in reproductive toxicology involving historical negative controls
Bernd-Wolfgang Igl, Monika Brüning, Bernd Baier
Boehringer Ingelheim Pharma GmbH & Co. KG, Germany

A fundamental requirement of regulatory bodies for the development of new pharmaceuticals is to perform nonclinical developmental and reproductive toxicology (DART) studies to reveal any possible effect of the test item on mammalian reproduction. Usually DART studies are performed in rats and a further (non-rodent) species and aim to support human clinical trials and market access. General recommendations are given in ICH Guideline S5 allowing various phase-dependent designs for a huge number of parameters. The statistical evaluation of DART data is quite multifaceted due to more or less complex correlation structures between mother and offspring, e.g. maternal weight development, fetus weight, ossification status and number of littermates all in dependence of different test item doses.

Initially, we will sketch a Scrum inspired project that was set-up as a cooperation between Boehringer Ingelheim’s Reproductive Toxicology and Non-Clinical Statistics groups. Then, we will describe the particular role and relevance of historical control data in reproductive toxicology. This will be followed by a presentation of common statistical models and some related open problems. Finally, we will give some simulation-based results on statistical power and sample size for the detection of certain events in DART studies.


A Nonparametric Bayesian Model for Historical Control Data in Reproductive Toxicology
Ludger Sandig1, Bernd Baier2, Bernd-Wolfgang Igl3, Katja Ickstadt4
1Fakultät Statistik, Technische Universität Dortmund; 2Reproductive Toxicology, Nonclinical Drug Safety, Boehringer Ingelheim Pharma GmbH & Co. KG; 3Non-Clinical Statistics, Biostatistics and Data Sciences Europe, Boehringer Ingelheim Pharma GmbH & Co. KG; 4Lehrstuhl für mathematische Statistik und biometrische Anwendungen, Fakultät Statistik, Technische Universität Dortmund

Historical control data are of fundamental importance for the interpretation of developmental and reproductive toxicology studies. Modeling such data presents two challenges: Outcomes are observed on different measurement scales (continuous, counts, categorical) and on multiple nested levels (fetuses within a litter, litters within a group, groups within a set of experiments). We propose a nonparametric Bayesian approach to tackle both of them. By using a hierarchical Dirichlet process mixture model we can capture the dependence structure of observables both within and between litters. Additionally we can accommodate an arbitrary number of variables on arbitrary measurement scales at the fetus level, e.g. fetus weight (continuous) and malformation status (categorical). In a second step we extend the model to incorporate observables at higher levels in the hierarchy, e.g. litter size or maternal weight. Inference in these models is possible using Markov Chain Monte Carlo (MCMC) techniques which we implemented in R. We illustrate our approach on several real-world datasets.


Weightloss as Safety Indicator in Rodents
Tina Lang, Issam Ben Khedhiri
Bayer AG, Germany

In preclinical research, the assessment of animal well-being is crucial to ensure ethical standards and compliance with guidelines. It is a tough task to define rules within which the well-being is deemed ok, and when to claim that the suffering of the animal exceeds a tolerable burden and thus, the animal needs to be sacrificed. Indicators are, e.g., food refusal, listlessness and, most outstanding, body weight.

For rodents, a popular rule states that animals that experiences > 20% body weight loss exceeds limits of tolerable suffering and has to be taken out of the experiment. However, research experiments are of highly various nature (Talbot et al., 2020). An absolute rule for all of them can lead to unnecessary deaths of lab animals that are still within reasonable limits of well-being, but for various reasons fall below the body weight limit.

An additional challenge are studies on juvenile rodents which are still within their growth phase. Here, a weight loss might not be observable, but a reduced weight gain could indicate complications. As a solution, their weight gain is routinely compared to the mean weight gain of a control group of animals. If the weight gain differs by a certain percentage, the animals are excluded from the experiment. In case of frequent weighing and small weight gains in the control group, this leads to mathematically driven exclusion of animals which are fit and healthy.

We propose a different approach of safety monitoring which firstly unify assessment for juvenile and adult animals and secondly compensate for different conditions within different experiments.

If a reasonable control group can be kept within the study design, the body weight within the control group is assumed to be lognormally distributed. Within the interval of mean log body weight plus/minus three standard deviations, about 99.73% of all control animals are expected to be found. We conclude that this interval contains acceptable body weights. As the theoretical means and standard deviations of log body weight are unknown, we checked how their empirical equivalent counterparts perform.

We investigated if the rule leaves all healthy animals in the study and only excludes suffering animals. Our data shows that it outperforms the traditional rules by far. Many animals that would have been excluded by the traditional rules can now stay in the study. Thus, the new rule supports animal welfare, and also increases the power of the experiment.