Early bird or night owl – the influence of testing time on anxiety, exploration, and learning in mice and its potential for improving reproducibility
The reproducibility crisis highlights the need for improvement strategies throughout the life sciences. Within the realm of animal-based research, we demonstrated significant effects of the time of behavioural testing on different mouse strains, thus questioning current standardisation practices.
Why time matters
Mice mainly rest at daytime and are highly active at night. For this reason, researchers worldwide (us included) regularly have to face the question whether to subject mice to behavioural testing during their active or resting phase, in other words, in the middle of the night or during normal working hours. Therefore, the choice of testing times often favours practicability reasons over the animals’ physiology and behaviour, leading researchers to conduct behavioural tests during the resting phase of the animals. That this decision might influence test results has been discussed in various publications indicating the dependence of behavioural target variables on the time of day at which testing took place (e.g., ). The most likely reason for this phenomenon is the circadian rhythm, which is not only responsible for sleep and wakefulness, but also regulates other crucial systems, like hunger, stress, heart function, alertness, and immunity. Despite this awareness, however, circadian effects are rarely taken into account in the experimental design and statistical analysis, and are commonly not even reported in publications. This is even more surprising given that the testing time could be easily adjusted and controlled in modern, artificially lit, temperature- and humidity-controlled, animal facilities to harmonize the animals’ with the experimenter’s demands.
Furthermore, what does this mean for the comparability and reproducibility of behavioural data between experiments and laboratories? Do we need to fully control and thus standardise the testing time for each and every experiment all over the world to get somehow similar findings? No, in addition to being completely unrealistic, a full control of the testing time would in fact create highly idiosyncratic data that would only be true for the specific experimental conditions under which the experiment took place. In contrast to animal studies, in agricultural and human studies it is well-recognized that there will always be a certain amount of uncontrollable variation. Accordingly, the use of a more heterogeneous approach has been suggested to increase the representativeness of study populations and hence, reproducibility of data . This is achieved by systematically increasing the variance of the factor in question, which, in this case, means the testing time. Theoretically, testing time should interact with the treatment under investigation to represent an effective heterogenisation approach. Ideally, the systematic variation of different testing times within an experiment should then lead to more robust results.
By day one way, by night another?
To our knowledge, only very limited data is available regarding possible interactions of testing times and treatments. And a systematic, hypothesis-driven approach with distinct, carefully chosen time windows was missing. Therefore, our aim was to prove the presence of robust testing time effects in a systematic and hypothesis-driven approach. By simply using different laboratory mouse strains to mimic the effects of different ‘treatments’, we aimed at investigating if potential effects of testing time and strain were additive or synergistic. Synergistic effects, i.e. interaction effects, would indicate that ‘testing time’ influences ‘strain’ in different ways. This would lead to results that are highly dependent on the specific time of testing. The more an experiment is then standardised to a specific time window, the lower the chances to reproduce the findings under slightly different conditions. However, based on the recommendations of the Food and Drug Administration (FDA) and the European Agency for the Evaluation of Medicinal Products (EMEA), each study should be replicated at least once. The difficulty to reproduce spurious results then causes an increase in animal numbers. Public support for animal experiments, however, is granted on the explicit understanding that the research is relevant and produces valid and reproducible results using the smallest possible number of animals. The many contradicting results we found in the literature searching for suitable strains showed with clarity how researchers worldwide struggle to reproduce data.
Our idea was to use very common laboratory strains that, at the same time, show varying differences in anxiety- and exploratory behaviours. C57BL/6J and DBA/2N mice emerged as most suitable candidates due to a large number of contradicting results in the literature. In order to select time windows that are representative for times of increased, decreased, or intermediate activity, we recorded and analysed the home cage behaviour of all mice within every cage for a period of 48 hours. Although setting up a camera system that would reliably record the movements of each mouse day and night without interrupting the baseline activity was challenging, the obtained data allowed for picking three times during the dark as well as the light phase and in periods of minimal, intermediate, and maximal activity.
Why varying testing times may affect reproducibility
The extent of the effect of testing time on strain differences was unexpectedly high. During the light i.e. resting phase, strains displayed significant differences in almost every parameter. A similar, slightly attenuated pattern was found during peak activity in the dark phase. However, a completely different picture was revealed for the phase of intermediate activity during the dark phase, where no differences between the strains could be detected. Descriptively, the direction of the difference between strains was even reversed.
Interestingly, the effects of time and strain were not additive. Time had a distinct, strain-specific influence that did not seem to be dependent on the light or dark phase or the home cage activity. Exactly this interaction effect made this factor an extremely interesting candidate for a subsequent simulation approach, in which we simulated the use of a “systematic heterogenisation across two testing times”. We then compared the results of these simulated heterogenised experiments to the conventionally standardised replicate experiments that made use of single testing times and observed drastically improved reproducibility in the heterogenised design.
These very promising findings warrant further investigation of the influence of testing time on other outcome variables and call for a proof-of-principle study using systematically heterogenised testing times.
Altogether, our data clearly demonstrate that time matters and that the testing time should definitely be taken into account in the experimental design and statistical analysis!
 Chesler, E. J., Wilson, S. G., Lariviere, W. R., Rodriguez-Zas, S. L., & Mogil, J. S. Identification and ranking of genetic and laboratory environment factors influencing a behavioral trait, thermal nociception, via computational analysis of a large data archive. Neuroscience & Biobehavioral Reviews, 2002; 26: 907-923.