Discovering human germ cell mutagens with whole genome sequencing: Insights from power calculations reveal the importance of controlling for between-family variability
- DOI
- Language of the publication
- English
- Date
- 2018-04-20
- Type
- Article
- Author(s)
- Webster, R.J.
- Williams, A.
- Marchetti, F.
- Yauk, C.L.
- Publisher
- Elsevier
Abstract
Mutations in germ cells pose potential genetic risks to offspring. However, de novo mutations are rare events that are spread across the genome and are difficult to detect. Thus, studies in this area have generally been under-powered, and no human germ cell mutagen has been identified. Whole Genome Sequencing (WGS) of human pedigrees has been proposed as an approach to overcome these technical and statistical challenges. WGS enables analysis of a much wider breadth of the genome than traditional approaches. Here, we performed power analyses to determine the feasibility of using WGS in human families to identify germ cell mutagens. Different statistical models were compared in the power analyses (ANOVA and multiple regression for one-child families, and mixed effect model sampling between two to four siblings per family). Assumptions were made based on parameters from the existing literature, such as the mutation-by-paternal age effect. We explored two scenarios: a constant effect due to an exposure that occurred in the past, and an accumulating effect where the exposure is continuing. Our analysis revealed the importance of modeling inter-family variability of the mutation-by-paternal age effect. Statistical power was improved by models accounting for the family-to-family variability. Our power analyses suggest that sufficient statistical power can be attained with 4–28 four-sibling families per treatment group, when the increase in mutations ranges from 40 to 10% respectively. Modeling family variability using mixed effect models provided a reduction in sample size compared to a multiple regression approach. Much larger sample sizes were required to detect an interaction effect between environmental exposures and paternal age. These findings inform study design and statistical modeling approaches to improve power and reduce sequencing costs for future studies in this area.
Plain language summary
Health Canada (HC) is responsible for assessing the health risks posed by chemical exposures. In support of its mandate, HC works to develop and optimize methods that can be used to identify/characterize chemically induced health effects. HC is pioneering the use of genomic technologies to determine if there are environmental exposures that cause genetic changes in parents that can be transmitted to their children. Each human cell contains a genome with a set of genetic instructions. If environmental exposures induce genetic changes (mutations) in sperm or eggs, these may be passed on to the child at conception. Offspring with more mutations have higher rates of genetic diseases. HC is exploring whether whole genome sequencing can be used to compare genetic information of father-mother-offspring trios to detect unexpected genetic changes. Before this work can be done, analyses are required to determine the number of children/families required to detect statistically significant changes in mutations in an exposed group. Such analyses, termed power calculations, are also used to identify the most appropriate statistical model. In this study, the results showed that when both paternal age and between-family variability in mutation rates is statistically controlled for, the effects of an exposure can be detected with 4 to 28 families per treatment group, using a ‘generalised mixed model’. The most powerful models sampled four siblings per family. Using two siblings also controls a modest amount of family variability, providing more power and reducing the samples sizes compared to a study design that only uses one child per family. Overall, this study shows that genome sequencing is a feasible approach to identify environmental chemicals that affect heritable mutation rates. The results are being used to inform the experimental design for sequencing studies aiming to identify environmental exposures that cause heritable effects in humans.
Subject
- Health,
- Health and safety