Nature | Vol 588 | 10 December 2020 | 277

Article

Multiple wheat genomes reveal global 
variation in modern breeding

Sean Walkowiak1,2,41, Liangliang Gao3,41, Cecile Monat4,41, Georg Haberer5,  
Mulualem T. Kassa6, Jemima Brinton7, Ricardo H. Ramirez-Gonzalez7, Markus C. Kolodziej8, 
Emily Delorean3, Dinushika Thambugala9, Valentyna Klymiuk1, Brook Byrns1,  
Heidrun Gundlach5, Venkat Bandi10, Jorge Nunez Siri10, Kirby Nilsen1,11, Catharine Aquino12, 
Axel Himmelbach4, Dario Copetti13,14, Tomohiro Ban15, Luca Venturini16, Michael Bevan7, 
Bernardo Clavijo17, Dal-Hoe Koo3, Jennifer Ens1, Krystalee Wiebe1, Amidou N’Diaye1,  
Allen K. Fritz3, Carl Gutwin10, Anne Fiebig4, Christine Fosker17, Bin Xiao Fu2,  
Gonzalo Garcia Accinelli17, Keith A. Gardner18, Nick Fradgley18, Juan Gutierrez-Gonzalez19, 
Gwyneth Halstead-Nussloch13, Masaomi Hatakeyama12,13, Chu Shin Koh20, Jasline Deek21, 
Alejandro C. Costamagna22, Pierre Fobert6, Darren Heavens17, Hiroyuki Kanamori23,  
Kanako Kawaura15, Fuminori Kobayashi23, Ksenia Krasileva17, Tony Kuo24,25, Neil McKenzie7, 
Kazuki Murata26, Yusuke Nabeka26, Timothy Paape13, Sudharsan Padmarasu4,  
Lawrence Percival-Alwyn18, Sateesh Kagale6, Uwe Scholz4, Jun Sese25,27, Philomin Juliana28, 
Ravi Singh28, Rie Shimizu-Inatsugi13, David Swarbreck17, James Cockram18, Hikmet Budak29, 
Toshiaki Tameshige15, Tsuyoshi Tanaka23, Hiroyuki Tsuji15, Jonathan Wright17, Jianzhong Wu23, 
Burkhard Steuernagel7, Ian Small30, Sylvie Cloutier31, Gabriel Keeble-Gagnère32,  
Gary Muehlbauer19, Josquin Tibbets32, Shuhei Nasuda26, Joanna Melonek30, Pierre J. Hucl1, 
Andrew G. Sharpe20, Matthew Clark16, Erik Legg33, Arvind Bharti33, Peter Langridge34, 
Anthony Hall17, Cristobal Uauy7, Martin Mascher4,35, Simon G. Krattinger8,36,  
Hirokazu Handa23,37, Kentaro K. Shimizu13,15, Assaf Distelfeld38, Ken Chalmers34,  
Beat Keller8, Klaus F. X. Mayer5,39, Jesse Poland3, Nils Stein4,40, Curt A. McCartney9 ✉, 
Manuel Spannagl5 ✉, Thomas Wicker8 ✉ & Curtis J. Pozniak1 ✉

Advances in genomics have expedited the improvement of several agriculturally 
important crops but similar efforts in wheat (Triticum spp.) have been more 
challenging. This is largely owing to the size and complexity of the wheat genome1, 
and the lack of genome-assembly data for multiple wheat lines2,3. Here we generated 
ten chromosome pseudomolecule and five scaffold assemblies of hexaploid wheat to 
explore the genomic diversity among wheat lines from global breeding programs. 
Comparative analysis revealed extensive structural rearrangements, introgressions 
from wild relatives and differences in gene content resulting from complex breeding 
histories aimed at improving adaptation to diverse environments, grain yield and 
quality, and resistance to stresses4,5. We provide examples outlining the utility of these 
genomes, including a detailed multi-genome-derived nucleotide-binding leucine-rich 
repeat protein repertoire involved in disease resistance and the characterization of 
Sm16, a gene associated with insect resistance. These genome assemblies will provide 
a basis for functional gene discovery and breeding to deliver the next generation of 
modern wheat cultivars.

Wheat is a staple food across all parts of the world and is one of the 
most widely grown and consumed crops7. As the human population 
continues to grow, wheat production must increase by more than 50% 
over current levels by 2050 to meet demand7. Efforts to increase wheat 
production may be aided by comprehensive genomic resources from 
global breeding programs to identify within-species allelic diversity and 
determine the best allele combinations to produce superior cultivars2,8.

Two species dominate current global wheat production: allotetra-
ploid (AABB) durum wheat (Triticum turgidum ssp. durum), which 
is used to make couscous and pasta9, and allohexaploid (AABBDD) 

bread wheat (Triticum aestivum), used for making bread and noodles. 
A, B and D in these designations correspond to separate subgenomes 
derived from three ancestral diploid species with similar but distinct 
genome structure and gene content that diverged between 2.5 and 
6 million years ago10. The large genome size (16 Gb for bread wheat), 
high sequence similarity between subgenomes and abundance of 
repetitive elements (about 85% of the genome) hampered early wheat 
genome-assembly efforts3. However, chromosome-level assemblies 
have recently become available for both tetraploid11,12 and hexaploid 
wheat1,13. Although these genome assemblies are valuable resources, 

https://doi.org/10.1038/s41586-020-2961-x

Received: 3 April 2020

Accepted: 9 September 2020

Published online: 25 November 2020

Open access

 Check for updates

*A list of affiliations appears at the end of the paper.

https://doi.org/10.1038/s41586-020-2961-x
http://crossmark.crossref.org/dialog/?doi=10.1038/s41586-020-2961-x&domain=pdf


278 | Nature | Vol 588 | 10 December 2020

Article
they do not fully capture within-species genomic variation that can be 
used for crop improvement, and comparative genome data from mul-
tiple individuals is still needed to expedite bread wheat research and 
breeding. Until now, comparative genomics of multiple bread wheat 
lines have been limited to exome-capture sequencing4,5,14, low-coverage 
sequencing2 and whole-genome scaffolded assemblies13,15–17. Here we 
report multiple reference-quality genome assemblies and explore 
genome variation that, owing to past breeder selection, differs greatly 
between bread wheat lines. These genome assemblies usher a new era 
for bread wheat and equip researchers and breeders with the tools 
needed to improve bread wheat and meet future food demands.

Global variation in wheat genomes
To expand on the genome assembly of wheat for Chinese Spring1, we 
generated ten reference-quality pseudomolecule assemblies (RQAs) 
and five scaffold-level assemblies of hexaploid wheat (Supplemen-
tary Note 1, Supplementary Tables 1–3). For each RQA, we performed 
de novo assembly of contigs (contig N50 > 48 kb) that were combined 
into scaffolds (N50 > 10 Mb) spanning more than 14.2 Gb (Supplemen-
tary Note 1). The completeness of the genomes was supported by a 
universal single-copy orthologue (BUSCO) analysis that identified more 
than 97% of the expected gene content in each genome (Supplemen-
tary Note 1). More than 94% of the scaffolds were ordered, oriented 
and curated using 10X Genomics linked reads and three-dimensional 
chromosome conformation capture sequencing (Hi-C) to generate 
21 pseudomolecules, as done previously for wheat1,12 and barley (Hor-
deum vulgare)18. The size and structure of the genomes were similar to 
that of Chinese Spring, and we observed high collinearity between the 
pseudomolecules (Extended Data Fig. 1). We also independently vali-
dated the scaffold placement and orientation in the pseudomolecule 
assembly of CDC Landmark by Oxford Nanopore long-read sequencing 
(Extended Data Fig. 2a, Supplementary Note 2). To complement the 
RQAs, we generated scaffold-level assemblies of five additional bread 
wheat lines (Supplementary Note 1). To determine the global context 
of the 15 assemblies, we combined our data with existing datasets4,5,19 
(Fig. 1a, Supplementary Table 4). The genetic relationships were in 
agreement with those reported in previous studies4,5 and reflected 
pedigree, geographical location and growth habit (that is, spring ver-
sus winter type). There was also a clear separation between the newly 
assembled genomes and Chinese Spring, supporting that they capture 
geographical and historical variation not represented in the Chinese 
Spring assembly.

Polyploidy and CNV drive gene diversification
Single-nucleotide polymorphisms (SNPs), insertions or deletions 
(indels), presence/absence variation (PAV) and gene copy number varia-
tion (CNV) influence agronomically important traits. This is particularly 
true for polyploid species such as wheat, in which gene redundancy 
can buffer the effect of genome variation17. To assess gene content, we 
projected around 107,000 high-confidence gene models from Chinese 
Spring1 onto the RQAs (Supplementary Note 3). The total number of 
projected genes exhibited a narrow range, between 118,734 and 120,967 
(Supplementary Table 5). We identified orthologous groups among 
projected genes and used the alignment of the orthologous groups to 
examine SNPs in coding sequences (Supplementary Note 3). The peak 
positions of nucleotide diversity across the three subgenomes were 
highly similar to those reported in previous studies20, supporting a 
strong representation of breeding diversity within the RQAs (Extended 
Data Fig. 3a, b). The correlation of synonymous nucleotide diversity π 
(r = 0.11–0.29) and Tajima’s D (r = 0.02–0.06) between homeologues 
was low (Supplementary Tables 6–8). This suggested that polyploidiza-
tion increased the number of targets of selection and contributed to 
broad adaptation of bread wheat, as in wild polyploid plant species20–22. 

Further investigation of orthologous groups indicated that 88.1% were 
unambiguous (clusters containing at most one member in each cultivar) 
(Extended Data Fig. 3c, Supplementary Table 5). Orthologous groups 
comprising exactly one gene in each line (‘complete’) were the most 
frequent (approximately 73.5% of genes per cultivar), suggesting strong 
retention of orthologous genes within the ten RQAs. The residual genes 
represented either singleton genes with no reciprocal best BLAST hits 
or genes located in complex clusters in at least one cultivar. Roughly 
12% of genes showed PAVs, and their clustering resulted in relationships 
(Fig. 1b) that were consistent with SNP-based phylogenetic similarities 
(Fig. 1a). In addition, approximately 26% of the projected genes were 
found in tandem duplications, indicating that CNV is a strong contribu-
tor of genetic variation in wheat.

To provide an example of gene expansion on emerging breeding 
targets, we performed a more detailed analysis of the restorer of fertil-
ity (Rf) gene families (Supplementary Note 4). Rf genes are involved in 
restoring pollen fertility in hybrid breeding programs23, and we iden-
tified a previously undescribed clade within the mitochondrial tran-
scription termination factor (mTERF) family (Supplementary Table 9), 
which has recently been implicated in fertility restoration in barley24. Of 
note, this clade shows evolutionary patterns similar to those of Rf-like 
pentatricopeptide repeat (PPR) proteins, representatives of which 
are associated with Rf3, a major locus used in hybrid wheat breeding 
programs (Extended Data Fig. 4). Although wheat is currently not a 
hybrid crop, there is substantial interest in Rf genes and their potential 
application in hybrid wheat production systems25. To our knowledge, 
no Rf genes have been cloned in wheat and our analysis of Rf genes in 
multiple RQAs and identification of an Rf clade in wheat is an impor-
tant step forward in tackling the challenges of hybrid wheat breeding.

The wheat NLR repertoire
To further exemplify the use of multi-genome comparisons for char-
acterizing agronomically relevant gene families, we examined gene 
expansion in nucleotide-binding leucine-rich repeat (NLR) proteins, 
which are major components of the innate immune system and are 
often causal genes for disease resistance in plants26,27. We performed 
de novo annotation of loci that contain conserved NLR motifs (NB-ARC–
leucine-rich repeat) and identified around 2,500 loci with NLR signa-
tures in each RQA (Supplementary Tables 10, 11). A redundancy analysis 
showed that only 31–34% of the NLR signatures are shared across all 
genomes, and the number of unique signatures ranged from 22 to 192 
per wheat cultivar. We estimated the number of unique NLR signatures 
that can be detected by incrementally adding more wheat genomes to 
the dataset; this revealed that 90% of the NLR complement is reached 
at between 8 (considering 95% sequence identity) and 11 wheat lines 
(considering 100% protein sequence identity) (Fig. 1c). The total NLR 
complement of all wheat lines consisted of 5,905 (98% identity) to 7,780 
(100% identity) unique NLR signatures, highlighting the size and com-
plexity of the repertoire of receptors involved in disease resistance.

Transposon signatures identify introgressions
Transposable elements make up a large majority of the wheat genome 
and have a critical role in genome structure and gene regulation. We 
characterized the overall transposable element content (81.6%) and its 
composition (69% long terminal-repeat retrotransposons (LTR) and 
12.5% DNA transposons) in the RQAs (Supplementary Table 5). Across all 
RQAs, we annotated 1.22 × 106 full length (fl)-LTRs, which clustered lines 
into the same groups we observed from our analysis of PAV and SNPs 
(Fig. 1a, b, Extended Data Fig. 3d). Generally, unique fl-LTRs (147,450) 
were young (median of 0.9 million years) and were enriched in the 
highly recombining, more distal chromosomal regions (Fig. 1d). By con-
trast, shared fl-LTRs were older (median of 1.3 million years) and were 
more evenly distributed across the pericentric regions (Fig. 1d). The 


Nature | Vol 588 | 10 December 2020 | 279

RLC-Angela fl-LTRs were the most abundant (21,000–27,000 full-length 
copies per genome) and analysis of variant patterns identified several 
chromosomal segments that contained numerous unique or rare ret-
rotransposon insertions (Extended Data Fig. 5), which, on the basis 
of breeding history, we hypothesize to represent introgressions. For 
example, the LongReach Lancer RQA revealed two unique regions, a 
pericentric region on chromosome 2B and a segment on the end of 
chromosome 3D (Fig. 2a, b), both of which affect chromosome length 
(Extended Data Fig. 5). We used pedigree analysis to postulate the 
source of the introgressions and performed whole-genome sequenc-
ing of multiple accessions of putative donors. LongReach Lancer carries 
the stem rust resistance gene Sr36, derived from an introgression from 
Triticum timopheevii, and the resistance genes Lr24 (leaf rust) and Sr24 
(stem rust), derived from tall wheatgrass28,29 (Thinopyrum ponticum). 
We generated whole-genome sequence reads from multiple T. ponticum 
and T. timopheevii accessions (Supplementary Table 12) and alignment 
to the LongReach Lancer RQA confirmed a T. ponticum introgression 
spanning a region of approximately 60 Mb of chromosome 3D (Fig. 2a), 
whereas T. timopheevii aligned to the majority (427 Mb) of chromo-
some 2B (Fig. 2b). Overall, we identified 341 chromosomal segments 
larger than 20 Mb with unique or rare fl-LTR insertion patterns that 
were present in only 1 to 4 of the RQA genomes, of which 273 insertion 

patterns were uniquely associated with a single genome (Supplemen-
tary Tables 13–16). The majority of unique regions were in PI190962 
(spelt wheat; Triticum aestivum ssp. spelta), which was expected, given 
that it diverged from modern bread wheat several thousand years ago.

A similar strategy was used to confirm RLC-Angela variation at the 
telomeric region of chromosome 2A in Jagger, Mace, SY Mattis and 
CDC Stanley (Fig. 2c), which corresponds to the 2NvS introgression 
from Aegilops ventricosa (Supplementary Note 5). This introgression 
is a well-known source of resistance to wheat blast30, and contains the 
Lr37–Yr17–Sr38 gene cluster, which provides resistance to several rust 
diseases31. Sequencing of A. ventricosa accessions (Supplementary 
Table 12) followed by comparison of chromosomes with the RQAs con-
firmed that Jagger, Mace, SY Mattis and CDC Stanley carry the 2NvS 
introgression, which spans about 33 Mb on chromosome 2A (Fig. 2c, 
Extended Data Fig. 6a). We annotated the coding genes within this 
region and identified 535 high-confidence genes; more than 10% were 
predicted to be associated with disease resistance, including genes that 
encode putative NB-ARC and NLRs (Extended Data Fig. 6b, Supplemen-
tary Tables 17, 18). Furthermore, we used genotyping by sequencing to 
detect the 2NvS segment in three wheat panels and discovered that its 
frequency has been increasing in breeding germplasm and its pres-
ence is consistently associated with higher grain yield (Extended Data 

Unique

2

3

4

5

6

7

8

9

10

11

LTR-retrotransposon density

Low                      

In
se

rt
io

n 
tim

e 
(M

yr
)

Chromosomal location (% length)
0

P
rin

ci
p

al
 c

om
p

on
en

t 
2

Principal component 1
U

ni
q

ue
 N

LR
s

2,000

5,000

8,000

0              

Number of lines

100%

98%

95%

Chinese Spring

Norin 61

Jagger

Julius

ArinaLrFor

SY Mattis

LongReach Lancer

PI190962 (spelt wheat)

CDC Stanley

CDC Landmark

Mace

Similarity in PAV

                          92

2
0
2
0
2
0
2
0
2
0
2
0
2
0
2
0
2
0
2
0
2
0

Norin 61

Chinese
Spring

Mace

LongReach Lancer

Robigus Julius

SY Mattis

Claire

ArinaLrFor

Paragon

Cadenza

Jagger

CDC Stanley

CDC Landmark

Weebill 1

PI190962
(spelt wheat)

Sequence
identity

a b

c d

5             10            15

  High

N
um

b
er

 o
f l

in
es

100755025

88

Fig. 1 | Patterns of variation in the wheat genome. a, Principal component 
analysis of polymorphisms from exome-capture sequencing of about 1,200 
lines (grey markers), 16 lines from whole-genome shotgun resequencing 
(orange markers) and our new assemblies (black markers). Text colours reflect 
different geographical locations and winter or spring growth. b, Dendrogram 
of pairwise Jaccard similarities for gene PAV between all RQA assemblies.  
c, Number of unique NLRs at different per cent identity cut-offs as the number 

of genomes increases. Dashed vertical lines represent 90% of the NLR 
complement. Markers indicate the mean values of all permutations of the order 
of adding genomes. Whiskers show maximum and minimum values based on 
one million random permutations. d, Chromosomal location versus insertion 
age distribution of unique to (reading downward) increasingly shared syntenic 
full-length LTR retrotransposons.


280 | Nature | Vol 588 | 10 December 2020

Article

Fig. 6c, d, Supplementary Tables 19, 20). Of note, we identified about 
60 genes belonging to the cytochrome P450 superfamily, which have 
been implicated in abiotic and biotic stress tolerance32 and have been 
functionally validated to influence grain yield in wheat33. Together, 
these data indicate that the modern wheat gene pool contains many 
chromosomal segments of diverse ancestral origins, which can be iden-
tified by their transposable-element signatures. We also confirmed the 
wild-relative origins of three introgressions within the RQA assemblies—
a first step towards characterizing causal genes for breeding targets, 
such as resistance to wheat blast and rust fungi.

Centromere dynamics
Centromeres are vital for cell division and chromosome pairing during 
meiosis. In plants, functional centromeres are defined by the epige-
netic placement of the modified histone CENH334. We therefore used 

CENH3 chromatin immunoprecipitation and sequencing (ChIP–seq)35 
to determine the positions and sizes (about 7.5–9.6 Mb) of the cen-
tromeres for each RQA (Supplementary Tables 21, 22), which were 
consistent with previous estimates for wheat1. Furthermore, all chro-
mosomes showed a single active site, implying that previous reports 
of multiple active centromeres in Chinese Spring1 were artefacts of 
misoriented scaffolds. However, we found examples in which the rela-
tive position of the centromere was shifted owing to several pericentric 
inversions, including inversions on chromosomes 4B and 5B (Extended 
Data Fig. 7a, b). We also observed one instance in which the centro-
meric position changed, but was not associated with a structural event. 
Specifically, on chromosome 4D in Chinese Spring, the centromere is 
shifted by around 25 Mb relative to the consensus position (Fig. 2d). 
This shift was previously recognized by cytology but was hypothesized 
to result from a pericentric inversion36. However, the high degree of 
collinearity between genomes supports the hypothesis that Cen4D in 

a e

b

c f

d g

Jagger chromosome 2A

LongReach Lancer chromosome 2B

LongReach Lancer chromosome 3D

T. timopheevii 

T. ponticum 

A. ventricosa

i

ii
iii
iv

i

ii
iii

iv

i

ii
iii

iv

No. of lines

RLC_Angela

1    2   3   4

Read depth

Min.                Max.

Ju
liu

s 
ch

ro
m

os
om

e 
4D

p
os

iti
on

 (M
b

)

Centromere
shift

Chinese Spring chromosome 4D
position (Mb)

0      100     200     300     400     500

0

100

200

300

400

500

Chinese Spring (5B/7B non-carrier)

ArinaLrFor (5B/7B carrier)

762                              325  305                              1

1           166 222                                       737

1                                 428 480                                        993

1            157 174                            488

7BL 7BS

5BS                               5BL

Translocation breakpoint

7BL                                         5BL

5BS                       7BS

SY Mattis cytology (5B/7B carrier)

7BL/5BL

7BS/5BS

10 μm

6B

3A

2B

7A
7D 7B

1B

3D

2D

5D

3B

4D

C
hr

om
os

om
e 

7B
 (M

b
)

Chromosome 5B (Mb)
100  120   140  160  180  200

SY Mattis Hi-C
(5B/7B carrier)

Norin 61 Hi-C
(5B/7B non-carrier)

400

350

300

250

200
100  120   140  160  180  200

Fig. 2 | Introgressions and large-scale structural variation in wheat.  
a–c, T. ponticum introgression on chromosome 3D in LongReach Lancer (a),  
T. timopheevi introgression on chromosome 2B in LongReach Lancer (b) and  
A. ventricosa introgression on chromosome 3D in Jagger (c). Track i, map of 
polymorphic RLC-Angela retrotransposon insertions (legend at bottom); track 
ii, density of projected gene annotations from Chinese Spring (blue bars, 
scaled to maximum value); track iii, per cent identity to Chinese Spring based 
on chromosome alignment (yellow; scale is 0–100%); track iv, read depth of 

wheat wild relatives (blue–yellow heat map; legend at bottom). d, Dot plot 
alignment showing chromosome-level collinearity (black) with relative density 
of CENH3 ChIP–seq mapped to 100-kb bins for Chinese Spring (blue) and Julius 
(red); the arrow indicates a centromere shift. e, Robertsonian translocation 
between chromosomes 5B and 7B in ArinaLrFor. f, g, Cytology (f) and Hi-C (g) 
confirm the 5B/7B translocation in SY Mattis (left) compared with the 
non-carrier Norin 61 (right). In f, five independent cells were observed; the 
translocation was confirmed independently ten times. Scale bar, 10 μm.


Nature | Vol 588 | 10 December 2020 | 281

Chinese Spring has shifted to a non-homologous position; this shifting 
of centromeres to non-homologous sites has also been reported in 
maize37. By characterizing the centromere positions for these diverse 
wheat lines, we provide strong evidence for changes in centromere 
position caused by structural rearrangements and centromere shifts.

Large-scale structural variation between genomes
Structural variants are common in wheat38, and impact genome struc-
ture and gene content. We characterized large structural variants 
using pairwise genome alignments (Extended Data Fig. 1), changes in 
three-dimensional topology of chromosomes revealed by Hi-C confor-
mation capture directionality biases along the genome39,40 (Extended 
Data Fig. 8, Supplementary Table 23), which were confirmed by Oxford 
Nanopore long-read sequencing (Extended Data Fig. 2) and cytological 
karyotyping (Extended Data Fig. 7c, Supplementary Table 24, Sup-
plementary Note 6). The most prominent event was a translocation 
between chromosomes 5B and 7B, observed in ArinaLrFor, SY Mattis 
(Fig. 2e–g) and Claire. Normally, chromosomes 5B and 7B are approxi-
mately 737 and 762 Mb long, respectively, and we estimated that the 
recombined chromosomes are 488 Mb (5BS/7BS) and 993 Mb (7BL/5BL) 
long, making 7BL/5BL the largest wheat chromosome (Extended Data 
Fig. 9a). In ArinaLrFor and SY Mattis, the 7BL/5BL breakpoint resides 
within an approximately 5-kb GAA microsatellite, which we were 
able to span using polymerase chain reaction (PCR) (Extended Data 
Fig. 9b, c). By contrast, the breakpoint on 5BS/7BS was less syntenic, 
and we detected polymorphic fluorescence in situ hybridization signals 
between ArinaLrFor and SY Mattis on the 5BS portion of the translo-
cated chromosome segment, suggesting that the regions adjacent to 
the translocation events differ on 5BS/7BS (Supplementary Note 6). 
To determine the stability of the translocation in breeding, we geno-
typed for the translocation event in a panel of 538 wheat lines that 

represent most of the UK wheat gene pool grown since the 1920s41. 
The translocation occurred in 66% of the lines and was selectively neu-
tral (Supplementary Note 7). Notably, the Ph1 locus on chromosome 
5B, which controls the pairing of homeologous chromosomes during 
meiosis42, is near the translocation breakpoint, but remained highly 
syntenic between translocation carriers and non-carriers. Genetic 
mapping and analysis of short-read sequencing data indicated that 
the 5B/7B translocated chromosomes recombine freely with 5B and 7B 
chromosomes (Extended Data Fig. 9d), suggesting that chromosome 
pairing is not affected by the translocation.

Haplotype-based gene mapping
To develop improved wheat cultivars, breeders shuffle allelic vari-
ants by making targeted crosses and exploiting the recombination 
that occurs during meiosis. These alleles, however, are not inherited 
independently, but rather as haplotype blocks that often extend 
across multiple genes that are in genetic linkage43,44. We quantified 
haplotype variation along chromosomes across the assemblies, and 
developed visualization software to support its utility (Supplemen-
tary Note 8). We used these haplotypes to characterize a locus that 
provides resistance to the orange wheat blossom midge (OWBM, Sito-
diplosis mosellana Géhin), one of the most damaging insect pests of 
wheat, which is endemic in Europe, North America, west Asia and the 
Far East. Upon hatching, the first-instar larvae feed on the developing 
grains and damage the kernels (Fig. 3a). Sm1 is the only gene in wheat 
known to provide resistance to OWBM6. CDC Landmark, Robigus and 
Paragon are all resistant to the OWBM, and all three carry the same 
7.3-Mb haplotype within the Sm1 locus on chromosome 2B (Fig. 3b). 
To identify Sm1 gene candidates, we used high-resolution genetic 
mapping and refined the locus to a 587-kb interval in the CDC Land-
mark RQA (Fig. 3c, Extended Data Fig. 10a, Supplementary Table 25). 

Sm1

800 Mb

25 Mb

Sm1

15
.7

 M
b

17
.0

 M
b

15
.2

 M
b

15
.7

 M
b

1  Sm1 carrier

2  Sm1 non-carrier

3  Sm1 non-carrier

Haplotypes

Chinese Spring
(Sm1 non-carrier)

CDC Landmark
(Sm1 carrier)

Alternative haplotype         

1  Sm1 carrier

2  Sm1 non-carrier

3  Sm1 non-carrier
      (that is, Waskada)

G
1

8
2R

W
98

*

MSP

ca b CDC Landmark
Paragon
Robigus

Mace
Claire

CDC Stanley
Chinese Spring

Weebill 1
Norin 61
Cadenza

Julius
ArinaLrFor

LongReach Lancer
SY Mattis

Jagger

CDC Landmark
Paragon
Robigus

Mace
Claire

CDC Stanley
Chinese Spring

Weebill 1
Norin 61
Cadenza

Julius
ArinaLrFor

LongReach Lancer
SY Mattis

Jagger

0 Mb

5 Mb

NB-ARC LRR S/T kinase

Transmembrane Mutations

A
d

ul
t

La
rv

ae
H

ea
lth

y
D

am
ag

ed

Fig. 3 | Cloning of the gene Sm1. a, The orange wheat blossom midge oviposits 
eggs on wheat spikes and the larvae feed on developing wheat grains, resulting 
in moderate to severe damage to mature kernels. b, Top, sections of 
chromosome 2B of the same colour in the same position share haplotypes 
(based on 5-Mb bins), with the exception of those in grey, which indicates a 
line-specific haplotype. The position of Sm1 is indicated with respect to the 
CDC Landmark assembly. Bottom, zoomed-in view of haplotype blocks (based 
on 250-kb bins) from 5 to 25 Mb positions on chromosome 2B, surrounding 
Sm1. CDC Landmark, Robigus and Paragon all carry the same haplotype 

surrounding Sm1 (teal). c, Top, anchoring of the Sm1 fine map to the physical 
maps of Chinese Spring and CDC Landmark and graphical genotypes of three 
haplotypes critical to localizing the Sm1 candidate gene. Bottom, annotation of 
the Sm1 candidate gene, which encodes NB-ARC and LRR motifs in addition to 
the integrated serine/threonine (S/T) kinase and MSP domains. Two 
independent ethyl-methanesulfonate-induced mutations (W98* and G182R) 
result in loss of function and susceptibility to the orange wheat blossom midge 
(light blue lines). An alternative haplotype was observed in the kinase region of 
Waskada (black).


282 | Nature | Vol 588 | 10 December 2020

Article
Through extensive genotyping of diverse breeding lines, we found an 
OWBM-susceptible line, Waskada, that displayed a resistant haplo-
type except near one gene, which we annotated in CDC Landmark to 
encode a canonical NLR with kinase and major sperm protein (MSP) 
integrated domains (Fig. 3c). Oxford Nanopore long-read sequenc-
ing further confirmed the structure of the gene in CDC Landmark 
(Extended Data Fig. 10b). By contrast, the remaining assemblies (sus-
ceptible to OWBM) lacked the NB-ARC domain, but the kinase and MSP 
domains remained intact (Fig. 3c). We sequenced the Waskada allele 
and found it contains the NB-ARC domain, but an alternative haplotype 
within the kinase domain (Fig. 3c, Extended Data Fig. 10c). This gene 
is expressed in wheat kernels and seedlings of Sm1 carrier lines, and 
the lack of cDNA amplification of the NB-ARC domain for non-carrier 
lines further supported an alternative gene structure (Extended Data 
Fig. 10c). We generated two knockout-mutant lines of this candidate 
gene in the Sm1 carrier line Unity45, and both were consistently rated 
as susceptible to OWBM (Supplementary Table 26). Sequencing of the 
candidate gene in these two mutants revealed a single point mutation 
in each line: a G>A mutation resulting in a Gly>Arg (G182R) amino acid 
substitution in the NB-ARC domain, and a G>A mutation, resulting in 
a stop codon (W98*) before the NB-ARC domain (Fig. 3c). The kinase 
domain encoded by Sm1 belongs to the serine/threonine class46, similar 
to those of Rpg5, which provides stem rust resistance47, and Tsn1, which 
encodes sensitivity to the necrotrophic effector ToxA produced by Par-
astagonospora nodorum and Pyrenophora tritici-repentis48; however, 
both Rpg5 and Tsn1 lack the MSP domain. To our knowledge, this is the 
first report of an NB-ARC-LRR-kinase-MSP coding gene associated with 
insect resistance. Additional research is needed to functionally validate 
these domains and their putative role in OWBM resistance using tools 
such as gene editing. Nevertheless, we developed a high-throughput 
and low-cost competitive allele-specific PCR marker (KASP) that dis-
criminates between OWBM-susceptible and OWBM-resistant lines with 
perfect accuracy (Extended Data Fig. 10d, Supplementary Table 27). 
Our analyses, along with the haplotype and synteny viewers (https://
kiranbandi.github.io/10wheatgenomes/, http://10wheatgenomes.
plantinformatics.io/ and http://www.crop-haplotypes.com/), laid 
the foundation for identifying haplotypes for Sm1. Haplotypes can 
now be genotyped in breeding programs using single-marker or 
high-throughput-sequencing-based approaches, which can integrate 
desirable genes into improved cultivars more efficiently.

Discussion
We have built on the genome-sequence resources available for wheat 
and related species to produce ten RQAs and five scaffolded assem-
blies that represent hexaploid wheat lines from different regions, 
growth habits and breeding programs1,11,12,18,20,49. We have identified 
and characterized SNPs, PAV, CNV, centromere shifts, large-scale 
structural variants and introgressions from wild relatives of wheat 
that can be used to identify and characterize important breeding 
targets. This was complemented by a transposable-element-analysis 
approach to identify candidate introgressions from wild relatives of 
wheat, for which we provided high-quality assemblies of segments 
already used in global breeding programs. Together, these RQAs 
present an opportunity for breeders and researchers to perform 
high-resolution manipulation of genomic segments and pave the 
way to identifying genes responsible for in-demand traits, as we 
demonstrated for resistance to the insect pest OWBM. Functional 
gene studies will also be facilitated by comparative gene analyses, 
as exemplified by our analyses of orthologous groups, Rf genes and 
NLR immune receptors26. Finally, we highlight haplotype blocks, 
which will facilitate marker development for applied breeding43,50. 
Equipped with multiple layers of data describing variation in wheat, 
we now have powerful tools to increase the rate of wheat improve-
ment to meet future food demands.

Online content
Any methods, additional references, Nature Research reporting sum-
maries, source data, extended data, supplementary information, 
acknowledgements, peer review information; details of author con-
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-020-2961-x.

1. The International Wheat Genome Sequencing Consortium. Shifting the limits in wheat 
research and breeding using a fully annotated reference genome. Science 361, eaar7191 
(2018).

2. Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013 
(2017).

3. International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based 
draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 
1251788 (2014).

4. He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping 
the adaptive landscape of the wheat genome. Nat. Genet. 51, 896–904 (2019); correction 
51, 1194 (2019).

5. Pont, C. et al. Tracing the ancestry of modern bread wheats. Nat. Genet. 51, 905–911 
(2019).

6. Kassa, M. T. et al. A saturated SNP linkage map for the orange wheat blossom midge 
resistance gene Sm1. Theor. Appl. Genet. 129, 1507–1517 (2016).

7. Tadesse, W. et al. Genetic gains in wheat breeding and its role in feeding the world. Crop 
Breed. Genet. Genom. 1, e190005 (2019).

8. Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in 
cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).

9. Dubcovsky, J. & Dvorak, J. Genome plasticity a key factor in the success of polyploid 
wheat under domestication. Science 316, 1862–1866 (2007).

10. Marcussen, T. et al. Ancient hybridizations among the ancestral genomes of bread wheat. 
Science 345, 1250092 (2014).

11. Avni, R. et al. Wild emmer genome architecture and diversity elucidate wheat evolution 
and domestication. Science 357, 93–97 (2017).

12. Maccaferri, M. et al. Durum wheat genome highlights past domestication signatures and 
future improvement targets. Nat. Genet. 51, 885–895 (2019).

13. Zimin, A. V. et al. The first near-complete assembly of the hexaploid bread wheat 
genome, Triticum aestivum. Gigascience 6, 1–7 (2017).

14. Winfield, M. O. et al. Targeted re-sequencing of the allohexaploid wheat exome. Plant 
Biotechnol. J. 10, 733–742 (2012).

15. Arora, D., Gross, T. & Brueggeman, R. Allele characterization of genes required for 
rpg4-mediated wheat stem rust resistance identifies Rpg5 as the R gene. Phytopathology 
103, 1153–1161 (2013).

16. Adamski, N. M. et al. A roadmap for gene functional characterisation in crops with large 
genomes: lessons from polyploid wheat. eLife 9, e55646 (2020).

17. Uauy, C. Wheat genomics comes of age. Curr. Opin. Plant Biol. 36, 142–148 (2017).
18. Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley 

genome. Nature 544, 427–433 (2017).
19. Edwards, D. et al. Bread matters: a national initiative to profile the genetic diversity of 

Australian wheat. Plant Biotechnol. J. 10, 703–708 (2012).
20. Jordan, K. W. et al. A haplotype map of allohexaploid wheat reveals distinct patterns of 

selection on homoeologous genomes. Genome Biol. 16, 48 (2015).
21. Paape, T. et al. Patterns of polymorphism and selection in the subgenomes of the 

allopolyploid Arabidopsis kamchatica. Nat. Commun. 9, 3909 (2018).
22. Paape, T. et al. Conserved but attenuated parental gene expression in allopolyploids: 

Constitutive zinc hyperaccumulation in the allotetraploid Arabidopsis kamchatica. Mol. 
Biol. Evol. 33, 2781–2800 (2016).

23. Melonek, J., Stone, J. D. & Small, I. Evolutionary plasticity of restorer-of-fertility-like 
proteins in rice. Sci. Rep. 6, 35152 (2016).

24. Bernhard, T., Koch, M., Snowdon, R. J., Friedt, W. & Wittkop, B. Undesired fertility 
restoration in msm1 barley associates with two mTERF genes. Theor. Appl. Genet. 132, 
1335–1350 (2019).

25. Whitford, R. et al. Hybrid breeding in wheat: technologies to improve hybrid wheat seed 
production. J. Exp. Bot. 64, 5411–5428 (2013).

26. Keller, B., Wicker, T. & Krattinger, S. G. Advances in wheat and pathogen genomics: 
Implications for disease control. Annu. Rev. Phytopathol. 56, 67–87 (2018).

27. Steuernagel, B. et al. Rapid cloning of disease-resistance genes in plants using 
mutagenesis and sequence capture. Nat. Biotechnol. 34, 652–655 (2016).

28. Bariana, H. S. et al. Mapping of durable adult plant and seedling resistances to stripe rust 
and stem rust diseases in wheat. Aust. J. Agric. Res. 52, 1247–1255 (2001).

29. Chemayek, B. et al. Tight repulsion linkage between Sr36 and Sr39 was revealed by 
genetic, cytogenetic and molecular analyses. Theor. Appl. Genet. 130, 587–595 
(2017).

30. Cruz, C. D. et al. The 2NS translocation from Aegilops ventricosa confers resistance to the 
Triticum pathotype of Magnaporthe oryzae. Crop Sci. 56, 990–1000 (2016).

31. Helguera, M. et al. PCR assays for the Lr37-Yr17-Sr38 cluster of rust resistance genes and 
their use to develop isogenic hard red spring wheat lines. Crop Sci. 43, 1839–1847 (2003).

32. Li, Y. & Wei, K. Comparative functional genomics analysis of cytochrome P450 gene 
superfamily in wheat and maize. BMC Plant Biol. 20, 93 (2020).

33. Gunupuru, L. R. et al. A wheat cytochrome P450 enhances both resistance to 
deoxynivalenol and grain yield. PLoS ONE 13, e0204992 (2018).

34. Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in 
centromeric structure. Plant J. 73, 952–965 (2013).

35. Gent, J. I., Wang, K., Jiang, J. & Dawe, R. K. Stable patterns of CENH3 occupancy through 
maize lineages containing genetically similar centromeres. Genetics 200, 1105–1116 (2015).

https://kiranbandi.github.io/10wheatgenomes/
https://kiranbandi.github.io/10wheatgenomes/
http://10wheatgenomes.plantinformatics.io/
http://10wheatgenomes.plantinformatics.io/
http://www.crop-haplotypes.com/
https://doi.org/10.1038/s41586-020-2961-x


Nature | Vol 588 | 10 December 2020 | 283

36. Koo, D. H., Sehgal, S. K., Friebe, B. & Gill, B. S. Structure and stability of telocentric 
chromosomes in wheat. PLoS ONE 10, e0137747 (2015).

37. Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize 
centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016).

38. Saxena, R. K., Edwards, D. & Varshney, R. K. Structural variations in plant genomes. Brief. 
Funct. Genomics 13, 296–307 (2014).

39. Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of 
chromosomal rearrangements and copy number variation in human tumours. Genome 
Biol. 18, 125 (2017).

40. Himmelbach, A. et al. Discovery of multi-megabase polymorphic inversions by 
chromosome conformation capture sequencing in large-genome plant species. Plant J. 
96, 1309–1316 (2018).

41. Fradgley, N. et al. A large-scale pedigree resource of wheat reveals evidence for 
adaptation and selection by breeders. PLoS Biol. 17, e3000071 (2019).

42. Martín, A. C., Rey, M. D., Shaw, P. & Moore, G. Dual effect of the wheat Ph1 locus on 
chromosome synapsis and crossover. Chromosoma 126, 669–680 (2017).

43. Bevan, M. W. et al. Genomic innovation for crop improvement. Nature 543, 346–354 
(2017).

44. Luján Basile, S. M. et al. Haplotype block analysis of an Argentinean hexaploid wheat 
collection and GWAS for yield components and adaptation. BMC Plant Biol. 19, 553 (2019).

45. Fox, S. L. et al. Unity hard red spring wheat. Can. J. Plant Sci. 90, 71–78 (2010).
46. Hanks, S. K., Quinn, A. M. & Hunter, T. The protein kinase family: conserved features and 

deduced phylogeny of the catalytic domains. Science 241, 42–52 (1988).
47. Brueggeman, R. et al. The stem rust resistance gene Rpg5 encodes a protein with 

nucleotide-binding-site, leucine-rich, and protein kinase domains. Proc. Natl Acad. Sci. 
USA 105, 14970–14975 (2008).

48. Faris, J. D. et al. A unique wheat disease resistance-like gene governs effector-triggered 
susceptibility to necrotrophic pathogens. Proc. Natl Acad. Sci. USA 107, 13544–13549 
(2010).

49. Luo, M. C. et al. Genome sequence of the progenitor of the wheat D genome Aegilops 
tauschii. Nature 551, 498–502 (2017).

50. Borrill, P., Harrington, S. A. & Uauy, C. Applying the latest advances in genomics and 
phenomics for trait discovery in polyploid wheat. Plant J. 97, 56–72 (2019).

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 
4.0 International License, which permits use, sharing, adaptation, distribution 
and reproduction in any medium or format, as long as you give appropriate 

credit to the original author(s) and the source, provide a link to the Creative Commons license, 
and indicate if changes were made. The images or other third party material in this article are 
included in the article’s Creative Commons license, unless indicated otherwise in a credit line 
to the material. If material is not included in the article’s Creative Commons license and your 
intended use is not permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a copy of this license, 
visit http://creativecommons.org/licenses/by/4.0/.

© The Author(s) 2020

1Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan, 
Canada. 2Grain Research Laboratory, Canadian Grain Commission, Winnipeg, Manitoba, 
Canada. 3Department of Plant Pathology, Kansas State University, Manhattan, KS, USA. 
4Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 
Germany. 5Helmholtz Zentrum München—German Research Center for Environmental 
Health, Neuherberg, Germany. 6Aquatic and Crop Resource Development, National 
Research Council Canada, Saskatoon, Saskatchewan, Canada. 7John Innes Centre, Norwich 
Research Park, Norwich, UK. 8Department of Plant and Microbial Biology, University of 
Zurich, Zurich, Switzerland. 9Morden Research and Development Centre, Agriculture and 
Agri-Food Canada, Morden, Manitoba, Canada. 10Department of Computer Science, 
University of Saskatchewan, Saskatoon, Saskatchewan, Canada. 11Brandon Research and 
Development Centre, Agriculture and Agri-Food Canada, Brandon, Manitoba, Canada. 
12Genomics/Transcriptomics group, Functional Genomics Center Zurich, Zurich, 
Switzerland. 13Department of Evolutionary Biology and Environmental Studies, University of 
Zurich, Zurich, Switzerland. 14Institute of Agricultural Sciences, ETHZ, Zurich, Switzerland. 
15Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan. 16Life 
Sciences Department, Natural History Museum, London, UK. 17Earlham Institute, Norwich 
Research Park, Norwich, UK. 18The John Bingham Laboratory, NIAB, Cambridge, UK. 
19Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN, 
USA. 20Global Institute for Food Security, University of Saskatchewan, Saskatoon, 
Saskatchewan, Canada. 21School of Plant Sciences and Food Security, Tel Aviv University, 
Ramat Aviv, Israel. 22Department of Entomology, University of Manitoba, Winnipeg, 
Manitoba, Canada. 23Institute of Crop Science, NARO, Tsukuba, Japan. 24Centre for 
Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada. 25National Institute 
of Advanced Industrial Science and Technology (AIST), Tokyo, Japan. 26Laboratory of Plant 
Genetics, Graduate School of Agriculture, Kyoto University, Kyoto, Japan. 27Humanome Lab, 
Tokyo, Japan. 28Global Wheat Program, International Maize and Wheat Improvement Center 
(CIMMYT), Texcoco, Mexico. 29Montana BioAg, Missoula, MT, USA. 30Australian Research 
Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences, 
University of Western Australia, Perth, Western Australia, Australia. 31Ottawa Research and 
Development Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario, Canada. 
32Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia. 
33Syngenta, Durham, NC, USA. 34School of Agriculture, Food and Wine, University of 
Adelaide, Adelaide, South Australia, Australia. 35German Centre for Integrative Biodiversity 
Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany. 36Biological and Environmental 
Science & Engineering Division, King Abdullah University of Science and Technology, 
Thuwal, Saudi Arabia. 37Graduate School of Life and Environmental Sciences, Kyoto 
Prefectural University, Kyoto, Japan. 38Institute of Evolution and Department of Evolutionary 
and Environmental Biology, University of Haifa, Haifa, Israel. 39School of Life Sciences 
Weihenstephan, Technical University of Munich, Freising, Germany. 40Center for Integrated 
Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany. 
41These authors contributed equally: Sean Walkowiak, Liangliang Gao, Cecile Monat. 
✉e-mail: curt.mccartney@canada.ca; manuel.spannagl@helmholtz-muenchen.de; wicker@
botinst.uzh.ch; curtis.pozniak@usask.ca

http://creativecommons.org/licenses/by/4.0/
mailto:curt.mccartney@canada.ca
mailto:manuel.spannagl@helmholtz-muenchen.de
mailto:wicker@botinst.uzh.ch
mailto:wicker@botinst.uzh.ch
mailto:curtis.pozniak@usask.ca


Article
Methods

No statistical methods were used to predetermine sample size. The 
field experiments were randomized, but the wheat lines sequenced 
and assembled were not selected at random. The investigators were 
not blinded to allocation during experiments and outcome assessment.

Assemblies and annotation
Genome assemblies. We assembled the genomes of 15 diverse wheat 
lines using two approaches (Supplementary Table 1). The RQA approach 
used the DeNovoMAGIC v.3.0 assembly pipeline, previously used for 
the wild emmer wheat11, durum wheat12 and Chinese Spring RefSeqv1.0 
assemblies. In brief, high-molecular-weight DNA was extracted from 
wheat seedlings as described previously51. Illumina 450-bp paired-end 
(PE), 800-bp PE and mate-pair (MP) libraries of three different sizes (3 
kb, 6 kb and 9 kb) were generated. Sequencing was performed at the 
University of Illinois Roy J. Carver Biotechnology Center. 10X Genom-
ics Chromium libraries were prepared and sequenced at the Genome 
Canada Genome Innovation Centre using the manufacturers’ recom-
mendations to achieve a minimum of 30 × coverage. Hi-C libraries were 
prepared using previously described methods40. Using the Illumina PE, 
MP, 10X Genomics Chromium, and Hi-C, chromosome scale assemblies 
were prepared as described previously18. For cultivars assembled to a 
scaffold level, we used the W2RAP-contigger using k = 200 (Supple-
mentary Note 1). Two MP libraries (10 kb and 13 kb) were produced for 
each line except Weebill 1, for which two additional MP libraries were 
used. Mate pairs were processed, filtered and used to scaffold contigs 
as described in the W2RAP pipeline (https://github.com/bioinfolog-
ics/w2rap). Scaffolds of less than 500 bp were removed from the final 
assemblies. Additionally, we performed Oxford Nanopore sequenc-
ing of CDC Landmark using R9 flow cells and the GridION sequencing 
technology (Supplementary Note 2).

Nucleotide diversity analysis. The variant call format data files from 
two wheat exome-capture studies4,5 were retrieved, combined, and 
filtered to retain hexaploid accessions and polymorphisms detected 
in both studies. The 10X Genomics Chromium sequencing data for 
each of the RQA lines were aligned to Chinese Spring RefSeqv1.0 using 
the LongRanger v.2.1.6 software. Alignment files from the accessions 
assembled here and 16 Bioplatforms Australia lines19 with alignments 
obtained from the DAWN project52 were then used for variant calling by 
GATK v.3.8 at the same genomic positions identified by exome-capture 
sequencing. The variant files from the exome-capture studies, DAWN 
project and 10+ Wheat Genomes lines were then merged and subjected 
to principal component analysis (PCA) using the prcomp function in 
R v.3.6.1.

Gene projections. We used the previously published high-confidence 
gene models for Chinese Spring to assess the gene content in each 
assembly. Representative coding sequences of each informant locus 
were aligned to pseudomolecules of each line separately using BLAT53 
v.3.5 with the ‘fine’ parameter and a maximal intron size of 70 kb. BLAT 
matches seeded an additional alignment by exonerate54 in the genomic 
neighbourhood encompassing 20 kb upstream and downstream of the 
match position. Exonerate alignments required a minimal and maximal 
intron sizes of 30 bp and 20 kb, respectively. A linear regression of 
colocalized matches with complete alignments of the informant were 
computed for 10,000 such pairs to derive a normalization function 
and to render comparable scoring schemes for both methods. Sub-
sequently, we selected the top-scoring match for each mapping pair 
as the locus for the gene projection. Projections were then filtered by 
alignment coverage (Supplementary Note 3), the open reading frame 
(ORF) contiguity, the observed mapping frequency of the informant, 
coverage of start and stop codons, and the orthology or potential dis-
location of the match scaffold relative to its informant chromosome. 

Identification of orthologous groups was analogous to the approach 
used previously55. Reciprocal best BLAST hit (RBH) graphs were derived 
from pairwise all-against-all BLASTn v2.8 transcript searches (minimal 
e-value ≤ 1 × 10−30). Hits were assigned to homeologous groups on the 
basis of gene models of Chinese Spring following a previously described 
homeologue classification9. Multiple sequence alignments for the 
population genetics analysis were performed using MUSCLE v.3.8 with 
default parameters (Supplementary Note 3). Using the gene projec-
tions, we quantified average pairwise genetic diversity (π), polymor-
phism (Watterson’s θW), and Tajima’s D using compute and polydNdS 
in the libsequence v.1.0.3-1 package56. We retained diversity estimates 
for genes that were in all of the genomes and had ≤100 segregating 
sites. PAV was determined from the orthologous groups limited to 
one-to-one relations where there was no match in at least one genome.

Analysis of the Rf-like gene family. For Rf genes, the genome se-
quences were scanned for ORFs in six frame translations with the getorf 
program of the EMBOSS v.6.6.0 package. ORFs longer than 89 codons 
were searched for the presence of PPR motifs using hmmsearch from 
the HMMER v.3.2.1 package (http://hmmer.org) and the hidden Markov 
models defined previously. The PF02536 profile from the Pfam v32.0 
database (http://pfam.xfam.org) was used to screen for ORFs carrying 
mTERF motifs. Downstream processing of the hmmsearch results fol-
lowed the pipeline described previously57. ORFs with low hmmsearch 
scores were removed from the analysis as they are unlikely to repre-
sent functional PPR proteins. Only genes encoding mTERF proteins 
longer than 100 amino acids were included in the analysis. RFL-PPR 
sequences were identified as described23. The phylogenetic analyses 
were performed as described previously23. Conserved, non-PPR genes 
delimiting the borders of analysed RFL clusters were identified in the 
Chinese Spring RefSeqv1.0 reference genome and used to search for 
syntenic regions in the remaining wheat accessions with BLAST v.2.8. 
See Supplementary Note 4 for more details.

NLR repertoire.  NLR signatures were annotated using 
NLR-Annotator58,59 (https://github.com/steuernb/NLR-Annotator) 
with the option -a. We estimated redundancy of NLR signatures between 
genomes at different thresholds of identity: 95%, 98% and 100%. For the 
165 amino acids in the consensus of all NB-ARC motifs, this translates to 
8, 3 and 0 mismatches of a concatenated motif sequence. To calculate 
the overall redundancy in all genomes, we counted the number of LR 
signatures added to a non-redundant set by adding genomes iteratively. 
This was done for 1 million random permutations.

Repeat annotation. Transposons were detected and classified by a 
homology search against the REdat_9.7_Poaceae section of the PGSB 
transposon library60 using vmatch (http://www.vmatch.de) with the fol-
lowing parameters: identity ≥70%, minimal hit length 75 bp, seedlength 
12 bp (exact command line: -d -p -l 75 -identity 70 -seedlength 12 -exdrop 
5). To remove overlapping annotations, the output was filtered for 
redundant hits via a priority-based approach in which higher-scoring 
matches where assigned first and lower-scoring hits at overlapping 
positions were either shortened or removed if there was ≥90% overlap 
with a priority hit or if <50 bp remained. Tandem repeats where iden-
tified with TandemRepeatFinder v.4.09 under default parameters61 
and subjected to overlap removal as described above. Full-length LTR 
retrotransposons were identified with LTRharvest (http://genometools.
org/documents/ltrharvest.pdf). All candidates were subsequently an-
notated for PfamA domains using HMMER v.3.0 and filtered to remove 
false positives, non-canonical hybrids and gene-containing elements. 
The inner domain order served as a criterion for the LTR retrotranspo-
son superfamily classification, either Gypsy (RLG: RT-RH-INT), Copia 
(RLC: INT-RT-RH) or undetermined (RLX). The insertion age of fl-LTRs 
was calculated from the divergence between the 5′ and 3′ long terminal 
repeats, which are identical upon insertion. The genetic distance was 

https://github.com/bioinfologics/w2rap
https://github.com/bioinfologics/w2rap
http://hmmer.org
http://pfam.xfam.org
https://github.com/steuernb/NLR-Annotator
http://www.vmatch.de
http://genometools.org/documents/ltrharvest.pdf
http://genometools.org/documents/ltrharvest.pdf


calculated with EMBOSS v.6.6.0 distmat (Kimura2-parameter correc-
tion) using a random mutation rate of 1.3 × 10−8.

Analysis of centromeric regions. For each line with a RQA, ChIP 
was performed according to previous methods62 with slight modi-
fication using a wheat-specific CENH3 antibody36. An antigen with 
the peptide sequence RTKHPAVRKTKALPKK, corresponding to the 
N terminus of wheat CENH3, was used to produce an antibody using 
the custom-antibody production facility provided by Thermo Fisher 
Scientific. The customized antibody was purified and obtained as pel-
lets. The antibody pellet (0.396 mg) was dissolved in 2 ml PBS buffer, 
pH 7.4, resulting in a working concentration of 198 ng μl−1. Nuclei were 
isolated from 2-week-old seedlings, digested with micrococcal nuclease 
and incubated overnight at 4 °C with 3 μg of antibody or rabbit serum 
(control). Antibodies were captured using Dynabeads Protein G and 
the chromatin eluted using 100 μl of 1% sodium dodecyl sulfate, 0.1 
M NaHCO3 preheated to 65 °C. DNA isolation was then performed us-
ing ChIP DNA Clean & Concentrator Kit, and ChIP–seq libraries were 
constructed using TruSeq ChIP Library Preparation Kit and sequenced 
with a NovoSeq S4, which generated 150-bp paired-end reads.

For Chinese Spring, we used two datasets, SRR168679963 (dataset 1) 
and the dataset generated in this study (dataset 2). Sequence reads were 
de-multiplexed, trimmed and aligned to each of the respective RQAs 
using HISAT2 v.2.1.064. Alignments were sorted, filtered for minimum 
alignment quality of 30, counted in 100-kb bins using samtools v.1.10 
and BEDtools v.2.29, and visualized in R v.3.6.1. To define the midpoint of 
each centromere, we identified the highest density of CENH3 ChIP–seq 
reads using a smoothing spline in R v.3.6.1 with smooth.spline function 
(number of knots = 1,000) and identified the peak of the smooth spline 
as the centre of the respective centromere for a given chromosome. 
To compare centromeric positions of different genomes, the CENH3 
ChIP–seq density was plotted along with MUMmer v.4.0 chromosome 
alignments. To determine the overall size of wheat centromeres, we 
considered each 100-kb bin with CENH3 ChIP–seq read density that 
was greater than three times the background (genome average) level of 
read density to be an active centromeric bin. The number of enriched 
bins for each genome were counted and averaged to a total of 21 chro-
mosomes. This calculation included counting of unanchored bins.

Analysis of introgressions
Identification of full-length RLC-Angela retrotransposons. Retro-
transposon profiles were created for each genome using the RLC-Angela 
family65 and consensus sequences obtained from the TREP database 
(www.botinst.uzh.ch/en/research/genetics/thomasWicker/trep-db.
html). First, BLASTn was used to compare the ~1,700-bp LTR of 
RLC-Angela to each genome. Matching elements and 500 bp of flank-
ing sequences were aligned to identify precise LTR borders as well as 
different sub-families and/or sequences variants. We then used BLASTn 
to compare the 18 consensus LTR sequences against each genome and 
then screened for pairs of full-length LTRs that are found in the same 
orientation within a window of 7.5–9.5 kb (RLC-Angela elements are ~8.7 
kb long). These initial candidate full-length elements were screened for 
the presence of RLC-Angela polyprotein sequences by BLASTx, as well 
as for the typical 5-bp target-site duplications. We allowed a maximum 
of two mismatches between the two target-site duplications. All identi-
fied full-length RLC-Angela copies were then aligned to a RLC-Angela 
consensus sequence with the program Water from the EMBOSS v.6.6.0 
package (www.ebi.ac.uk/Tools/emboss/). These alignments were used 
to compile all nucleotide polymorphisms into a single file. The variant 
call file was then used for PCA using the snpgdsPCA function in the R 
package SNPrelate v.3.11.

Sequencing of the tertiary gene pool of wheat. Genomic DNA 
(gDNA) was extracted and purified from young leaf tissue collected 
from multiple accessions of T. timopheevii, A. ventricosa and T. ponticum 

(Supplementary Table 12) following a standard CTAB–chloroform 
extraction method. Yield and integrity were evaluated by fluorom-
etry (Qubit 2.0) and agarose gel electrophoresis. Paired-end libraries 
were prepared following the Nextera DNA Flex protocol. In brief, 500 
ng gDNA from each accession was fragmented and amplified with a 
limited-cycle PCR. Each library was uniquely dual-indexed with a dis-
tinct 10-bp index code (IDT for Illumina Nextera DNA UD) for multiplex-
ing, and quantified by qPCR (Kapa Biosystems). Final average library 
size was estimated on a Tapestation 2200. Libraries were normalized 
and pooled for sequencing on an Illumina NovaSeq 6000 S4 to generate 
~5× coverage per genotype. Sequencing data were de-multiplexed and 
aligned to appropriate RQAs (Supplementary Table 12) in semi-perfect 
mode using the BBMap v.38 short-read alignment software (https://
sourceforge.net/projects/bbmap/).

Structural variation
We karyotyped the lines using mitotic metaphase chromosomes 
prepared by the conventional acetocarmine-squash method. 
Non-denaturing fluorescence in situ hybridization (ND-FISH) of 
three repetitive sequence probes, Oligo-pSc119.2-1, Oligo-pTa535 
and Oligo-pTa713, was performed as described66,67 (Supplementary 
Note 6). Chromosomes were counterstained with DAPI. Chromo-
some images were captured with an Olympus BX61 epifluorescence 
microscope and a CCD camera DP80. Images were processed and 
pseudocoloured with ImageJ v.1.51n in the Fiji package. For karyo-
typing, at least four chromosomes per accession were examined 
and compared to the karyotype of Chinese Spring as described pre-
viously68. Hierarchal clustering of karyotype polymorphisms was 
performed using the Ward method in R v.3.0.2, which was used to 
estimate distance. Next, we applied Hi-C analysis for inversion calling 
as described previously40. In brief, adapters were removed and reads 
were mapped to Chinese Spring using minimap2 v.2.1069 as we have 
done previously21. The raw Hi-C link counts were calculated in 1 Mb 
non-overlapping sliding windows and then normalized as described 
in our previous work40. Finally, the normalized Hi-C link matrix was 
subjected to inversion calling using R.

We performed flow cytometry of wheat cultivars Arina and Forno 
as previously described70, except that we used a FACSAria SORP flow 
cytometer and cell sorter (Becton Dickinson). The 5B/7B translocation 
breakpoints were identified by comparison of chromosomes 5B and 7B 
from ArinaLrFor and Julius. Sequence collinearity between ArinaLrFor 
and Julius was detected by BLASTn searches of 1,000-bp sequence 
windows every 100 kb along the chromosomes. Once an interruption 
of synteny was detected, sequence segments at the positions of syn-
teny loss were extracted and used for local alignments to determine 
the precise breakpoint positions. PCR amplification of the 5BS/7BS 
and 7BL/5BL translocation sites was performed using standard PCR 
cycling conditions.

Characterization of haplotypes
Development of a wheat genome haplotype database. To iden-
tify haplotypes, pairwise chromosome alignments were performed 
between the RQA using MUMmer v.4.0, which were combined 
with pairwise nucleotide BLASTn analyses of the genes ± 2,000 bp 
using custom scripts in R v.3.6.1 (https://github.com/Uauy-Lab/
pangenome-haplotypes)71 (Supplementary Note 8). The resultant 
haplotypes were uploaded to an interactive viewer (http://www.
crop-haplotypes.com/). Pairwise BLASTn comparisons of the genes 
were also used to identify structural variants, and were uploaded into 
AccuSyn (https://accusyn.usask.ca/) and SynVisio (https://synvisio.
github.io/#/) to create a wheat-specific database (https://kiranbandi.
github.io/10wheatgenomes/). Pretzel (https://github.com/plantinfor-
matics/pretzel) was also used to visualize and compare the RQA and 
the projected gene annotations (http://10wheatgenomes.plantinfor-
matics.io/).

http://www.botinst.uzh.ch/en/research/genetics/thomasWicker/trep-db.html
http://www.botinst.uzh.ch/en/research/genetics/thomasWicker/trep-db.html
http://www.ebi.ac.uk/Tools/emboss/
https://sourceforge.net/projects/bbmap/
https://sourceforge.net/projects/bbmap/
https://github.com/Uauy-Lab/pangenome-haplotypes
https://github.com/Uauy-Lab/pangenome-haplotypes
http://www.crop-haplotypes.com/
http://www.crop-haplotypes.com/
https://accusyn.usask.ca/
https://synvisio.github.io/#/
https://synvisio.github.io/#/
https://kiranbandi.github.io/10wheatgenomes/
https://kiranbandi.github.io/10wheatgenomes/
https://github.com/plantinformatics/pretzel
https://github.com/plantinformatics/pretzel
http://10wheatgenomes.plantinformatics.io/
http://10wheatgenomes.plantinformatics.io/


Article

Characterization of Sm1. Sm1-linked markers6 were located in RQAs 
using BLAST v.2.8.0. Two high-resolution mapping populations were 
developed, 99B60-EJ2D/Thatcher and 99B60-EJ2G/Infinity. Progeny 
heterozygous for crossover events near Sm1 were identified in the F2 
generation, and the crossovers were fixed in the F3 generation. The 
resulting F2-derived F3 families were analysed with KASP markers within 
the Sm1 region and tested for resistance to OWBM in field nurseries 
to identify markers associated with Sm1. Ethyl methanesulfonate was 
used to develop knockout mutants in the Sm1 gene. Approximately 
3,200 seeds of the Canadian spring wheat variety Unity (an Sm1 carrier) 
were soaked in a 0.2% (v/v) aqueous ethyl methanesulfonate solution 
for 22 h at 22 °C. The seed was then rinsed in distilled water and sown 
in a field nursery. The M1 seed was grown to maturity and bulk har-
vested. Approximately 6,000 M2 seeds were space planted in two field 
nurseries located in Brandon and Glenlea, Manitoba, Canada. Spikes 
were collected on a per-plant basis at maturity and were classified as 
resistant, susceptible or undamaged as done previously6,72. Putative 
Sm1-knockout mutants were re-tested for OWBM resistance in indoor 
cage tests73 in the M3 and M4 generations. M4-derived families were 
tested for resistance to OWBM in field nurseries (randomized complete 
block design, six environments, and eight replicates per environment).

Candidate genes were identified between Sm1 flanking markers on 
the CDC Landmark assembly using the projected gene annotations and 
FGENESH v.2.6 (http://www.softberry.com/), which were compared to 
the projected genes of non-carriers. Both 5′ and 3′ rapid amplification 
of cDNA ends (5′ and 3′ RACE) were used to verify the transcription 
initiation and termination sites of the gene candidate, whose structure 
was predicted by FGENESH v.2.6. In brief, RNA was extracted from the 
leaves of Unity (Sm1 carrier) seedlings (using the Qiagen RNeasy kit), 
RACE PCR performed (Invitrogen GeneRacer kit), and the PCR product 
cloned (Invitrogen TOPO TA Cloning kit for sequencing) and sequenced 
by Sanger sequencing. Prediction of the conserved domains was done 
using the NCBI Conserved Domain Search tool (https://www.ncbi.
nlm.nih.gov/Structure/cdd/wrpsb.cgi) and PROSITE (release 2020_01; 
https://prosite.expasy.org/). The LRR domain was defined on the basis 
of the presence of 2–42 LRR motif repeats of 20–30 amino acids each. 
LRR motifs were manually annotated74. Prediction of transmembrane 
regions and orientation was performed using the program TMpred 
NCBI Conserved Domain Search tool (https://embnet.vital-it.ch/soft-
ware/TMPRED_form.html).

To study the expression of Sm1, total RNA was extracted from four 
biological replicates from four wheat genotypes (Unity, CDC Landmark, 
Waskada and Thatcher) from two different tissues; seedling leaves and 
developing kernels (five days post anthesis) using NucleoSpin RNA Plant 
kit (Macherey-Nagel) according to the manufacturer’s instructions. 
RNA was treated with RNase-free DNase (rDNase) (Macherey-Nagel) 
and reversed transcribed into cDNA using SuperScript IV Reverse Tran-
scriptase kit (Invitrogen) according to the manufacturer’s instructions 
and the NB-ARC domain amplified by PCR.

Reporting summary
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper.

Data availability
All sequence reads assemblies have been deposited into the National 
Center for Biotechnology Information sequence read archive (SRA) 
(see Supplementary Table 1 for accession numbers). Sequence reads 
for the RQAs, T. ponticum, A. ventricosa and T. timopheevii have been 
deposited into the SRA (accession no. PRJNA544491) and ChIP–seq 
short read-data used for centromere characterization is deposited 
under accession no. PRJNA625537. All Hi-C data have been deposited in 
the European Nucleotide Archive (Supplementary Table 1). The RQAs 

are available for direct user download at https://wheat.ipk-gatersleben.
de/. All assemblies and projected annotations are available for com-
parative analysis at Ensembl Plants (https://plants.ensembl.org/index.
html). Comparative analysis viewers are also online for synteny (https://
kiranbandi.github.io/10wheatgenomes/, http://10wheatgenomes.
plantinformatics.io/) and haplotypes (http://www.crop-haplotypes.
com/). Seed stocks of the assembled lines are available at the UK Germ-
plasm Resources Unit (https://www.seedstor.ac.uk/).

Code availability
Code for custom genome visualizers have been deposited in the 
public domain for haplotype viewer (https://github.com/Uauy-Lab/
pangenome-haplotypes), Pretzel (https://github.com/plantinformat-
ics/pretzel), AccuSyn (https://github.com/jorgenunezsiri/accusyn) and 
SynVisio (https://github.com/kiranbandi/synvisio). Additional scripts 
used for ChIP–seq analysis of the centromeres are provided at https://
github.com/wheatgenetics/centromere.
 
51. Dvorak, J., Mcguire, P. E. & Cassidy, B. Apparent sources of the A genomes of wheats 

inferred from polymorphism in abundance and restriction fragment length of repeated 
nucleotide-sequences. Genome 30, 680–689 (1988).

52. Watson-Haigh, N. S., Suchecki, R., Kalashyan, E., Garcia, M. & Baumann, U. DAWN: a 
resource for yielding insights into the diversity among wheat genomes. BMC Genomics 
19, 941 (2018).

53. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
54. Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence 

comparison. BMC Bioinformatics 6, 31 (2005).
55. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for 

genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 
(2000).

56. Thornton, K. Libsequence: a C++ class library for evolutionary genetic analysis. 
Bioinformatics 19, 2325–2327 (2003).

57. Cheng, S. et al. Redefining the structural motifs that determine RNA binding and RNA 
editing by pentatricopeptide repeat proteins in land plants. Plant J. 85, 532–547 (2016).

58. Steuernagel, B. et al. Physical and transcriptional organisation of the bread wheat 
intracellular immune receptor repertoire. Preprint at https://doi.org/10.1101/339424 (2018).

59. Steuernagel, B. et al. The NLR-Annotator tool enables annotation of the intracellular 
immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).

60. Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative 
plant genome research. Nucleic Acids Res. 44 (D1), D1141–D1147 (2016).

61. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids 
Res. 27, 573–580 (1999).

62. Nagaki, K. et al. Chromatin immunoprecipitation reveals that the 180-bp satellite repeat 
is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163, 
1221–1225 (2003).

63. Guo, X. et al. De novo centromere formation and centromeric sequence expansion in 
wheat and its wide hybrids. PLoS Genet. 12, e1005997 (2016).

64. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome 
alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–
915 (2019).

65. Wicker, T. et al. Impact of transposable elements on genome structure and evolution in 
bread wheat. Genome Biol. 19, 103 (2018).

66. Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences 
pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55, 
313–318 (2014).

67. Zhao, L. et al. Cytological identification of an Aegilops variabilis chromosome carrying 
stripe rust resistance in wheat. Breed. Sci. 66, 522–529 (2016).

68. Komuro, S., Endo, R., Shikata, K. & Kato, A. Genomic and chromosomal distribution 
patterns of various repeated DNA sequences in wheat revealed by a fluorescence in situ 
hybridization procedure. Genome 56, 131–137 (2013).

69. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–
3100 (2018).

70. Kubaláková, M., Vrána, J., Cíhalíková, J., Simková, H. & Doležel, J. Flow karyotyping and 
chromosome sorting in bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 104, 
1362–1372 (2002).

71. Brinton, J. et al. A haplotype-led approach to increase the precision of wheat breeding. 
Commun. Biol. https://doi.org/10.1038/s42003-020-01413-2 (2020).

72. Thomas, J. et al. Chromosome location and markers of Sm1: a gene of wheat that 
conditions antibiotic resistance to orange wheat blossom midge. Mol. Breed. 15, 183–192 
(2005).

73. Lamb, R. J. et al. Resistance to Sitodiplosis mosellana (Diptera: Cecidomyiidae) in spring 
wheat (Gramineae). Can. Entomol. 132, 591–605 (2000).

74. la Cour, T. et al. Analysis and prediction of leucine-rich nuclear export signals. Protein 
Eng. Des. Sel. 17, 527–536 (2004).

Acknowledgements We are grateful for funding from the Canadian Triticum Applied Genomics 
research project (CTAG2) funded by Genome Canada, Genome Prairie, the Western Grains 
Research Foundation, Government of Saskatchewan, Saskatchewan Wheat Development 
Commission, Alberta Wheat Commission, Viterra and Manitoba Wheat and Barley Growers 

http://www.softberry.com/
https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi
https://prosite.expasy.org/
https://embnet.vital-it.ch/software/TMPRED_form.html
https://embnet.vital-it.ch/software/TMPRED_form.html
http://www.ncbi.nlm.nih.gov/sra?term=PRJNA544491
http://www.ncbi.nlm.nih.gov/sra?term=PRJNA625537
https://wheat.ipk-gatersleben.de/
https://wheat.ipk-gatersleben.de/
https://plants.ensembl.org/index.html
https://plants.ensembl.org/index.html
https://kiranbandi.github.io/10wheatgenomes/
https://kiranbandi.github.io/10wheatgenomes/
http://10wheatgenomes.plantinformatics.io/
http://10wheatgenomes.plantinformatics.io/
http://www.crop-haplotypes.com/
http://www.crop-haplotypes.com/
https://www.seedstor.ac.uk/
https://github.com/Uauy-Lab/pangenome-haplotypes
https://github.com/Uauy-Lab/pangenome-haplotypes
https://github.com/plantinformatics/pretzel
https://github.com/plantinformatics/pretzel
https://github.com/jorgenunezsiri/accusyn
https://github.com/kiranbandi/synvisio
https://github.com/wheatgenetics/centromere
https://github.com/wheatgenetics/centromere
https://doi.org/10.1101/339424


Association. Funding was also provided by the Biotechnology and Biological Sciences Research 
Council (BBSRC) via the projects Designing Future Wheat (BB/P016855/1), sLOLA (BB/
J003557/1) and MAGIC Pangenome (BB/P010741/1, BB/P010733/1 and BB/P010768/1), by AMED 
NBRP (JP17km0210142), the German Federal Ministry of Education and Research (FKZ 
031B0190, WHEATSeq, 2819103915 and 2819104015), German Network for Bioinformatics and 
Infrastructure de.NBI (FKZ 031A536A, 031A536B), German Federal Ministry of Food and 
Agriculture (BMEL FKZ 2819103915 WHEATSEQ), Israel Science Foundation (Grant 1137/17), JST 
CREST (JPMJCR16O3), US National Science Foundation (1339389), Kansas Wheat Commission 
and Kansas State University, MEXT KAKENHI, The Birth of New Plant Species (JP16H06469, 
JP16H06464, JP16H06466 and JP16K21727), National Agriculture and Food Research 
Organization (NARO) Vice President Fund, Swiss Federal Office of Agriculture (NAP-PGREL), 
Agroscope, Delley Seeds and Plants, ETH Zurich Institute of Agricultural Sciences, Fenaco 
Co-operative, IP-SUISSE, swisssem, JOWA, SGPV-FSPC, Swiss National Science Foundation 
(31003A_182318 and CRSII5_183578), University of Zurich Research Priority Program Evolution in 
Action, King Abdullah University of Science and Technology, Grains Research and Development 
Corporation (GRDC), Australian Research Council (CE140100008) and Groupe Limagrain. We 
are grateful for the computational support of the Functional Genomics Center Zurich, the 
Molecular Plant Breeding Group—ETH Zurich, and the Global Institute of Food Security (GIFS), 
Saskatoon. We acknowledge the contribution of the Australian Wheat Pathogens Consortium 
(https://data.bioplatforms.com/organization/edit/bpa-wheat-cultivars) in the generation of data 
used in this publication. The Initiative is supported by funding from Bioplatforms Australia 
through the Australian Government National Collaborative Research Infrastructure Strategy 
(NCRIS). We thank S. Wu for DNA preparations for assembly and ChIP–seq library preparations; 
O. Francisco-Pabalan and J. Santos, T. Wisk and S. Wolfe for their provision of OWBM images;  
M. Knauft, I. Walde, S. König, T. Münch, J. Bauernfeind and D. Schüler for their contribution to 
Hi-C data generation and sequencing, DNA sequencing and IT administration and sequence 
data management; J. Vrána for karyotyping the wheat cultivars Arina and Forno; and R. Regier 
for project management, administration and support.

Author contributions Project establishment: K.C., A.D., A. Hall, B.K., S.G.K., E.L., P.L., 
K.F.X.M., J.P., C.J.P., K.K.S., M.S. and N.S. Project coordination: A. Hall, C.J.P. and N.S. 

Genome assemblies were contributed as follows: CDC Stanley and CDC Landmark: P.J.H., 
C.J.P., A.G.S., B.B., C.S.K., A.N., K.N. and S.W.; Julius: K.F.X.M., N.S., M.M., C.M. and U.S.; 
Jagger: G.M., J.P. and L.G.; ArinaLrFor: B.K., S.G.K. and M.C.K.; Mace and LongReach Lancer: 
K.C., P.L., G.K.-G. and J.T.; Norin 61: K.K.S., H.H., S.N., J.S., K. Kawaura, H.T., T. Tameshige, T.B., 
D.C., M.H., R.S.-I., C.A., F.K., J.G.-G. and N.S.; SY Mattis: E.L. and A.B.; spelt (PI190962): A.D., 
C.J.P. and J.D.; Robigus, Claire, Paragon and Cadenza: M.B., M.C., B.C., C.F., N.F. and D.H.; 
Weebill 1: M.C., B.C., J.C., K.A.G., L.P.-A. and L.V. Sequencing, assembly and analysis were 
contributed by WRA2P computational assembly: A. Hall, B.C., G.G.A., K. Krasileva, N.M., 
D.S. and J. Wright; 10X Genomics: H.B., C.J.P., J.E., S.K. and K.W.; Hi-C and structural 
analysis: M.M., N.S., A. Himmelbach, C.M., S.P. and L.G.; pseudomolecule assemblies: M.M., 
C.M. and N.S.; gene projections and TE analysis: K.F.X.M., M.S., H.G. and G.H.; diversity and 
polymorphism analysis: K.K.S., E.D., T.P., G.H.-N., D.C., M.H., G.H., H.H., H.K., M.S., K.M., T. 
Tameshige, T. Tanaka, J.S. and J. Wu; centromere diversity: J.P. and D.H.K.; 5B/7B 
translocation: S.G.K., T.W., J.C. and M.C.K; 2NvS introgression: J.P., A.K.F., L.G., P.J., C.J.P., R.S. 
and S.W.; TE-based introgressions: T.W., B.B., J.E., M.C.K., J.P., C.J.P., J.T. and S.W; cytological 
karyotyping: S.N., K.M., Y.N., J.S. and T.K.; diversification of Rf genes: J.M. and I.S.; NLR 
repertoire: S.G.K. and B.S.; Sm1 gene cloning: C.A.M., C.J.P., C.U., J.B., A.C.C., S.C., P.F., 
M.T.K., V.K., D.T. and K.W.; haplotype database: C.U., J.B. and R.H.R.-G.; visualization software: 
C.G., V.B., G.K.-G., J.N.S., J.T. and J.M.; BLAST server: M.M., A.F. and U.S.; C.J.P and S.W. 
drafted the manuscript with input from all authors. All co-authors contributed to and edited 
the final version.

Competing interests The authors declare no competing interests.

Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
2961-x.
Correspondence and requests for materials should be addressed to C.A.M., M.S., T.W. and C.J.P.
Peer review information Nature thanks Victor Albert, Rudi Appels and the other, anonymous, 
reviewer(s) for their contribution to the peer review of this work.
Reprints and permissions information is available at http://www.nature.com/reprints.

https://data.bioplatforms.com/organization/edit/bpa-wheat-cultivars
https://doi.org/10.1038/s41586-020-2961-x
https://doi.org/10.1038/s41586-020-2961-x
http://www.nature.com/reprints


Article

Extended Data Fig. 1 | Chromosome-scale collinearity between the RQA. 
Genomes were aligned chromosome by chromosome using MUMmer and are 
represented as dot plots. The introgression on chromosome 2B of LongReach 

Lancer (red rectangles) and 5B/7B translocation in SY Mattis and ArinaLrFor 
(purple rectangles) are indicated.


Extended Data Fig. 2 | Evaluation of the CDC Landmark RQA using Oxford 
Nanopore Long Reads. a, Scaffold-scaffold long read contact map showing 
shared read IDs between scaffold ends along the ordered scaffolds in the CDC 
Landmark pseudomolecules. The diagonal pattern indicates that adjacent 
scaffolds share the same long reads and are therefore properly ordered and 
oriented by Hi-C in the RQA. b, Characterization of inversion events on 

chromosomes 2A, 3A, and 3D. The directionality biases estimated from 
alignments of Hi-C data against Chinese Spring (left, top), and chromosome 
alignment of the inversion events between CDC Landmark and Chinese Spring 
RQAs (left, bottom) are shown. Long reads spanning the inversion events and 
magnified views of the reads aligning to the left and right boundaries of the 
inversions (right) are provided.


Article

Extended Data Fig. 3 | Diversity of genes and TEs. a, Average pairwise genetic 
diversity of the homeologues (coding sequences only) of the A, B and D 
subgenomes. The mode of the A, B and D subgenome is 0.00057, 0.00082, and 
0.0002, respectively. b, Tajima’s D estimates of coding sequences for each 
wheat subgenome. The lower and upper range of the boxplot hinges 
correspond to the first and third quartiles (the 25th and 75th percentiles). 
Boxplots show centre line, median; box limits, upper and lower quartiles; 

whiskers, 1.5 × interquartile range. c, Total gene counts and orthologues for the 
RQA. Genes in orthologous groups with exactly one gene for each line 
(Complete; dark brown), genes contained in unambiguous orthologous groups 
missing an orthologue for at least one line, that is, PAV (2-10 Lines; light brown), 
and genes with ambiguous orthologues or CNV (Other; pink) are indicated. d, 
Per cent of pairwise shared syntenic fl-LTRs between wheat lines.


Extended Data Fig. 4 | Evolutionary relationships among PPR and mTERF 
gene sequences. a, The RFL clade is in blue and all remaining P-class PPRs are in 
green. b, Clustered mTERF sequences are in blue and the remaining mTERFs are 
shown in green. The scale bar represents number of substitutions per site. c, 
Sequence inversions and copy number variation at the Rf3 locus on 

chromosome 1B. RFL genes are shown as light pink triangles above the 
chromosome scale. Conserved non-PPR genes used as syntenic anchors are 
shown on the chromosome scale as coloured triangles. The total number (T) 
and the number of putatively functional RFL genes with 10 or more PPR motifs 
(F) are indicated on the right side of each panel.


Article

Extended Data Fig. 5 | Identification of alien introgressions from wheat 
relatives. A feature of foreign chromosomal introgressions is that they contain 
unique patterns of TE insertions. Shown are stretches of >20 Mb containing 
multiple polymorphic RLC-Angela retrotransposons that are found only in one 
or a few (≤4) of the sequenced lines. One representative chromosome for each 
wheat subgenome is shown. Individual polymorphic retrotransposons are 
indicated as coloured vertical lines. Colours correspond to the number of 

cultivars a foreign segment is found in. Regions of particular interest are 
indicated by black rectangles. These include the 2NvS alien introgression from 
A. ventricosa at the end of chromosome 2A in Jagger, Mace, SY Mattis and CDC 
Stanley, as well as introgression in the central region of chromosome 2B from  
T. timopheevi in LongReach Lancer, and introgression at the end of 
chromosome 3D from T. ponticum in LongReach Lancer.


Extended Data Fig. 6 | Detailed characterization of the 2NvS introgression 
from A. ventricosa. a, Pairwise alignments of the first 50 Mb of chromosome 
2A. The black arrow indicates a possible unique haplotype within spelt. b, 
Orthologous genes between the 2NvS introgression from A. ventricosa in Jagger 
and the genes on chromosomes 2A, 2B, and 2D in Chinese Spring. c, Frequency 

of 2NvS introgression carriers in North American datasets from CIMMYT, 
Kansas State, and the USDA Winter Wheat Regional Performance Nursery 
(RPN) over time. d, Per cent yield difference in lines that carry the 2NvS 
introgression. Two sided t-tests were performed to test for the significance of 
the impact of the 2NvS introgression. **P < 0.01; ***P < 0.001.


Article

Extended Data Fig. 7 | Centromere positions and karyotype variation. 
Functional centromere positions in the RQA have undergone structural and 
positional rearrangement. Chromosome alignments showing collinearity 
(black scaffolds in same orientation, grey scaffolds in opposite orientation) 
with relative density of CENH3 ChIP–seq mapped to 100 kb genomic bins for 

Chinese Spring (blue) and a representative genome of comparison (red) for 
chromosome 4B of CDC Stanley (a), and chromosome 5B of Julius (b). c, 
Detailed list and clustering of cytological features carried by each wheat line 
(Supplementary Note 6). Features that are identical (dark grey) or have a gain 
(black) or loss (light grey) relative to Chinese Spring are indicated.


Extended Data Fig. 8 | Hi-C validates inversions identified from pairwise 
chromosome alignments. Pairwise alignments of chromosome 6B from the 
RQA and Chinese Spring are shown. Above each alignment dot plot, the 
directionality biases estimated from alignments of Hi-C data against Chinese 

Spring are shown. Boundaries of diagonal segments are indicative of inversions 
and coincide with inversion boundaries identified from the chromosome 
alignments.


Article

Extended Data Fig. 9 | Characterization of a translocation involving wheat 
chromosomes 5B and 7B. a, Cytogenetic karyotypes of Forno (left) and Arina 
(right), the parents of ArinaLrFor. Note that the large recombinant 
chromosome 7B is represented by a distinct peak. b, Sequence of the 
translocation breakpoint on chromosome 7B of ArinaLrFor. Note that the exact 
breakpoint lies in a sequence gap (stretch of Ns). The bp positions are indicated 
at the left. Forward PCR primers are shown in red and reverse primers in blue. 
The overlap of the two reverse primers is shown in purple. The outer primer 
pair was used for PCR, while the inner pair was used for a nested PCR. c, PCR 
amplification of the fragment spanning the translocation breakpoint. The 

nested PCR yielded a ~5 kb fragment that spanned the translocation breakpoint 
and its identity was confirmed by sequencing. Both PCR and nested PCR were 
performed in duplicate; both replicates of the nested PCR were sequenced 
using the Sanger method. For gel source data, see Supplementary Fig. 1. d, 
Mapping of Illumina reads from the cultivars Arina and Forno on to the 
pseudomolecules of ArinaLrFor. Sequence derived from Forno is shown in blue, 
while sequenced derived from Arina is in red. Note that chromosomes 5B and 
7B are derived from both parents, indicating that these parental chromosomes 
can recombine freely, despite the presence of a large 5B/7B translocation in 
Arina.


Extended Data Fig. 10 | See next page for caption.


Article
Extended Data Fig. 10 | Confirmation of gene expression and gene 
structure for Sm1. a, Critical recombinants from the 99B60-EJ2G/Infinity and 
99B60-EJ2D/Thatcher populations used to fine map Sm1. The 99B60-EJ2G/
Infinity cross had 5,170 F2 plants, while 99B60-EJ2D/Thatcher cross had 5,264 
F2 plants; only recombinant haplotypes between orange wheat blossom midge 
resistant (R) and susceptible (S) genotypes are shown. b, Oxford Nanopore 
long read confirmation of the Sm1 gene candidate in the CDC Landmark RQA 
(left), and alternative haplotype in Chinese Spring (right). Vertical coloured 
lines indicate sequence variants. c, Amplification of cDNA for the NB-ARC 

domain of the Sm1 gene candidate (top) and actin control (bottom) derived 
from RNA isolated from developing kernels (left) and wheat seedlings (right). 
Unity and CDC Landmark are carriers of Sm1. Waskada carries an alternative 
haplotype and does not carry Sm1 (see main text). Thatcher was used as a 
susceptible parent for fine mapping of Sm1 and does not contain the associated 
NB-ARC domain. The experiment was replicated on four independent 
biological samples for each condition. d, Distribution of an Sm1 allele-specific 
PCR marker in a diverse panel of >300 wheat lines.


1

nature research  |  reporting sum
m

ary
O

ctober 2018

Corresponding author(s): Curtis Pozniak

Last updated by author(s): Aug 11, 2020

Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed

The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement

A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly

The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section.

A description of all covariates tested

A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons

A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)

For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings

For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes

Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Our web collection on statistics for biologists contains articles on many of the points above.

Software and code
Policy information about availability of computer code

Data collection No software was used to collect data for this study.

Data analysis A multitude of software and databases were used in this study, all of which have been listed, cited, or provided. These include: 
DeNovoMAGIC v3.0, W2RAP (no versions, https://github.com/bioinfologics/w2rap), LongRanger v2.1.6, GATK v3.8, R v3.6.1 and v3.0.2, 
BLAT v3.5, BLAST v2.8 , MUSCLE v3.8, libsequence v1.8.3, EMBOSS v6.6.0, HMMER 3.1b2, PFAM v32.0, NLR-Annotator (no version, 
https://github.com/steuernb/NLR-Annotator), Vmatch v2.3.0, TandemRepeatFinder v4.07b, LTRharvest genometools-1.5.9, HMMER 
v3.0, MUMmer v3.23 (haplotype database) and v4 (all other analyses), HISAT v2.1.0, SNPrelate v3.11, BBTools/BBMap  v38, ImageJ 
v1.51n, minimap2 v2.13, FGENESH v2.6, NCBI Conserved Domain Search tool (no version, https://www.ncbi.nlm.nih.gov/Structure/cdd/
wrpsb), PROSITE release 2020_01, TMpred v25, STAR v2.6.0b., AUGUSTUS v3.2.3., GMAP v2017-06-20, EvidenceModeler v1.1.1, AHRD 
v1.6, MCScanX v2.0, samtools v1.10, BEDtools v2.29, and custom data scripts (https://github.com/Uauy-Lab/pangenome-haplotypes; 
http://people.beocat.ksu.edu/~jpoland/centromeres/).

For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data

All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- A description of any restrictions on data availability

All sequence reads have been deposited into the National Center for Biotechnology Information sequence read archive (SRA) (see Supplementary Table 1 for 
accession numbers).  Sequence reads for the RQAs, Th. ponticum, Ae. ventricosa and T. timopheevii have been deposited into the SRA (no. PRJNA544491) and ChIP-


2

nature research  |  reporting sum
m

ary
O

ctober 2018

seq short read-data used for centromere characterization is deposited as PRJNA625537.  All Hi-C data has been deposited in the European Nucleotide Archive 
(Supplementary Table 1). The RQAs and projected annotations are available for direct user download at https://wheat.ipk-gatersleben.de/. All RQA assemblies have 
also been deposited at EBI with the following accession numbers:  GCA_903993795; GCA_903993985; GCA_903993975; GCA_903994175; GCA_903994195; 
GCA_904066035; GCA_903994155; GCA_903994165; GCA_903994185; GCA_903995565.  These data will be syncrhonized across multiple platforms including NCBI 
and at Ensembl Plants (https://plants.ensembl.org/index.html). Comparative analysis viewers are also online for synteny (https://
kiranbandi.github.io/10wheatgenomes/; http://10wheatgenomes.plantinformatics.io/) and haplotypes (http://www.crop-haplotypes.com/). Seed stocks of the 
assembled lines are available at the UK Germplasm Resources Unit (https://www.seedstor.ac.uk/).

Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences  Ecological, evolutionary & environmental sciences

For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size No statistical methods were used to establish sample size. The samples that were sequenced were selected to represent modern breeding 
material from different continents that had known differences in pedigree and were known to carry different genes/traits/chromosomal 
segments of interest.  

Data exclusions All sequencing data generated was used in the genome assembly and analyses. Whenever possible, all data was included in the supporting 
analyses. Data exclusion applies only to some of the subsequent supporting analysis, which was pre-established based on limitations in the 
data.  For example, we excluded the scaffolded assemblies from some analyses because the analyses required chromosome 
pseudomolecules. We performed diversity analysis both with the spelt genome but also excluding the spelt genome because it is a different 
species and is much more diverged and biased the results.

Replication In all analyses that support the genome assemblies, the number of replicates or iterations are indicated in materials and methods or 
supplemental tables. In each case, all replications were successful and were used. The genome assemblies themselves were validated using 
multiple methods (i.e. BUSCO, genetic maps, HiC, 10x Genomics, cytology, and comparions to Chinese Spring).  The CDC Landmark assembly 
was further validated using Oxford Nanopore long read sequencing.  This helped validate the other approaches.  

Randomization Randomization does not directly apply to the genome sequencing and assembly; however it applies to some of the supporting analyses. In 
these cases, the group design and data seeding for computational analysis are described in the materials and methods and adhere to widely 
accepted standards. For example, analysis of NLRs (Fig. 1c), 1 million random permutations were used. For the field experiments established 
for phenotyping of  Sm1, all samples were replicated and randomized using appropriate experimental designs.  

Blinding Blinding does not apply to this study, as the study focuses on genome sequencing. This study focuses on plants genomics and the results of 
the study are not impacted by the concealment of treatment, data, or groups.

Reporting for specific materials, systems and methods
We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 

Materials & experimental systems
n/a Involved in the study

Antibodies

Eukaryotic cell lines

Palaeontology

Animals and other organisms

Human research participants

Clinical data

Methods
n/a Involved in the study

ChIP-seq

Flow cytometry

MRI-based neuroimaging

Antibodies
Antibodies used Chromatin immunoprecipitation (ChIP) was performed ausing wheat CenH3 antibody (Koo et al., 2015).  A antigen with the 

peptide sequence ‘RTKHPAVRKTKALPKK’ corresponding to the N-terminus of wheat CENH3 was used to produce antibody 
utilizing the custom-antibody production facility provided by the Thermo Fisher Scientific, Illinois, USA (abs@thermofisher.com). 
A 0.396 mg of the antibody pellet was dissolved in 2 ml of PBS buffer, pH 7.4 resulting in 198 ng/uL of the working concentration. 

Validation In the manuscript, we validate the antibody according to a previous study of Chinese Spring (Koo et al., 2015) and achieved near 


3

nature research  |  reporting sum
m

ary
O

ctober 2018

identical results (Supplementary Table 12). Additional controls were used in the study where the antibody was substituted with 
rabbit serum, which serves as nonspecific binding control in chromatin immunoprecipitation assay.

ChIP-seq
Data deposition

Confirm that both raw and final processed data have been deposited in a public database such as GEO.

Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links 
May remain private before publication.

The data for the project has been deposited at NCBI: PRJNA625537 and analysis files are available for download: http://
people.beocat.ksu.edu/~jpoland/centromeres/

Files in database submission BED files, delta files (MUMmer), data analysis scripts

Genome browser session 
(e.g. UCSC)

Data for visualization is available at http://people.beocat.ksu.edu/~jpoland/centromeres/

Methodology

Replicates NA.  Samples were obtained from 2-week-old seedlings. 

Sequencing depth Paired-end reads were generated at varying levels of read depth, data was deposited at NCBI (PRJNA625537).

Antibodies Wheat CenH3 antibody - see: Koo DH, Sehgal SK, Friebe B, Gill BS (2015) Structure and stability of telocentric chromosomes 
in wheat. PLoS One 10: e0137747.

Peak calling parameters Reads mapped per 100kb bin were counted for each sample using BEDtools and output as a bed file.  Scripts for data 
analysis are provided at http://people.beocat.ksu.edu/~jpoland/centromeres/. Unlike studies involving transcription factors, 
CENH3 ChIP-seq provides clear distinct peaks that are ~100 fold greater than background. 

Data quality SAM output files from HISAT2 were converted to BAM, sorted and filtered for minimum alignment quality of 30 using 
SAMtools. 

Software Reads for each sample were aligned to each of the respective genome assemblies using HISAT2.Reads mapped per 100kb 
bin were counted for each sample using BEDtools and output as a bed file. Scripts for data analysis are provided at http://
people.beocat.ksu.edu/~jpoland/centromeres/. 


	Multiple wheat genomes reveal global variation in modern breeding
	Global variation in wheat genomes
	Polyploi