Vol.:(0123456789)1 3 Planta (2022) 256:93 https://doi.org/10.1007/s00425-022-03998-w ORIGINAL ARTICLE Genetic variation and structural diversity in major seed proteins among and within Camelina species Dwayne Hegedus1,2  · Cathy Coutu1 · Branimir Gjetvaj1 · Abdelali Hannoufa3 · Myrtle Harrington1 · Sara Martin3 · Isobel A. P. Parkin1 · Suneru Perera1,2 · Janitha Wanasundara1,2 Received: 8 April 2022 / Accepted: 12 September 2022 / Published online: 6 October 2022 © Crown 2022 Abstract Main conclusion Genetic variation in seed protein composition, seed protein gene expression and predictions of seed protein physiochemical properties were documented in C. sativa and other Camelina species. Abstract Seed protein diversity was examined in six Camelina species (C. hispida, C. laxa, C. microcarpa, C. neglecta, C. rumelica and C. sativa). Differences were observed in seed protein electrophoretic profiles, total seed protein content and amino acid composition between the species. Genes encoding major seed proteins (cruciferins, napins, oleosins and vicilins) were catalogued for C. sativa and RNA-Seq analysis established the expression patterns of these and other genes in developing seed from anthesis through to maturation. Examination of 187 C. sativa accessions revealed limited variation in seed protein electrophoretic profiles, though sufficient to group the majority into classes based on high MW protein pro- files corresponding to the cruciferin region. C. sativa possessed four distinct types of cruciferins, named CsCRA, CsCRB, CsCRC and CsCRD, which corresponded to orthologues in Arabidopsis thaliana with members of each type encoded by homeologous genes on the three C. sativa sub-genomes. Total protein content and amino acid composition varied only slightly; however, RNA-Seq analysis revealed that CsCRA and CsCRB genes contributed > 95% of the cruciferin transcripts in most lines, whereas CsCRC genes were the most highly expressed cruciferin genes in others, including the type cultivar DH55. This was confirmed by proteomics analyses. Cruciferin is the most abundant seed protein and contributes the most to functionality. Modelling of the C. sativa cruciferins indicated that each type possesses different physiochemical attributes that were predicted to impart unique functional properties. As such, opportunities exist to create C. sativa cultivars with seed protein profiles tailored to specific technical applications. Keywords Camelina sativa · Cruciferin · Gene expression · Protein functionality · Protein modelling Abbreviations daa Days after anthesis G1 Sub-genome I G2 Sub-genome II G3 Sub-genome III HVR Hypervariable region IA Intrachain disulphide bond-containing IE Interchain disulphide bond-containing PGRC Plant Gene Resources Center RMSD Root mean square difference Introduction Interest in Camelina sativa (camelina), grown in Europe in medieval times for food and fuel, stems from the need to diversify annual crop rotation portfolios with those that have smaller environmental footprints and the potential to produce valuable secondary products. It is compatible with practices used to produce contemporary oilseed crops, such as canola/oilseed rape and soybean, can be grown on mar- ginal lands with fewer inputs and has higher tolerance to Communicated by Dorothea Bartels. * Dwayne Hegedus Dwayne.Hegedus@agr.gc.ca 1 Agriculture and Agri-Food Canada, 107 Science Place, Saskatoon, SK S7N 0X2, Canada 2 Department of Food and Bioproduct Sciences, University of Saskatchewan, Saskatoon, SK, Canada 3 Agriculture and Agri-Food Canada, London, ON, Canada http://orcid.org/0000-0002-7790-5567 http://crossmark.crossref.org/dialog/?doi=10.1007/s00425-022-03998-w&domain=pdf Planta (2022) 256:93 1 3 93 Page 2 of 23 drought and cold (Vollman et al. 1996). It is also naturally resistant to several diseases (Sharma et al. 2002; Eynck et al. 2012) and insects (Deng et al. 2002; Henderson et al. 2004; Soroka et al. 2015) that afflict canola. Camelina sativa seed comprises approximately 36–47% oil (Moser 2012) and 43% protein (Zubr 2003). While it is being aggressively marketed as a diesel and aviation fuel feedstock (Li and Mupondwa 2014), the high levels of poly- unsaturated fatty acids, in particular α-linolenic acid (38% total fatty acid), make it an attractive source of ω3 fatty acids in food and feed. α-linolenic acid is the precursor for the essential long chain polyunsaturated fatty acids eicosapen- tanoic acid (20:5 ω3) and docosahexanoic acid (22:6 ω3) that have human health benefits. Farmed fish species, such as salmon and cod, can convert α-linolenic acid to these longer chain polyunsaturated fatty acids when camelina oil is substituted for fish oil in their diet (Hixson et al. 2014; Hix- son and Parrish 2014). This was attributed to the induction of two genes encoding fatty acyl elongases in the livers of fish fed diets containing only camelina oil (Xue et al. 2014, 2015). Other studies have reported increased ω3 fatty acid levels in chicken meat (Ariza et al. 2010) and eggs (Kakani et al. 2012), as well as in milk (Szumacher-Strabel et al. 2011) when camelina meal is incorporated into the diet at fairly low levels. Complete replacement of fish oil with camelina oil in farmed fish diets seems possible as this has no impact on weight gain, fillet sensory quality (Hixson et al. 2014) or the ability to mount an immune response (Booman et al. 2014), though some differences in tissue lipid compo- sition (Hixson et al. 2014) and intestinal function (Morias et al. 2012) have been noted. Camelina meal is also being considered as a protein source in farmed fish, poultry and livestock. Atlantic cod (Gadus morhua) tolerated up to 24% inclusion of camelina meal in place of fish meal in their diets without affecting weight gain (Hixson et al. 2016a). Salmonids were more sensitive to fish meal replacement and tolerated up to 5% (Atlantic salmon, Salmo salar) (Hixson et al. 2016b) and 14% (rainbow trout, Oncorhynchus mykiss) (Ye et al. 2016) camelina meal in their diets without ill effects. In cod, high inclusion rates were associated with increased expression of appetite-stimulating hormones and decreased expression of appetite-suppressing hormones indicating that the meal is affecting nutritional quality or palatability (Tuziak et al. 2014). In broiler chickens, low energy and nitrogen utilisa- tion from camelina-based meals was attributed to high jeju- nal digesta viscosity, likely due to high levels of seed coat mucilage remaining in the meal, and to the presence of glu- cosinolates which can affect palatability (Pekel et al. 2015). Conversely, in growing pigs the ileal digestibility of crude protein from camelina expeller cake was only slightly less than the comparable canola product and was recommended for use in swine diets (Almeida et al. 2013). In cattle, the amount of undegraded protein in the rumen differed among meals from ten camelina genotypes (Colombini et al. 2014), but was generally higher than for canola meal. With the exception of glucosinolates, the levels of anti-nutritional factors including phytic acid, condensed tannins and sinap- ine, were lower in the camelina meals than canola meal. The essential amino acid composition of camelina meal is comparable to that from canola, soybean and flax meals (Zubr 2003); however, differences in amino acid profiles among camelina lines have been reported (Colombini et al. 2014). Of particular significance are the essential amino acids lysine and methionine as they cannot be synthesised de novo by animals and must be provided in the diet, though methionine can be converted to cysteine. Both are limiting in plant-based diets, most notably in cereals and some legumes (Ufaz and Galili 2008), and are added as supplements to feeds at a significant cost to fish (Wilson and Halver 1986), poultry (Kidd et al. 1998) and swine (Brinegar et al. 1950) production. Camelina breeding is still in its infancy, but release of the camelina genome (Kagale et al. 2014) and transcriptome data (Liang et al. 2013; Nguyen et al. 2013; Mudalkar et al. 2014; Kagale et al. 2016) will facilitate rapid advances in crop improvement. As in other Brassicaceae, the major seed proteins in camelina are of the 2S albumins (napin) and 12S globulins (cruciferin), with transcript data indicating that there are 8 and 17 expressed members of these gene families, respectively, in C. sativa cv. Sunesson (Nguyen et al. 2013). The napin dimer possesses four disulphide bonds and conse- quently these proteins are rich in cysteine, while cruciferins tend to have higher levels of lysine. Oleosins are amphiphi- lic proteins with well-separated hydrophilic and hydropho- bic domains; an attribute that allows them to interact with both lipid and water. While less abundant than cruciferin or napin, they play a major role in seed lipid accumulation and stabilisation of oil bodies, as well as other aspects of plant development (D’Andrea 2016). Manipulation of amino acid levels is possible through mutation (Kita et al. 2010; Mar- solais et al. 2010) or down-regulation (Schmidt et al. 2011) of the major seed storage protein genes. To date, there has been no broad examination of C. sativa seed protein or seed amino acid content diversity. To this end, we established seed protein profiles for six Camelina species and 187 C. sativa accessions from a global diversity collection held at the Plant Gene Resources Center for Can- ada (pgrc.agr.gc.ca). Amino acid content was determined for representatives from each major seed protein profile group and transcriptomic analysis was conducted to catalogue the expressed seed protein genes from the most diverse lines. These studies established that there is potential to select or engineer C. sativa lines with altered seed protein and/or amino acid profiles that may be more useful in food/feed or technical applications. Planta (2022) 256:93 1 3 Page 3 of 23 93 Materials and methods Plant materials A list of Camelina species, accessions and their source is provided in Suppl. Table S1a. Another 187 C. sativa acces- sions were obtained from PGRC (Agriculture & Agri-Food Canada, Saskatoon) (Suppl. Table S1b). C. sativa DH55 is a doubled haploid line for which the genome sequence is available (Kagale et al. 2014). Seed protein extraction and separation Seeds of C. hispida var. hispida, C. hispida var. gran- diflora, C. sativa, C. laxa, C. neglecta, C. microcarpa (4x and 6x) and C. rumelica were generated at the Agri- culture and Agri-Food Canada, Ottawa Research and Development Centre under controlled conditions within a growth chamber with randomised individual position and re-randomisation of position every two weeks. Self- incompatible taxa were hand pollinated to induce seed set. Seeds of C. sativa lines obtained from PGRC were gener- ated at the Agriculture and Agri-Food Canada, Saskatoon Research and Development Centre. Plants were grown in 6-inch pots in a soilless medium (Stringam 1971) in a growth chamber with a photoperiod of 16 h and light/ dark temperatures of 20 °C/16 °C. At maturity, water was withheld and plants allowed to dry, at which point seed was collected from the entire plant and seed from each plant kept separate. Seeds (30 mg) from individual plants grown at the same time and under the same conditions, each representing one biological replicate, were ground under liquid nitrogen using a Helix grinder (Helix Technologies Inc., French Lick, IN, USA). The material was suspended in 1.2 ml of lysis buffer (7 M urea, 2 M thiourea, 19 mM Tris–HCl, 14 mM Tris-base, 0.2% Triton X-100) with 8% Complete mini EDTA-free protease inhibitor (Roche Diagnostics, Laval, Canada), 1.5 mg/ml DNase I (Roche Diagnostics) in dilution buffer (10 mM Tris–Cl pH 7.5, 150 mM NaCl, 1 mM MgCl2), and 0.01 mg/ml bovine pancreas RNase A (Sigma-Aldrich, Oakville, Canada) added just prior to use (Withana-Gamage et al. 2013a). Soluble proteins were isolated by centrifugation at 10,000 g for 20 min. Disulfide bonds were reduced by incubation for 30 min at 4 °C with 1.0 mM DTT when required. Protein concentrations were determined using a Qubit 2.0 Fluorometer (Thermo Fisher Scientific, Nepean, Canada). An Experion Pro260 analysis kit (Bio-Rad Laborato- ries, Mississauga, Canada) was used to determine the rela- tive proportion of each protein based on size from the seed extracts. Fresh, not frozen, protein samples were adjusted to 0.5 µg/µl and treated according to the manufacturer’s protocol (Experion Pro260 Analysis kit, Bio-Rad Labo- ratories). In brief, gel solution, gel-stain solution, Pro260 ladder and sample buffer were prepared with Experion Pro260 analysis kit reagents. Note only the Pro260 ladder was heated to 100 °C; the samples were heated to 65 °C to prevent thiourea in the buffer from denaturing the pro- teins. Experion Pro260 chip micro-channels were used to separate proteins on an Experion automated electrophore- sis station (Bio-Rad Laboratories). The resulting electro- pherograms were analysed using the percentage determi- nation function in the Experion software which calculates each protein peak as a percent of the total protein within the sample. Amino acid analysis Seeds (3 g) from individual plants grown at the same time and under the same conditions, each representing one bio- logical replicate, were defatted using hexane based on the methods of Troeng (1955) and Barthet and Daun (2004). Seeds were placed in sealed, steel tubes with 3 ball bearings and 25 ml of hexane (Sigma-Aldrich). Samples were ground for 45 min using an Eberbach shaker followed by filtration to remove oils and hexanes. Defatted meal was air-dried over- night followed by storage at − 20 °C. Total nitrogen content of the defatted meal was determined using a Flash EA 112 Series N/Protein 2000 Organic Elemental Analyzer (Thermo Fischer Scientific). This system uses a dynamic flash com- bustion system coupled with a gas chromatographic sepa- ration system based on the AOAC Method 972.43 (1999). Approximately, 15 mg of defatted meal from each sample (biological replicate) was analysed in triplicate (technical replicates). The nitrogen to protein conversion factor used was 6.25 (Mariotti 2008; AACC Method 46–18.01 1999). Moisture levels in the defatted meal were determined as weight loss upon drying to stability at 105 °C for 24 h in a forced air oven (AACC Method 44–01.01 1999). Approxi- mately, 700 mg of defatted camelina meal was dried for each sample. Amino acid profiles were analysed following the proce- dure of AOAC Method 994.12 (2005) and Tuan and Phil- lips (1997). Tryptophan was quantified following method of Nielsen and Hurrell (1985). For C. sativa lines from the PGRC repository, protein hydrolysis was conducted using a microwave, acid hydrolysis method modified from Lill et al. (2007) and Kabaha et al. (2011). Acid hydrolysis converts asparagine and glutamine into aspartic acid and glutamic acid, respectively; therefore, these amino acids are quantified together. Separation and quantification of amino acids was performed using a high-performance liquid chromatography (HPLC) system (Waters Alliance Planta (2022) 256:93 1 3 93 Page 4 of 23 2695) equipped with a Waters 2475 fluorescence detector with excitation wavelength of 250 nm, emission wave- length of 395 nm. Amino acids were resolved using a mul- tistep gradient elution with an injection volume of 5 μl. Response peaks were recorded with the software Empower (Waters Corporation, Brossard, Canada). Pre-column deri- vatization using AccQ-Fluor (Waters Corporation) was done for all samples, except tryptophan which was diluted prior to application. For all amino acids except cysteine, methionine and tryptophan, 5 mg of protein basis was hydrolysed with 6 M HCL (Optima grade, Thermo Fisher Scientific) with 1% phenol using a CEM Discover SPD Microwave digester (ramp time 5.5 min, hold at 195 °C for 10 min, maximum pressure at 140 psi and maximum power at 300 W). Hydrolysates were neutralised with sodium hydroxide, filtered through a 0.45 μm Phenex RC syringe filter and applied to a Waters Oasis HLB C18 Cartridge. Flow through and washes were collected. Cysteine and methionine were determined as cystic acid and methio- nine sulfone after oxidation with performic acid followed by microwave hydrolysis with 6 M HCl, then neutralised and filtered as described. Tryptophan was determined by hydrolysing 10 mg of protein in 4.2 M NaOH in a 10 ml quartz hydrolysis tube with a teflon liner using a CEM Dis- cover SPD Microwave digester (ramp time 6.0 min, hold at 215 °C for 20 min, with maximum pressure set at 140 psi and maximum power at 300 W). Hydrolysed samples were neutralised with HCl and filtered prior to application on a Waters Oasis HLB C18 Cartridge. The flow-through and washes were collected. Samples were stored at -20 °C prior to dilution and HPLC analysis. DL 2-aminobutyric acid and DL 5-methyl-tryptophan (Sigma-Aldrich) were used as internal standards. For experiments with Camelina species, amino acid analysis was conducted as described above, except the hydrolysis was performed as follows. Defatted meal was placed into 10 ml Pyrex screw cap vials with protein equivalents of 5 mg (nitrogen to protein con- version factor of 6.25). Hydrolysis was done in 2 ml of 6 M HCl (Optima grade, Thermo Fisher Scientific) with 1% (w/v) phenol for 24 h at 110 °C, with the exception of cysteine and methionine which were oxidised to cystic acid and methionine sulfone prior to hydrolysis in 6 M HCl. Tryptophan was not assessed. Amino acids were reported as % w/w (weight of the spe- cific amino acid/weight of all amino acids recovered X-100). For samples from each biological replicate, representing single plants grown at the same time and under the same conditions, amino acid and nitrogen analysis were performed in triplicate (technical replicates) and moisture determina- tion as a single reading. Technical replications of the same sample presenting a large coefficient of variation (> 10) were repeated. Statistical differences between biological replicates were identified using JMP 13 software. A one-way analysis of variance (ANOVA) and the multiple comparison Tukey honestly significant difference (HSD) test were used to iden- tify and rank significant differences (P ≤ 0.05). RNA‑Seq analysis C. sativa DH55 flower buds along the main raceme were marked at anthesis and developing bolls taken every 4 days from anthesis to seed maturity (40 days). RNA was isolated separately from samples from each time point. Buds from lines identified as belonging to one of three protein profile groups, either Group 1 (CN113733 and CN30476), Group 2 (CN30477and CN45816), or Group 3 (CN111331 and CN114265), were also marked at anthesis and bolls sampled similarly; however, prior to RNA isolation, equal amounts (by weight) of material from each time point were pooled into a single sample representing an average developmental profile for each line. This allowed the suite of seed protein genes expressed in each line to be compared, although it was not possible to determine when they were expressed. RNA isolation was performed similar to Suzuki et  al. (2004) with volumes modified to allow extraction in 1.5 ml tubes. RNA was quantified on a Qbit using the BR RNA kit (Invitrogen/Thermo Fisher Scientific), and library genera- tion (Truseq stranded mRNA kit) and Illumina sequencing (800,000–1,000,000 reads per sample) were performed by the National Research Council of Canada DNA Services Lab (Saskatoon, Canada). Reads were trimmed for adapters and quality using Trimmomatic 0.30, with a phred 33 qual- ity score cutoff of 15 used for leading, trailing, and sliding window (4 bp) trimming, discarding any reads with under 55 bp remaining after trimming. CLC Genomics Workbench 11.0.1 was used to run RNAseq Analysis (version 2.1), which mapped the reads to the genome and calculated the transcripts per million (TPM). Quantile normalisation was applied to improve between-sample comparisons. Proteomics analysis Seed protein was solubilised in non-reducing protein load- ing buffer (2% SDS, 10% glycerol, 0.01% bromophenol blue in 60 mM Tris–HCl buffer, pH 6.8) and separated by electrophoresis on 12% SDS-PAGE gels. A high molecu- lar weight region (49–54 kDa) was cut from the gel and subjected to LS-MS/MS analysis at the Genome BC Pro- teomics Centre, University of Victoria, Canada, as per the following procedure. Trypsin digests were performed as previously described (Loiselle et al. 2005). Briefly, the gel slice was cut into 1 mm cubes and transferred to a Genom- ics Solutions Progest (DigiLab Inc., Holliston, MA, USA) perforated digestion tray. The gel pieces were de-stained (methanol/water/acetic acid, 50/45/5, by vol.) prior to reduc- tion with 10 mM dithiothreitol and alkylation with 100 mM Planta (2022) 256:93 1 3 Page 5 of 23 93 iodoacetamide. Modified sequencing-grade porcine trypsin solution (20 ng/µl) (Promega, Madison, WI, USA) was added at an enzyme/protein ratio of 1:50. Proteins were then digested for 5 h at 37 °C prior to collection of the tryptic digests and acid extraction of the gel slices (acetonitrile/ water/formic acid, 50/40/10, by vol.). The samples were then lyophilised and stored at − 80 °C prior to analysis. The peptide digest was separated by on-line reverse-phase chromatography using an EASY-nLC II system (Thermo Fisher Scientific) with a reverse-phase Magic C-18AQ pre- column (100 µm I.D., 2 cm length, 5 µm, 100 Å) and reverse- phase nano-analytical column Magic C-18AQ (75 µm I.D., 15 cm length, 5 µm, 100 Å) (Michrom BioResources Inc., Auburn, AL, USA) both prepared in-house, at a flow rate of 300 nl/min. The chromatography system was coupled on-line with an LTQ Orbitrap Velos mass spectrometer equipped with a Nanospray II source (Thermo Fisher Scientific). Sol- vents were A: 2% acetonitrile, 0.1% formic acid; B: 90% acetonitrile, 0.1% formic acid. After pre-column (~ 10 µl, 249 bar) and nanocolumn (~ 6 µl, 249 bar) equilibration, samples were separated by gradient elution (0 min: 5% B; 45 min: 45% B; 2 min: 80% B; hold 8 min: 80% B). The LTQ Orbitrap Fusion (Thermo Fisher Scientific) parameters were as follows: nano-electrospray ion source with spray voltage 2.1 kV, capillary temperature 225 °C. Survey MS1 scan m/z range 400–2,000 profile mode, resolution 60,000 FWHM at 400 m/z with AGC target 1E6, and one microscan with maximum inject time of 500 ms. Lock mass Siloxane 445.120024 for internal calibration with preview mode for FTMS master scans: on, injection waveforms: on, monoi- sotopic precursor selection: on; rejection of charge state: 1. The samples were analysed by the following methods: (1) top 15 FTMS/IT-CID method with the fifteen most intense ions charge state 2–4 exceeding 5000 counts were selected for CID ion trap MS/MS fragmentation (ITMS scans 2–16) with detection in centroid mode. Dynamic exclusion set- tings were: repeat count: 2; repeat duration: 15 s; exclusion list size: 500; exclusion duration: 60 s with a 10 ppm mass window. The CID activation isolation window was: 2 Da; AGC target: 1E4; maximum inject time: 100 ms; activation time: 10 ms; activation Q: 0.250; and normalised collision energy 35%. A database was generated based on the published pro- teome of C. sativa (Kagale et al. 2014, 2016) and com- mon contaminant sequences (human keratin and porcine trypsin) added. All cruciferin, napin, vicilin, and oleosin sequences were manually curated prior to inclusion in the database. The following sequences were corrected: napins (Csa11g017000, Csa12g024720, Csa12g024730), cruci- ferins (Csa14g004960, Csa03g005050, Csa11g015240), vicilins (Csa19g031870, Csa01g025880, Csa01g025890, Csa16g016660 , Csa05g038120) and o leos in (Csa12g079570). All seed protein sequences were deposited in Genbank (accessions OL404969-OL405008). Tandem mass spectra were extracted, charge state deconvoluted and deisotoped by Proteome Discoverer version 1.4. All MS/ MS samples were analysed using Mascot version 1.4.1.14 (Matrix Science, London, UK). Mascot was set up to search with a fragment ion mass tolerance of 0.60 Da and a parent ion tolerance of 8.0 PPM. Carbamidomethyl of cysteine was specified as a fixed modification. Deamidation of asparagine and glutamine, oxidation of methionine and propionamide of cysteine were specified as variable modifications. Scaffold (version Scaffold_4.8.4, Proteome Software Inc., Portland, OR, USA) was used to validate MS/MS based peptide and protein identifications. Peptide identifications were accepted if they could be established at greater than 95.0% probability by the Scaffold Local FDR algorithm. Protein identifica- tions were accepted if they could be established at greater than 95.0% probability and contained at least 2 identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm (Nesvizhskii et al. 2003). Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony. Proteins sharing significant peptide evidence were grouped into clusters. Phylogenetic analysis Phylogenetic analysis was conducted using MEGA version 6.06 (Tamura et al. 2013). Sequences were aligned using MUSCLE with parameters set at gap opening penalty 10, gap extension penalty 0.2 and gap separation distance 4 for protein alignments and gap opening penalty 15, gap exten- sion penalty 6.66, transition weight 0.5 for DNA alignments. Maximum likelihood trees were constructed using the best substitution model for each data set with 500 bootstrap iterations. Protein modelling The Swiss Model First Approach (Waterhouse et al. 2018) was used to identify the best template and to generate an initial structure for each cruciferin. The SWISS-MODEL template library (SMTL version 2020-05-20, PDB release 2020-05-15) (Bienert et al. 2017) was searched for evolutionary-related structures matching the target sequence using default set- tings (http:// swiss model. expasy. org). The best template, PDB 3KGL.1.A, was found with HHblits and identified as a homo- trimer. The template structure was obtained from X-ray crys- tallography with a resolution of 2.98 angstroms. A structural alignment was calculated and the fit adjusted to the template using Swiss PDB Viewer, SPDBV (https:// spdbv. vital- it. ch). The resultant structurally aligned SPDBV project files were submitted to Swiss Model workspace. Loops were constructed for untemplated regions and adjacent residues with low root http://swissmodel.expasy.org https://spdbv.vital-it.ch Planta (2022) 256:93 1 3 93 Page 6 of 23 mean square differences (RMSD) using the Scan Loop Data Base for realistic loop options. When an acceptable loop was not identified, the residues associated with the loop were sub- mitted for modelling to the DaReUS-Loop server (https:// biose rv. rpbs. univ- paris- dider ot. fr/ servi ces/ DaReUS- Loop). Energy minimization of the structure was done after loop selection. Energy minimization computations (bonds, angles, torsion, improper, non-bonded and electrostatic) were conducted with the GROMOS96 module in Swiss PDB Viewer. Model quality was reviewed using QMEAN and GMQE from Swiss Model, Ramachandran plot statistics were calculated using ProCheck (https:// servi cesn. mbi. ucla. edu/ PROCH ECK) and Z-Score from ProSA (https:// prosa. servi ces. came. sbg. ac. at/ prosa. php). RMSD of the final structure was calculated for the structurally- aligned residues against the template 3KGL.1.A using Swiss PDB Viewer (van Gunstern 1996). Electrostatic surface potentials of the molecules were cal- culated using the default settings in the APBS electrostatic plugin (Dolinsky et al. 2007). The molecule was prepared using PDB2PQR workflow to add missing side chains and hydrogen atoms, to assign partial charges and radii, and to remove ligands. The electrostatic map was calculated with the grid spacing set to 0.5 with molecular surface visuali- sation set at ± 5 on the solvent-excluded surface (Connolly surface). The protein dielectric constant was set at 2, the solvent dielectric constant at 78, and the temperature at 310 K. Hydrophobicity was ranked using the Eisenberg scale (Eisenberg et al. 1984). Models were coloured using the color_h pyMol script (https:// pymol wiki. org/ index. php/ Color_h). ClustalW was used for multiple sequence alignments. Evolutionary sequence conservation was determined using the ConSurf server (https:// consu rf. tau. ac. il/) (Landau et al. 2005). Phosphorylation sites were identified using Net Phos 2.0 (http:// www. cbs. dtu. dk/ servi ces/ NetPh os-2.0). PyMol (https:// pymol wiki. org/ index. php/ FindS urfac eResi dues) was used to colour each of the identified sites. Surface accessible phosphorylation sites on the trimer were identified using the find surface residues feature in PyMol. The cutoff to define exposed or not exposed residues was set at 2.0 squared Ang- stroms. CAST-P (computed atlas of surface topography of proteins) was used to calculate the main pocket of the trimer. Pocket volume, area, circumference, openings and sum of mouth areas were reported using Connolly solvent-excluded surface area, which is the contact surface created when a sphere of size 1.4 angstroms is rolled over the model. Results Seed protein profile diversity in Camelina species Total seed protein from lines representing the spectrum of Camelina species (Suppl. Table S1) was separated by capillary electrophoresis under reducing (with β-ME) and non-reducing conditions (without β-ME) (Fig. 1; Suppl. Table S2). While many of the major peaks were in com- mon between the species, a scheme to differentiate them based on unique peaks and patterns specific to each was developed (Suppl. Fig. S1). The C. sativa/C. microcarpa 4X/C. microcarpa 6X/C. rumelica rumelica/C. rumelica transcapida group could be differentiated from the C. neglecta, C. laxa/C. hispida hispida/C. hispida grandiflora group by the presence or absence of a 17 kDa peak under reducing conditions. C. sativa could then be differenti- ated by the presence of a 14 kDa peak and C. rumelica rumelica/C. rumelica transcapida differentiated from C. microcarpa 4X/C. microcarpa 6X by the presence or absence of a 33 kDa peak. C. microcarpa 4X exhibited a 54 kDa peak under non-reducing conditions, while C. microcarpa 6X did not. C. neglecta could be differentiated from C. laxa/C. hispida hispida/C. hispida grandiflora by a 12 kDa peak under reducing conditions and the latter further differentiated by 33 and 29 kDa peaks. Protein and amino acid content in meal from Camelina species The percent protein of defatted meal varied considerably between species, but generally less so between acces- sions of the same species (Table 1). Meal from the C. microcarpa 4X lines exhibited the lowest protein content, approximately 31%, while meal from C. hispida hispida, C. laxa, C. rumelica transcapida and lines within the C. rumelica rumelica and C. sativa groups approached or exceeded 40%. Amino acid content in the meal also varied significantly within and between species (Table 2). Of the essential amino acids most often added as supplements to feeds, lysine levels varied from a low of 4.77% (w/w) in meal from C. rumelica rumelica 609 to a high of 5.74% in C. sativa 1063 meal. Meal from the C. sativa lines gener- ally had higher levels of lysine. Of the sulphur-contain- ing amino acids, methionine was highest in meal from C. rumelica rumelica 609 and lowest in C. microcarpa 6X 198 meal, while cysteine was highest in C. rumelica rumelica 247 meal, but lowest in C. rumelica rumelica 1034 meal. Interestingly, histidine levels were signifi- cantly higher in the meal from C. rumelica rumelica 1034 (4.77%), which was almost twice that found in meal from https://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop https://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop https://servicesn.mbi.ucla.edu/PROCHECK https://prosa.services.came.sbg.ac.at/prosa.php https://pymolwiki.org/index.php/Color_h https://pymolwiki.org/index.php/Color_h https://consurf.tau.ac.il/ http://www.cbs.dtu.dk/services/NetPhos-2.0 https://pymolwiki.org/index.php/FindSurfaceResidues Planta (2022) 256:93 1 3 Page 7 of 23 93 the other species. Serine content was highest (5.39%) in C. sativa 605 meal, but lowest (4.43%) in meal from another C. sativa line, 252. Threonine was also lowest (3.83%) in meal from C. sativa line 1662, but exceeded 4.5% in other C. sativa lines and other Camelina species. Seed protein profile diversity in C. sativa As variation in seed protein profile was observed with the nine C. sativa accessions examined above, the analysis was extended to include a global collection of 187 C. sativa lines from the PGRC. Lines could be classified based on the similarity of seed protein profiles under reducing or non- reducing conditions. It should be noted that while classifica- tion of the lines based on protein profiles generated under the two conditions was generally in agreement, some lines were placed into different groups dependent upon the condition under which the seed protein was separated. This allowed for an even finer level of discrimination when both data sets were considered. A complete list of the lines tested with accompanying capillary electrophoresis electropherograms can be found in Suppl. Table S3. Under reducing conditions, seven different profiles were noted with the majority of the lines exhibiting one of three Fig. 1 Seed protein profiles from various Camelina species. Traces were generated by capillary electrophoretic separation of total seed protein under reducing (upper panel) and non-reducing (lower panel) conditions. Commons peaks (black numbers), peaks differing between species (red numbers) and peaks unique to a species (green numbers) Planta (2022) 256:93 1 3 93 Page 8 of 23 profiles as exemplified by lines CN113733, CN111311 and CN30477 (Fig. 2). Lines with these profiles exhibited several unique protein peaks or patterns between 22 and 36 kDa (Fig. 2a). Three distinct profiles were observed under non- reducing conditions with the pattern of proteins ranging from 49 to 54 kDa being one of the more distinguishing features (Fig. 2b). Profile 1 (e.g. CN113733) had a single peak ca. 51. kDa with a small higher molecular weight (MW) shoulder. Profile 2 (e.g. CN30477) was distinguish- able by a unique peak at ca. 23 kDa, by a peak at ca. 36 kDa appearing as a shoulder on a common higher MW peak at ca. 39 kDa, and by two smaller, broad peaks of relatively equal abundance at ca. 52 and 55 kDa. Lines exhibiting Profile 3 (e.g. CN111331) were similar to Profile 1, but had two large peaks at ca. 51 and 54 kDa. Lines in the same category often showed slight differences in the ratio of proteins, but the profiles were very similar (Fig. 2c). Protein and amino acid content in meal from diverse C. sativa accessions Percent protein in defatted meal was found to vary con- siderably among the C. sativa lines; however, this did not correlate with protein profiles (Table 3). Meal from line CN113733 had the highest protein content (53.71%), while meal from line CN111331 had the lowest (43.26%). It should be noted that the average meal protein content among these lines (49.49%) was higher than in the nine accessions exam- ined above (40.41%). This likely reflects the different loca- tions and conditions under which the plants were propagated for these experiments. The amino acid content in meal from the lines repre- senting the three seed protein profiles was also examined to estimate the extent of diversity for this trait among the lines in the PGRC collection (Table 4). While a correlation between seed protein profile and amino acid content was not observed, the lines examined exhibited significant dif- ferences in meal amino acid content. Of the essential amino acids required by monogastric animals, methionine (con- verted to cysteine), threonine and lysine are often lacking in plant-based diets. In this regard, meal from lines CN113733, CN30476 and CN111331 had significantly higher levels (ca. 7–8% more) of lysine, while less variation was found for methionine and threonine levels. Meal from line CN30477 had generally higher levels of essential aliphatic amino acids, namely leucine, isoleucine and valine, than the other lines, while meal from line CN114265 had significantly higher levels of cysteine. Meal from line CN45816 had sig- nificantly higher levels of glutamic acid (ca. 4–7% more), but lower levels of hydroxyl amino acids (serine, threonine and tyrosine) as did meal from line CN114265. Meal from line CN30476 had the highest levels of serine and threonine. Genes encoding major seed storage proteins in C. sativa Examination of the C. sativa DH55 genome sequence (Kagale et al. 2014) identified genes encoding major seed proteins, namely cruciferin, napin, vicilin and oleosin, which were then annotated according to their relationship to the presumed A. thaliana orthologues and location of the gene on a specific C. sativa sub-genome (Suppl. Table S4). Twelve genes encoded the main Brassicaceae seed storage protein, cruciferin, of which five were located on sub-genome I (G1), four on sub-genome II (G2) and three on sub-genome III (G3). Phylogenetic comparison to the four genes encod- ing cruciferin in A. thaliana (AtCRA , AtCRB, AtCRC and At1g03890) revealed that two tandemly linked genes on G1 (Csa11g070580 and Csa11g070590) and one of the genes on G2 (Csa18g009670) were most similar to AtCRA and were named accordingly (Fig. 3; Suppl. Fig. S2a). A CRA Table 1 Protein content in meal from various Camelina species 1 Mean ± SD (n = 3 biological replicates each with 3 technical repli- cates) 2 Letters denote significant differences (P = 0.05). Tukey–Kramer comparison for least squares means Species Line Protein1 (%) SD Significance category2 C. hispida grandi- flora 248 36.17 3.71 >BCDEFGHI C. hispida hispida 240 39.40 1.42 ABCD>>>>> C. laxa 612 37.39 0.68 ABCDEFG>> C. neglecta 246 34.06 2.66 >>>DEFGHI C. microcarpa 4X 168 31.54 0.12 >>>>>>>HI 718 31.42 0.98 >>>>>>>>I 965 31.19 1.86 >>>>>>GHI C. microcarpa 6X 198 32.94 0.89 >>>>>FGHI 818 33.69 0.47 >>>>EFGHI C. rumelica rumelica 247 39.24 0.84 ABCD>>>>> 609 37.96 1.32 ABCDEF>>> 1022 37.01 1.37 ABCDEFGH> 1034 36.97 1.90 ABCDEFGH> 1255 37.42 0.66 ABCDEFG>> C. rumelica tran- scapida 245 40.03 2.99 ABC>>>>>> C. sativa 239 41.70 0.49 AB>>>>>>> 252 40.43 3.24 ABC>>>>>> 596 35.78 1.07 >>CDEFGHI 605 39.16 1.17 ABCDE>>>> 621 41.68 0.87 AB>>>>>>> 1044 40.72 1.21 ABC>>>>>> 1062 39.91 0.86 ABC>>>>>> 1063 41.83 2.03 A>>>>>>>> 1662 42.49 0.20 A>>>>>>>> Planta (2022) 256:93 1 3 Page 9 of 23 93 Ta bl e 2 A m in o ac id c on te nt in m ea l f ro m v ar io us C am el in a sp ec ie s Am in o Ac id C on te nt (% w /w )1, 2, 3 Sp ec ie s ID # Al an in e Ar gi ni ne As pa rt at e/ As pa ra gi ne Cy st ei c A cid Gl ut am at e/ Gl ut am in e Gl yc in e Hi s� di ne Iso le uc in e Le uc in e Ly sin e M et hi on in e Ph en yl - al an in e Pr ol in e Se rin e Th re on in e Ty ro sin e Va lin e C. h isp id a gr an di flo ra 24 8 4. 77 ± 0 .1 6 AB CD 9. 51 ± 0 .3 0 AB 8. 58 ± 0 .3 3 DE 7. 98 ± 0 .3 4 CD EF GH I 17 .2 3 ± 0. 52 AB C 6. 22 ± 0 .1 3 AB CD 2. 41 ± 0 .0 8 BC 3. 48 ± 0 .0 4 BC DE 5. 70 ± 0 .0 4 HI JK 5. 47 ± 0 .3 0 AB CD EF 2. 57 ± 0 .0 8 CD EF G 3. 66 ± 0 .0 6 HI 4. 97 ± 0 .1 6 AB CD E 5. 18 ± 0 .0 9 AB CD 4. 09 ± 0 .1 1 BC DE F 3. 17 ± 0 .0 7 AB 4. 99 ± 0 .0 8 FG HI C. h isp id a hi sp id a 24 0 4. 50 ± 0 .0 9 D 9. 29 ± 0 .2 1 AB 8. 44 ± 0 .1 3 E 8. 22 ± 0 .4 4 BC DE FG H 17 .5 1 ± 0. 38 AB C 6. 09 ± 0 .1 1 BC DE 2. 28 ± 0 .0 5 BC 3. 69 ± 0 .0 7 AB CD 5. 92 ± 0 .0 4 FG HI JK 5. 46 ± 0 .2 4 AB CD EF 2. 81 ± 0 .1 6 AB CD 3. 69 ± 0 .0 4 GH I 4. 96 ± 0 .0 4 AB CD E 4. 87 ± 0 .1 0 CD EF GH IJ 4. 18 ± 0 .1 4 AB CD EF 3. 20 ± 0 .0 4 AB 4. 89 ± 0 .1 3 HI C. la xa 61 2 4. 48 ± 0 .0 9 D 9. 50 ± 0 .3 4 AB 8. 84 ± 0 .1 7 BC DE 7. 80 ± 0 .2 2 FG HI 17 .8 3 ± 0. 29 A 5. 16 ± 0 .0 6 IJK L 2. 57 ± 0 .0 6 B 3. 59 ± 0 .1 7 AB CD E 6. 35 ± 0 .0 7 AB CD 5. 27 ± 0 .1 0 AB CD EF 2. 67 ± 0 .1 3 AB CD EF 4. 09 ± 0 .0 5 AB 5. 05 ± 0 .0 7 AB CD E 4. 60 ± 0 .2 0 GH IJ 4. 04 ± 0 .1 0 DE F 2. 93 ± 0 .0 7 B 5. 25 ± 0 .2 4 CD EF G C. n eg le ct a 24 6 4. 47 ± 0 .1 7 D 9. 90 ± 0 .1 6 A 9. 15 ± 0 .5 3 BC DE 7. 90 ± 0 .3 4 DE FG HI 17 .4 4 ± 0. 38 AB C 6. 09 ± 0 .2 0 BC DE 2. 52 ± 0 .0 8 BC 3. 64 ± 0 .0 6 AB CD 5. 73 ± 0 .1 3 HI JK 4. 88 ± 0 .1 8 EF 2. 50 ± 0 .1 4 FG H 3. 90 ± 0 .0 5 CD EF 4. 81 ± 0 .0 6 CD EF 4. 64 ± 0 .0 7 GH IJ 4. 02 ± 0 .0 7 DE F 3. 20 ± 0 .0 2 AB 5. 23 ± 0 .0 8 CD EF GH C. m icr oc ar pa 4 X 16 8 4. 72 ± 0 .1 3 AB CD 9. 26 ± 0 .4 1 AB 9. 15 ± 0 .1 2 BC DE 6. 93 ± 0 .1 8 J 16 .9 4 ± 0. 12 AB C 6. 02 ± 0 .1 3 CD EF G 2. 43 ± 0 .0 4 BC 3. 52 ± 0 .0 4 BC DE 6. 10 ± 0 .0 8 BC DE FG 5. 31 ± 0 .2 5 AB CD EF 2. 60 ± 0 .0 8 BC DE F 3. 91 ± 0 .0 6 CD EF 5. 24 ± 0 .0 8 AB 5. 03 ± 0 .1 8 AB CD EF G 4. 57 ± 0 .1 4 A 3. 11 ± 0 .0 8 AB 5. 15 ± 0 .0 4 CD EF GH I C. m icr oc ar pa 4 X 71 8 4. 79 ± 0 .0 6 AB CD 10 .1 3 ± 0. 82 A 9. 28 ± 0 .9 1 BC DE 8. 43 ± 0 .3 0 BC DE FG 15 .6 6 ± 0. 52 DE 5. 68 ± 0 .2 9 EF GH I 2. 54 ± 0 .0 8 B 3. 57 ± 0 .1 4 AB CD E 5. 60 ± 0 .1 2 K 5. 58 ± 0 .2 1 AB CD E 2. 51 ± 0 .0 8 FG H 3. 81 ± 0 .0 5 DE FG H 5. 02 ± 0 .1 9 AB CD E 5. 09 ± 0 .1 4 AB CD EF 4. 27 ± 0 .1 4 AB CD E 3. 03 ± 0 .0 4 AB 5. 00 ± 0 .1 1 EF GH I C. m icr oc ar pa 4 X 96 5 4. 68 ± 0 .0 7 AB CD 9. 52 ± 0 .2 5 AB 11 .2 4 ± 0. 51 A 7. 72 ± 0 .3 4 FG HI J 16 .7 1 ± 0. 69 AB CD E 5. 38 ± 0 .1 6 HI JK 2. 30 ± 0 .0 6 BC 3. 49 ± 0 .1 7 BC DE 5. 96 ± 0 .1 1 FG HI J 5. 32 ± 0 .1 0 AB CD EF 2. 62 ± 0 .1 0 BC DE F 3. 78 ± 0 .0 4 EF GH I 4. 80 ± 0 .1 0 DE F 4. 58 ± 0 .1 3 GH IJ 3. 97 ± 0 .1 7 DE F 2. 92 ± 0 .0 3 B 5. 01 ± 0 .0 8 DE FG HI C. m icr oc ar pa 6 X 19 8 4. 67 ± 0 .1 1 AB CD 9. 52 ± 0 .4 8 AB 10 .1 1 ± 0. 29 AB 7. 08 ± 0 .4 3 IJ 17 .4 9 ± 0. 20 AB C 6. 41 ± 0 .1 7 AB CD 2. 49 ± 0 .0 3 BC 3. 49 ± 0 .0 3 BC DE 5. 66 ± 0 .0 6 HI JK 4. 89 ± 0 .1 3 CD EF 2. 25 ± 0 .1 8 H 3. 86 ± 0 .0 4 DE FG 4. 83 ± 0 .0 3 BC DE F 5. 03 ± 0 .0 8 AB CD EF GH 3. 95 ± 0 .0 5 DE F 3. 06 ± 0 .0 6 AB 5. 22 ± 0 .0 7 CD EF GH I C. m icr oc ar pa 6 X 81 8 4. 57 ± 0 .1 2 BC D 9. 77 ± 0 .4 2 AB 8. 64 ± 0 .2 1 CD E 8. 48 ± 0 .3 8 AB CD EF 17 .2 6 ± 0. 19 AB C 4. 91 ± 0 .1 2 KL 2. 44 ± 0 .0 6 BC 3. 76 ± 0 .0 6 AB 6. 32 ± 0 .2 0 AB CD E 5. 11 ± 0 .4 0 BC DE F 2. 73 ± 0 .0 8 AB CD EF 3. 97 ± 0 .1 2 AB CD 5. 12 ± 0 .0 9 AB CD 4. 57 ± 0 .2 0 HI J 4. 04 ± 0 .1 1 DE F 2. 97 ± 0 .0 5 AB 5. 35 ± 0 .1 2 BC DE C. ru m el ica ru m el ica 24 7 4. 48 ± 0 .0 7 D 10 .0 4 ± 0. 12 A 8. 15 ± 0 .0 8 E 9. 32 ± 0 .4 0 A 17 .2 5 ± 0. 35 AB C 5. 50 ± 0 .0 6 GH IJ 2. 40 ± 0 .0 2 BC 3. 51 ± 0 .0 4 BC DE 5. 96 ± 0 .0 8 FG HI J 4. 92 ± 0 .0 9 DE F 2. 72 ± 0 .1 4 AB CD EF 3. 82 ± 0 .0 4 DE FG H 5. 12 ± 0 .0 4 AB CD 4. 72 ± 0 .0 8 EF GH IJ 4. 08 ± 0 .0 9 CD EF 2. 98 ± 0 .0 3 AB 5. 03 ± 0 .0 7 DE FG HI C. ru m el ica ru m el ica 60 9 4. 40 ± 0 .1 3 D 9. 83 ± 0 .2 5 AB 8. 43 ± 0 .0 8 E 8. 92 ± 0 .2 9 AB 17 .2 6 ± 0. 43 AB C 4. 96 ± 0 .1 7 KL 2. 51 ± 0 .0 1 BC 3. 76 ± 0 .0 8 AB C 6. 40 ± 0 .1 1 AB 4. 77 ± 0 .2 1 F 2. 89 ± 0 .0 7 A 4. 07 ± 0 .0 8 AB C 5. 06 ± 0 .1 2 AB CD E 4. 42 ± 0 .2 1 J 4. 02 ± 0 .1 4 DE F 2. 97 ± 0 .0 9 AB 5. 32 ± 0 .0 8 BC DE F C. ru m el ica ru m el ica 10 22 4. 79 ± 0 .2 3 AB CD 9. 16 ± 0 .4 2 AB 9. 75 ± 0 .6 9 BC 8. 18 ± 0 .2 2 BC DE FG H 16 .4 5 ± 0. 60 BC DE 5. 54 ± 0 .2 0 FG HI 2. 48 ± 0 .1 7 BC 3. 44 ± 0 .1 6 CD E 5. 98 ± 0 .1 9 EF GH IJ 5. 65 ± 0 .3 5 AB C 2. 68 ± 0 .1 8 AB CD EF 3. 81 ± 0 .0 5 DE FG H 5. 02 ± 0 .2 5 AB CD E 4. 78 ± 0 .0 5 DE FG HI J 4. 50 ± 0 .2 8 AB 2. 92 ± 0 .2 4 B 4. 87 ± 0 .0 4 I C. ru m el ica ru m el ica 10 34 5. 05 ± 0 .1 7 A 8. 85 ± 0 .4 0 B 8. 96 ± 0 .1 3 BC DE 6. 91 ± 0 .2 1 J 15 .5 1 ± 0. 26 E 6. 50 ± 0 .1 7 AB C 4. 70 ± 0 .3 8 A 3. 46 ± 0 .0 8 BC DE 5. 63 ± 0 .1 2 JK 5. 84 ± 0 .3 2 A 2. 32 ± 0 .0 6 GH 3. 82 ± 0 .0 4 DE FG H 4. 51 ± 0 .0 8 F 5. 28 ± 0 .2 4 AB C 4. 37 ± 0 .1 5 AB CD 2. 95 ± 0 .2 3 AB 5. 34 ± 0 .0 6 BC DE F C. ru m el ica ru m el ica 12 55 4. 55 ± 0 .0 5 CD 9. 96 ± 0 .1 8 A 8. 82 ± 0 .1 1 CD E 7. 94 ± 0 .3 9 DE FG HI 16 .8 1 ± 0. 14 AB CD 5. 19 ± 0 .1 6 IJK L 2. 54 ± 0 .0 2 BC 3. 87 ± 0 .0 3 A 6. 46 ± 0 .0 4 A 4. 80 ± 0 .1 2 F 2. 84 ± 0 .1 0 AB 4. 11 ± 0 .0 2 A 4. 74 ± 0 .0 7 EF 4. 62 ± 0 .0 3 GH IJ 4. 26 ± 0 .0 6 AB CD E 3. 12 ± 0 .0 4 AB 5. 35 ± 0 .0 7 BC D C. ru m el ica tr an sc ap id a 24 5 4. 74 ± 0 .0 5 AB CD 9. 45 ± 0 .2 4 AB 8. 68 ± 0 .0 6 CD E 7. 57 ± 0 .2 5 HI J 17 .0 6 ± 0. 19 AB C 6. 59 ± 0 .1 3 AB 2. 41 ± 0 .0 3 BC 3. 51 ± 0 .0 8 BC DE 5. 68 ± 0 .0 3 IJK 5. 61 ± 0 .2 1 AB CD 2. 53 ± 0 .1 0 FG H 3. 62 ± 0 .0 2 I 4. 81 ± 0 .0 3 CD EF 5. 34 ± 0 .1 1 AB 4. 33 ± 0 .0 9 AB CD 3. 14 ± 0 .0 4 AB 4. 92 ± 0 .0 5 GH I C. sa �v a 23 9 4. 46 ± 0 .0 6 D 9. 64 ± 0 .1 0 AB 8. 50 ± 0 .0 9 DE 7. 88 ± 0 .2 8 EF GH I 17 .7 0 ± 0. 36 AB 6. 19 ± 0 .2 0 AB CD E 2. 21 ± 0 .0 6 C 3. 72 ± 0 .0 5 AB CD 5. 99 ± 0 .1 2 EF GH I 5. 36 ± 0 .1 2 AB CD EF 2. 81 ± 0 .0 9 AB CD E 3. 74 ± 0 .0 6 FG HI 4. 86 ± 0 .0 6 CD EF 4. 52 ± 0 .0 5 IJ 4. 09 ± 0 .1 0 BC DE F 3. 18 ± 0 .1 1 AB 5. 14 ± 0 .0 8 CD EF GH I C. sa �v a 25 2 4. 79 ± 0 .0 5 AB CD 9. 52 ± 0 .2 3 AB 8. 71 ± 0 .0 5 CD E 7. 58 ± 0 .1 3 GH IJ 17 .0 2 ± 0. 18 AB C 6. 71 ± 0 .0 5 A 2. 39 ± 0 .0 2 BC 3. 50 ± 0 .0 5 BC DE 5. 68 ± 0 .0 4 IJK 5. 39 ± 0 .2 0 AB CD EF 2. 55 ± 0 .1 0 EF G 3. 68 ± 0 .0 2 GH I 4. 73 ± 0 .0 4 EF 5. 39 ± 0 .0 5 A 4. 27 ± 0 .1 0 AB CD E 3. 22 ± 0 .0 4 A 4. 90 ± 0 .1 0 HI C. sa �v a 59 6 4. 90 ± 0 .0 8 AB C 9. 60 ± 0 .3 3 AB 9. 20 ± 0 .2 3 BC DE 7. 71 ± 0 .2 3 FG HI J 16 .7 1 ± 0. 45 AB CD E 6. 04 ± 0 .1 9 CD EF 2. 29 ± 0 .1 1 BC 3. 28 ± 0 .1 1 E 6. 01 ± 0 .1 2 DE FG HI 5. 31 ± 0 .2 5 AB CD EF 2. 59 ± 0 .1 0 BC DE FG 3. 93 ± 0 .0 5 BC DE 5. 17 ± 0 .1 6 AB C 4. 90 ± 0 .0 6 BC DE FG HI 4. 50 ± 0 .1 6 AB 2. 96 ± 0 .1 8 AB 4. 91 ± 0 .2 0 GH I C. sa �v a 60 5 4. 58 ± 0 .0 5 BC D 9. 81 ± 0 .3 0 AB 8. 32 ± 0 .2 8 E 8. 12 ± 0 .4 5 BC DE FG H 17 .1 0 ± 0. 33 AB C 5. 81 ± 0 .1 2 DE FG H 2. 55 ± 0 .0 3 B 3. 47 ± 0 .0 6 BC DE 6. 04 ± 0 .0 8 CD EF GH 5. 22 ± 0 .3 0 AB CD EF 2. 67 ± 0 .1 6 AB CD EF 3. 85 ± 0 .0 5 DE FG 5. 30 ± 0 .0 7 A 4. 74 ± 0 .1 1 DE FG HI J 4. 28 ± 0 .0 9 AB CD E 3. 05 ± 0 .0 5 AB 5. 09 ± 0 .0 9 DE FG HI C. sa �v a 62 1 4. 50 ± 0 .1 5 D 9. 94 ± 0 .2 9 A 8. 59 ± 0 .2 0 CD E 8. 51 ± 0 .3 6 AB CD EF 17 .0 3 ± 0. 22 AB C 5. 23 ± 0 .2 7 IJK L 2. 45 ± 0 .0 5 BC 3. 62 ± 0 .0 9 AB CD 6. 42 ± 0 .1 4 AB 4. 88 ± 0 .3 1 EF 2. 83 ± 0 .1 1 AB C 4. 11 ± 0 .0 9 AB 4. 92 ± 0 .0 2 BC DE 4. 63 ± 0 .2 2 GH IJ 4. 14 ± 0 .0 9 BC DE F 3. 02 ± 0 .0 4 AB 5. 18 ± 0 .0 9 CD EF GH I C. sa �v a 10 44 4. 54 ± 0 .0 8 CD 9. 49 ± 0 .1 3 AB 8. 87 ± 0 .1 0 BC DE 8. 67 ± 0 .5 5 AB CD E 17 .3 1 ± 0. 46 AB C 4. 98 ± 0 .0 9 JK L 2. 46 ± 0 .0 4 BC 3. 53 ± 0 .1 7 BC DE 6. 10 ± 0 .0 8 BC DE FG 5. 32 ± 0 .2 3 AB CD EF 2. 67 ± 0 .1 6 AB CD EF 3. 89 ± 0 .0 5 DE F 5. 04 ± 0 .1 5 AB CD E 4. 70 ± 0 .1 3 EF GH IJ 4. 03 ± 0 .0 6 DE F 2. 93 ± 0 .0 9 B 5. 48 ± 0 .2 4 AB C C. sa �v a 10 62 4. 46 ± 0 .0 6 D 9. 76 ± 0 .2 6 AB 8. 67 ± 0 .0 9 CD E 8. 74 ± 0 .7 5 AB CD 17 .2 6 ± 0. 18 AB C 4. 75 ± 0 .0 6 L 2. 43 ± 0 .0 6 BC 3. 75 ± 0 .0 4 AB C 6. 38 ± 0 .0 7 AB C 4. 96 ± 0 .1 3 CD EF 2. 67 ± 0 .1 7 AB CD EF 3. 95 ± 0 .0 7 AB CD E 5. 13 ± 0 .0 8 AB CD 4. 43 ± 0 .0 6 J 3. 91 ± 0 .0 9 EF 2. 97 ± 0 .0 6 AB 5. 80 ± 0 .1 1 A C. sa �v a 10 63 4. 95 ± 0 .1 9 AB 9. 60 ± 0 .4 4 AB 9. 61 ± 0 .2 9 BC D 7. 18 ± 0 .3 4 IJ 16 .3 2 ± 0. 25 CD E 5. 98 ± 0 .2 9 CD EF G 2. 44 ± 0 .0 5 BC 3. 42 ± 0 .0 8 DE 5. 82 ± 0 .0 8 GH IJK 5. 74 ± 0 .2 1 AB 2. 55 ± 0 .0 7 DE FG 3. 81 ± 0 .0 2 DE FG H 4. 89 ± 0 .1 5 BC DE 5. 15 ± 0 .1 5 AB CD E 4. 46 ± 0 .1 8 AB C 3. 03 ± 0 .0 6 AB 5. 05 ± 0 .0 6 DE FG HI C. sa �v a 16 62 4. 53 ± 0 .1 0 CD 9. 55 ± 0 .1 8 AB 8. 85 ± 0 .0 9 BC DE 8. 83 ± 0 .7 3 AB C 17 .1 7 ± 0. 31 AB C 4. 97 ± 0 .0 9 JK L 2. 37 ± 0 .0 7 BC 3. 55 ± 0 .0 3 BC DE 6. 23 ± 0 .1 1 AB CD EF 5. 32 ± 0 .1 7 AB CD EF 2. 69 ± 0 .1 8 AB CD EF 3. 91 ± 0 .0 6 CD EF 4. 96 ± 0 .0 8 AB CD E 4. 68 ± 0 .1 0 FG HI J 3. 84 ± 0 .1 4 F 2. 92 ± 0 .0 6 B 5. 62 ± 0 .1 0 AB 1 % AA (w /w ) = m g of sp ec ifi c a m in o ac id d iv id ed b y th e to ta l r ec ov er ed m g (s um o f 1 9 re co ve re d am in o ac id s - tr yp to ph an n ot d et er m in ed ) m ul �p lie d by 1 00 . 2 M ea n ± SD (n =3 ). M ea ns w er e ca lcu la te d us in g a ne st ed m ixe d m od el u sin g th e M ax im um Li ke lih oo d (R EM L) m et ho d. Le �e rs th at d iff er w ith in a co lu m n w er e sig ni fic an tly d iff er en t u sin g Tu ke ys H SD te st (P <0 .0 5) . Le �e rs w hi ch d iff er w ith in a co lu m n ar e sig ni fic an tly d iff er en t P <0 ,0 5. T uk ey s H SD te st . 3 H ig he st v al ue , L ow es t V al ue Planta (2022) 256:93 1 3 93 Page 10 of 23 orthologue was not found on any of the C. sativa G3 chro- mosomes, while single orthologues of AtCRB and AtCRC were found on each of the three sub-genomes. The phyloge- netic analysis also revealed genes encoding a fourth type of cruciferin in C. sativa, hereafter referred as CsCruD, which was most similar to the cruciferin encoded by the A. thaliana At1g03890 locus. Single CsCruD orthologues were found on each of the C. sativa sub-genomes, each linked in tandem to a CsCruB gene, which is similar to the arrangement in the A. thaliana genome. Vicilin is a cupin-domain protein similar in structure to cruciferin. In total, eight genes encoding vicilin-like pro- teins were identified in the C. sativa DH55 genome (Suppl. Table S4). Phylogenetic analysis revealed that five of the C. Fig. 2 Seed protein profiles from C. sativa accessions. Virtual digital gels (left-hand side) and traces (right-hand side) were generated by capil- lary electrophoretic separation of total seed protein under reducing (a) and non-reducing (b and c) conditions. a and b C. sativa lines represent- ing the three main profiles (profile 1—CN113733, profile 2—CN30477 and profile 3— CN111331). Arrows denote differences between profiles. c Variation among four lines exhibiting seed protein profile 1 Planta (2022) 256:93 1 3 Page 11 of 23 93 sativa vicilins formed two related subgroups that were most similar to the A. thaliana vicilin AtPAP85 (also known as vicilin 1); accordingly, these vicilins were denoted CsVic1A and CsVic1B (Fig. 3; Suppl. Fig. S2b). The CsVic1A sub- group contained homeologues from all three sub-genomes (Csa19g031870, Csa1g025880 and Csa15g039290), while the CsVic1B subgroup included a gene on G3 (Csa15g039300) and a gene on G2 (Csa01g025890), but was missing a G1 homeologue. The two tandem Vic1 genes on G2 represent both subgroups, as did the two tandemly linked genes on G3. The remaining vicilin genes (Csa07g016060, Csa16g016660 and Csa05g038120) were most similar to A. thaliana vicilin AtVCL22 (denoted herein as vicilin 2) with homeologues present on each of the three C. sativa sub-genomes. The original annotation of the C. sativa DH55 genome identified five genes encoding the 2S albumin, napin (Kagale et al. 2014); however, a transcriptomic study indicated that as many as eight genes might exist (Nguyen et al. 2013). As this did not correspond with the expectation of gene num- ber based on the genomic prediction, the assembly of the genomic regions containing the napin genes was re-exam- ined. This revealed that three of the genes that had been previously annotated as single genes by Kagale et al. (2014) were in fact closely related genes linked in tandem and had been misassembled. In agreement with the previous tran- scriptomic study, eight genes encoding napin were identified after separation of the tandem genes, four of which were in Table 3 Protein content in meal from C. sativa lines with various seed protein profiles 1 Mean ± SE (n = 4, except for CN111331 where n = 3) 2 Letters denote significant differences (P = 0.05). Tukey–Kramer comparison for least squares means Species Protein Profile Line Protein1 (%) SE Sig- nificance category2 C. sativa 1 CN113733 53.71 0.24 A CN30476 47.27 0.66 C 2 CN30477 49.55 0.74 BC CN45816 51.77 0.25 AB 3 CN111331 43.26 0.39 D CN114265 51.44 0.97 AB Table 4 Amino acid content in meal from C. sativa lines with various seed protein profiles 1 %AA (w/w) = mg of specific amino acid divided by the total recovered mg (sum of 19 recovered amino acids–tryptophan not determined) mul- tiplied by 100 2 Mean ± SD (n = 4 except for CN111331 where n = 3). Letters within a row denote significant differences (P = 0.05). Tukey–Kramer comparison for least squares means Amino Acid Amino acid content (% w/w) per accession1,2 Average Seed protein profile 1 Seed protein profile 2 Seed protein profile 3 CN113733 CN30476 CN30477 CN45816 CN111331 CN114265 Alanine 4.74 ± 0.12 B 4.89 ± 0.08 A 4.72 ± 0.06 BC 4.61 ± 0.11 C 5.01 ± 0.11 A 4.63 ± 0.11 BC 4.76 ± 0.16 Arginine 9.82 ± 0.29 AB 9.46 ± 0.32 C 9.59 ± 0.16 BC 9.98 ± 0.31 A 9.56 ± 0.24 BC 9.78 ± 0.31 ABC 9.69 ± 0.32 Aspartate/ Asparagine 9.45 ± 0.14 AB 9.4 ± 0.24 AB 9.59 ± 0.11 A 9.26 ± 0.45 BC 9.49 ± 0.21 AB 9.09 ± 0.22 C 9.38 ± 0.29 Cysteic Acid 3.46 ± 0.21 B 3.38 ± 0.28 B 3.13 ± 0.27 B 3.37 ± 0.62 B 3.27 ± 0.32 B 3.95 ± 0.58 A 3.44 ± 0.48 Glutamate/ Glutamine 17.68 ± 0.33 BC 17.93 ± 0.21 B 17.89 ± 0.12 B 18.63 ± 0.52 A 17.45 ± 0.47 C 17.98 ± 0.3 B 17.93 ± 0.46 Glycine 5.17 ± 0.03 C 5.41 ± 0.05 B 5.5 ± 0.05 B 5.49 ± 0.06 AB 5.64 ± 0.18 A 5.53 ± 0.18 AB 5.45 ± 0.17 Histidine 2.73 ± 0.06 A 2.69 ± 0.07 AB 2.61 ± 0.06 BC 2.67 ± 0.04 AB 2.55 ± 0.1 C 2.66 ± 0.11 AB 2.66 ± 0.09 Isoleucine 3.77 ± 0.09 B 3.71 ± 0.09 B 4.05 ± 0.09 A 3.81 ± 0.16 B 3.77 ± 0.08 B 3.72 ± 0.11 B 3.81 ± 0.16 Leucine 6.93 ± 0.14 AB 6.83 ± 0.11 B 7.04 ± 0.12 A 6.85 ± 0.12 B 6.85 ± 0.14 B 6.85 ± 0.13 B 6.9 ± 0.14 Lysine 5.81 ± 0.08 A 5.86 ± 0.09 A 5.42 ± 0.1 B 5.55 ± 0.07 B 5.8 ± 0.14 A 5.52 ± 0.16 B 5.66 ± 0.21 Methionine 1.84 ± 0.16 AB 1.77 ± 0.16 B 1.86 ± 0.18 AB 1.75 ± 0.2 B 1.85 ± 0.2 AB 2.02 ± 0.27 A 1.85 ± 0.21 Phenylalanine 4.36 ± 0.07 AB 4.37 ± 0.15 AB 4.42 ± 0.05 A 4.26 ± 0.15 B 4.36 ± 0.16 AB 4.33 ± 0.13 AB 4.36 ± 0.13 Proline 5.53 ± 0.09 A 5.39 ± 0.04 B 5.26 ± 0.06 C 5.46 ± 0.16 AB 5.55 ± 0.13 A 5.49 ± 0.11 AB 5.44 ± 0.14 Serine 4.57 ± 0.09 C 4.78 ± 0.09 A 4.59 ± 0.09 BC 4.52 ± 0.13 C 4.71 ± 0.07 AB 4.54 ± 0.09 C 4.62 ± 0.13 Threonine 3.89 ± 0.05 ABC 3.98 ± 0.11 A 3.94 ± 0.06 AB 3.81 ± 0.13 BC 3.95 ± 0.07 AB 3.81 ± 0.1 C 3.9 ± 0.11 Tryptophan 1.38 ± 0.07 A 1.23 ± 0.08 B 1.32 ± 0.08 AB 1.25 ± 0.13 B 1.25 ± 0.09 AB 1.31 ± 0.14 AB 1.29 ± 0.11 Tyrosine 3.2 ± 0.04 C 3.28 ± 0.04 AB 3.35 ± 0.02 A 3.18 ± 0.12 C 3.23 ± 0.07 BC 3.21 ± 0.06 C 3.25 ± 0.08 Valine 5.67 ± 0.11 AB 5.64 ± 0.16 AB 5.74 ± 0.1 A 5.55 ± 0.19 B 5.69 ± 0.1 AB 5.55 ± 0.18 B 5.64 ± 0.15 Planta (2022) 256:93 1 3 93 Page 12 of 23 a cluster on G1 and four in a cluster on G3. Phylogenetic analysis revealed that the C. sativa napins were most similar to AtSESA1, AtSESA2, AtSESA3, and AtSESA4, which are also closely related and linked in tandem on A. thaliana chromosome 4, and distinct from the other A. thaliana napin, AtSESA5, which is encoded by a gene on chromosome 5 (Fig. 3; Suppl. Fig. S2c). Each of the eight C. sativa proteins could be paired to one of the eight napins reported in the earlier transcriptomic study (Nguyen et al. 2013) (Suppl. Fig. S3); however, the C. sativa proteins were renamed according to their genomic locations as per cruciferin and vicilin (Suppl. Table S4). No genes encoding napin were found on G2. The first two genes in the tandem series on G1 (Csa11g017000) and G3 (Csa12g024720) appear to be homeologues based on phylogenetic analysis; however, the other paralogues in each tandem series appear to have arisen through separate gene duplication events (Suppl. Fig. S2c) and the fact that there are four in each cluster appears to be coincidental. Oleosins possess hydrophilic and hydrophobic domains that allow them to organise storage triglycerides into the oil bodies commonly found in cells of oilseed embryos. In total, 12 C. sativa genes were found to encode oleosins comprising three homeologues related to each of the genes encoding the four major A. thaliana oil body-associated oleosins (OLEO1 to 4) (Fig. 3; Suppl. Fig. S2d). C. sativa orthologues of members of the extended oleosin-like family were also identified. Temporal expression of C. sativa seed storage protein genes through seed development RNA-Seq analysis was conducted with C. sativa DH55 developing bolls from anthesis to seed maturity (40 days) to ascertain the expression profile of genes encoding seed proteins (Table 5). Transcripts derived from all of the genes encoding the two major seed storage proteins, namely cruciferin (12) and napin (8), were identified. Both sets exhibited similar expression patterns with a sharp increase in expression detected between 8–12 days after anthesis (daa) and a sharp decline between 28–32 daa. Members of the tandem napin clusters on G1 and G3 differed greatly in their levels of expression, but not in their temporal patterns. The three homeologous genes Fig. 3 Phylogenetic analysis of major C. sativa seed proteins. Maximum likelihood trees were constructed using the best substitution model for each data set with 500 bootstrap iterations. Numbers beside nodes indicate percentage of trees agreeing with the consensus Planta (2022) 256:93 1 3 Page 13 of 23 93 encoding cruciferin CsCruD isoforms were expressed at lower levels than those encoding CsCruA, CsCruB or CsCruC suggesting that CsCruD may contribute less to overall seed protein composition. There was some evi- dence for genome partitioning with respect to the level of expression of homeologous genes encoding CsCruA, CsCruB or CsCruC. The expression of the homeologous genes encoding CsVic1A on G1 (Csa19g031870), G2 (Csa01g025880) and G3 (Csa15g039290) increased sharply at 12 daa and high levels of transcripts were detected throughout seed devel- opment. The expression of the gene encoding CsVic1B on G3 (Csa15g039300) increased more gradually until 28 daa before declining sharply, while few transcripts were detected from its homeologous partner on G2 (Csa01g025890). Temporal patterns were also apparent in the expression of genes encoding oleosins. In general, the expression of genes encoding oleosins increased between 8 and 12 daa, though those encoding CsOle-1 were induced slightly earlier. Tran- script levels from homeologous genes encoding CsOle-3 declined after 20 daa, while the expression of genes encod- ing CsOle-1, CsOle-2 and CsOle-4 remained elevated or continued to increase until the seeds were mature (40 daa). Of the other proteins known to contribute to seed protein composition, many genes encoding dehydrins or members of various late embryo abundant (LEA) protein families were also expressed at high levels during the later stages of seed development as expected (Suppl. Table S5). Comparison of the high molecular weight proteome in diverse C. sativa accessions The feature that most distinguished the C. sativa accessions was a high molecular weight region (49–55 kDa) appearing under non-reducing conditions, therefore, proteomics analy- sis of this region was conducted with two lines representing each of the three major seed protein profiles observed under non-reducing conditions, namely Profile 1 (CN113733 and CN30476), Profile 2 (CN30477 and CN45816) and Profile 3 (CN111331 and CN114265) (Suppl. Fig. S4). As expected, the most abundant proteins within this fraction were cruci- ferins (Suppl. Table S6) of which all four types were rep- resented. Across all lines, CsCruA (MW 52 kDa) was the most abundant cruciferin and approximately three times more so than CsCruB (MW 51 kDa). The level of CsCruD (MW 50 kDa) was low, but relatively similar among the lines, while the amount of CsCruC (MW 55 kDa) varied extensively. Higher levels of CsCruC were present in line CN45816, while lines CN113733 and CN111331 had 10–12 times less. The relative abundance of the cruciferin isoforms did not fully explain the differences in protein profiles in this region; however, other proteins of similar MW were found in this fraction, including a group of nitrile specifier proteins that were even more abundant than CsCruD. Comparison of the seed transcriptome in diverse C. sativa accessions To examine the genetic basis underlying the different seed protein profiles among the C. sativa accessions, RNA-Seq analysis (Suppl. Table S7) was also conducted with these lines. Lines from the same seed protein profile groups did not exhibit seed protein gene expression patterns that were indicative of a specific group, although differences in tempo- ral patterns could not be evaluated since bolls from all stages of development were pooled in this experiment. Genetic var- iation existed in the overall patterns between the lines and in comparison to the collective profile for C. sativa DH55 which has an electrophoretic protein profile similar to Profile 3 (Table 5). The napin genes encoding CsNap-1, CsNap-3, CsNap-4 on the G1 and G3 sub-genomes were expressed at the high- est levels, while genes encoding CsNap-2 were expressed at appreciably lower levels (ca. 10–50%) than the other CsNap genes in all of the lines. This pattern was similar to that observed with C. sativa DH55, although in this line CsNap- 1-G1 was expressed at a lower level and CsNap-2-G1 at high levels. Notably, in DH55 CsNap-2-G3 was induced much later and for a shorter period of time than the other napin genes (Table 5), which may have also contributed to the lower overall transcript levels in the other C. sativa lines. The expression pattern of genes encoding CsCruA and CsCruB was similar in all C. sativa lines, including DH55. CsCruA-2-G1 and CsCruA-1-G2 were expressed at compa- rable levels and approximately twice that of CsCruA-1-G1, while the expression of the CruB genes was in the following order, CsCruB-1-G3 > CsCruB-1-G1 > CsCruB-1-G2. The pattern of CruC expression was markedly different between the lines. CN45816 and DH55 (Table 5; Suppl. Table S7) exhibited very high levels of CsCruC-1-G3 expression (in fact, the highest of all of the cruciferin genes), high levels of CsCruC-1-G1 expression and lower levels of CsCruC-1-G2 expression. Conversely, CN30476 and CN114265 expressed mainly CsCruC-1-G3 and only at lower levels, while the CN113733, CN30477 and CN111331 possessed few or no CruC transcripts. As in DH55, the expression of genes encoding CsCruD was also low in the other C. sativa lines when compared to genes encoding CsCruA and CsCruB. Proteomic analysis of the high MW protein region, of which cruciferin was the most abundant member, confirmed these patterns (Suppl. Table S6). The expression of vicilin genes was similar to DH55 with higher levels of expression detected from genes encoding CsVic1A and with comparatively little contribution from those encoding CsVic1B. The genes encoding CsVic2 on Planta (2022) 256:93 1 3 93 Page 14 of 23 sub-genomes G1 and G3 were expressed at approximately 30% the level of the genes encoding CsVic1A, with the CsVic2 gene on G2 contributing few transcripts. Genes encoding oleosins CsOle-1, CsOle-2 and CsOle-4 were expressed at higher levels than those encoding CsOle-3. This was similar to the pattern in DH55, though it should be noted that expression of genes encoding CsOle-3 declined as seed development progressed, while the expression of genes encoding the other oleosins continued to increase throughout (Table 5). Structural diversity of C. sativa cruciferins In its natural form, cruciferin exists as a hexamer with a stochastic composition dependent on the availability of indi- vidual protomers (subunits). The functional properties of cruciferin are, therefore, an average of the functional proper- ties of the subunits contributing to the whole. As variation was observed in the expression of genes encoding CruC and in actual cruciferin composition in the meal, the structure and potential functional properties of C. sativa cruciferins were examined. Homology models of C. sativa cruciferins representing each of the four main classes (CsCruA, CsCruB, CsCruC and CsCruD) were constructed using the B. napus procru- ciferin (Cru2/3a, PDB 3KGL) as a template (Fig. 4: Suppl. Fig. S5). The C. sativa cruciferins had a reasonable degree of sequence identity with the B. napus template: 86.9% (CsCruA), 74.3% (CsCruB), 61.6% (CsCruC) and 51% (CsCruD). The difference between CsCruC and the tem- plate was largely attributed to an extended hypervariable region (HVR) II (Fig. 5; Suppl. Fig. S6), while CsCruD is phylogenetically distinct from the other cruciferins. None- theless, each of the C. sativa cruciferins possessed a highly conserved core structure consisting of two jelly roll β-barrels and two extended helix regions comprised of 27 β-sheets, six α-helices and three 310-helices, which is typical of cupin domains associated with 11S and 7S globulins (Tandang- Silvas et al. 2010). The HVR regions cannot be resolved by crystallography as they do not possess ordered secondary structures, such as β-sheets or α-helices, and likely form loops protruding from the core (Adachi et al. 2001; Tan- dang-Silvas et al. 2010). To account for this, the energy min- imization approach used by Withana-Gamage et al. (2011) to model A. thaliana cruciferin loops was employed; how- ever, models were first constructed for those loops that had a similar modelled loop in the Scan Loop Data Base. The DaReUS-Loop server was used to construct loops for those without an acceptable template in the database. Only then were stereochemical alterations made to minimise energy based on the GROMOS 96 force field calculations. Several parameters indicated that the C. sativa cruciferin models were of high quality and geometrically correct (Suppl. Table S8). G-factor scores based on torsion angles and cova- lent bond geometry ranged from − 0.09 to − 0.16 which was well within the generally regarding acceptable value range of 0 to − 0.5. Ramachandran plots showed that the sum of the percentage of residues in the core, allowed and additionally- allowed regions was 100% for CsCruA, CsCruB and CsCruD and 99.96% for CsCruB. Qualitative Model Energy ANalysis (QMEAN) scores, a composite measure of several geomet- ric parameters (Benkert et al. 2008) with 0 considered as a good model and values < − 0.4 generally considered poor, ranged from − 0.98 to − 0.27. Z-scores, a measure of overall model quality based on the deviation of the total energy of the structure with respect to an energy distribution derived from random conformations (Benkert et al. 2011), ranged from 6.73 to 7.11. These scores were similar to models of A. thaliana cruciferins (Withana-Gamage et al. 2011) and within the range observed for models of proteins of similar size. RMSD derived by superimposing the C. sativa crucif- erin models on the template indicated close alignment of the backbone with RMSD values all below 0.5 Å. Alignment of the C. sativa cruciferins indicated a high degree of variability between CsCruA, CsCruB, CsCruC and CsCruD in each of the five HVRs (Fig. 5; Suppl. Fig. S6); these are also referred to as disordered regions due to their inability to be modelled or resolved by crystallographic methods (Adachi et al. 2001, 2003). HVR-I and HVR-V reside at the amino- and carboxy-terminus of the mature cruciferin, respectively, with the differences between the C. sativa cruciferin types attributed to amino acids with vari- ous properties. The three major solvent-exposed loops are represented by HVR-I, HVR-III (a.k.a. the extended loop region) and HVR-IV and were replete with charged (gluta- mate, arginine and lysine) and polar (asparagine, glutamine and serine) amino acids. The CsCruC paralogues had the longest HVR-II regions, although this was much shorter than that found within A. thaliana CruC (Suppl. Fig. S6). Hexamer formation proceeds by interaction of the interchain disulphide bond-containing (IE) faces of two trimers after proteolytic processing at the β-cleavage site (Fig. 5) which permits movement of HVR-IV to the periphery of the pro- tein and exposes the trimer-interacting regions. The four trimer-interacting regions were highly conserved in CsCruA, CsCruB and CsCruC (Fig. 5); however, several differences were noted in CsCruD, in particular in polar and charged residues important for hydrogen bond and ionic interactions between the trimers (Adachi et al. 2001, 2003; Tandang-Sil- vas et al. 2010). This suggests that while CsCruD may form trimers, its participation may lead to hexamers with less stable structures. HVR-II and HVR-V remain on the IE face and their high degree of variability contributes to the lower degree of evolutionary conservation, as well as variation in electrostatic potential and hydrophobicity (Fig. 4) which Planta (2022) 256:93 1 3 Page 15 of 23 93 may also influence the stability of trimer-trimer interactions. Additional cysteine residues not predicted to be involved in inter- or intrachain disulphide bond formation were present in CsCruB and CsCruC (Fig. 5; Suppl. Fig. S6), which could promote interactions with other proteins/molecules or inter- subunit disulphide bond exchanges (Shimada et al. 1980; Inquello et al. 1993). In the context of functional properties (i.e. the prop- erties that proteins confer in multi-component systems), the physicochemical properties of native cruciferin are directly related to the nature of the surface-exposed resi- dues (Withana-Gamage et al. 2013a, 2013b, 2015, 2020). CsCruD had the highest percentage of negatively charged amino acids (11.1%; total net charge − 14) and the lowest isoelectric point (4.99) of the C. sativa, A. thaliana and B. napus cruciferins (Table 6). CsCruD had a grand aver- age hydropathicity (GRAVY) value of − 0.375, making it the least hydrophilic of all the cruciferins examined, Fig. 4 Structural modelling, evolutionary conservation, surface hydrostatic potential, surface hydrophobicity and predicted phos- phorylation of C. sativa cruciferins. Structural modelling panel: yel- low = β-sheet, red = α-helix and green = loops. IE–interchain interact- ing face. IA–intrachain interacting face. One representative from each cruciferin type is shown: CsCRA (CRA-1-G1), CsCRB (CRB-1-G1), CsCRC (CRC-1-G1) and CsCRD (CsCRD-1-G1) Planta (2022) 256:93 1 3 93 Page 16 of 23 while CsCruC was the most hydrophilic cruciferin (GRAVY = −  0.627) and was comparable to A. thali- ana CruC. This suggests that CsCruC would be the most soluble in aqueous solution, while CsCruD would be the least soluble. The spatial arrangement of hydrophilic and hydrophobic residues on the exposed surfaces was also markedly different for the cruciferin types. The intrachain disulphide bond-containing (IA) faces of CsCruB and CsCruC had negatively charged peripheries with a posi- tively charged central region, while the IA face of CsCruD was dominated by negatively charged amino acids (Fig. 4). As expected, the IA face of all cruciferins were generally hydrophilic; however, in CsCruA, CsCruB and CsCruC, hydrophobic residues tended to occur in small clusters, while those in CsCruD were more evenly distributed across its surface (Fig. 4). Phosphorylation of cruciferin was first noted in A. thali- ana (Wan et al. 2007) and now appears to be a general occurrence in seed and vegetative storage proteins (Mouzo et al. 2018). Phosphorylation of serine, threonine or tyrosine was predicted to occur on 23–37 residues in the C. sativa cruciferin forms (Fig. 5; Suppl. Fig. S6; Suppl. Table S9). These were predicted to occur within the core structure, on the IE face and on the surface (IA face, periphery and in solvent accessible cavities) (Fig. 4) indicating that this post-translational modification may influence protein fold- ing, subunit interactions, as well as surface-active properties. An important property for proteins used as food ingredi- ents is their ability to bind/sequester small molecules, such as pigments and flavours. This is related to number, size and chemical properties of pockets in the tertiary and quaternary structure that are accessible to the solvent. The total number of pockets (1.4 Å probe) in the C. sativa cruciferin trimers ranged from 214 (CsCruD) to 260 (CsCruC) (Table 6). A larger central pocket forms when the protomers associate to form the trimer and is accessible via an opening on the Fig. 5 Alignment and features associated with C. sativa cruciferins Planta (2022) 256:93 1 3 Page 17 of 23 93 Table 5 Expression of C. sativa cv. DH55 genes encoding seed storage proteins Gene Protein Gene expression (normalized transcripts per million) at various days post-anthesis Scale 4 8 12 16 20 24 28 32 36 40 0 Csa11g017000 CsNap-1-G1 20 73 4,357 4,672 5,844 2,932 4,672 70 52 127 100 Csa11g017005 CsNap-2-G1 51 157 14,291 14,291 10,198 24,333 17,574 509 565 711 500 Csa11g017010 CsNap-3-G1 104 109 25,510 24,333 20,372 14,291 24,333 90 155 154 1,000 Csa11g017020 CsNap-4-G1 80 63 20,372 20,372 17,574 17,574 20,372 51 93 97 5,000 Csa12g024720 CsNap-1-G3 43 509 17,574 11,281 11,281 11,281 14,291 199 264 308 10,000 Csa12g024725 CsNap-2-G3 4 44 2,540 3,248 4,672 1,779 12,490 14 32 27 20,000 Csa12g024730 CsNap-3-G3 70 167 24,333 25,510 24,333 20,372 25,510 61 68 120 30,000 Csa12g024735 CsNap-4-G3 102 156 30,695 30,695 25,510 25,510 30,695 335 402 446 Csa11g070580 CsCruA-1-G1 13 22 3,768 5,844 6,486 3,768 2,642 110 130 110 Csa11g070590 CsCruA-2-G1 31 14 10,507 10,507 10,507 10,507 10,507 601 647 507 Csa18g009670 CsCruA-1-G2 41 17 10,198 12,490 14,291 12,490 10,198 396 424 390 Csa14g004960 CsCruB-1-G1 20 23 3,036 6,486 6,875 4,672 4,357 130 140 126 Csa03g005050 CsCruB-1-G2 3 1 1,110 2,540 3,469 1,261 683 22 19 16 Csa17g006950 CsCruB-1-G3 23 32 4,672 8,181 8,181 8,181 6,486 32 58 45 Csa11g015240 CsCruC-1-G1 34 13 6,875 10,198 12,490 10,198 98 330 354 207 Csa10g014100 CsCruC-1-G2 11 41 2,642 3,469 5,407 1,590 97 53 56 59 Csa12g021990 CsCruC-1-G3 68 69 12,490 17,574 30,695 30,695 678 1,251 1,261 720 Csa14g004970 CsCruD-1-G1 1 11 1,178 2,642 2,138 717 781 3 4 7 Csa03g005060 CsCruD-1-G2 0 1 337 730 533 123 180 2 3 0 Csa17g006960 CsCruD-1-G3 4 20 892 1,752 2,540 1,296 892 9 12 30 Csa11g019460 CsOle1-1-G1 10 251 2,138 3,600 3,348 3,600 3,469 2,389 2,276 2,138 Csa10g017840 CsOle1-1-G2 16 483 3,348 5,407 4,357 4,869 8,181 3,469 4,357 5,844 Csa12g028090 CsOle1-1-G3 22 346 2,932 4,869 4,093 6,486 5,844 4,672 6,486 6,875 Csa11g057650 CsOle2-1-G1 7 28 1,419 2,430 2,199 2,642 1,972 3,348 3,469 2,430 Csa10g047190 CsOle2-1-G2 5 51 1,251 1,694 1,590 1,844 1,694 1,296 1,752 1,538 Csa12g079570 CsOle2-1-G3 13 52 1,844 2,932 2,752 3,248 4,093 11,281 11,281 10,507 Csa11g082710 CsOle3-1-G1 1 69 775 853 507 371 298 87 75 105 Csa18g022020 CsOle3-1-G2 1 110 885 879 439 196 375 69 62 39 Csa02g041750 CsOle3-1-G3 1 70 1,037 1,261 637 361 360 109 94 108 Csa04g015780 CsOle4-1-G1 6 3 853 1,972 2,642 2,138 2,701 2,752 3,036 3,469 Csa06g008780 CsOle4-1-G2 8 1 862 2,389 3,036 2,701 2,932 1,037 1,251 1,086 Csa09g014800 CsOle4-1-G3 8 0 950 2,701 3,114 2,430 3,036 1,261 1,296 1,280 Csa19g031870 CsVic1A-1-G1 5 1 432 755 2,276 2,752 2,540 4,357 3,248 3,768 Csa01g025880 CsVic1A-1-G2 5 2 178 405 1,678 3,036 2,138 10,198 6,875 10,198 Csa15g039290 CsVic1A-1-G3 3 2 165 359 1,186 2,540 1,635 5,407 4,672 4,869 Csa01g025890 CsVic1B-1-G2 23 10 1 0 0 0 2 0 0 2 Csa15g039300 CsVic1B-1-G3 4 15 43 307 747 631 717 29 31 52 Csa07g016060 CsVic2-1-G1 2 18 730 989 1,972 892 561 402 335 445 Csa16g016660 CsVic2-1-G2 0 0 40 43 50 25 17 26 11 8 Csa05g038120 CsVic2-1-G3 2 17 670 910 1,460 760 650 235 239 337 Planta (2022) 256:93 1 3 93 Page 18 of 23 IE face. The size of this pocket size is also a measure of packing efficiency. In homomeric form, CsCruB had the largest pocket volume (17,173.2 Å3), twice that of CsCruA (8709.5 Å3) and four-five times that of CsCruC (4178.7 Å3) and CruD (3070.3 Å3) (Table 6). The CruB central pocket was also the most accessible with a mouth opening area of 1856.0 Å2 with 15 individual openings (orientations through which a water molecule may pass). CsCruD had the small- est pocket volume and was also the least accessible with a mouth opening area of only 12.8 Å2 with one opening. CsCruC had a similar pocket volume with a wider mouth area 275.5 Å2; however, this was accessible by only a single opening. Discussion Current interest in C. sativa is mainly centred around oil and its use in bio-fuels (Li and Mupondwa 2014) or as a supplement in animal and fish feeds (Hixson et al. 2014; Hixson and Parrish 2014); however, utilisation of its meal protein (Colombini et al. 2014; Pekel et al. 2015; Hixson et al. 2016a, 2016b) will be necessary to achieve maximal commercial exploitation and valorization. C. sativa seed comprises about 43% protein (Zubr 2003), but little or noth- ing is known about other closely related Camelina species. The current study established that Camelina species exhibit different seed protein profiles and these differences can separate genotypes representing them. The percent protein in defatted meal also varied between species and less so between lines within the same species. Meal from C. micro- carpa had the lowest protein content, 31%, while meal from C. hispida hispida, C. laxa, C. rumelica transcapida and some C. rumelica rumelica and C. sativa lines all reached or Table 6 Properties of B. napus, A. thaliana and C. sativa cruciferins *Mr molecular weight, pI isoelectric point, total number of negatively charged residues (Asp + Glu), total number of positively charged residues (Arg + Lys), GRAVY–grand average hydropathy value according to Kyte and Doolittle (1982). Negative scores indicate increasing hydrophilic- ity, positive scores indicate increasing hydrophobicity Property* Cruciferin B. napus 3KGL A. thaliana C. sativa CRA CRB CRC CruA CruB CruC CruD Protomer Formula C2247H3515N 671O696S8 C2200H3442N 658O670S8 C2118H3322 N616O636S15 C2436H3814 N734O756S12 C2178H3408N 644O664S7 C2116H3310 N618O647S16 C2288H3610 N688O710S13 C1975H3057 N565O612S10 Amino acids 466 449 432 501 445 435 469 405 Mr (kDa) 51.3 50.1 48.1 55.9 49.5 48.3 52.5 44.8 pI 6.6 7.26 6.36 6.36 6.41 5.96 6.51 4.99 Negative residues 43 (9.2%) 45 (10.0%) 42 (9.7%) 45 (9.0%) 46 (10.3%) 40 (9.2%) 45 (9.6%) 45 (11.1%) Positive resi- dues 41 (8.8%) 45 (10.0%) 39 (9.0%) 42 (8.4%) 43 (9.7%) 34 (7.8%) 43 (9.2%) 32 (7.9%) GRAVY − 0.557 − 0.562 − 0.432 − 0.691 − 0.487 − 0.46 − 0.627 − 0.375 Total charge 0 − 2 − 5 − 2 − 5 − 8 − 1 − 14 Trimer Total pockets − 228 270 283 221 247 260 214 Central pocket volume (Å3) − 17,419.4 9959.7 5092.9 8709.5 17,173.2 4178.7 3070.3 Central pocket area (Å) − 10,024.4 6755.4 3133.1 4799.9 8821.3 2369.6 1741.2 Central pocket circumfer- ence (Å) − 896.1 449.1 218.4 251.9 733.9 86.4 14.4 Central pocket openings − 28 15 1 6 15 1 1 Central pocket mouth area (Å) − 1695.1 762.2 577.7 624.8 1856.0 275.5 12.8 Planta (2022) 256:93 1 3 Page 19 of 23 93 exceeded 40%. This is slightly higher than the 38% reported for canola meal, but less than the 46% for soybean meal (So and Duncan 2021), which have been bred for oil and protein content, respectively. Lysine and methionine are not synthesised de novo by animals and must be obtained from their diets. These are also limiting in wholly plant-based diets and are often added as supplements to feeds used for monogastric ani- mals, such as fish (Wilson and Halver 1986), poultry (Kidd et al. 1998) and swine (Brinegar et al. 1950). Meals derived from Cruciferous oilseeds generally have higher levels of lysine and methionine than cereals, with C. sativa exhib- iting a reasonably-balanced essential amino acid profile. Like protein content, amino acid content in the meal also varied between Camelina species. Lysine levels were low- est in meal from C. rumelica rumelica (4.77%) and highest in most C. sativa lines (up to 5.74% in line 1063). Histi- dine was highest in the meal from C. rumelica rumelica (4.77% in line 1034), almost twice that found in meals from any of the other Camelina species. Interestingly, the amino acid composition of the two major seed proteins, napin and cruciferin, would account for only about one-half of the total lysine and histidine (Suppl. Table S10) indicating that unincorporated/free amino acids or other proteins of lesser abundance are major contributors to the overall meal amino acid profile. Variation in meal amino acid composition was observed between lines within a species. Methionine and cysteine were highest in meal from C. rumelica rumelica lines 609 (2.89%) and 247 (9.32%), respectively, but lowest in C. rumelica rumelica line 1034. Serine content was high- est in meal from C. sativa line 605 meal (5.39%), but low- est in line 252 (4.43%). Threonine was also lowest in meal from C. sativa line 1662 (3.83%); however, other C. sativa lines exceeded 4.5% similar to other Camelina species. This analysis clearly demonstrates that variation among C. sativa lines and in related species exists, which could be accessed to develop lines producing meals with amino acid composi- tions that are better suited for monogastric diets. However, it remains to be demonstrated whether adequate levels of several or all limiting essential amino acids can be achieved in the same genetic background as regulatory mechanisms governing carbon/nitrogen partitioning may not permit this. With respect to essential amino acids, canola meal has com- parable levels of histidine (3.39%), isoleucine (3.47), leu- cine (6.19%), phenyalanine (4.06%), and threonine (4.27), slightly lower levels of lysine (5.92%), and lower levels of cysteine (2.29%), methionine (1.94%), tyrosine (2.50%) and valine (4.97) (Wanasundara et al. 2016) than were found in lines from the various Camelina species examined here. It should be noted that differences in analytical techniques must be considered in such comparisons and significant vari- ation in protein and amino acid content has been reported in canola meal from different crushing plants (Le Thanh et al. 2019). For the most part, variation in seed protein profile between C. sativa lines was limited in the 187 accessions examined, which is in keeping with genotypic analyses (Singh et al. 2015; Luo et al. 2019; Chaudhary et al. 2020). This may be attributed to the notion that C. sativa is a recent allopolyploid where most homeologous genes are expressed and little sub-genome fractionation has occurred (Kagale et al. 2014, 2016). Despite this, most of the lines could be placed into one of three classes based on differences in the electrophoretic profile of high molecular weight proteins consisting mainly of cruciferin. C. sativa possesses 12 genes encoding cruciferin, with each of the three sub-genomes hav- ing a contingent of homeologues (Kagale et al. 2014). The 12 C. sativa cruciferins are phylogenetically related to the four A. thaliana cruciferins, namely AtCRA (At5g44120), AtCRB (At1g03880), AtCRC (At4g28520), and AtCRD (At1g03890). A CRA orthologue is not present on any of the C. sativa sub-genome G3 chromosomes; however, a tandem duplication occurs on G1 chromosome 11 yielding CsCruA- 1-G1 (Csa11g070580) and CsCruA-2-G1 (Csa11g070590). Interestingly, the CsCruB and CsCruD paralogues are also closely linked on each of the sub-genomes, similar to that in A. thaliana, even though they are the two most distantly related cruciferins. This signature is suggestive of a dupli- cation event that occurred in a progenitor genome with suf- ficient time for divergence before the original triplication event that gave rise to the ancestor of both A. thaliana and C. sativa. It is especially interesting that this arrangement has been maintained through subsequent genome polyploidi- zation and fractionation events in C. sativa. The situation with the organisation of napin genes is equally compelling. The A. thaliana genome contains 5 genes encoding napin, four of these are linked in tandem on chromosome 4 and are closely related, while the fifth is present on chromosome 5. Camelina sativa also has two clusters of four napin genes, one on G1 and the other on G3; no napin genes occur on any G2 chromosomes. This arrangement, however, appears to be coincidental as phylogenetic comparisons between the genes within the A. thaliana and C. sativa napin clusters indicate that each evolved through a different duplicative route. When the napin gene or gene cluster was lost from G2 might be resolved by examination of genomes from other Camelina species (Chaudhary et al. 2020). Two genes encoding vici- lin 1 lie in tandem on both G2 and G3, while a single gene is present on G1. This genomic arrangement and phyloge- netic analysis suggest that these two sub-genomes are more closely related to one another than to G1, a notion which is supported by genotypic data (Chaudhary et al. 2020). RNA-Seq analysis of seven C. sativa lines revealed that the same homeologues/paralogues encoding napins, oleosins and vicilins were expressed and at similar levels; however, Planta (2022) 256:93 1 3 93 Page 20 of 23 the expression of cruciferin homeologues/paralouges dif- fered widely between lines in some instances. In the C. sativa type strain DH55, genes encoding cruciferins were mainly expressed from the 12th to the 28th day post-anthesis. The general pattern of expression according to transcript levels was CsCruC > CsCruA > CsCruB > CsCruD. This same rela- tive expression profile is also present in A. thaliana (TAIR; https:// www. arabi dopsis. org/) and, thus, appeared to be evo- lutionarily conserved and possibly of functional importance. However, upon examination of six additional C. sativa lines, only CN45816 shared this pattern with DH55. In the other five lines, genes encoding CsCruA and CsCruB contributed the majority of the transcripts with those encoding CsCruC and CsCruD providing only a minor component. These general patterns were confirmed by proteomic analysis. The differences in the abundance of cruciferin isoforms/types between the lines has significant consequences as cruciferin is the most abundant seed storage protein and, as such, is the principal contributor to the physiochemical and nutritional properties of meal protein. Cruciferin is a hexamer with the degree of heterogeneity determined by the stoichiometry of the various protomers. While this serves to homogenise the physiochemical properties of individual cruciferin types (Withana-Gamage et al. 2011, 2013a, 2013b, 2015, 2020), it is conceivable that C. sativa lines could be selected that produce meals or globulin isolates with properties suited to specific applications. Reduction in the expression of the entire napin gene family via RNA interference (Nguyen et al. 2013) and targeted disruption of homeologous genes encod- ing CsCruC (Lyzenga et al. 2019) have been successful in altering C. sativa seed protein composition and, by infer- ence, the physiochemical properties of the meal. Vicilins are similar to cruciferins in that they are bicupin-domain globulins; however, they remain as trimers similar to the 7S globulins in legumes (Shewry et al. 1995). In A. thaliana, the genes encoding vicilins 1 and 2 are expressed at low levels during seed development (TAIR; https:// www. arabi dopsis. org/) and these proteins likely contribute little to seed protein composition. Conversely, genes encoding CsVic1A were expressed at levels comparable to those encoding CsCruB and moreso than those encoding CsCruC in many of the C. sativa lines. Interestingly, neither the A. thaliana nor the C. sativa vicilin 2 proteins were predicted to contain a signal peptide and are, therefore, unlikely to be deposited within protein storage vacuoles. Given the sequence and structural similarity between A. thaliana and C. sativa cruciferin isoforms, it may be assumed that they share similar physiochemical properties. Cruciferins and other 11S/12S globulins contain two con- served β-barrel or cupin domains; however, the five hyper- variable regions confer different properties on individual isoforms (Tandang-Silvas et al. 2010). As noted with A. thaliana cruciferins (Withana-Gamage et al. 2011), HVR-I and HVR-III are located on the solvent-exposed surface of the IA face in the hexamer, while HVR-IV moves to the periphery after cleavage at the β-site. In both A. thaliana and C. sativa, CruC possesses an extended, glutamine-rich, HVR-II within the alpha subunit. In specialised A. thaliana lines producing homomeric cruciferins, AtCRC was found to form a compact and less hydrophobic hexamer than either homomeric AtCRA or AtCRB. This resulted in increased thermostability and reduced susceptibility to hydrolysis by pepsin, but altered its ability to form heat-induced gels and to stabilise oil-in-water emulsions (Withana-Gamage et al. 2013a, 2013b, 2015, 2020). Furthermore, reduced proteo- lytic susceptibility is one of several factors that contribute to the antigenic potential of cupin-like proteins (Mills et al. 2002) making elimination of CsCruC in C. sativa an attrac- tive goal (Lyzenga et al. 2019). Homomeric AtCRA and AtCRB formed strong heat-induced gels (Withana-Gamage et al. 2015) and possessed good ability to stabilise oil-in- water emulsions over a wide pH range (Withana-Gamage et al. 2020). Structural features that facilitate flavour or small molecule binding, such as the size of the central pocket and mouth opening (Guichard 2006), were most prominent in CsCruB followed by CsCruA. CsCruD has an unusual HVR- IV that is rich in arginine rather than glutamine residues as in other cruciferin types. Its IA face (solvent-exposed) is domi- nated by negatively charged amino acids with a more even distribution of hydrophobic residues suggesting that it may possess unique properties. CruD also presents an enigma. It is expressed at very low levels compared to genes encoding other cruciferins. It also possesses alterations in polar and charged residues important for interaction between trim- ers (Adachi et al. 2001, 2003; Tandang-Silvas et al. 2010), suggesting that it may destabilise hexamers when present. While this may seem counter-intuitive, seed storage proteins must be both stable and be rapidly mobilised during seed germination. Following imbibition, globulin mobilisation is achieved through the sequential hydrolysis of a limited number of internal sites by metallo-endopeptidases followed by a more general degradation by cysteine proteases (Muntz et al. 2001; Tan-Wilson and Wilson 2011). Slight structural instability introduced by CruD may assist in this process when this minor isoform is present and may explain why it remains in A. thaliana and C. sativa, as well as in other Brassicaceae. In conclusion, the wealth of information on seed protein diversity in Camelina species provided in this work will ini- tially be useful in breeding/engineering lines with higher protein content and amino acid profiles suitable for animal and, possibly, human diets. The plant protein industry is already moving in this direction and beyond, with particular interest in purified protein isolates, mainly albumins (napins) and globulins (cruciferins), for specific food applications (So and Duncan 2021). In the future, knowledge of the genes and https://www.arabidopsis.org/ https://www.arabidopsis.org/ https://www.arabidopsis.org/ Planta (2022) 256:93 1 3 Page 21 of 23 93 their expression patterns that underlie the protein profiles will permit the creation of specialised C. sativa lines that, for example, produce homogeneous cruciferins with prop- erties tailored to specific applications. Indeed, targeted dis- ruption of entire cruciferin gene families, notably CsCruC, has already been demonstrated in C. sativa (Lyzenga et al. 2019). It is only a matter of time before this is applied to other oilseed species. Author contribution statement DD, SM, IAP, AH and JW conceived, designed and funded the research. BG, MH and SP conducted experiments. CC analysed data. DD, IAP and JW wrote the manuscript. All authors read and approved the manuscript. Supplementary Information The online version contains supplemen- tary material available at https:// doi. org/ 10. 1007/ s00425- 022- 03998-w. Acknowledgements This work was funded by the Agriculture and Agri-Food Canada Canadian Crop Genomics Initiative and the Global Institute for Food Security. Funding Open Access provided by Agriculture & Agri-Food Canada. Declarations Conflict of interest Authors declare that they do not have any conflict of interest. Data availability statement The datasets generated during and/or analysed during the current study are deposited in publicly available repositories as indicated or available from the corresponding author upon reasonable request. Open Access This article is licensed under a Creative Commons Attri- bution 4.0 International License, which permits use, sharing, adapta- tion, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. References Adachi M, Takenaka Y, Gidamis AB, Mikami B, Utsumi S (2001) Crystal structure of soybean proglycinin A1aB1b homotrimer. J Mol Biol 305:291–305 Adachi M, Kanamori J, Masuda T, Yagasaki K, Kitamura K, Mikami B, Utsumi S (2003) Crystal structure of soybean 11S globu- lin: glycinin A3B4 homohexamer. Proc Natl Acad Sci USA 100:7395–7400 Almeada FN, Htoo JK, Thomson J, Stein HH (2013) Amino acid digestibility in camelina products fed to growing pigs. Can J Anim Sci 93:335–343 AOAC Method 972.43. (1997) Microchemical determination of car- bon, hydrogen, and nitrogen, automated method. In: Official methods of analysis of AOAC International, 16th edn. AOAC International, Arlington, VA, USA AACC Method 44–01.01. (1999) Calculation of percent moisture. In: Approved methods of analysis, 11th edition. AACC Inter- national, St. Paul, MN, USA. https:// doi. org/ 10. 1094/ AACCI ntMet hod- 44- 01. 01 AACC Method 46–18.01. (1999) Crude protein, calculated from per- centage of total nitrogen, in feeds and feedstuffs. In: Approved methods of analysis, 11th edition. AACC International, St. Paul, MN, USA. https:// doi. org/ 10. 1094/ AACCI ntMet hod- 46- 18. 01 AOAC Method 994.12. (2005) Amino acids in feeds: Performic acid oxidation with acid hydrolysis–sodium metabisulfite method. In: Official methods of analysis of AOAC International, 18th edition. AOAC International, Gaithersburg, MD, USA Ariza AE, Quezada N, Cherian G (2010) Feeding Camelina sativa meal to meat-type chickens: effect on production performance and tissue fatty acid composition. J Appl Poult Res 19:157–168 Barthet VJ, Daun JK (2004) Oil content analysis: myths and reality. In: Luthria DL (ed) Oi