Plastome analysis unveils Inverted Repeat (IR) expansion and positive selection in Sea Lavenders (Limonium, Plumbaginaceae, Limonioideae, Limonieae)

Abstract The genus Limonium, commonly known as Sea Lavenders, is one of the most species-rich genera of the family Plumbaginaceae. In this study, two new plastomes for the genus Limonium, viz. L. tetragonum and L. bicolor, were sequenced and compared to available Limonium plastomes, viz. L. aureum and L. tenellum, to understand the gene content and structural variations within the family. The loss of the rpl16 intron and pseudogenisation of rpl23 was observed. This study reports, for the first time, expansion of the IRs to include the ycf1 gene in Limonium plastomes, incongruent with previous studies. Two positively selected genes, viz. ndhF and ycf2, were identified. Furthermore, putative barcodes are proposed for the genus, based on the nucleotide diversity of four Limonium plastomes.


Introduction
The family Plumbaginaceae of the order Caryophyllales is highly diverse, rich in species and displays a cosmopolitan distribution with its maximum diversity in the temperate areas of the northern hemisphere (Kubitzki 1993). It is sister to Polygonaceae Chase et al. 2016) and further classified into two subfamilies: Limonioideae (formerly Staticoideae) and Plumbaginoideae (Lledó et al. , 2001Hernández-Ledesma et al. 2015). Limonioideae is further divided into two tribes, Limonieae (consisting of 24 genera) and the monotypic Aegialitideae, whereas Plumbaginoideae consists of four genera. Limonioideae is a sub-cosmopolitan group distributed mostly in the Mediterranean and Indo-Turanian regions, but a few genera have also diversified in the Southern Hemisphere. Limonium Mill., Acantholimon Boiss. and Armeria (DC.) Willd., all belonging to Limonioideae, are the most species-rich genera, comprising 80-90% of the species in Plumbaginaceae (Koutroumpa et al. 2018).
The genus Limonium Mill., popularly known as sea lavenders, belongs to the subfamily Limonioideae and tribe Limonieae (Kubitzki 1993;Lledó et al. 2005;Malekmohammadi et al. 2017). The genus is represented by ca. 600 species and is the sole genus of Plumbaginaceae exhibiting a sub-cosmopolitan distribution (Koutroumpa et al. 2018). The genus comprises several ornamental and medicinally-important species (Malekmohammadi et al. 2017). Limonium are herbs or shrubs growing in saline or metal-rich soils, mostly in coastal areas (Kubitzki 1993). Variation of reproductive systems (sexual and apomixis), as well as events like hybridisation and polyploidy, complicate the delimitation of most of the species of Limonium (Kubitzki 1993;Cowan et al. 1998;Palacios et al. 2000;Akhani et al. 2013;Róis et al. 2016).
Certain phylogenetic studies tried to resolve the relationships within Limonium at a global scale (Lledó et al. 2005); however, many others were confined to either a specific geographic area or some specific sections of Limonium (Palacios et al. 2000;Lledó et al. 2005;Lledó et al. 2011;Akhani et al. 2013;Róis et al. 2016). More recently, Malekmohammadi et al. (2017) tried to resolve the phylogenetic relationships within Limonium including 76 species of the Mediterranean region. They found two wellsupported clades for subgenera Limonium and Pteroclados (Boiss.) Pignatti, which confirmed the earlier findings by Lledó et al. (2005Lledó et al. ( , 2011. Their study was based on one nuclear and several plastid loci, but lacked comprehensive sampling. Koutroumpa et al. (2018) carried out a phylogenetic study of Plumbaginaceae which included 23 genera of the family with an emphasis on the genus Limonium (201 spp.), based on three chloroplast and one nuclear marker. The study again confirmed most of the previous molecular phylogenies and led to the proposal of new sections and altering some of the existing sections. However, taxonomic difficulties, diversity in reproductive modes and sub-cosmopolitan distribution of Limonium necessitate further studies on the genus.
With the advent of sequencing technologies, the availability of large genome-scale data has made it easier to understand phylogeny and detect polyploidy events (Gompert and Mock 2017;Thomas et al. 2017;McKain et al. 2018). Five plastomes have been sequenced for Plumbaginaceae thus far, viz. Limonium sinense (Girard) Kuntze , L. aureum (L.) Chaz. (Zhang et al. 2020), L. tenellum (Turcz.) Kuntze (Yao et al. 2019) and one each of Ceratostigma Bunge and Plumbago L. The recent phylogenomic study to understand the evolution of Caryophyllales incorporated plastome sequences of L. tenellum, Plumbago auriculata Lam. and Ceratostigma willmottianum Stapf (Yao et al. 2019). The study reported that all Plumbaginaceae members, except Limonium, exhibit expanded IR to accommodate ycf1. However, no studies have been carried out to understand and compare the structure, composition and evolution of the plastome within Plumbaginaceae and Limonium in particular.
The present study reports the plastome sequences of two Asian Limonium species, viz. L. tetragonum (Thunb.) Bullock and L. bicolor (Bunge) Kuntze and compares the structure, composition and diversity within the genus by combining them with other available plastomes. L. tetragonum is a biennial species characterised by a spicate inflorescence, yellow corolla, acute calyx with pink at the base, white in upper parts and distributed in Japan, Korea, New Caledonia and Primorye (Ohwi 1965;POWO 2021). Limonium bicolor is a perennial species, characterised by a paniculate inflorescence, yellow corolla, somewhat rounded calyx, pink to purplish at base, white in upper parts and distributed in China and Mongolia (Wu and Raven 1996; POWO 2021). The former is known for its anti-cancerous properties (Kong et al. 2008;Bae et al. 2016), while the latter is most studied for its salt glands Lu et al. 2020). An attempt has also been made to unravel the structural variations within Plumbaginaceae plastomes and propose molecular markers for easier discrimination of Limonium species.

Assembly and annotation
The raw reads obtained after Illumina sequencing were analysed using FastQC V0.11.7 (Andrews 2010) software to ensure the quality of the reads and Phred score. The assembly was carried out using NOVOPlasty V4.2 (Dierckxsens et al. 2017). Forward and reverse reads with a read length of 150 bp and insert size of 300 bp, as well as seed sequence (rbcL of L. tetragonum), were used as input. De novo, as well as referencebased assembly, were carried out. The plastome sequence of L. aureum (MN623109) was used as a reference for the assembly of both the plastomes. The assembly and orientation of the Inverted Repeats (IRs), Large Single Copy (LSC) and Small Single Copy (SSC) regions were confirmed by NCBI blast and graphic view using Geneious prime 2020.2.2 (https://www.geneious.com). The assembled plastomes were annotated using Geneious prime 2020.2.2 (https://www.geneious.com) and Geseq -Annotation of Organellar Genomes (Tillich et al. 2017). The transfer RNAs (tRNAs) were identified with tRNA-scan-SE (Lowe and Chan 2016). Graphical maps of both plastomes were visualised using OGDRAW (Lohse et al. 2013).

Comparative plastome analysis
All Plumbaginaceae plastomes available so far were included for comparison. Four plastomes of Limonium, viz. L. aureum (MN623109), L. tenellum (MK397871), L. tetragonum, L. bicolor and Ceratostigma willmottianum (MK397862), as well as Plumbago auriculata (MH286308), were included in the analysis. The plastome of L. sinense was not included in any of the analyses due to ambiguities observed in the assembly. The selected six plastomes were aligned using a Geneious prime 2020.2.2 plugin MAFFT v.7.450 (Katoh and Standley 2013). The contraction and expansion of IRs at the junction sites were examined and plotted using IRscope (Amiryousefi et al. 2018). Percentage sequence identity for all six plastomes was plotted using Mul-tiPIPMaker (http://pipmaker.bx.psu.edu/pipmaker/) considering L. tetragonum as a reference (Schwartz et al. 2003).

Identification of Simple Sequence Repeats (SSRs) and repeats
Simple Sequence Repeats across the four Limonium plastomes were detected using MISA online server (Beier et al. 2017). The minimum repeat units for mono-, di-, tri-, tetra-, penta-and hexanucleotide repeats were set as 10, 5, 4, 3, 3 and 3, respectively, whereas forward, reverse, palindromic and complementary repeats were identified using the programme REPuter (Kurtz et al. 2001). The threshold for repeat length was set to 30 with more than 90% similarity and the hamming distance was set to 3.

Nucleotide diversity
The four Limonium plastomes were aligned using Geneious prime 2020.2.2 plugin MAFFT v.7.450 (Katoh and Standley 2013). The aligned sequences were curated manually and, due to the presence of ambiguous ycf1 region in L. tenellum at IRb/ SSC junction, the sequences were trimmed for all four taxa before the analysis. Nucleotide diversity and sliding window analysis were performed using the software DnaSP v.6.12.01 (Rozas et al. 2017).

Codon usage
The percentage codon usage of protein-coding regions of all four Limonium plastomes was calculated using Geneious prime 2020.2.2. To examine the frequency and uniformity of Synonymous codon and codon biases, the Relative Synonymous Codon Usage (RSCU) was also determined in DnaSP v.6.12.01 software (Rozas et al. 2017).

Positive selection analysis
In order to detect protein-coding genes under selection in the genus Limonium, sequences of each gene were aligned using the MAFFT v.7.450 plugin of Geneious prime 2020.2.2. The aligned sequences were again manually checked for an end-toend alignment. The phylogenetic tree for each protein-coding gene was constructed using the FastTree plugin (Price et al. 2010) of Geneious prime 2020.2.2. The sitespecific model was performed using CODEML algorithms (Yang 1998) implemented in EasyCodeML software (Gao et al. 2019). Seven codon substitution models (M0, M1a, M2a, M3, M7, M8 and M8a) were investigated and compared to identify positively selected sites, based on likelihood ratio tests.

Phylogenomic analysis
A total of 39 plastome sequences, including six from Plumbaginaceae were considered as ingroups for phylogenomic analysis. The outgroup was composed of members of Amaranthaceae. All the plastome sequences were aligned using the MAFFT v.7.450 plugin of Geneious prime 2020.2.2. Maximum Likelihood analysis, based on the best fit model GTR+F+R5, was performed using IQtree 1.6.12-MacOSX (Nguyen et al. 2015). The best likelihood score for the consensus tree was -979751.011.

General features of the plastomes
The average organelle coverage for the plastomes of L. tetragonum and L. bicolor was 1014X and 1009X, respectively. Plastomes of L. tetragonum and L. bicolor exhibited a typical quadripartite structure (Fig. 1). Total plastome lengths of L. tetragonum and L. bicolor were 154,691 bp and 154,617 bp with LSC (84,568 bp and 84,541 bp), SSC (12,997 bp and 12,964 bp) and a pair of IRs (28,563 bp and 28,556 bp), respectively ( Table 1). The GC content of both plastomes was 37%. Both the plastomes exhibited 83 protein coding genes, 37 tRNA genes and four rRNA genes (duplicated in IR region) (Tables 1, 2). The assembled and annotated plastomes of L. tetragonum and L. bicolor were deposited to the NCBI database with accession numbers MW085088 and MW085089, respectively.

IR expansion and contraction
The IR regions of plastomes are divided by four junctions viz., IRb/LSC, IRb/SSC, IRa/SSC and IRa/LSC. All six plastomes of Plumbaginaceae (including four Limonium) were compared for their IR boundaries. The annotations available on NCBI database were used for Limonium aureum, L. tenellum, Ceratostigma willmottianum and Plumbago auriculata.
The IRb/LSC junction of three Limonium species viz. L. tetragonum, L. bicolor and L. aureum was characterised by the presence of the rpl2 gene (Fig. 2). The gene also extends in IRb with 11 bp, 10 bp and 11 bp, respectively. However, in L. tenellum and  C. willmottianum, it was exclusively present in the LSC region, 937 bp and 1,041 bp away from the junction, respectively. In Plumbago, IR was expanded to include the rps19 gene at the junction, while rpl2 was duplicated in the IR region. The next junction, i.e. IRb/SSC, was characteried by the presence of the gene in L. tetragonum, L. bicolor, L. aureum and P. auriculata (49 bp of all Limonium species and 51 bp of Plumbago in IR region). In L. tenellum, the junction exhibited rrn23 (IR) and trnR-ACG (SSC), which could probably be due to wrong assembly or annotation. In Ceratostigma, ndhF appeared to be shifted to SSC, 118 bp away from IRb border.
IRa/SSC junction of all compared species was characterised by the presence of rps15 and ycf1 genes, except in L. tenellum. The IRa/LSC junction was characterised by rpl32 and trnH in L. tetragonum, L. bicolor and L. aureum, while in L. tenellum and C. willmottianum, it was characterised by ycf2 and trnH. In P. auriculata, the junction was bordered by rps19 and trnH (Fig. 2).
Plastomes of L. aureum, L. bicolor and L. tetragonum exhibited two copies of ycf1, except for L. tenellum which exhibited a single copy. All three plastomes are characterised by the ycf1 gene having a length of 5,298 bp and IR has been expanded to accommodate the ycf1 gene. Plumbago and Ceratostigma also exhibited the ycf1 gene duplicated in IRs. However, the annotation provided for L. tenellum (MK397871) exhibits a single copy of ycf1.

Structural comparison
Plastomes of Limonium tetragonum, L. tenellum, L. bicolor and L. aureum were compared with two Plumbaginaceae plastomes, keeping L. tetragonum as a reference. Sequence divergence amongst the four compared Limonium plastomes was similar as compared to Plumbago and Ceratostigma. Limonium tenellum exhibited partial deletion at the IRa/ LSC junction in the ycf1 gene (Fig. 3). Our results suggest that the gene rpl23 has been pseudogenised in all the studied plastomes of the three genera of Plumbaginaceae. The length of rpl23 in all four Limonium plastomes was observed to be 171 bp, 270 bp in Plumbago and 50 bp in Ceratostigma. A similar loss of intron was observed in L. aureum, L. bicolor and L. tetragonum in our study. In all four compared species of Limonium, the length of rpl16 was 408 bp without any intron. However, the two other genera of Plumbaginaceae, i.e. Plumbago and Ceratostigma, exhibited the presence of intron. The length of the gene was 1,449 bp in the former, while in the latter, it was 1,469 bp.

Nucleotide diversity
The nucleotide diversity (Pi) of four Limonium plastomes was analysed, except for the ycf1 region, which was removed due to ambiguous alignment. Sliding window analysis yielded some regions with higher Pi values. High nucleotide diversity was found in two spacer regions viz. trnY-GUA-trnT-GGU, trnL-UAG-ccsA and one gene ycf3 with Pi values 0.015, 0.043 and 0.02, respectively (Fig. 4).

SSR and repeat analyses
The plastomes of L. bicolor exhibited 6 compound, 26 mono-, 9 di-, 5 tri-and 5 tetranucleotide repeats, while L. tetragonum exhibited 4 compound, 31 mono-, 9 di-, 4 tri-, 6 tetra-and 1 hexanucleotide repeats (Fig. 5a). The size of repeats ranged from 10 to 168 in L. bicolor and 10 to 109 in L. tetragonum (Fig. 5b). Repeats in both the plastomes were found to be AT-rich. The highest number of SSRs was found in the LSC region, followed by IR and SSC regions in both the plastomes (Fig. 5c).   All four species of Limonium exhibited only forward and palindrome type in RE-Puter analysis (Suppl. material 1).

Codon usage
The four Limonium plastomes were compared for their codon usage. The plastomes of L. tetragonum, L. tenellum, L. bicolor and L. aureum exhibited 27,290, 24,093, 26,682 and 27,308 codons, respectively. Leucine was the most abundant while Cysteine was the least abundant amino acid in all the compared plastomes (Fig.  6). Codon usage was biased towards A and T in all the compared plastomes. The highest codon preference was 1.88, 1.89, 1.82 and 1.91 while the lowest was 0.33, 0.36, 0.37 and 0.36, respectively, in L. tetragonum, L. tenellum, L. bicolor and L. aureum (Suppl. material 2). Codon usage was biased towards A and T in all the compared plastomes.

Positive selection analysis
A total of 79 consensus protein-coding genes of four Limonium species were evaluated with respect to selective pressure. Two genes were found to have undergone positive selection viz. ndhF and ycf2 with ω values 2.2278 and 19.657, respectively (Suppl. material 3).

Discussion
In this study, two Limonium plastomes were assembled and the structure and composition of four Limonium plastomes were compared. The plastomes were conserved in terms of size and structure ranging from 154,617 to 154,691 bp, except for L. tenellum with 150,515 bp. Expansion and contraction of IRs/SSC account for huge variation, evolutionary events and also affect the plastome sizes (Zhu et al. 2016;Darshetkar et al. 2019;Maurya et al. 2020). Yao et al. (2019) reported the expansion of IRs in Polygonaceae and Plumbaginaceae members to accommodate ycf1, except for the genus Limonium. However, the study included a single plastome sequence of Limonium tenellum. In all the Limonium plastomes, except L. tenellum, IR has been expanded to accommodate ycf1, unlike that reported by Yao et al. (2019). We assume the single copy of ycf1 in L. tenellum could be a result of erroneous assembly and annotation. Sequencing more plastomes for the genus will help to better understand the arrangement and position of ycf1 in Limonium. Ribosomal Protein L23 is a protein component of the 60s large ribosomal subunit. The comprehensive study of plastomes of Caryophyllales (Yao et al. 2019) reported that the gene has undergone pseudogenisation at least 11 times in the order. Pseudogenisation has also been reported in the family Polygonaceae, a sister family of Plumbaginaceae (Logacheva et al. 2008). The results of our study corroborate these earlier studies. Transfer of the functional copy of the gene to the nucleus has been reported for many plastid genes in angiosperms (Millen et al. 2001;Jansen et al. 2011;Daniell et al. 2016). The functional copy of rpl23 might have been transferred to the nucleus, even in the members of Plumbaginaceae as predicted by Yao et al. (2019). Ribosomal Protein L16 codes for a protein component of the 50s ribosomal subunit. The loss of rpl16 intron was reported in Limonium gmelinii Kuntze and Limonium latifolium Kuntze (Campagna and Downie 1998). Recently, Yao et al. (2019) reported the loss of rpl16 intron in L. tenellum plastome. Hence, our study confirms the loss of rpl16 intron on the branch leading to Limonium as reported by Yao et al. (2019).
The value of the ratio of synonymous and nonsynonymous substitutions (Ka/Ks or ω) above 1 indicates that the corresponding genes experience positive selection, however, ω values ranging from 0.5 to 1 indicate relaxed selection (Tomoko 1995). The ndhF gene was found to be positively selected in the genus Rheum L. of Polygonaceae. The study reported that the higher expression levels of ndhF were observed in Rheum under environmental stress conditions (Li et al. 2016). Most of the Limonium species also grow in saline soils, which could be the reason behind the adaptive evolution of the gene. Positive selection of the ycf2 gene has already been reported in several angiosperms (Jiang et al. 2018;Zhong et al. 2019).
The sampling for the phylogenomic analysis followed the studies of Walker et al. (2018) and Yao et al. (2019). The ingroup consisted of plastomes belonging to the FTPP clade, i.e. Frankeniaceae, Tamaricaceae, Plumbaginaceae and Polygonaceae (they were all recovered as strongly-supported monophyletic groups), while the outgroup consisted of species belonging to Amaranthaceae. The position of the four Limonium species studied fits with the earlier studies (Yao et al. 2019). In the present study, phylogenetic position of four Limonium species belonging to subgenus Limonium (Koutroumpa et al. 2018) was studied. A study based on rbcL, trnL intron and trnL-F intergenic spacer included two species viz. L. tenellum and L. tetragonum, which were resolved in the same clade, but were placed into two different subsections, according to Boissier (1848), Rhodanthae and Chrysanthae, respectively. However, the study treated them under subgenus Limonium (Lledo et al. 2005). Later, Malekmohammadi et al. (2017) studied the phylogeny for the genus, based on several plastid and single nuclear (ITS) markers. The study included L. tetragonum and L. aureum and both the species were placed in the "L. aureum clade". Recently, Koutroumpa et al. (2018) carried out phylogenetic studies of the genus, based on three chloroplast (trnL-F, matK and rbcL) and one nuclear marker (ITS). Three species, L. tetragonum, L. aureum and L. tenellum were included in the study. These three species were all resolved in the same clade. These species were earlier placed under sect. Plathymenium, L. tetragonum and L. aureum belonging to subsect. Chrysanthae and L. tenellum under subsect. Rhodanthae (Boissier 1848). Earlier studies reported plastome data dealing with single Limonium species and placed them as sister to Plumbago species (Yao et al. 2019;Li et al. 2020;Zhang et al. 2020). The present phylogenetic study, for the first time, included L. bicolor and analysed four Limonium plastomes together. However, inclusion of more Limonium plastomes would help in understanding the intrageneric relationships.

Conclusions
The present study has made an effort to understand the structural changes in the plastomes of Plumbaginaceae by including two newly-generated Limonium plastome sequences. The study also confirms the loss of rpl16 intron in the genus Limonium and pseudogenisation of the rpl23 gene in Plumbaginaceae. Our results also revealed, for the first time, the expansion of the IRs to accommodate the ycf1 gene in Limonium as in other Plumbaginaceae members. The annotation available for L. tenellum exhibits ycf1 in the SSC region. Hence, the sequencing of more plastomes would aid in identifying the exact position of ycf1. Two positively-selected genes were identified, viz. ndhF and ycf2. The positive selection of these genes could be linked to the evolution of ndhF to adapt to extreme environmental conditions, such as salt stress. It would be interesting to identify the adaptive sites in the ndhF amino acid by adding more ndhF sequences of Limonium species, while the expansion of IRs and accommodation of ycf2 genes could be related to the re-arrangement of the plastome. The function of ycf2 is still not clear, but it would be interesting to study the ycf2 evolution and re-arrangement in the whole order. High nucleotide diversity was observed in two spacer regions trnY-GUA--trnT-GGU, trnL-UAG--ccsA and one gene ycf3, which could be used as potential DNA barcodes for the genus. Future studies will focus on identifying adaptive codon sites in positively-selected genes and correlating those with the habitats and environmental conditions and validation of the proposed barcodes by including more Limonium species.