BIOINFORMATIC APPROACH IN THE IDENTIFICATION OF ARABIDOPSIS GENE HOMOLOGOUS IN AMARANTHUS

Bioinfomatics offers an efficient tool for molecular genetics applications and sequence homology search algorithms became an inevitable part for many different research strategies. Appropriate managing of known data that are stored in public available databases can be used in many ways in the research. Here, we report the identification of RmlC-like cupins superfamily protein DNA sequence than is known in Arabidopsis genome for the Amaranthus – plant specie where this sequence was still not sequenced. A BLAST based approach was used to identify the homologous sequences in the nucleotide database and to find suitable parts of the Arabidopsis sequence were primers can be designed. In total, 64 hits were found in nucleotide database for Arabidopsis RmlC-like cupins sequence. A query cover ranged from 10% up to the 100% among RmlC-like cupins nucleotides and its homologues that are actually stored in public nucleotide databases. The most conserved region was identified for matches that posses nucleotides in the range of 1506 up to the 1925 bp of RmlC-like cupins DNA sequence stored in the database. The in silico approach was subsequently used in PCR analysis where the specifity of designed primers was approved. A unique, 250 bp long fragment was obtained for Amaranthus cruentus and a hybride Amaranthus hypochondriacus x hybridus in our analysis. Bioinformatic based analysis of unknown parts of the plant genomes as showed in this study is a very good additional tool in PCR based analysis of plant variability. This approach is suitable in the case for plants, where concrete genomic data are still missing for the appropriate genes, as was demonstrated for Amaranthus.


INTRODUCTION
Bioinformatics provides an interdisciplinary tool, that is used to manage and analyse biological data and known sequences of nucleic acids (Cannataro et al., 2009).Many features of nucleic acids can be used in bioinformatic algorithms as motifs for description of their genomic variability and their better understanding.Individual sequence motifs are recognized by their order and nucleotide preference and many motif discovery algorithms have used in different molecular or bioinformatic studies (Aravind and Koonin, 1999; Hertz and Stormo, 1999; La and Livesay, 2005; Rasouli et al., 2013; Gardner and Slezak, 2014).
Here, the bioinformatic algorithms were applied for known cupin DNA sequences.Cupin proteins are reported as to be structurally conserved and in function very divergent superfamily of proteins (Khuri et al., 2001) that are germin-related.These proteins were analysed by their EST and microbial GeneBank databases as having representatives in many procaryotic and eukaryotic organisms, moreover, 26 residues of cupins intermotif regions were found in cereal proteins In this study, we identify the conserved nucleotides of available genomic data of Arabidopsis RmlC-like cupins superfamily protein suitable for bioinformatic approach based primer designation and subsequently PCR identification of the presence of this sequence in the genome of Amaranthus.

MATERIAL AND METHODOLOGY
Plant material of Amaranthus cruentus (Ficha cultivar) and a hybride Amaranthus hypochondriacus x hybridus (hybrid K-433) was planted under the in vitro conditions.DNA was extracted following the instructions of GeneJET Plant Genomic DNA Purification Mini Kit (Thermo Scientific).Nanodrop Nanophotometer™ was used for quantity and quality analysis of the extracted DNA.sequences were chosen that posses the query cover more than 75% and E-value 0.0.Primers were designed in Primer-BLAST (Ye et al, 2012) in a manner to get RmlC-like cupins amplification only based on the conservative part of this gene as predicted bioinformatically.Following primers were returned as specific and used in the study: forward 5´ccgaagtttcatccgatggc 3´and reverse 5´ctttgaaagctccccctccg 3´.PCR amplifications were performed in a Bio-Rad C1000™ Thermocycler with the following program: an initial denaturation step at 95 °C for 5 min followed by 40 cycles at 95 °C for 30 s, 58 °C for 40 s, and 72 °C for 40 sec with a final cycle at 72 °C for 10 min.The amplified products were inspected by electrophoresis in 1.5% agarose in a 1×TBE buffer, visualized after GelRed™ staining and photographed under UV light.

RESULTS AND DISCUSSION
First, alignment of Arabidopsis RmlC-like cupins sequence was done using megaBLAST.Here, query cover from 10% up to the 100% was found among RmlC-like cupins nucleotides and its homologues that are actually known for taxid 3193 and stored in databases (Figure 1).In total, 64 hits were found in nucleotide database for Arabidopsis RmlC-like cupins sequence.Subsequently, conserved region was identified for matches that posses the query cover more than 75% and E-value 0.0 and was returned for the nucleotides in the range of 1506 up to the 1925 bp of RmlC-like cupins DNA.Variable regions of the most conserved part of the RmlC-like cupins sequence of Arabidopsis are listed in the Table 1.
Comparing the obtained data for possibility to design primers for RmlC-like cupins PCR identification in plant species, candidates sites were found as displayed in figure 2 for the nucleotides 1533-1552 and 1765-1784.For primer design, Primer-BLAST (Ye et al, 2012) software was used.In literature, not only Primer-BLAST, but also similar softwares like FastPCR (Kalendar et al., 2011), or csPCR (Dasu et al., 2010) are reported by (Gardner et al., 2014) as to be optimal for low throughput analyses for the purposes of manual inspection or a graphical user interface.They design the primers together with primer Tm, secondary structure and primer-dimer prediction.Bäumlein et al. (1995) reported for cupin superfamily sharing conserved residues with vicilin and legumin storage proteins.This was confirmed by performed BLAST again, as vicilin-like and provicilin-like alignments were found in nucleotide database for Camelina sativa, Morus notabilis, Elaeis guineensis, Citrus sinensis, Vitis vinifera, Brassica rapa and Tarenaya hassleriana.None all of these returned alignment share the query cover more than 75% and E-value 0.0 and were not used in comparison for primer design purposes.In Arabidopsis, analysed RmlC-like cupins superfamily protein is described to have nutrient reservoir activity function (http://sgdb.cbi.pku.edu.cn/gene_info.php?id=AT2G18540).For other alignments found by BLAST in this study, beside above mentioned vicilin and provicilin like characteristics, three types of other characteristics were returned: globulin 1-S, hypothetical protein and uncharacterized one.For most of germin-like proteins isolated from plants are in literature reported unknown function or bifunctionality (Woo et al., 2000).
After the bioinformatic analysis of Arabidopsis RmlC-like cupins sequence, primers were designed for the purposes of its PCR identification in those plants, for which no homologous nucleotides exist in databases.Amaranthus cruentus (Ficha cultivar) and a hybride

CONCLUSION
We presented a bioinformatic approach for identification of specific parts of the plant genomes that are known for some species, but not for all.Using nucleotide comparisons based on BLAST analysis offer a tool that can be used for selecting the most conservative sites of known sequences.Based on comparison such as those, universal primers can be designed and used for species, where concrete genomic data are still missing for the appropriate gene.
Bioinformatic Mega-BLAST (Zhang et al. 2000) alignment of the 2672 bp RmlC-like cupins superfamily protein DNA sequence (SeedGeneDB accession code AT2G18540) was performed.BLAST searches were used in nonredundant, nonmouse and nonhuman nucleotide databases by BLATtn against plants (taxid:3193) nucleotide sequences in the NCBI database to alight existing accessions.To analyse the returned alignments for the purposes of primer designations, only those nucleotide

Figure 1
Figure 1Differences in query covers of returned sequences in the alignment of Arabidopsis RmlC-like cupins sequence against taxid taxid:3193.
Amaranthus hypochondriacus x hybridus (hybrid K-433) became the biological object for the test of the designed primer pair suitability and amplification efficiency.After optimization of PCR conditions a specific monomorphic fragment with the size of 250 bp it was obtained (Figure3).This fragment lenght corresponded to the size of the conservative region that is flanked by designed primers in the bioinformatic part of the study.Amplicons were inspected for specifity on 1.5% agarose gel.Sequence homology search algorithms became commonly used and efficient tools in molecular genetics (Karpov and Bloom, 2010).Nowadays, a number of different motifs finding algorithms are available and (Lin, http://biochem218.stanford.edu/Projects%202012/Lin.pdf) reported them to be impossible to provide a comprehensive report of all of them.Each algorithm has its own advantages and disadvantages.One of the aims of different patterns discovery is finding of specific motifs in nucleotide or protein sequences for the purpose of better understanding of their structure and function(Bailey, 2008) or for their identification (Khuri et al., 2001).

Table 1
Characterization of variable nucleotide motifs in the most conserved region of RmlC-like cupins superfamily protein.