The high molecular diversity and large quantity of viral species complicate the recovery and interpretation of viral biological, genome and proteome data. The large amount of information available in a range databases has been contributing to an inefficient search of data, a time-consuming assessment work, requiring a high level of computational expertise. The number of entire genome sequences and scientific articles related to geminiviruses and associated DNAs satellite has increased dramatically over the last 10 years. This tendency is, at least in part, due to the application of the technique of circular DNA amplification (through Rolling-circle amplification, RCA) using DNA polymerase of bacteriophage phi29 to clone geminivirus genomes (Inoue-Nagata et al., 2004) and, more recently, to the next-generation sequencing (NGS) approaches, also known as high-throughput sequencing (Radford et al., 2012, Al Rwahnih et al., 2015, Roossinck et al., 2015).

The GeminivirusDW offers a single and user-friendly environment to retrieve information about Geminiviruses and DNA satellites, integrating different levels and kinds of information to promote fast and easy recovery of the data based in descriptors from different web repositories (GenBank and PubMed from National Center for Biotechnology Information). The web interface contains search modules for entire genome, protein and gene sequences and scientific publications. Furthermore, GeminivirusDW implements useful tools, such as basic local alignment search (BLAST), pairwise comparison and identity matrix between sequences (SDT Tools), phylogenetic analysis and viral taxon network analysis.

The GeminivirusDW will be maintained by the National Institute of Science and Technology in Plant-Pest Interactions (INCT-IPP), located at the Institute of Biotechnology Applied to Agriculture (BIOAGRO) in the the Universidade Federal de Viçosa campus, Viçosa, Minas Gerais, Brazil.

The authors would like to declare that are not responsible for lack of the information or failure of information transferred to the genome sequence data, as well as poor quality of deposited sequences into web repositories. However, GeminivirusDW is open-access and subsequently allows the inclusion of information that may be provided by research groups responsible for these genome sequences. Users will be also actively contributing by sending feedback, minor corrections and suggestions to

The Geminiviridae family is a group of single-stranded DNA viruses that cause important economic losses worldwide (Hanley-Bowdoin et al., 1999; Rojas et al., 2005). Nine genera (Becurtovirus, Begomovirus, Curtovirus, Eragrovirus, Mastrevirus, Topocuvirus, Turncurtovirus, Capulavirus and Grablovirus) are currently recognized based upon the type of insect vector, host range, genome organization and phylogeny (Brown et al., 2012; Varsani et al., 2014,Varsani et al., 2017). All have monopartite genomes, with the exception of viruses classified in the genus Begomovirus, which can be monopartite (a single genomic DNA of approx. 2.9 kb) or bipartite (two genomic DNA components of approx. 2.6 kb, referred to as DNA-A and DNA-B) (Brown et al., 2012) (Figure 1).

Figure 1. Genomic organization of the seven genera in the family Geminiviridae. LIR, long intergenic region; SIR, short intergenic region; CR, common region; CP, capsid protein; Rep, replication-associated protein; TrAP, transactivator protein; REn, replication enhancer; MP, movement protein; NSP, nuclear shuttle protein; Reg, regulatory gene; SD, symptom determinant; SS, silencing suppressor; TGS, transcriptional gene silencing. Note that the DNA-A component of Old World bipartite geminiviruses contains a V2 ORF (adapted from Varsani et al., 2017).

The association of geminiviruses with two types of ssDNA satellites (alpha- and betasatellites) has been well documented (Zhou, 2013). These molecules are half the size of viral genome components (approximately 1.35 kb), many of them modulate the symptoms caused by the helper geminiviruses, and are dependent on them for their replication (only in the case of betasatellites), encapsidation and vector transmission (Briddon et al., 2003; Briddon & Stanley, 2006).

Alphasatellites are similar to the DNA-R component of nanoviruses, containing a single ORF, which encodes a replication-associated protein (Rep). They have an A-rich region and a predicted stem-loop structure. The A-rich region is the only feature that can be used to distinguish alphasatellites from nanovirus DNA-R components and it has been suggested that this region may function to increase the size of alphasatellite molecules to half the size of begomovirus genomic components. The predicted alphasatellite stem-loop structure has a loop containing a nonanucleotide, TAGTATTAC, common to nanoviruses that is also similar to the TAATATTAC nonanucleotide sequence in geminiviruses. Alphasatellites are typically associated with monopartite geminiviruses from the Old World that are also associated with betasatellites. Only recently have alphasatellites been found associated to New World bipartite geminiviruses (Paprotka et al., 2010; Romay et al., 2010). These satellites do not significantly contribute to disease development. They can replicate autonomously, but require the helper virus for systemic infection and insect transmission (Briddon & Stanley, 2006).

Betasatellites contain a single ORF, which encodes a pathogenicity determinant protein known as betaC1. These agents contribute for the development of typical disease symptoms, enhance pathogenicity of their helper geminiviruses pathogenicity, and modulate virus host range by modulation of host defense response (Saunders et al., 2004; Saeed et al., 2005). Betasatellite genomes share no significant sequence homology with their helper geminiviruses other than a potential stem-loop structure containing the TAATATTAC sequence. Nevertheless, they have a highly conserved genome organization consisting of a region known as the satellite conserved region (SCR), the betaC1 ORF (conserved both in sequence and position among betasatellites) and an adenine-rich region.


Arguello-Astorga G, Lopez-Ochoa L, Kong LJ, Orozco BM, Settlage SB & Hanley-Bowdoin L. 2004. A novel motif in geminivirus replication proteins interacts with the plant retinoblastoma-related protein. Journal of Virology 78:4817-4826.

Ascencio-Ibanez JT, Sozzani R, Lee TJ, Chu TM, Wolfinger RD, Cella R & Hanley-Bowdoin L. 2008. Global analysis of arabidopsis gene expression uncovers a complex array of changes impacting pathogen response and cell cycle during geminivirus infection. Plant Physiology 148:436-454.

Bernardo P, Golden M, Akram M, Naimuddin M, Nadarajan N, Fernandez E, Granier M, Rebelo AG, Peterschmitt M, Martin DP & Roumagnac P. 2013. Identification and characterization of a highly divergent geminivirus: evolutionary and taxonomic implications. Virus Research 177:35-45.

Briddon RW & Stanley J. 2006. Subviral agents associated with plant single-stranded DNA viruses. Virology 344:198-210.

Briddon RW, Patil BL, Bagewadi B, Nawaz-ul Rehman MS, Fauquet CM (2010) Distinct evolutionary histories of the DNA-A and DNA-B components of bipartite geminiviruses. BMC Evolutionary Biology 10:1-17.

Briddon RW. 2003. Cotton leaf curl disease, a multicomponent begomovirus complex. Molecular Plant Pathology 4:427-434.

Brown JK, Fauquet CM, Briddon RW, Zerbini FM, Moriones E & Navas-Castillo J. 2012. Family Geminiviridae. In Virus Taxonomy 9th Report of the International Committee on Taxonomy of Viruses, pp. 351-373. Edited by AMQ King, MJ Adams, EB Carstens & EJ Lefkowitz. London, UK: Elsevier Academic Press.

Fauquet CM, Briddon RW, Brown JK, Moriones E, Stanley J, Zerbini FM & Zhou X. 2008. Geminivirus strain demarcation and nomenclature. Archives of Virology 153:783-821.

Fondong VN. 2013. Geminivirus protein structure and function. Molecular Plant Pathology 14:635-649.

Frischmuth S, Wege C, Hulser D & Jeske H. 2007. The movement protein BC1 promotes redirection of the nuclear shuttle protein BV1 of Abutilon mosaic geminivirus to the plasma membrane in fission yeast. Protoplasma 230:117-123.

Gilbertson RL, Sudarshana M, Jiang H, Rojas MR & Lucas WJ. 2003. Limitations on geminivirus genome size imposed by plasmodesmata and virus-encoded movement protein: Insights into DNA trafficking. Plant Cell 15:2578-2591.

Hanley-Bowdoin L, Settlage SB, Orozco BM, Nagar S & Robertson D. 1999. Geminiviruses: Models for plant DNA replication, transcription, and cell cycle regulation. Critical Reviews in Plant Sciences 18:71-106.

Hulo C, Castro E, Masson P, Bougueleret L, Bairoch A, Xenarios I & Le Mercier P. 2011. ViralZone: a knowledge resource to understand virus diversity. Nucleic Acids Research 39:D576-D582.

Inoue-Nagata AK, Albuquerque LC, Rocha WB & Nagata T. 2004. A simple method for cloning the complete begomovirus genome using the bacteriophage Π29 DNA polymerase. Journal of Virological Methods 116:209-211.

Kjemtrup S, Sampson KS, Peele C, Nguyen L, Conkling M, Thompson W & Roberson D. 1998. Gene silencing from DNA carried by a geminivirus. Plant Journal 14:91-100.

Kleinow T, Holeiter G, Nischang M, Stein M, Karayavuz M, Wege C & Jeske H. 2008. Post-translational modifications of Abutilon mosaic virus movement protein (BC1) in fission yeast. Virus Research 131:86-94.

Kumar J, Kumar J, Singh SP & Tuli R. 2014. Association of satellites with a mastrevirus in natural infection: complexity of Wheat dwarf India virus disease. Journal of Virology, ahead of print 10.1128/JVI.02911-13.

Kunik T, Salomon R, Zamir D, Navot N, Zeidan M, Michelson I, Gafni Y & Czosnek H. 1994. Transgenic tomato plants expressing the Tomato yellow leaf curl virus capsid protein are resistant to the virus. Biotechnology 12:500-504.

Latham JR, Saunders K, Pinner MS & Stanley J. 1997. Induction of plant cell division by Beet curly top virus gene C4. Plant Journal 11:1273-1283.

Laufs J, Schumacher S, Geisler N, Jupin I & Gronenborn B. 1995. Identification of the nicking tyrosine of geminivirus Rep protein. FEBS Letters 377:258-262.

Lazarowitz SG. 1992. Geminiviruses: Genome structure and gene function. Critical Reviews in Plant Sciences 11:327-349.

Luque A, Sanz-Burgos AP, Ramirez-Parra E, Castellano MM & Gutierrez C. 2002. Interaction of geminivirus Rep protein with replication factor C and its potential role during geminivirus DNA replication. Virology 302:83-94.

Mansoor S, Briddon RW, Zafar Y & Stanley J. 2003. Geminivirus disease complexes: An emerging threat. Trends in Plant Sciences 8:128-134.

Monci F, Sanchez-Campos S, Navas-Castillo J & Moriones E. 2002. A natural recombinant between the geminiviruses Tomato yellow leaf curl Sardinia virus and Tomato yellow leaf curl virus exhibits a novel pathogenic phenotype and is becoming prevalent in Spanish populations. Virology 303:317-326.

Morales FJ & Anderson PK. 2001. The emergence and dissemination of whitefly-transmitted geminiviruses in Latin America. Archives of Virology 146:415-441.

Noueiry AO, Lucas WJ & Gilbertson RL. 1994. Two proteins of a plant DNA virus coordinate nuclear and plasmodesmal transport. Cell 76:925-932.

Noueiry AO, Lucas WJ & Gilbertson RL. 1994. Two proteins of a plant DNA virus coordinate nuclear and plasmodesmal transport. Cell 76:925-932.

Orozco BM & Hanley-Bowdoin L. 1998. Conserved sequence and structural motifs contribute to the DNA binding and cleavage activities of a geminivirus replication protein. Journal of Biological Chemistry 273:24448-24456.

Palmer KE & Rybicki EP. 1998. The molecular biology of mastreviruses. Advances in Virus Research 50:183-234.

Paprotka T, Metzler V & Jeske H. 2010. The first DNA 1-like alpha satellites in association with New World geminiviruses in natural infections. Virology 404:148-157.

Poojari S, Alabi OJ, Fofanov VY & Naidu RA. 2013. A leafhopper-transmissible DNA virus with novel evolutionary lineage in the family Geminiviridae implicated in Grapevine Redleaf Disease by next-generation sequencing. PLoS ONE 8:6:e64194.

Radford AD, Chapman D, Dixon L, Chantrey J, Darby AC & Hall N. 2012. Application of next-generation sequencing technologies in virology. Journal of General Virology 93:1853-1868.

Rojas MR, Hagen C, Lucas WJ & Gilbertson RL. 2005. Exploiting chinks in the plant's armor: Evolution and emergence of geminiviruses. Annual Review of Phytopathology 43:361-394.

Rojas MR, Noueiry AO, Lucas WJ & Gilbertson RL. 1998. Bean dwarf mosaic geminivirus movement proteins recognize DNA in a form- and size-specific manner. Cell 95:105-113.

Romay G, Chirinos D, Geraud-Pouey F & Desbiez C. 2010. Association of an atypical alphasatellite with a bipartite New World begomovirus. Archives of Virology 55:1843-1847.

Saeed M, Behjatnia SA, Mansoor S, Zafar Y, Hasnain S & Rezaian MA. 2005. A single complementary-sense transcript of a geminiviral DNA β satellite is determinant of pathogenicity. Molecular Plant Microbe Interaction 18:7-14.

Sanderfoot AA & Lazarowitz SG. 1995. Cooperation in viral movement: The geminivirus BL1 movement protein interacts with BR1 and redirects it from the nucleus to the cell periphery. Plant Cell 7:1185-1194.

Saunders K, Norman A, Gucciardo S & Stanley J. 2004. The DNA beta satellite component associated with ageratum yellow vein disease encodes an essential pathogenicity protein (betaC1). Virology 324:37-47.

Settlage SB, Miller AB, Gruissem W & Hanley-Bowdoin L. 2001. Dual interaction of a geminivirus replication accessory factor with a viral replication protein and a plant cell cycle regulator. Virology 279:570-576.

Stanley J. 1995. Analysis of African cassava mosaic virus recombinants suggest strand nicking occurs within the conserved nonanucleotide motif during the initiation of rolling circle DNA replication. Virology 206:707-712.

Stenger DC, Revington GN, Stevenson MC & Bisaro DM. 1991. Replicational release of geminivirus genomes from tandemly repeated copies: Evidence for rolling-circle replication of a plant viral DNA. Proceedings of the National Academy of Sciences 88:8029-8033.

Trinks D, Rajeswaran R, Shivaprasad PV, Akbergenov R, Oakeley EJ, Veluthambi K, Hohn T & Pooggin MA. 2005. Suppression of RNA silencing by a geminivirus nuclear protein, AC2, correlates with transactivation of host genes. Journal of Virology 79:2517-2527.

Vanitharani R, Chellappan P, Pita JS & Fauquet CM. 2004. Differential roles of AC2 and AC4 of cassava geminiviruses in mediating synergism and suppression of posttranscriptional gene silencing. Journal of Virology 78:9487-9498.

Varsani A, Navas-Catillo J, Mariones E, Hernández-Zepeda C, Idris A, Brown JK, Zerbini FM & Martin DP (2014) Capulavirus and Grablovirus: two new genera in the family Geminiviridae. Journal of Virology doi:10.1007/s00705-017-3268-6.

Arvind Varsani, Philippe Roumagnac, Marc Fuchs, Jesús Navas-Castillo, Enrique Moriones, Ali Idris, Rob W. Briddon, Rafael Rivera-Bustamante, F. Murilo Zerbini & , Darren P Martin (2017) Establishment of three new genera in the family Geminiviridae: Becurtovirus, Eragrovirus and Turncurtovirus. Archives of Virology (doi:10 .1007/s00705-014-2050-2).

Were HK, Winter S & Maiss E. 2004. Viruses infecting cassava in Kenya. Plant Disease 88:17-22.

Xie Q, Sanz-Burgos AP, Guo H, Garcia JA & Gutierrez C. 1999. Grab proteins, novel members of the NAC domain family, isolated by their interaction with a geminivirus protein. Plant Molecular Biology 39:647-656.


R version 3.4.0 (2017-04-21) -- "You Stupid Darkness"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see 

Protein-Protein BLAST

blastp: 2.2.29+
Package: blast 2.2.29, build Jul  1 2014 18:29:55 

Nucleotide-Nucleotide BLAST

blastn: 2.2.29+
Package: blast 2.2.29, build Jul  1 2014 18:29:55 


 CLUSTAL 2.1 Multiple Sequence Alignments

                DATA (sequences)

-INFILE=file.ext                             :input sequences.
-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (old alignment).

                VERBS (do things)

-OPTIONS            :list the command line parameters
-HELP  or -CHECK    :outline the command line params.
-FULLHELP           :output full help content.
-ALIGN              :do full multiple alignment.
-TREE               :calculate NJ tree.
-PIM                :output percent identity matrix (while calculating the tree)
-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT            :output the input sequences in a different file format.

                PARAMETERS (set things)

***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE   :use FAST algorithm for the alignment guide tree
-TYPE=       :PROTEIN or DNA sequences
-NEGATIVE    :protein alignment with negative values in matrix
-OUTFILE=    :sequence alignment file name
-CASE        :LOWER or UPPER (for GDE output only)
-SEQNOS=     :OFF or ON (for Clustal output only)
-SEQNO_RANGE=:OFF or ON (NEW: for all output formats)
-RANGE=m,n   :sequence range to write starting m to m+n
-MAXSEQLEN=n :maximum allowed input sequence length
-QUIET       :Reduce console output to minimum
-STATS=      :Log some alignents statistics to file

***Fast Pairwise Alignments:***
-KTUPLE=n    :word size
-TOPDIAGS=n  :number of best diags.
-WINDOW=n    :window around best diags.
-PAIRGAP=n   :gap penalty

***Slow Pairwise Alignments:***
-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f  :gap opening penalty        
-PWGAPEXT=f   :gap opening penalty

***Multiple Alignments:***
-NEWTREE=      :file for new guide tree
-USETREE=      :file for old guide tree
-MATRIX=       :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=    :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f     :gap opening penalty        
-GAPEXT=f      :gap extension penalty
-ENDGAPS       :no end gap separation pen. 
-GAPDIST=n     :gap separation pen. range
-NOPGAP        :residue-specific gaps off  
-NOHGAP        :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.    
-MAXDIV=n      :% ident. for delay
-TYPE=         :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting
-NUMITER=n     :maximum number of iterations to perform
-NOWEIGHTS     :disable sequence weighting

***Profile Alignments:***
-PROFILE      :Merge two alignments by profile alignment
-NEWTREE1=    :file for new guide tree for profile1
-NEWTREE2=    :file for new guide tree for profile2
-USETREE1=    :file for old guide tree for profile1
-USETREE2=    :file for old guide tree for profile2

***Sequence to Profile Alignments:***
-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree

***Structure Alignments:***
-NOSECSTR1     :do not use secondary structure-gap penalty mask for profile 1 
-NOSECSTR2     :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE   :output in alignment file
-HELIXGAP=n    :gap penalty for helix core residues 
-STRANDGAP=n   :gap penalty for strand core residues
-LOOPGAP=n     :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal 

-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n        :seed number for bootstraps.
-KIMURA        :use Kimura's correction.   
-TOSSGAPS      :ignore positions with gaps.
-BOOTLABELS=node OR branch :position of bootstrap values in tree display


/usr/bin/mafft: Cannot open -h.

  MAFFT v7.205 (2014/10/20)
  MBE 30:772-780 (2013), NAR 30:3059-3066 (2002)
High speed:
  % mafft in > out
  % mafft --retree 1 in > out (fast)

High accuracy (for <~200 sequences x <~2,000 aa/nt):
  % mafft --maxiterate 1000 --localpair  in > out (% linsi in > out is also ok)
  % mafft --maxiterate 1000 --genafpair  in > out (% einsi in > out)
  % mafft --maxiterate 1000 --globalpair in > out (% ginsi in > out)

If unsure which option to use:
  % mafft --auto in > out

--op # :         Gap opening penalty, default: 1.53
--ep # :         Offset (works like gap extension penalty), default: 0.0
--maxiterate # : Maximum number of iterative refinement, default: 0
--clustalout :   Output: clustal format, default: fasta
--reorder :      Outorder: aligned, default: input order
--quiet :        Do not report progress
--thread # :     Number of threads (if unsure, --thread -1) 


Invalid command line option "help"

MUSCLE v3.8.31 by Robert C. Edgar
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

Basic usage

    muscle -in  -out 

Common options (for a complete list please see the User Guide):

    -in     Input file in FASTA format (default stdin)
    -out   Output alignment in FASTA format (default stdout)
    -diags             Find diagonals (faster for similar sequences)
    -maxiters       Maximum number of iterations (integer, default 16)
    -maxhours       Maximum time to iterate in hours (default no limit)
    -html              Write output in HTML format (default FASTA)
    -msf               Write output in GCG MSF format (default FASTA)
    -clw               Write output in CLUSTALW format (default FASTA)
    -clwstrict         As -clw, with 'CLUSTAL W (1.81)' header
    -log[a]   Log to file (append if -loga, overwrite if -log)
    -quiet             Do not write progress messages to stderr
    -version           Display version information and exit

Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
Fastest possible (nucleotides): -maxiters 1 -diags