Skip to main content
Fig. 4 | Microbiome

Fig. 4

From: The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification

Fig. 4

Overview of virus taxonomy prediction by GRAViTy. Schematic diagram of the processing steps used to construct classifiers based on viruses with assigned taxonomic status (reference virus genomes) and the pipeline used to classify viruses of interest (virus queries). In summary, protein sequences are extracted from reference virus genomes and clustered based on pairwise BLASTp bit scores. Sequences in each cluster are then aligned and turned into a protein profile hidden Markov model (PPHMM). Reference genomes are subsequently scanned against the database of PPHMMs to determine the locations of their genes and genomic organisation models (GOMs) for each virus family are constructed. PPHMM and GOM databases are the main machinery of our genome annotator (Annotator). To classify viruses of interest, they, together with the reference viruses, are first annotated with information on the presence of genes and the degree of similarity of their genomic organisation to various reference families (Feature table). Pairwise similarity scores (composite generalised Jaccard similarity) is then estimated and passed to the classifier to identify taxonomic candidates for each query using the 1-nearest neighbour algorithm. A UPGMA dendrogram and a similarity acceptance cut-off for each virus family are also estimated from the pairwise similarity scores and used by the evaluator to evaluate the taxonomic candidates. The analysis is performed in parallel for the six virus Baltimore groups; those showing best matches are the finalised taxonomic assignments

Back to article page