Skip to main content
Fig. 1 | Microbiome

Fig. 1

From: A multi-source domain annotation pipeline for quantitative metagenomic and metatranscriptomic functional profiling

Fig. 1

The MetaCLADE workflow. a The MetaCLADE workflow is described in the rectangular green box: the two MetaCLADE main steps are illustrated in white boxes. MetaCLADE input data is constituted by (i) a set of reads to be annotated where ORFs have been already identified and (ii) the CLADE model library. The CLADE model library is used to identify all domain hits for a ORF. The large set of identified hits is then combined with gathering thresholds pre-computed for each domain model (pink box), to realise the second main step in MetaCLADE (right white box): overlapping domain hits are selected based on three filtering features. The output of the workflow is an annotation of the ORF, possibly constituted by several domains. The figure illustrates the best expected annotation of a ORF, that is a domain with, possibly, some domain fragments surrounding it. The rectangular pink box illustrates the pre-computed step. For each domain, the CLADE library contains several hundreds of models that are used in MetaCLADE to identify the hits. For domain D1, considered in the blue cylinder on the left, the model library contains the consensus model (blue coloured line, bottom) and hundreds of CCMs generated from sequences that are spread through the phylogenetic tree of species. Coloured lines represent models constructed from sequences coming from phylogenetic clades coloured on the same colour tone. The blue box on the right illustrates the pre-computation of the domain-specific parameters for the discrimination of positive (light blue, yellow and dark red) from negative (blue, orange and red) sequences. Dots in the plots correspond to sequences. The sequence spaces defined by bit-scores and mean-bit-scores (white plots) and the probability spaces (plots where probabilities are associated to regions) obtained by the naive Bayes classifier are given. b Phylogenetic tree of species that generated the CCMs used in MetaCLADE [43]. c Histogram reporting the number of CCMs available in MetaCLADE, organised by clades

Back to article page