Skip to main content
Fig. 1 | Microbiome

Fig. 1

From: Improved OTU-picking using long-read 16S rRNA gene amplicon sequencing and generic hierarchical clustering

Fig. 1

Boxplots of clustering accuracy of simulated sequencing on mock communities. Clustering accuracy was measured with the adjusted Rand index score (ARI; y-axis) on five simulated sequencing read lengths and four clustering programs. Values closer to zero indicate more dissimilar clustering compared to the ground truth and values closer to one indicate clustering in agreement with the ground truth. Simulated sequencing reads were generated on mock communities of low (left panel; 100 genomes/mock), medium (centered panel; 250 genomes/mock), and high (right panel; 500 genomes/mock) complexity. Each sequencing technology (x-axis; MiSeq: 2 × 150 and 2 × 250 bp paired-end reads, PacBio: 450, 750, and 1450 bp CCS reads) was simulated on 10 mock communities at each complexity level. Each box-and-whisker plot thus contains 10 observations. Black dots are outliers and the centered horizontal line inside the box corresponds to the median. Red, green, blue, and violet colors correspond to the programs cd-hit, usearch, oclust MSA (genetic distances computed from a multiple sequence alignment, and subsequent complete-linkage hierarchical clustering), and oclust PW (genetic distances computed from pairwise comparisons, and subsequent complete-linkage hierarchical clustering), respectively. Clustering was performed at 1 % increments from 1 to 6 % similarity. The best identity threshold per program and technology is shown

Back to article page