Skip to main content

Advertisement

Figure 1 | Microbiome

Figure 1

From: Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis

Figure 1

Outline of the approach for one feature in three control and three experimental samples. The count values for feature i, sample j are converted to probabilities by Monte Carlo sampling from the Dirichlet distribution with the addition of a uniform prior. Each count value is now represented by a vector of probabilities 1:n, where n is the number of Monte Carlo instances sampled: three instances are shown in the example, but 128 are used by default. Each probability in the vector is consistent with the number of counts in feature i given the total number of reads observed for sample j. Each Monte Carlo Dirichlet instance is center log-ratio transformed giving a vector of transformed values. These values are the base 2 logarithm of the abundance of the feature in each Dirichlet instance in each sample divided by the geometric mean abundance of the Dirichlet instance of the sample. Significance tests for control samples (C1 : C3) vs experimental samples (E1 : E3) are performed on each element in the vector of clr values. Each resulting P value is corrected using the Benjamini–Hochberg procedure. The expected values are reported for both the distribution of P values and for the distribution of Benjamini–Hochberg corrected values. clr, centered log-ratio; FDR, false discovery rate.

Back to article page