library(phyloseq)
library(ggplot2)
library(patchwork)
PARTII_alpha_Diversity_ANF
1 Prepare workspace
1.1 Load libraries
1.2 Load custom functions
::load_all() devtools
Warning: Objects listed as exports, but not present in namespace:
• primer_trim
1.3 Define output folder
<- here::here("outputs", "alpha_diversity")
output_alpha if (!dir.exists(output_alpha)) dir.create(output_alpha, recursive = TRUE)
1.4 Load the data and inspect the phyloseq object
<- readRDS(here::here("data",
physeq "asv_table",
"phyloseq_object_alpha_beta_div.rds"))
2 Data Structure
- Phyloseq object
physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 213 taxa and 18 samples ]
sample_data() Sample Data: [ 18 samples by 21 sample variables ]
tax_table() Taxonomy Table: [ 213 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 213 tips and 212 internal nodes ]
refseq() DNAStringSet: [ 213 reference sequences ]
2.1 Composition of our phyloseq object physeq
2.1.1 An ASV table with the absolute counts
Be careful: Rows are samples, columns are ASVs
@otu_table[1:10,1:10] physeq
OTU Table: [10 taxa and 10 samples]
taxa are columns
ASV1 ASV2 ASV3 ASV4 ASV5 ASV6 ASV7 ASV8 ASV9 ASV10
S11B 117 25 85 70 87 40 57 34 41 0
S1B 67 0 23 0 51 48 0 0 27 58
S2B 43 0 35 15 42 52 0 0 0 43
S2S 103 87 76 12 99 43 36 72 46 0
S3B 59 0 32 0 49 73 0 0 0 57
S3S 81 10 0 20 36 0 0 0 0 50
S4B 11 6 38 33 43 46 0 8 0 37
S4S 68 6 38 0 62 0 0 11 30 46
S5B 176 18 62 109 0 35 56 13 33 82
S5S 182 0 36 101 51 0 33 0 29 42
2.1.2 A metadata table with information (e.g. physicochemical, categorical variables) about samples
@sam_data physeq
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S11B S11B South South5B SGF 35 5370 46830 580 6010
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
S6B S6B South South1B SGF 13 2520 39050 705 3630
S6S S6S South South1S SGS 0 2435 35890 915 3735
S7B S7B South South2B SGF 26 0 0 0 4005
S7S S7S South South2S SGS 0 4535 26545 1340 6585
S8B S8B South South3B SGF 33 0 0 0 5910
S8S S8S South South3S SGS 0 4260 36745 985 5470
S9B S9B South South4B SGF 25 4000 31730 485 4395
S9S S9S South South4S SGS 0 5465 32860 820 5045
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S11B 1690 3.324 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
S6B 1295 2.206 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877
S6S 1300 3.004 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884
S7B 1600 3.016 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699
S7S 1355 1.198 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708
S8B 1590 3.868 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054
S8S 2265 3.639 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885
S9B 1180 3.910 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094
S9S 1545 3.607 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777
Sigma_t
S11B 26.9631
S1B 26.0046
S2B 26.0521
S2S 26.0137
S3B 26.2987
S3S 26.0332
S4B 26.9415
S4S 26.9415
S5B 26.1037
S5S 26.1065
S6B 27.3241
S6S 27.3151
S7B 26.7536
S7S 26.7488
S8B 26.9423
S8S 26.8713
S9B 27.0131
S9S 26.8172
2.1.3 A table of taxonomic classification level of each ASV
@tax_table[1:10,] physeq
Taxonomy Table: [10 taxa by 7 taxonomic ranks]:
Kingdom Phylum Class Order
ASV1 "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Synechococcales"
ASV2 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV3 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV4 "Archaea" "Thermoplasmatota" "Thermoplasmata" "Marine Group II"
ASV5 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV6 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV7 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "Rhodospirillales"
ASV8 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV9 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV10 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
Family Genus Species
ASV1 "Cyanobiaceae" "Synechococcus CC9902" NA
ASV2 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV3 "Clade I" "Clade Ia" NA
ASV4 NA NA NA
ASV5 "Clade I" "Clade Ia" NA
ASV6 "Clade II" NA NA
ASV7 "AEGEAN-169 marine group" NA NA
ASV8 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV9 "Clade I" "Clade Ia" NA
ASV10 "Clade I" "Clade Ia" NA
2.1.4 A Phylogenetic tree
@phy_tree physeq
Phylogenetic tree with 213 tips and 212 internal nodes.
Tip labels:
ASV1, ASV2, ASV3, ASV4, ASV5, ASV6, ...
Rooted; includes branch lengths.
2.1.5 A table with the ASV sequences
@refseq physeq
DNAStringSet object of length 213:
width seq names
[1] 402 GGAATTTTCCGCAATGGGCGAA...CGAAAGCCAGGGGAGCGAAAGG ASV1
[2] 425 GGAATATTGCACAATGGGCGCA...CGAAAGCGTGGGGAGCAAACAG ASV2
[3] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV3
[4] 383 CGAAAACTTGACAATGCGAGCA...CGAAGCCTAGGGGCACGAACCG ASV4
[5] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV5
... ... ...
[209] 426 GGAATTTTGCGCAATGGACGAA...CGAAAGCGTGGGGAGCGAACAG ASV209
[210] 403 GGAATATTGCACAATGGGCGCA...GGTCAACACTGACGCTCATGTA ASV210
[211] 360 CGAAAACTTCACACTGCAGGAA...GAACGGATCCGACGGTCAGGGA ASV211
[212] 400 GGAATATTGGACAATGGGCGAA...CGAAAGCGTGGGTAGCAAACAG ASV212
[213] 404 GGAATATTGCACAATGGGCGCA...GTCAACACTGACGCTCATGTAC ASV213
3 Subsampling normalization
3.1 Rarefaction Curves
Before normalization by sub-sampling, let’s have a look at rarefaction curves, evaluate your sequencing effort and make decisions
3.1.1 Identify your minimum sample size
::sample_sums(physeq) phyloseq
S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S
975 837 893 983 878 889 917 1077 1018 1006 1076 937 878 936 846 958
S9B S9S
888 991
What is the minimum sample size?
3.1.2 Run rarefaction curves using our custom function ggrare()
(defined in R/alpha_diversity.R
)
#Make rarefaction curves & Add min sample size line
ggrare(physeq, step = 10, color = "Description", se = FALSE) +
geom_vline(xintercept = min(sample_sums(physeq)), color = "gray60")
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Do you think is a good idea to normalize your data using this minimal sample size?
3.2 Normalization process for alpha diversity: sub-sampling
<- phyloseq::rarefy_even_depth(physeq, rngseed = TRUE) physeq_rar
Check the number of sequences for each sample using sample_sums function
Did you lost a lot of ASVs?
3.3 Run rarefaction curves on normalized data
<- ggrare(physeq_rar, step = 10, color = "Description", se = TRUE) p0
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
3.4 Group separation
+ facet_wrap(~Geo, ncol = 2) p0
4 IV-Alpha Diversity
4.1 Indices
4.1.1 Get taxonomy-based diversity indices
#Get indices with alpha function (NB: index="all" if you want all the indices)
<- microbiome::alpha(
alpha_indices
physeq_rar,index = c("observed", "diversity_gini_simpson",
"diversity_shannon", "evenness_pielou",
"dominance_relative")
)
#save
write.table(alpha_indices,
file = file.path(output_alpha, "indices_alpha_resultat.txt"),
sep = "\t")
#which type?
class(alpha_indices)
[1] "data.frame"
#see
alpha_indices
observed diversity_gini_simpson diversity_shannon evenness_pielou
S11B 36 0.9447863 3.146480 0.8780420
S1B 35 0.9477924 3.177766 0.8937989
S2B 43 0.9577758 3.408022 0.9060997
S2S 36 0.9414576 3.112066 0.8684385
S3B 41 0.9503989 3.275906 0.8821441
S3S 38 0.9491313 3.246802 0.8925706
S4B 40 0.9570249 3.350694 0.9083230
S4S 40 0.9363446 3.097474 0.8396788
S5B 32 0.9271578 2.978541 0.8594252
S5S 33 0.9253250 2.993341 0.8560944
S6B 41 0.9439556 3.213508 0.8653415
S6S 28 0.8247810 2.440971 0.7325395
S7B 36 0.9435901 3.157702 0.8811735
S7S 44 0.9488744 3.278296 0.8663140
S8B 33 0.9452117 3.108474 0.8890226
S8S 39 0.9517578 3.307644 0.9028494
S9B 36 0.9443038 3.105908 0.8667202
S9S 33 0.9487545 3.180393 0.9095912
dominance_relative
S11B 0.10274791
S1B 0.10274791
S2B 0.09677419
S2S 0.10991637
S3B 0.10991637
S3S 0.10991637
S4B 0.08721625
S4S 0.14336918
S5B 0.16606930
S5S 0.18876941
S6B 0.13620072
S6S 0.38351254
S7B 0.13022700
S7S 0.10633214
S8B 0.09438471
S8S 0.09677419
S9B 0.09438471
S9S 0.10394265
What can you notice for one sample?
How to show this graphically?
4.1.2 Add the alpha indices result to your metadata (sample_data) phyloseq object
Important because many times you will probably want to add new variables in the phyloseq class object!!!
#Turn into sample_data object : sample_data function
<- phyloseq::sample_data(alpha_indices)
alpha_indices #See
class(alpha_indices)
[1] "sample_data"
attr(,"package")
[1] "phyloseq"
#Add alpha_indices to phyloseq sample_data object: merge_phyloseq function!
<- phyloseq::merge_phyloseq(physeq_rar, alpha_indices)
physeq_rar #See the result
sample_data(physeq_rar)
4.2 Alpha diversity representations
This section will show you how to plot by different ways the alpha diversity and its customization. Understand how it works!
4.2.1 Alpha representations using phyloseq::plot_richness()
You are limited to the indices calculated by the phyloseq::estimate_richness function (i.e.”Observed”, “Chao1”, “ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”).
4.2.1.1 Selected indices + SampName
x
allow you to choose the column from sample_data(physeq_rar) for applying the label
::plot_richness(physeq_rar, x = "SampName",
phyloseqmeasures = c("Observed", "Shannon", "Simpson"))
4.2.2 Color by group: color = Geo
& change sample name
For color option pass the column of sample_data(physeq_rar)
that you want. Here different colors is applied depending on Geo
(which is North and South, so 2 different colors)
::plot_richness(physeq_rar,
phyloseqx = "Description",
color="Geo",
measures=c("Observed", "Shannon", "Simpson"))
4.2.3 Make box_plot by adding geom_boxplot function
::plot_richness(physeq_rar,
phyloseqx="Geo",
color="Geo",
measures=c("Observed", "Shannon", "Simpson")) +
::geom_boxplot() ggplot2
4.2.4 Make box_plot : geom_boxplot + fill color of boxplot (fill) + transparency (with alpha)
::plot_richness(physeq_rar,
phyloseqx = "Geo",
measures = c("Observed", "Shannon", "Simpson")) +
::geom_boxplot(aes(fill = Geo), alpha = 0.4) ggplot2
4.2.5 Alpha representations using Microbiome::boxplot_alpha
(not shown)
Again, you are limited to the indices calculated by the Microbiome::alpha function
4.2.6 Alpha representations using ggplot2
Interest: Freedom!! you can use ANY indices that you have calculated from different packages & included in sample_data
#Before : Change your phyloseq class oject sample_data as a dataframe
<- data.frame(sample_data(physeq_rar)) metadata
4.2.6.1 Boxplot, color control, points and Mean SD: stat_summary()
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color=c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
stat_summary(fun = mean, geom = "point", shape = 17, size = 3, color = "white") +
stat_summary(fun.data = "mean_se", geom = "errorbar", width = .1, color = "white")
4.2.6.2 Combine graphs on same figure: patchwork
#Put your graphs in different variables P1,P2,P3
<- ggplot(metadata, aes(x = Geo, y = observed)) +
p1 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB","#E7B800"),
color=c("#00AFBB","#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
<- ggplot(metadata, aes(x = Geo, y = evenness_pielou)) +
p2 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
<- ggplot(metadata, aes(x = Geo, y = diversity_gini_simpson)) +
p3 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
#Put the graph of p1, p2 and p3 on same Figure
+ p2 + p3 +
p1 ::plot_annotation(tag_levels = "A") +
patchwork::plot_layout(guides = "collect") patchwork
5 Statistical hypothesis for alpha diversity
5.0.1 Normality test: Check the Normal or not normal distribution of your data to choose the right test!
5.0.1.1 Shapiro test: H0 Null Hypothesis: follows Normal distribution!
Means if p<0.05 -> reject the H0 (so does not follow a normal distribution)
5.0.1.2 Q-Qplots: Compare your distribution with a theoretical normal distribution
If your data follow a normal distribution, you’re expecting a linear relationship theoritical vs. experimental
Our custom function indices_normality()
(defined in R/alpha_diversity.R
) plots the results of Shapiro test as well as Q-Qplots.
5.0.2 Select indices to test & run normality check
|>
metadata ::select(observed,
dplyr
diversity_gini_simpson,
diversity_shannon,|>
evenness_pielou) indices_normality(nrow = 3, ncol = 2)
What are your conclusions?
5.1 ANOVA: parametric (follows normal distribution) AND at least 3 groups
5.1.0.1 Anova for Observed ASV and 4 groups
# How many groups used? See the column "groupe" of metadata:
factor(metadata$groupe)
[1] SGF NBF NBF NBS NBF NBS NBF NBS NBF NBS SGF SGS SGF SGS SGF SGS SGF SGS
Levels: NBF NBS SGF SGS
5.1.0.2 Variance
# Check homogeneity of variance between groups
# (avoid bias in ANOVA result & keep the power of the test)
# H0= equality of variances in the different populations
::bartlett.test(observed ~ groupe, metadata) stats
Bartlett test of homogeneity of variances
data: observed by groupe
Bartlett's K-squared = 3.1798, df = 3, p-value = 0.3647
Conclusion?
5.1.1 Alternative to Bartlett : Levene test (package car
), less sensitive to normality deviation
Global Test: Anova tell you if that some of the group means are different, but you don’t know which pairs of groups are different!
<- stats::aov(observed ~ groupe, metadata)
aov_observed summary(aov_observed)
Df Sum Sq Mean Sq F value Pr(>F)
groupe 3 13.03 4.343 0.211 0.887
Residuals 14 288.75 20.625
5.1.1.1 Which pairs of groups are different? -> Post-hoc test: Tukey multiple pairwise-comparisons
<- stats::TukeyHSD(aov_observed, method = "bh")
signif_pairgroups signif_pairgroups
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: stats::aov(formula = observed ~ groupe, data = metadata)
$groupe
diff lwr upr p adj
NBS-NBF -1.45 -10.304898 7.404898 0.9631679
SGF-NBF -1.80 -10.148478 6.548478 0.9217657
SGS-NBF -2.20 -11.054898 6.654898 0.8866424
SGF-NBS -0.35 -9.204898 8.504898 0.9994302
SGS-NBS -0.75 -10.083882 8.583882 0.9953019
SGS-SGF -0.40 -9.254898 8.454898 0.9991510
5.2 Kruskal-Wallis: non-parametric & at least three groups
5.2.0.1 Kruskal for diversity_shannon and 4 groups
Global test
::kruskal.test(diversity_shannon ~ groupe, data = metadata) stats
Kruskal-Wallis rank sum test
data: diversity_shannon by groupe
Kruskal-Wallis chi-squared = 2.9544, df = 3, p-value = 0.3987
5.2.0.2 Post hoc test: Dunn test (pairwise group test)
<- FSA::dunnTest(diversity_shannon ~ groupe,
signifgroup data = metadata,
method = "bh")
Warning: groupe was coerced to a factor.
#See
signifgroup
Comparison Z P.unadj P.adj
1 NBF - NBS 1.5218359 0.1280502 0.7683013
2 NBF - SGF 1.2439326 0.2135244 0.6405731
3 NBS - SGF -0.3490449 0.7270556 0.7270556
4 NBF - SGS 0.4048921 0.6855568 0.8226682
5 NBS - SGS -1.0596259 0.2893148 0.5786297
6 SGF - SGS -0.7678988 0.4425473 0.6638209
5.3 T-test: parametric, 2 groups (i.e North Vs. Sud)
::bartlett.test(observed ~ Geo, metadata) stats
Bartlett test of homogeneity of variances
data: observed by Geo
Bartlett's K-squared = 0.38191, df = 1, p-value = 0.5366
<- stats::t.test(observed ~ Geo, data = metadata)
observed_ttest #see
observed_ttest
Welch Two Sample t-test
data: observed by Geo
t = 0.66008, df = 15.246, p-value = 0.5191
alternative hypothesis: true difference in means between group North and group South is not equal to 0
95 percent confidence interval:
-2.966072 5.632739
sample estimates:
mean in group North mean in group South
37.55556 36.22222
5.4 Wilcoxon rank sum: non-parametric & 2 Groups
<- ggpubr::compare_means(diversity_shannon ~ Geo,
pairwise_test
metadata,method = "wilcox.test")
#See
pairwise_test
# A tibble: 1 × 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 diversity_shannon South North 0.863 0.86 0.86 ns Wilcoxon
5.4.1 Boxplot representation with p-value information
#Boxplot as previously seen
<- ggplot(metadata, aes(x = Geo, y = diversity_shannon)) +
graph_shan geom_boxplot(alpha=0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe),
position = position_jitter(0.02) ,
cex=2.2)+
stat_summary(fun = mean, geom = "point",
shape = 17, size = 3,
color = "white")
#Add p-value on graph
+ ggpubr::stat_pvalue_manual(
graph_shan
pairwise_test,y.position = 3.5,
label = "p.adj = {p.adj}",
color = "blue",
linetype = 1,
tip.length = 0.01
)
6 Taxonomy: barplot graph
6.1 Abundance Transformation
6.1.1 Counts in percentage using phyloseq::transform_sample_counts()
<- phyloseq::transform_sample_counts(physeq_rar, function(x) x/sum(x) * 100) pourcentS
See plot:
::plot_bar(pourcentS) phyloseq
What are the separation lines?
6.1.2 Summarise at a given taxonomic level with phyloseq::tax_glom()
Remember ranks can be obtained with phyloseq::rank_names()
::rank_names(pourcentS) phyloseq
[1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
<- phyloseq::tax_glom(pourcentS,
Phylum_glom taxrank = "Phylum",
NArm = FALSE)
#Plot at Phylum taxonomic rank, with color
::plot_bar(Phylum_glom, fill = "Phylum") phyloseq
NArm?
6.1.3 Filter phylum (mean of the line): phyloseq::filter_taxa()
Let’s filter out the phylums with a mean relative abundance inferior to 1%
<- phyloseq::filter_taxa(Phylum_glom,
Phylum_1 flist = function(x) mean(x) >= 1,
prune = TRUE)
#Plot at Phylum taxonomic rank, with color
::plot_bar(Phylum_1, fill = "Phylum") phyloseq
6.1.4 How to save a table into a file: exemple of phylum taxonomic table
write.table(df_export(otu_table(Phylum_glom)),
row.names = FALSE,
file = file.path(output_alpha, "Phylum_pourcent.tsv"),
sep = "\t")
6.1.5 Remove black lines
::plot_bar(Phylum_glom, "Description", fill = "Phylum") +
phyloseqgeom_bar(aes(colour = Phylum), stat = "identity")
6.2 Microbiome package
6.2.1 microbiome::aggregate_taxa()
# Order Rank
<- microbiome::aggregate_taxa(pourcentS, "Order")
Order_microb
#Filter at 1%
<- phyloseq::filter_taxa(Order_microb, function(x) mean(x) >= 1, prune = TRUE) Order1
6.2.2 microbiome::plot_composition()
<- microbiome::plot_composition(Order1,
p_order otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE) +
::labs(x = "", y = "Relative abundance (%)")
ggplot2#see
p_order
#Average by group :average_by option
<- microbiome::plot_composition(Order1,
p_order_groupe otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE,
average_by = "Geo") +
::labs(x = "", y = "Relative abundance (%)")
ggplot2
#see
p_order_groupe
6.2.3 Interactive barplot with plotly::ggplotly()
::ggplotly(p_order) plotly
6.2.4 How to manage colors in barplots
With the number of Phyla, Order etc a barplot can become very confusing… Need to have distinct color for each taxonomic groups.
Use the library RColorBrewer
et scale_fill_manual()
See here to understand the possibilities
You can visualise RColorBrewer
’s palettes with the following command:
::display.brewer.all() RColorBrewer
6.2.5 Build your own palette
Let’s assemble from two RColorBrewer’s palettes a single 13 colors palette
#See Set2 colors
<- RColorBrewer::brewer.pal(name = "Set2", n = 8)) (col1
[1] "#66C2A5" "#FC8D62" "#8DA0CB" "#E78AC3" "#A6D854" "#FFD92F" "#E5C494"
[8] "#B3B3B3"
#See Paired colors
<- RColorBrewer::brewer.pal(name = "Paired", n = 5)) (col2
[1] "#A6CEE3" "#1F78B4" "#B2DF8A" "#33A02C" "#FB9A99"
#Build your set of colors using brewer.pal or your own colors
<- c(col1, col2) mycolors
6.2.6 Use your palette in the p_order barplot
#Use scale_fill_manual
+
p_order ::scale_fill_manual("Order", values = mycolors) +
ggplot2theme(legend.position = "right",
legend.text = element_text(size=8))
6.2.7 To go even further in chosing colors
<- c(
order_color "SAR11 clade" = "cyan3",
"Enterobacterales" = "#33A02C", # lime green
"Synechococcales" = "#8DA0CB", # light steel blue
"Rhodospirillales"="pink",
"Marine Group II"="red",
"Actinomarinales"="purple",
"Pseudomonadales" = "#8B0000", # dark red
"Unknown"="#B3B3B3",
"Flavobacteriales" = "gold2",
"Chloroplast" = "#FDBF6F", # light orange
"Rhodobacterales" = "darkorange1" # dark orange
)
+
p_order ::scale_fill_manual("Order", values = order_color) +
ggplot2theme(legend.position = "right",
legend.text = element_text(size=8))
6.3 Other data Manipulation : select specific taxa, merge samples
6.3.1 Select Actinobacteria AND Bacteroidetes: phyloseq::subset_taxa()
<- phyloseq::subset_taxa(Phylum_glom, Phylum == "Actinobacteriota" | Phylum == "Bacteroidota")
myselection1
::plot_bar(myselection1, x = "Description", fill = "Phylum") phyloseq
::plot_bar(myselection1, x = "Description",
phyloseqfill="Phylum", facet_grid = ~Phylum)
6.3.2 Keep all with the exception of a class, a genus etc (e.g. contamination)
<- phyloseq::subset_taxa(physeq, Class != "Thermoplasmata" | is.na(Class)) myselection2
6.3.3 Understand:
! = means IS NOT
| = AND
Is.na = do not remove the NA (Not Assigned at the Class rank), by default it will be removed. be careful!
6.3.4 Merge samples (groups, duplicates etc)
Use a column from metadata to group/merge samples (North & South)
<- phyloseq::merge_samples(physeq_rar, "Geo")) (NordSud
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
Warning in asMethod(object): NAs introduced by coercion
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 2 samples ]
sample_data() Sample Data: [ 2 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
6.3.5 Sample selection: phyloseq::subset_samples()
<- phyloseq::subset_samples(pourcentS, Geo == "North")) (sub_North
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
refseq() DNAStringSet: [ 209 reference sequences ]
6.3.6 Alternative way: phyloseq::prune_samples
Define what you want to keep
<- c("S1B", "S2S") keep
Then extract these samples from pourcentS phyloseq object
<- phyloseq::prune_samples(keep, pourcentS)
keep2samples sample_names(keep2samples)
[1] "S1B" "S2S"
6.4 Retrieve sequences from a phyloseq object
6.4.1 One sequence:
::writeXStringSet(physeq_rar@refseq["ASV1"],
Biostringsfilepath = file.path(output_alpha,"ASV1.fasta"),
format = "fasta")
6.4.2 By name
<- c("ASV2", "ASV8", "ASV32", "ASV58") listASV
::writeXStringSet(physeq_rar@refseq[listASV],
Biostringsfilepath = file.path(output_alpha,"several_asvs.fasta"),
format = "fasta")
6.4.3 From a selection
Let’s export a fasta files of all ASVs with a maximum relative abundance superior to 10% in North samples:
::subset_samples(pourcentS, Geo == "North") |>
phyloseq::filter_taxa(flist = function(x) max(x) >= 10, prune = TRUE) |>
phyloseq::refseq() |>
phyloseq::writeXStringSet(
Biostringsfilepath = file.path(output_alpha, "fancy_selection_asvs.fasta"),
format = "fasta"
)
6.4.4 Retrieve all sequences
::writeXStringSet(physeq_rar@refseq,
Biostringsfilepath = file.path(output_alpha,"all_asvs.fasta"),
format = "fasta")
7 Core microbiota analysis
Identify the taxa names of the core microbiota
7.0.1 Which core? Compare North & South core microbiota
#Create 2 phyloseq objects for North and South sample groups
<- phyloseq::subset_samples(pourcentS, Geo == "North")
sub_North <- phyloseq::subset_samples(pourcentS, Geo == "South") sub_South
#Check group North ok
@sam_data sub_North
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
Sigma_t observed diversity_gini_simpson diversity_shannon evenness_pielou
S1B 26.0046 35 0.9477924 3.177766 0.8937989
S2B 26.0521 43 0.9577758 3.408022 0.9060997
S2S 26.0137 36 0.9414576 3.112066 0.8684385
S3B 26.2987 41 0.9503989 3.275906 0.8821441
S3S 26.0332 38 0.9491313 3.246802 0.8925706
S4B 26.9415 40 0.9570249 3.350694 0.9083230
S4S 26.9415 40 0.9363446 3.097474 0.8396788
S5B 26.1037 32 0.9271578 2.978541 0.8594252
S5S 26.1065 33 0.9253250 2.993341 0.8560944
dominance_relative
S1B 0.10274791
S2B 0.09677419
S2S 0.10991637
S3B 0.10991637
S3S 0.10991637
S4B 0.08721625
S4S 0.14336918
S5B 0.16606930
S5S 0.18876941
7.0.2 Change first column name of taxonomy rank
Replace “Kingdom” by “Domain”, needed for the use of add_best function
#Before
colnames(sub_North@tax_table)[1]
[1] "Kingdom"
#Apply change for North
colnames(sub_North@tax_table)[1] <- "Domain"
#See
colnames(sub_North@tax_table)[1]
[1] "Domain"
7.0.3 Add the lowest taxonomy classification
<- microbiome::add_besthit(sub_North, sep = ":") sub_North
7.0.4 See the transformation of tax_table
head(sub_North@tax_table)
Taxonomy Table: [6 taxa by 7 taxonomic ranks]:
Domain Phylum Class
ASV1:Synechococcus CC9902 "Bacteria" "Cyanobacteria" "Cyanobacteriia"
ASV2:Pseudoalteromonas "Bacteria" "Proteobacteria" "Gammaproteobacteria"
ASV3:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV4:Marine Group II "Archaea" "Thermoplasmatota" "Thermoplasmata"
ASV5:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV6:Clade II "Bacteria" "Proteobacteria" "Alphaproteobacteria"
Order Family
ASV1:Synechococcus CC9902 "Synechococcales" "Cyanobiaceae"
ASV2:Pseudoalteromonas "Enterobacterales" "Pseudoalteromonadaceae"
ASV3:Clade Ia "SAR11 clade" "Clade I"
ASV4:Marine Group II "Marine Group II" NA
ASV5:Clade Ia "SAR11 clade" "Clade I"
ASV6:Clade II "SAR11 clade" "Clade II"
Genus Species
ASV1:Synechococcus CC9902 "Synechococcus CC9902" NA
ASV2:Pseudoalteromonas "Pseudoalteromonas" NA
ASV3:Clade Ia "Clade Ia" NA
ASV4:Marine Group II NA NA
ASV5:Clade Ia "Clade Ia" NA
ASV6:Clade II NA NA
7.0.5 Identify Core microbiota
#North
<- microbiome::core_members(sub_North,
(core_taxa_north detection = 0.0001,
prevalence = 50/100))
[1] "ASV1:Synechococcus CC9902"
[2] "ASV2:Pseudoalteromonas"
[3] "ASV3:Clade Ia"
[4] "ASV4:Marine Group II"
[5] "ASV5:Clade Ia"
[6] "ASV6:Clade II"
[7] "ASV9:Clade Ia"
[8] "ASV10:Clade Ia"
[9] "ASV11:AEGEAN-169 marine group"
[10] "ASV12:Prochlorococcus MIT9313.marinus"
[11] "ASV16:AEGEAN-169 marine group"
[12] "ASV18:Clade Ib"
[13] "ASV22:Clade Ia"
[14] "ASV23:Clade Ia"
[15] "ASV26:Chloroplast"
[16] "ASV30:Marine Group II"
[17] "ASV32:Dadabacteriales"
[18] "ASV33:SAR324 clade(Marine group B)"
[19] "ASV35:Clade IV"
[20] "ASV37:Marine Group III"
[21] "ASV49:SAR202 clade"
[22] "ASV53:AEGEAN-169 marine group"
7.0.6 Get core microbiota phyloseq object
Get the phyloseq object with also sequences, phylo tree etc.
<- microbiome::core(sub_North,
(phyloseq_core_north detection = 0.0001,
prevalence = .5))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 22 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 26 sample variables ]
tax_table() Taxonomy Table: [ 22 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 22 tips and 21 internal nodes ]
refseq() DNAStringSet: [ 22 reference sequences ]
See full taxanomy of core members
<- as.data.frame(phyloseq::tax_table(phyloseq_core_north))) (tax_mat
Domain Phylum
ASV1:Synechococcus CC9902 Bacteria Cyanobacteria
ASV2:Pseudoalteromonas Bacteria Proteobacteria
ASV3:Clade Ia Bacteria Proteobacteria
ASV4:Marine Group II Archaea Thermoplasmatota
ASV5:Clade Ia Bacteria Proteobacteria
ASV6:Clade II Bacteria Proteobacteria
ASV9:Clade Ia Bacteria Proteobacteria
ASV10:Clade Ia Bacteria Proteobacteria
ASV11:AEGEAN-169 marine group Bacteria Proteobacteria
ASV12:Prochlorococcus MIT9313.marinus Bacteria Cyanobacteria
ASV16:AEGEAN-169 marine group Bacteria Proteobacteria
ASV18:Clade Ib Bacteria Proteobacteria
ASV22:Clade Ia Bacteria Proteobacteria
ASV23:Clade Ia Bacteria Proteobacteria
ASV26:Chloroplast Bacteria Cyanobacteria
ASV30:Marine Group II Archaea Thermoplasmatota
ASV32:Dadabacteriales Bacteria Dadabacteria
ASV33:SAR324 clade(Marine group B) Bacteria SAR324 clade(Marine group B)
ASV35:Clade IV Bacteria Proteobacteria
ASV37:Marine Group III Archaea Thermoplasmatota
ASV49:SAR202 clade Bacteria Chloroflexi
ASV53:AEGEAN-169 marine group Bacteria Proteobacteria
Class Order
ASV1:Synechococcus CC9902 Cyanobacteriia Synechococcales
ASV2:Pseudoalteromonas Gammaproteobacteria Enterobacterales
ASV3:Clade Ia Alphaproteobacteria SAR11 clade
ASV4:Marine Group II Thermoplasmata Marine Group II
ASV5:Clade Ia Alphaproteobacteria SAR11 clade
ASV6:Clade II Alphaproteobacteria SAR11 clade
ASV9:Clade Ia Alphaproteobacteria SAR11 clade
ASV10:Clade Ia Alphaproteobacteria SAR11 clade
ASV11:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV12:Prochlorococcus MIT9313.marinus Cyanobacteriia Synechococcales
ASV16:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV18:Clade Ib Alphaproteobacteria SAR11 clade
ASV22:Clade Ia Alphaproteobacteria SAR11 clade
ASV23:Clade Ia Alphaproteobacteria SAR11 clade
ASV26:Chloroplast Cyanobacteriia Chloroplast
ASV30:Marine Group II Thermoplasmata Marine Group II
ASV32:Dadabacteriales Dadabacteriia Dadabacteriales
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV Alphaproteobacteria SAR11 clade
ASV37:Marine Group III Thermoplasmata Marine Group III
ASV49:SAR202 clade Dehalococcoidia SAR202 clade
ASV53:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
Family
ASV1:Synechococcus CC9902 Cyanobiaceae
ASV2:Pseudoalteromonas Pseudoalteromonadaceae
ASV3:Clade Ia Clade I
ASV4:Marine Group II <NA>
ASV5:Clade Ia Clade I
ASV6:Clade II Clade II
ASV9:Clade Ia Clade I
ASV10:Clade Ia Clade I
ASV11:AEGEAN-169 marine group AEGEAN-169 marine group
ASV12:Prochlorococcus MIT9313.marinus Cyanobiaceae
ASV16:AEGEAN-169 marine group AEGEAN-169 marine group
ASV18:Clade Ib Clade I
ASV22:Clade Ia Clade I
ASV23:Clade Ia Clade I
ASV26:Chloroplast <NA>
ASV30:Marine Group II <NA>
ASV32:Dadabacteriales <NA>
ASV33:SAR324 clade(Marine group B) <NA>
ASV35:Clade IV Clade IV
ASV37:Marine Group III <NA>
ASV49:SAR202 clade <NA>
ASV53:AEGEAN-169 marine group AEGEAN-169 marine group
Genus Species
ASV1:Synechococcus CC9902 Synechococcus CC9902 <NA>
ASV2:Pseudoalteromonas Pseudoalteromonas <NA>
ASV3:Clade Ia Clade Ia <NA>
ASV4:Marine Group II <NA> <NA>
ASV5:Clade Ia Clade Ia <NA>
ASV6:Clade II <NA> <NA>
ASV9:Clade Ia Clade Ia <NA>
ASV10:Clade Ia Clade Ia <NA>
ASV11:AEGEAN-169 marine group <NA> <NA>
ASV12:Prochlorococcus MIT9313.marinus Prochlorococcus MIT9313 marinus
ASV16:AEGEAN-169 marine group <NA> <NA>
ASV18:Clade Ib Clade Ib <NA>
ASV22:Clade Ia Clade Ia <NA>
ASV23:Clade Ia Clade Ia <NA>
ASV26:Chloroplast <NA> <NA>
ASV30:Marine Group II <NA> <NA>
ASV32:Dadabacteriales <NA> <NA>
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV <NA> <NA>
ASV37:Marine Group III <NA> <NA>
ASV49:SAR202 clade <NA> <NA>
ASV53:AEGEAN-169 marine group <NA> <NA>
7.0.7 Visualise core microbiome with microbiome::plot_core()
Visualise the core microbiome of North samples
::plot_core(phyloseq_core_north,
microbiomeplot.type = "heatmap",
colours = rev(RColorBrewer::brewer.pal(8, "RdBu")),
prevalences = seq(from = 0, to = 1, by = .1),
detections = seq(from = 0.1, to = 5, by = 0.2)) +
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
xlab("Detection Threshold (Relative Abundance (%))") +
ylab("ASVs")
Warning in microbiome::plot_core(phyloseq_core_north, plot.type = "heatmap", : The plot_core function is typically used with compositional
data. The data is not compositional. Make sure that you
intend to operate on non-compositional data.
Do the same for the South samples .. please!
What are your conclusions about the comparison between North & South core micobiota at the ASV level?