library(phyloseq)
library(ggplot2)
library(patchwork)
PARTII_alpha_Diversity_ANF
1 Prepare workspace
1.1 Load libraries
1.2 Load custom functions
::load_all() devtools
1.3 Define output folder
<- here::here("outputs", "alpha_diversity")
output_alpha if (!dir.exists(output_alpha)) dir.create(output_alpha, recursive = TRUE)
1.4 Load the data and inspect the phyloseq object
<- readRDS(here::here("data",
physeq "asv_table",
"phyloseq_object_alpha_beta_div.rds"))
2 Data Structure
- Phyloseq object
physeq
phyloseq-class experiment-level object
otu_table() OTU Table: [ 213 taxa and 18 samples ]
sample_data() Sample Data: [ 18 samples by 21 sample variables ]
tax_table() Taxonomy Table: [ 213 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 213 tips and 212 internal nodes ]
refseq() DNAStringSet: [ 213 reference sequences ]
2.1 Composition of our phyloseq object physeq
2.1.1 An ASV table with the absolute counts
Be careful: Rows are samples, columns are ASVs
@otu_table[1:10,1:10] physeq
OTU Table: [10 taxa and 10 samples]
taxa are columns
ASV1 ASV2 ASV3 ASV4 ASV5 ASV6 ASV7 ASV8 ASV9 ASV10
S11B 117 25 85 70 87 40 57 34 41 0
S1B 67 0 23 0 51 48 0 0 27 58
S2B 43 0 35 15 42 52 0 0 0 43
S2S 103 87 76 12 99 43 36 72 46 0
S3B 59 0 32 0 49 73 0 0 0 57
S3S 81 10 0 20 36 0 0 0 0 50
S4B 11 6 38 33 43 46 0 8 0 37
S4S 68 6 38 0 62 0 0 11 30 46
S5B 176 18 62 109 0 35 56 13 33 82
S5S 182 0 36 101 51 0 33 0 29 42
2.1.2 A metadata table with information (e.g. physicochemical, categorical variables) about samples
@sam_data physeq
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S11B S11B South South5B SGF 35 5370 46830 580 6010
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
S6B S6B South South1B SGF 13 2520 39050 705 3630
S6S S6S South South1S SGS 0 2435 35890 915 3735
S7B S7B South South2B SGF 26 0 0 0 4005
S7S S7S South South2S SGS 0 4535 26545 1340 6585
S8B S8B South South3B SGF 33 0 0 0 5910
S8S S8S South South3S SGS 0 4260 36745 985 5470
S9B S9B South South4B SGF 25 4000 31730 485 4395
S9S S9S South South4S SGS 0 5465 32860 820 5045
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S11B 1690 3.324 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
S6B 1295 2.206 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877
S6S 1300 3.004 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884
S7B 1600 3.016 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699
S7S 1355 1.198 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708
S8B 1590 3.868 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054
S8S 2265 3.639 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885
S9B 1180 3.910 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094
S9S 1545 3.607 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777
Sigma_t
S11B 26.9631
S1B 26.0046
S2B 26.0521
S2S 26.0137
S3B 26.2987
S3S 26.0332
S4B 26.9415
S4S 26.9415
S5B 26.1037
S5S 26.1065
S6B 27.3241
S6S 27.3151
S7B 26.7536
S7S 26.7488
S8B 26.9423
S8S 26.8713
S9B 27.0131
S9S 26.8172
2.1.3 A table of taxonomic classification level of each ASV
@tax_table[1:10,] physeq
Taxonomy Table: [10 taxa by 7 taxonomic ranks]:
Kingdom Phylum Class Order
ASV1 "Bacteria" "Cyanobacteria" "Cyanobacteriia" "Synechococcales"
ASV2 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV3 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV4 "Archaea" "Thermoplasmatota" "Thermoplasmata" "Marine Group II"
ASV5 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV6 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV7 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "Rhodospirillales"
ASV8 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Enterobacterales"
ASV9 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
ASV10 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "SAR11 clade"
Family Genus Species
ASV1 "Cyanobiaceae" "Synechococcus CC9902" NA
ASV2 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV3 "Clade I" "Clade Ia" NA
ASV4 NA NA NA
ASV5 "Clade I" "Clade Ia" NA
ASV6 "Clade II" NA NA
ASV7 "AEGEAN-169 marine group" NA NA
ASV8 "Pseudoalteromonadaceae" "Pseudoalteromonas" NA
ASV9 "Clade I" "Clade Ia" NA
ASV10 "Clade I" "Clade Ia" NA
2.1.4 A Phylogenetic tree
@phy_tree physeq
Phylogenetic tree with 213 tips and 212 internal nodes.
Tip labels:
ASV1, ASV2, ASV3, ASV4, ASV5, ASV6, ...
Rooted; includes branch lengths.
2.1.5 A table with the ASV sequences
@refseq physeq
DNAStringSet object of length 213:
width seq names
[1] 402 GGAATTTTCCGCAATGGGCGAA...CGAAAGCCAGGGGAGCGAAAGG ASV1
[2] 425 GGAATATTGCACAATGGGCGCA...CGAAAGCGTGGGGAGCAAACAG ASV2
[3] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV3
[4] 383 CGAAAACTTGACAATGCGAGCA...CGAAGCCTAGGGGCACGAACCG ASV4
[5] 400 GGAATCTTGCACAATGGAGGAA...CGAAAGCATGGGTAGCGAAGAG ASV5
... ... ...
[209] 426 GGAATTTTGCGCAATGGACGAA...CGAAAGCGTGGGGAGCGAACAG ASV209
[210] 403 GGAATATTGCACAATGGGCGCA...GGTCAACACTGACGCTCATGTA ASV210
[211] 360 CGAAAACTTCACACTGCAGGAA...GAACGGATCCGACGGTCAGGGA ASV211
[212] 400 GGAATATTGGACAATGGGCGAA...CGAAAGCGTGGGTAGCAAACAG ASV212
[213] 404 GGAATATTGCACAATGGGCGCA...GTCAACACTGACGCTCATGTAC ASV213
3 Subsampling normalization
3.1 Rarefaction Curves
Before normalization by sub-sampling, let’s have a look at rarefaction curves, evaluate your sequencing effort and make decisions
3.1.1 Identify your minimum sample size
::sample_sums(physeq) phyloseq
S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S
975 837 893 983 878 889 917 1077 1018 1006 1076 937 878 936 846 958
S9B S9S
888 991
What is the minimum sample size?
3.1.2 Run rarefaction curves using our custom function ggrare()
(defined in R/alpha_diversity.R
)
#Make rarefaction curves & Add min sample size line
ggrare(physeq, step = 10, color = "Description", se = FALSE) +
geom_vline(xintercept = min(sample_sums(physeq)), color = "gray60")
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
Do you think is a good idea to normalize your data using this minimal sample size?
3.2 Normalization process for alpha diversity: sub-sampling
<- phyloseq::rarefy_even_depth(physeq, rngseed = TRUE) physeq_rar
Check the number of sequences for each sample using sample_sums function
Did you lost a lot of ASVs?
3.3 Run rarefaction curves on normalized data
<- ggrare(physeq_rar, step = 10, color = "Description", se = TRUE) p0
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 2
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 3
Warning in vegan::rarefy(x[i, , drop = FALSE], n, se = se): most observed count
data have counts 1, but smallest count is 4
3.4 Group separation
+ facet_wrap(~Geo, ncol = 2) p0
3.5 Rank Abundance Curves
<- data.frame(phyloseq::otu_table(physeq_rar))
tableASV <- data.frame(physeq_rar@sam_data)
metadonnees $Geo <- factor(metadonnees$Geo)
metadonnees<- rankabuncomp(tableASV, y=metadonnees, factor='Geo',scale='logabun', legend="topright") ab_ranktab
4 IV-Alpha Diversity
4.1 Indices
4.1.1 Get taxonomy-based diversity indices
#Get indices with alpha function (NB: index="all" if you want all the indices)
<- microbiome::alpha(
alpha_indices
physeq_rar,index = c("observed", "diversity_gini_simpson",
"diversity_shannon", "evenness_pielou",
"dominance_relative")
)
#save
write.table(alpha_indices,
file = file.path(output_alpha, "indices_alpha_resultat.txt"),
sep = "\t")
#which type?
class(alpha_indices)
[1] "data.frame"
#see
alpha_indices
observed diversity_gini_simpson diversity_shannon evenness_pielou
S11B 36 0.94478631 3.1464802 0.87804201
S1B 35 0.94779244 3.1777661 0.89379888
S2B 43 0.95777575 3.4080222 0.90609969
S2S 36 0.94145759 3.1120657 0.86843848
S3B 41 0.95039889 3.2759058 0.88214412
S3S 38 0.94913135 3.2468023 0.89257055
S4B 40 0.95702493 3.3506939 0.90832296
S4S 40 0.93634460 3.0974738 0.83967878
S5B 32 0.92715778 2.9785407 0.85942517
S5S 33 0.92532499 2.9933406 0.85609441
S6B 41 0.94395556 3.2135081 0.86534151
S6S 28 0.82478100 2.4409713 0.73253948
S7B 36 0.94359014 3.1577021 0.88117355
S7S 44 0.94887441 3.2782965 0.86631400
S8B 33 0.94521168 3.1084743 0.88902263
S8S 39 0.95175779 3.3076444 0.90284939
S9B 36 0.94430384 3.1059084 0.86672024
S9S 33 0.94875451 3.1803927 0.90959124
dominance_relative
S11B 0.102747909
S1B 0.102747909
S2B 0.096774194
S2S 0.109916368
S3B 0.109916368
S3S 0.109916368
S4B 0.087216249
S4S 0.143369176
S5B 0.166069295
S5S 0.188769415
S6B 0.136200717
S6S 0.383512545
S7B 0.130227001
S7S 0.106332139
S8B 0.094384707
S8S 0.096774194
S9B 0.094384707
S9S 0.103942652
What can you notice for one sample?
How to show this graphically?
4.1.2 Add the alpha indices result to your metadata (sample_data) phyloseq object
Important because many times you will probably want to add new variables in the phyloseq class object!!!
#Turn into sample_data object : sample_data function
<- phyloseq::sample_data(alpha_indices)
alpha_indices #See
class(alpha_indices)
[1] "sample_data"
attr(,"package")
[1] "phyloseq"
#Add alpha_indices to phyloseq sample_data object: merge_phyloseq function!
<- phyloseq::merge_phyloseq(physeq_rar, alpha_indices)
physeq_rar #See the result
sample_data(physeq_rar)
4.1.3 Get phylogeny based diversity indices: get_NRI_NTI function
#CalculateNRI,NTI,PD...: get_NRI_NTI function
<- MicrobiotaProcess::get_NRI_NTI(physeq_rar,
ind_comp abundance.weighted = FALSE,
metric = "all",
seed = 123)
#Retrieve only those of interest :select function, results are in ind_comp@alpha
<- as.data.frame(ind_comp@alpha)
indi_comp <- dplyr::select(indi_comp, NRI:PD)
NRI_NTI_PB #see
NRI_NTI_PB
NRI NTI PD
S11B -1.700640519 -1.73488080003 3.5323774
S1B 1.176239678 -0.76340361000 3.3155461
S2B -0.261257404 -0.66474586424 3.6970066
S2S -2.455150637 -0.74180248862 3.4222234
S3B 0.620451099 -1.29874536798 3.6834814
S3S -0.141085299 -0.73453694988 3.4403182
S4B -0.485562292 0.00455541779 3.3171476
S4S 1.773462103 -0.91361917079 3.4365770
S5B 0.156984003 0.13152798473 2.9592082
S5S -0.134251689 1.89245043841 2.4738611
S6B 0.218507592 1.08807022489 3.1227876
S6S 2.631228189 0.00037617274 2.4229824
S7B -1.470358724 -0.63713654961 3.1692736
S7S -2.518357417 0.70828292578 3.4530890
S8B -0.091630593 0.74473762595 2.9358002
S8S -1.205969382 0.44642821613 3.3017637
S9B -0.073949264 0.31007463902 2.8807192
S9S 0.443035146 2.45614696757 2.4373593
4.1.4 Again!!! Add the phylogenetic indices to your metadata (sample_data) phyloseq object
#Turn into sample_data object : sample_data function
<- phyloseq::sample_data(NRI_NTI_PB)
NRI_NTI_PB #Add alpha_indices to phyloseq sample_data object: merge_phyloseq function!
<- phyloseq::merge_phyloseq(physeq_rar, NRI_NTI_PB)
physeq_rar #See the result with all the indices included
sample_data(physeq_rar)
Can you give me one of the most diversified sample based on Simpson/Shannon/Richness/Pielou/PD values observed?
4.2 Alpha diversity representations
This section will show you how to plot by different ways the alpha diversity and its customization. Understand how it works!
4.2.1 Alpha representations using phyloseq::plot_richness()
You are limited to the indices calculated by the phyloseq::estimate_richness function (i.e.”Observed”, “Chao1”, “ACE”, “Shannon”, “Simpson”, “InvSimpson”, “Fisher”).
4.2.1.1 Selected indices + SampName
x
allow you to choose the column from sample_data(physeq_rar) for applying the label
::plot_richness(physeq_rar, x = "SampName",
phyloseqmeasures = c("Observed", "Shannon", "Simpson"))
4.2.2 Color by group: color = Geo
& change sample name
For color option pass the column of sample_data(physeq_rar)
that you want. Here different colors is applied depending on Geo
(which is North and South, so 2 different colors)
::plot_richness(physeq_rar,
phyloseqx = "Description",
color="Geo",
measures=c("Observed", "Shannon", "Simpson"))
4.2.3 Make box_plot by adding geom_boxplot function
::plot_richness(physeq_rar,
phyloseqx="Geo",
color="Geo",
measures=c("Observed", "Shannon", "Simpson")) +
::geom_boxplot() ggplot2
4.2.4 Make box_plot : geom_boxplot + fill color of boxplot (fill) + transparency (with alpha)
::plot_richness(physeq_rar,
phyloseqx = "Geo",
measures = c("Observed", "Shannon", "Simpson")) +
::geom_boxplot(aes(fill = Geo), alpha = 0.4) ggplot2
4.2.5 Alpha representations using Microbiome::boxplot_alpha
(not shown)
Again, you are limited to the indices calculated by the Microbiome::alpha function
4.2.6 Alpha representations using ggplot2
Interest: Freedom!! you can use ANY indices that you have calculated from different packages & included in sample_data
#Before : Change your phyloseq class oject sample_data as a dataframe
<- data.frame(sample_data(physeq_rar)) metadata
4.2.6.1 basic: points & color
#You use the columns of the metadata (Geo, observed, groupe etc)
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_point(aes(color = groupe, fill = groupe))
4.2.6.2 Deals with superposed points: geom_dotplot()
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_dotplot(binaxis = "y", stackdir = "center", stackgroups = TRUE,
binwidth = 0.5, aes(color = groupe, fill = groupe)) +
xlab("Geographic position") +
ylab("Number of Observed ASVs")
4.2.6.3 Boxplot & color control : scale_fill
& scale_color
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.7, aes(color = Geo, fill = Geo)) +
scale_fill_manual(values = c("#00AFBB", "#E7B800"))
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.7, aes(color = Geo, fill = Geo)) +
scale_fill_manual(values = c("#00AFBB", "#E7B800")) +
scale_color_manual(values = c("#00AFBB", "#E7B800"))
4.2.6.4 Boxplot, color control & points: geom_jitter()
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800"))+
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2)
4.2.6.5 Boxplot, color control, points and Mean SD: stat_summary()
ggplot(metadata, aes(x = Geo, y = observed)) +
geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color=c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
stat_summary(fun = mean, geom = "point", shape = 17, size = 3, color = "white") +
stat_summary(fun.data = "mean_se", geom = "errorbar", width = .1, color = "white")
4.2.6.6 Combine graphs on same figure: patchwork
#Put your graphs in different variables P1,P2,P3
<- ggplot(metadata, aes(x = Geo, y = observed)) +
p1 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB","#E7B800"),
color=c("#00AFBB","#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
<- ggplot(metadata, aes(x = Geo, y = evenness_pielou)) +
p2 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
<- ggplot(metadata, aes(x = Geo, y = diversity_gini_simpson)) +
p3 geom_boxplot(alpha = 0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe), position = position_jitter(0.07), cex = 2.2) +
theme(axis.title.x = element_blank())
#Put the graph of p1, p2 and p3 on same Figure
+ p2 + p3 +
p1 ::plot_annotation(tag_levels = "A") +
patchwork::plot_layout(guides = "collect") patchwork
5 Statistical hypothesis for alpha diversity
5.0.1 Normality test: Check the Normal or not normal distribution of your data to choose the right test!
5.0.1.1 Shapiro test: H0 Null Hypothesis: follows Normal distribution!
Means if p<0.05 -> reject the H0 (so does not follow a normal distribution)
5.0.1.2 Q-Qplots: Compare your distribution with a theoretical normal distribution
If your data follow a normal distribution, you’re expecting a linear relationship theoritical vs. experimental
Our custom function indices_normality()
(defined in R/alpha_diversity.R
) plots the results of Shapiro test as well as Q-Qplots.
5.0.2 Select indices to test & run normality check
|>
metadata ::select(observed,
dplyr
diversity_gini_simpson,
diversity_shannon,
evenness_pielou,|>
PD) indices_normality(nrow = 3, ncol = 2)
What are your conclusions?
5.1 ANOVA: parametric (follows normal distribution) AND at least 3 groups
5.1.0.1 Anova for Observed ASV and 4 groups
# How many groups used? See the column "groupe" of metadata:
factor(metadata$groupe)
[1] SGF NBF NBF NBS NBF NBS NBF NBS NBF NBS SGF SGS SGF SGS SGF SGS SGF SGS
Levels: NBF NBS SGF SGS
5.1.0.2 Variance
# Check homogeneity of variance between groups
# (avoid bias in ANOVA result & keep the power of the test)
# H0= equality of variances in the different populations
::bartlett.test(observed ~ groupe, metadata) stats
Bartlett test of homogeneity of variances
data: observed by groupe
Bartlett's K-squared = 3.17979, df = 3, p-value = 0.36473
Conclusion?
5.1.1 Alternative to Bartlett : Levene test (package car
), less sensitive to normality deviation
Global Test: Anova tell you if that some of the group means are different, but you don’t know which pairs of groups are different!
<- stats::aov(observed ~ groupe, metadata)
aov_observed summary(aov_observed)
Df Sum Sq Mean Sq F value Pr(>F)
groupe 3 13.028 4.3426 0.2105 0.8874
Residuals 14 288.750 20.6250
5.1.1.1 Which pairs of groups are different? -> Post-hoc test: Tukey multiple pairwise-comparisons
<- stats::TukeyHSD(aov_observed, method = "bh")
signif_pairgroups signif_pairgroups
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: stats::aov(formula = observed ~ groupe, data = metadata)
$groupe
diff lwr upr p adj
NBS-NBF -1.45 -10.304898 7.4048980 0.96316788
SGF-NBF -1.80 -10.148478 6.5484779 0.92176565
SGS-NBF -2.20 -11.054898 6.6548980 0.88664240
SGF-NBS -0.35 -9.204898 8.5048980 0.99943022
SGS-NBS -0.75 -10.083882 8.5838820 0.99530194
SGS-SGF -0.40 -9.254898 8.4548980 0.99915104
5.2 Kruskal-Wallis: non-parametric & at least three groups
5.2.0.1 Kruskal for diversity_shannon and 4 groups
Global test
::kruskal.test(diversity_shannon ~ groupe, data = metadata) stats
Kruskal-Wallis rank sum test
data: diversity_shannon by groupe
Kruskal-Wallis chi-squared = 2.95439, df = 3, p-value = 0.39871
5.2.0.2 Post hoc test: Dunn test (pairwise group test)
<- FSA::dunnTest(diversity_shannon ~ groupe,
signifgroup data = metadata,
method = "bh")
Warning: groupe was coerced to a factor.
#See
signifgroup
Comparison Z P.unadj P.adj
1 NBF - NBS 1.52183587 0.12805021 0.76830127
2 NBF - SGF 1.24393264 0.21352435 0.64057306
3 NBS - SGF -0.34904492 0.72705558 0.72705558
4 NBF - SGS 0.40489211 0.68555682 0.82266818
5 NBS - SGS -1.05962589 0.28931483 0.57862966
6 SGF - SGS -0.76789883 0.44254729 0.66382094
5.3 T-test: parametric, 2 groups (i.e North Vs. Sud)
::bartlett.test(observed ~ Geo, metadata) stats
Bartlett test of homogeneity of variances
data: observed by Geo
Bartlett's K-squared = 0.38191, df = 1, p-value = 0.53658
<- stats::t.test(observed ~ Geo, data = metadata)
observed_ttest #see
observed_ttest
Welch Two Sample t-test
data: observed by Geo
t = 0.660078, df = 15.246, p-value = 0.51905
alternative hypothesis: true difference in means between group North and group South is not equal to 0
95 percent confidence interval:
-2.9660719 5.6327386
sample estimates:
mean in group North mean in group South
37.555556 36.222222
5.4 Wilcoxon rank sum: non-parametric & 2 Groups
<- ggpubr::compare_means(diversity_shannon ~ Geo,
pairwise_test
metadata,method = "wilcox.test")
#See
pairwise_test
# A tibble: 1 × 8
.y. group1 group2 p p.adj p.format p.signif method
<chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 diversity_shannon South North 0.863 0.86 0.86 ns Wilcoxon
5.4.1 Boxplot representation with p-value information
#Boxplot as previously seen
<- ggplot(metadata, aes(x = Geo, y = diversity_shannon)) +
graph_shan geom_boxplot(alpha=0.6,
fill = c("#00AFBB", "#E7B800"),
color = c("#00AFBB", "#E7B800")) +
geom_jitter(aes(colour = groupe),
position = position_jitter(0.02) ,
cex=2.2)+
stat_summary(fun = mean, geom = "point",
shape = 17, size = 3,
color = "white")
#Add p-value on graph
+ ggpubr::stat_pvalue_manual(
graph_shan
pairwise_test,y.position = 3.5,
label = "p.adj = {p.adj}",
color = "blue",
linetype = 1,
tip.length = 0.01
)
6 Correlation & linear Regression
6.1 Correlation analysis
Methods available are spearman
, kendall
and pearson
. Correlation coefficient r is independent of change of origin and scale (So no data transformation!!). Correlation analysis describes the nature (strength (0->1) and direction +/-) of the relationship between two variables (r), whatever the range and the measurement units of them.
Considerations for statistical tests (test of the value being zero): * Pearson’s test is parametric (normal distribution required) * Spearman’s and Kendall’s tests are non-parametric
6.1.1 Select variables
#Select variables for bivariate correlation
<- dplyr::select(metadata, SiOH4:PO4,diversity_shannon)
myvariables #see
myvariables
SiOH4 NO2 NO3 NH4 PO4 diversity_shannon
S11B 3.324 0.083 0.756 0.467 0.115 3.1464802
S1B 1.813 0.256 0.889 0.324 0.132 3.1777661
S2B 2.592 0.105 1.125 0.328 0.067 3.4080222
S2S 3.381 0.231 0.706 0.450 0.109 3.1120657
S3B 1.438 0.057 1.159 0.369 0.174 3.2759058
S3S 1.656 0.098 0.794 0.367 0.095 3.2468023
S4B 2.457 0.099 1.087 0.349 0.137 3.3506939
S4S 2.457 0.099 1.087 0.349 0.137 3.0974738
S5B 2.028 0.103 1.135 0.216 0.128 2.9785407
S5S 2.669 0.136 0.785 0.267 0.114 2.9933406
S6B 2.206 0.249 0.768 0.629 0.236 3.2135081
S6S 3.004 0.251 0.927 0.653 0.266 2.4409713
S7B 3.016 0.157 0.895 0.491 0.176 3.1577021
S7S 1.198 0.165 1.099 0.432 0.180 3.2782965
S8B 3.868 0.253 0.567 0.533 0.169 3.1084743
S8S 3.639 0.255 0.658 0.665 0.247 3.3076444
S9B 3.910 0.107 0.672 0.490 0.134 3.1059084
S9S 3.607 0.139 0.644 0.373 0.167 3.1803927
6.1.2 Apply the method
#Apply method pearson
<- stats::cor(myvariables, method = "pearson")
matrixCor #see
matrixCor
SiOH4 NO2 NO3 NH4 PO4
SiOH4 1.00000000 0.26802252 -0.727703077 0.44354653 0.11949381
NO2 0.26802252 1.00000000 -0.484402421 0.62864562 0.57757348
NO3 -0.72770308 -0.48440242 1.000000000 -0.49147592 -0.16912716
NH4 0.44354653 0.62864562 -0.491475923 1.00000000 0.77443570
PO4 0.11949381 0.57757348 -0.169127163 0.77443570 1.00000000
diversity_shannon -0.20728823 -0.29316236 0.094763958 -0.24014733 -0.37791579
diversity_shannon
SiOH4 -0.207288231
NO2 -0.293162363
NO3 0.094763958
NH4 -0.240147333
PO4 -0.377915793
diversity_shannon 1.000000000
# we use a function defined in R/utils.R
# to move the row names content to a new column
df_export(matrixCor, new_rn = "variable")
variable SiOH4 NO2 NO3 NH4
1 SiOH4 1.00000000 0.26802252 -0.727703077 0.44354653
2 NO2 0.26802252 1.00000000 -0.484402421 0.62864562
3 NO3 -0.72770308 -0.48440242 1.000000000 -0.49147592
4 NH4 0.44354653 0.62864562 -0.491475923 1.00000000
5 PO4 0.11949381 0.57757348 -0.169127163 0.77443570
6 diversity_shannon -0.20728823 -0.29316236 0.094763958 -0.24014733
PO4 diversity_shannon
1 0.11949381 -0.207288231
2 0.57757348 -0.293162363
3 -0.16912716 0.094763958
4 0.77443570 -0.240147333
5 1.00000000 -0.377915793
6 -0.37791579 1.000000000
# we can now export
write.table(df_export(matrixCor, new_rn = "variable"),
file.path(output_alpha, "correlation_matrix.tsv"),
row.names = FALSE,
sep = "\t",
quote = FALSE)
6.1.3 Plot results: corrplot function
::corrplot(
corrplot
matrixCor,method="circle",
type="lower",
order='hclust',
tl.col = "black",
tl.srt = 45,
tl.cex=0.9,
diag = FALSE
)
6.1.4 Is the correlation is due to chance? Significance test!
The idea: Test the correlation at the population scale (=Rho) and compare to r (your samples). HO is : there is not a significant linear correlation between x and y in the population. For instance t-test allows to use sample data to generalize an assumption to an entire population.
#Test stats
<- corrplot::cor.mtest(matrixCor, conf.level = .95)
ptest #The p-value are stored in ptest$p
#see
$p ptest
SiOH4 NO2 NO3 NH4 PO4
SiOH4 0.0000000000 0.188313687 0.0045596985 0.110136746 0.425648355
NO2 0.1883136871 0.000000000 0.0721195375 0.012780256 0.035320670
NO3 0.0045596985 0.072119538 0.0000000000 0.051193376 0.300785372
NH4 0.1101367463 0.012780256 0.0511933761 0.000000000 0.013913210
PO4 0.4256483554 0.035320670 0.3007853722 0.013913210 0.000000000
diversity_shannon 0.3493471499 0.114960693 0.4193117807 0.112842904 0.052189815
diversity_shannon
SiOH4 0.349347150
NO2 0.114960693
NO3 0.419311781
NH4 0.112842904
PO4 0.052189815
diversity_shannon 0.000000000
6.1.5 Show only correlations with significant p-values
::corrplot(
corrplot
matrixCor,p.mat = ptest$p,
sig.level = .05,
method = "circle",
type = "lower",
order = 'hclust',
tl.col = "black",
tl.srt = 45,
tl.cex = 0.7,
diag = FALSE
)
6.2 Linear regression
Determination coefficient R2 provides percentage variation in y which is explained by all the x together. Its value is (usually) between 0 and 1 and it indicates strength of Linear Regression model. Higher the R2 value, data points are less scattered so it is a good model. Lesser the R2 value is more scattered the data points.
6.2.1 Shannon ~ Observed
ggplot(metadata, aes(x = observed, y = diversity_shannon)) +
geom_point() +
stat_smooth(method = "lm", col = "red") +
::stat_poly_eq(aes(label = paste(after_stat(rr.label),
ggpmiscafter_stat(p.value.label),
sep = "*\", \"*")))
What should be your conclusions…be careful…
What is the r value?
7 Taxonomy: barplot graph
7.1 Abundance Transformation
7.1.1 Counts in percentage using phyloseq::transform_sample_counts()
<- phyloseq::transform_sample_counts(physeq_rar, function(x) x/sum(x) * 100) pourcentS
See plot:
::plot_bar(pourcentS) phyloseq
What are the separation lines?
7.1.2 Summarise at a given taxonomic level with phyloseq::tax_glom()
Remember ranks can be obtained with phyloseq::rank_names()
::rank_names(pourcentS) phyloseq
[1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
<- phyloseq::tax_glom(pourcentS,
Phylum_glom taxrank = "Phylum",
NArm = FALSE)
#Plot at Phylum taxonomic rank, with color
::plot_bar(Phylum_glom, fill = "Phylum") phyloseq
NArm?
7.1.3 Filter phylum (mean of the line): phyloseq::filter_taxa()
Let’s filter out the phylums with a mean relative abundance inferior to 1%
<- phyloseq::filter_taxa(Phylum_glom,
Phylum_1 flist = function(x) mean(x) >= 1,
prune = TRUE)
#Plot at Phylum taxonomic rank, with color
::plot_bar(Phylum_1, fill = "Phylum") phyloseq
7.1.4 How to save a table into a file: exemple of phylum taxonomic table
write.table(df_export(otu_table(Phylum_glom)),
row.names = FALSE,
file = file.path(output_alpha, "Phylum_pourcent.tsv"),
sep = "\t")
7.1.5 Remove black lines
::plot_bar(Phylum_glom, "Description", fill = "Phylum") +
phyloseqgeom_bar(aes(colour = Phylum), stat = "identity")
7.2 Microbiome package
7.2.1 microbiome::aggregate_taxa()
# Order Rank
<- microbiome::aggregate_taxa(pourcentS, "Order")
Order_microb
#Filter at 1%
<- phyloseq::filter_taxa(Order_microb, function(x) mean(x) >= 1, prune = TRUE) Order1
7.2.2 microbiome::plot_composition()
<- microbiome::plot_composition(Order1,
p_order otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE) +
::labs(x = "", y = "Relative abundance (%)")
ggplot2#see
p_order
#Average by group :average_by option
<- microbiome::plot_composition(Order1,
p_order_groupe otu.sort = "abundance",
sample.sort = "Description",
x.label = "Description",
plot.type = "barplot",
verbose = FALSE,
average_by = "Geo") +
::labs(x = "", y = "Relative abundance (%)")
ggplot2
#see
p_order_groupe
7.2.3 Interactive barplot with plotly::ggplotly()
::ggplotly(p_order) plotly
7.2.4 How to manage colors in barplots
With the number of Phyla, Order etc a barplot can become very confusing… Need to have distinct color for each taxonomic groups.
Use the library RColorBrewer
et scale_fill_manual()
See here to understand the possibilities
You can visualise RColorBrewer
’s palettes with the following command:
::display.brewer.all() RColorBrewer
7.2.5 Build your own palette
Let’s assemble from two RColorBrewer’s palettes a single 13 colors palette
#See Set2 colors
<- RColorBrewer::brewer.pal(name = "Set2", n = 8)) (col1
[1] "#66C2A5" "#FC8D62" "#8DA0CB" "#E78AC3" "#A6D854" "#FFD92F" "#E5C494"
[8] "#B3B3B3"
#See Paired colors
<- RColorBrewer::brewer.pal(name = "Paired", n = 5)) (col2
[1] "#A6CEE3" "#1F78B4" "#B2DF8A" "#33A02C" "#FB9A99"
#Build your set of colors using brewer.pal or your own colors
<- c(col1, col2) mycolors
7.2.6 Use your palette in the p_order barplot
#Use scale_fill_manual
+
p_order ::scale_fill_manual("Order", values = mycolors) +
ggplot2theme(legend.position = "right",
legend.text = element_text(size=8))
7.3 Other data Manipulation : select specific taxa, merge samples
7.3.1 Select Actinobacteria AND Bacteroidetes: phyloseq::subset_taxa()
<- phyloseq::subset_taxa(Phylum_glom, Phylum == "Actinobacteriota" | Phylum == "Bacteroidota")
myselection1
::plot_bar(myselection1, x = "Description", fill = "Phylum") phyloseq
::plot_bar(myselection1, x = "Description",
phyloseqfill="Phylum", facet_grid = ~Phylum)
7.3.2 Keep all with the exception of a class, a genus etc (e.g. contamination)
<- phyloseq::subset_taxa(physeq, Class != "Thermoplasmata" | is.na(Class)) myselection2
7.3.3 Understand:
! = means IS NOT
| = AND
Is.na = do not remove the NA (Not Assigned at the Class rank), by default it will be removed. be careful!
7.3.4 Merge samples (groups, duplicates etc)
Use a column from metadata to group/merge samples (North & South)
<- phyloseq::merge_samples(physeq_rar, "Geo")) (NordSud
Warning in asMethod(object): NAs introduits lors de la conversion automatique
Warning in asMethod(object): NAs introduits lors de la conversion automatique
Warning in asMethod(object): NAs introduits lors de la conversion automatique
Warning in asMethod(object): NAs introduits lors de la conversion automatique
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 2 samples ]
sample_data() Sample Data: [ 2 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
7.3.5 Sample selection: phyloseq::subset_samples()
<- phyloseq::subset_samples(pourcentS, Geo == "North")) (sub_North
phyloseq-class experiment-level object
otu_table() OTU Table: [ 209 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 209 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 209 tips and 208 internal nodes ]
refseq() DNAStringSet: [ 209 reference sequences ]
7.3.6 Alternative way: phyloseq::prune_samples
Define what you want to keep
<- c("S1B", "S2S") keep
Then extract these samples from pourcentS phyloseq object
<- phyloseq::prune_samples(keep, pourcentS)
keep2samples sample_names(keep2samples)
[1] "S1B" "S2S"
7.4 Retrieve sequences from a phyloseq object
7.4.1 One sequence:
::writeXStringSet(physeq_rar@refseq["ASV1"],
Biostringsfilepath = file.path(output_alpha,"ASV1.fasta"),
format = "fasta")
7.4.2 By name
<- c("ASV2", "ASV8", "ASV32", "ASV58") listASV
::writeXStringSet(physeq_rar@refseq[listASV],
Biostringsfilepath = file.path(output_alpha,"several_asvs.fasta"),
format = "fasta")
7.4.3 From a selection
Let’s export a fasta files of all ASVs with a maximum relative abundance superior to 10% in North samples:
::subset_samples(pourcentS, Geo == "North") |>
phyloseq::filter_taxa(flist = function(x) max(x) >= 10, prune = TRUE) |>
phyloseq::refseq() |>
phyloseq::writeXStringSet(
Biostringsfilepath = file.path(output_alpha, "fancy_selection_asvs.fasta"),
format = "fasta"
)
7.4.4 Retrieve all sequences
::writeXStringSet(physeq_rar@refseq,
Biostringsfilepath = file.path(output_alpha,"all_asvs.fasta"),
format = "fasta")
8 Core microbiota analysis
Identify the taxa names of the core microbiota
8.0.1 Which core? Compare North & South core microbiota
#Create 2 phyloseq objects for North and South sample groups
<- phyloseq::subset_samples(pourcentS, Geo == "North")
sub_North <- phyloseq::subset_samples(pourcentS, Geo == "South") sub_South
#Check group North ok
@sam_data sub_North
SampName Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk
S1B S1B North North1B NBF 52 660 32195 10675 955
S2B S2B North North2B NBF 59 890 25480 16595 670
S2S S2S North North2S NBS 0 890 25480 16595 670
S3B S3B North North3B NBF 74 835 13340 25115 1115
S3S S3S North North3S NBS 0 715 26725 16860 890
S4B S4B North North4B NBF 78 2220 3130 29835 2120
S4S S4S North North4S NBS 78 2220 3130 29835 2120
S5B S5B North North5B NBF 42 1620 55780 23795 2555
S5S S5S North North5S NBS 0 1620 56555 22835 2560
Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S
S1B 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204
S2B 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627
S2S 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176
S3B 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549
S3S 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960
S4B 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S4S 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542
S5B 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192
S5S 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213
Sigma_t observed diversity_gini_simpson diversity_shannon evenness_pielou
S1B 26.0046 35 0.94779244 3.1777661 0.89379888
S2B 26.0521 43 0.95777575 3.4080222 0.90609969
S2S 26.0137 36 0.94145759 3.1120657 0.86843848
S3B 26.2987 41 0.95039889 3.2759058 0.88214412
S3S 26.0332 38 0.94913135 3.2468023 0.89257055
S4B 26.9415 40 0.95702493 3.3506939 0.90832296
S4S 26.9415 40 0.93634460 3.0974738 0.83967878
S5B 26.1037 32 0.92715778 2.9785407 0.85942517
S5S 26.1065 33 0.92532499 2.9933406 0.85609441
dominance_relative NRI NTI PD
S1B 0.102747909 1.17623968 -0.7634036100 3.3155461
S2B 0.096774194 -0.26125740 -0.6647458642 3.6970066
S2S 0.109916368 -2.45515064 -0.7418024886 3.4222234
S3B 0.109916368 0.62045110 -1.2987453680 3.6834814
S3S 0.109916368 -0.14108530 -0.7345369499 3.4403182
S4B 0.087216249 -0.48556229 0.0045554178 3.3171476
S4S 0.143369176 1.77346210 -0.9136191708 3.4365770
S5B 0.166069295 0.15698400 0.1315279847 2.9592082
S5S 0.188769415 -0.13425169 1.8924504384 2.4738611
8.0.2 Change first column name of taxonomy rank
Replace “Kingdom” by “Domain”, needed for the use of add_best function
#Before
colnames(sub_North@tax_table)[1]
[1] "Kingdom"
#Apply change for North
colnames(sub_North@tax_table)[1] <- "Domain"
#See
colnames(sub_North@tax_table)[1]
[1] "Domain"
8.0.3 Add the lowest taxonomy classification
<- microbiome::add_besthit(sub_North, sep = ":") sub_North
8.0.4 See the transformation of tax_table
head(sub_North@tax_table)
Taxonomy Table: [6 taxa by 7 taxonomic ranks]:
Domain Phylum Class
ASV1:Synechococcus CC9902 "Bacteria" "Cyanobacteria" "Cyanobacteriia"
ASV2:Pseudoalteromonas "Bacteria" "Proteobacteria" "Gammaproteobacteria"
ASV3:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV4:Marine Group II "Archaea" "Thermoplasmatota" "Thermoplasmata"
ASV5:Clade Ia "Bacteria" "Proteobacteria" "Alphaproteobacteria"
ASV6:Clade II "Bacteria" "Proteobacteria" "Alphaproteobacteria"
Order Family
ASV1:Synechococcus CC9902 "Synechococcales" "Cyanobiaceae"
ASV2:Pseudoalteromonas "Enterobacterales" "Pseudoalteromonadaceae"
ASV3:Clade Ia "SAR11 clade" "Clade I"
ASV4:Marine Group II "Marine Group II" NA
ASV5:Clade Ia "SAR11 clade" "Clade I"
ASV6:Clade II "SAR11 clade" "Clade II"
Genus Species
ASV1:Synechococcus CC9902 "Synechococcus CC9902" NA
ASV2:Pseudoalteromonas "Pseudoalteromonas" NA
ASV3:Clade Ia "Clade Ia" NA
ASV4:Marine Group II NA NA
ASV5:Clade Ia "Clade Ia" NA
ASV6:Clade II NA NA
8.0.5 Identify Core microbiota
#North
<- microbiome::core_members(sub_North,
(core_taxa_north detection = 0.0001,
prevalence = 50/100))
[1] "ASV1:Synechococcus CC9902"
[2] "ASV2:Pseudoalteromonas"
[3] "ASV3:Clade Ia"
[4] "ASV4:Marine Group II"
[5] "ASV5:Clade Ia"
[6] "ASV6:Clade II"
[7] "ASV9:Clade Ia"
[8] "ASV10:Clade Ia"
[9] "ASV11:AEGEAN-169 marine group"
[10] "ASV12:Prochlorococcus MIT9313.marinus"
[11] "ASV16:AEGEAN-169 marine group"
[12] "ASV18:Clade Ib"
[13] "ASV22:Clade Ia"
[14] "ASV23:Clade Ia"
[15] "ASV26:Chloroplast"
[16] "ASV30:Marine Group II"
[17] "ASV32:Dadabacteriales"
[18] "ASV33:SAR324 clade(Marine group B)"
[19] "ASV35:Clade IV"
[20] "ASV37:Marine Group III"
[21] "ASV49:SAR202 clade"
[22] "ASV53:AEGEAN-169 marine group"
8.0.6 Get core microbiota phyloseq object
Get the phyloseq object with also sequences, phylo tree etc.
<- microbiome::core(sub_North,
(phyloseq_core_north detection = 0.0001,
prevalence = .5))
phyloseq-class experiment-level object
otu_table() OTU Table: [ 22 taxa and 9 samples ]
sample_data() Sample Data: [ 9 samples by 29 sample variables ]
tax_table() Taxonomy Table: [ 22 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 22 tips and 21 internal nodes ]
refseq() DNAStringSet: [ 22 reference sequences ]
See full taxanomy of core members
<- as.data.frame(phyloseq::tax_table(phyloseq_core_north))) (tax_mat
Domain Phylum
ASV1:Synechococcus CC9902 Bacteria Cyanobacteria
ASV2:Pseudoalteromonas Bacteria Proteobacteria
ASV3:Clade Ia Bacteria Proteobacteria
ASV4:Marine Group II Archaea Thermoplasmatota
ASV5:Clade Ia Bacteria Proteobacteria
ASV6:Clade II Bacteria Proteobacteria
ASV9:Clade Ia Bacteria Proteobacteria
ASV10:Clade Ia Bacteria Proteobacteria
ASV11:AEGEAN-169 marine group Bacteria Proteobacteria
ASV12:Prochlorococcus MIT9313.marinus Bacteria Cyanobacteria
ASV16:AEGEAN-169 marine group Bacteria Proteobacteria
ASV18:Clade Ib Bacteria Proteobacteria
ASV22:Clade Ia Bacteria Proteobacteria
ASV23:Clade Ia Bacteria Proteobacteria
ASV26:Chloroplast Bacteria Cyanobacteria
ASV30:Marine Group II Archaea Thermoplasmatota
ASV32:Dadabacteriales Bacteria Dadabacteria
ASV33:SAR324 clade(Marine group B) Bacteria SAR324 clade(Marine group B)
ASV35:Clade IV Bacteria Proteobacteria
ASV37:Marine Group III Archaea Thermoplasmatota
ASV49:SAR202 clade Bacteria Chloroflexi
ASV53:AEGEAN-169 marine group Bacteria Proteobacteria
Class Order
ASV1:Synechococcus CC9902 Cyanobacteriia Synechococcales
ASV2:Pseudoalteromonas Gammaproteobacteria Enterobacterales
ASV3:Clade Ia Alphaproteobacteria SAR11 clade
ASV4:Marine Group II Thermoplasmata Marine Group II
ASV5:Clade Ia Alphaproteobacteria SAR11 clade
ASV6:Clade II Alphaproteobacteria SAR11 clade
ASV9:Clade Ia Alphaproteobacteria SAR11 clade
ASV10:Clade Ia Alphaproteobacteria SAR11 clade
ASV11:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV12:Prochlorococcus MIT9313.marinus Cyanobacteriia Synechococcales
ASV16:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
ASV18:Clade Ib Alphaproteobacteria SAR11 clade
ASV22:Clade Ia Alphaproteobacteria SAR11 clade
ASV23:Clade Ia Alphaproteobacteria SAR11 clade
ASV26:Chloroplast Cyanobacteriia Chloroplast
ASV30:Marine Group II Thermoplasmata Marine Group II
ASV32:Dadabacteriales Dadabacteriia Dadabacteriales
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV Alphaproteobacteria SAR11 clade
ASV37:Marine Group III Thermoplasmata Marine Group III
ASV49:SAR202 clade Dehalococcoidia SAR202 clade
ASV53:AEGEAN-169 marine group Alphaproteobacteria Rhodospirillales
Family
ASV1:Synechococcus CC9902 Cyanobiaceae
ASV2:Pseudoalteromonas Pseudoalteromonadaceae
ASV3:Clade Ia Clade I
ASV4:Marine Group II <NA>
ASV5:Clade Ia Clade I
ASV6:Clade II Clade II
ASV9:Clade Ia Clade I
ASV10:Clade Ia Clade I
ASV11:AEGEAN-169 marine group AEGEAN-169 marine group
ASV12:Prochlorococcus MIT9313.marinus Cyanobiaceae
ASV16:AEGEAN-169 marine group AEGEAN-169 marine group
ASV18:Clade Ib Clade I
ASV22:Clade Ia Clade I
ASV23:Clade Ia Clade I
ASV26:Chloroplast <NA>
ASV30:Marine Group II <NA>
ASV32:Dadabacteriales <NA>
ASV33:SAR324 clade(Marine group B) <NA>
ASV35:Clade IV Clade IV
ASV37:Marine Group III <NA>
ASV49:SAR202 clade <NA>
ASV53:AEGEAN-169 marine group AEGEAN-169 marine group
Genus Species
ASV1:Synechococcus CC9902 Synechococcus CC9902 <NA>
ASV2:Pseudoalteromonas Pseudoalteromonas <NA>
ASV3:Clade Ia Clade Ia <NA>
ASV4:Marine Group II <NA> <NA>
ASV5:Clade Ia Clade Ia <NA>
ASV6:Clade II <NA> <NA>
ASV9:Clade Ia Clade Ia <NA>
ASV10:Clade Ia Clade Ia <NA>
ASV11:AEGEAN-169 marine group <NA> <NA>
ASV12:Prochlorococcus MIT9313.marinus Prochlorococcus MIT9313 marinus
ASV16:AEGEAN-169 marine group <NA> <NA>
ASV18:Clade Ib Clade Ib <NA>
ASV22:Clade Ia Clade Ia <NA>
ASV23:Clade Ia Clade Ia <NA>
ASV26:Chloroplast <NA> <NA>
ASV30:Marine Group II <NA> <NA>
ASV32:Dadabacteriales <NA> <NA>
ASV33:SAR324 clade(Marine group B) <NA> <NA>
ASV35:Clade IV <NA> <NA>
ASV37:Marine Group III <NA> <NA>
ASV49:SAR202 clade <NA> <NA>
ASV53:AEGEAN-169 marine group <NA> <NA>
8.0.7 Visualise core microbiome with microbiome::plot_core()
Visualise the core microbiome of North samples
::plot_core(phyloseq_core_north,
microbiomeplot.type = "heatmap",
colours = rev(RColorBrewer::brewer.pal(8, "RdBu")),
prevalences = seq(from = 0, to = 1, by = .1),
detections = seq(from = 0.1, to = 5, by = 0.2)) +
scale_x_discrete(guide = guide_axis(n.dodge = 2))+
xlab("Detection Threshold (Relative Abundance (%))") +
ylab("ASVs")
Warning in microbiome::plot_core(phyloseq_core_north, plot.type = "heatmap", : The plot_core function is typically used with compositional
data. The data is not compositional. Make sure that you
intend to operate on non-compositional data.
Do the same for the South samples .. please!
What are your conclusions about the comparison between North & South core micobiota at the ASV level?