library(phyloseq)
Preprocessing : phyloseq
1 Prepare workspace
Load phyloseq
We import the ASV table, the taxonomic assignment results and the sequences from the text files we exported in the dada2 practical.
Define where the previous practical outputs are located:
<- here::here("outputs", "dada2", "asv_table") input_dir
Import the ASV table:
<- read.table(file = file.path(input_dir, "asv_table.tsv"),
asv_table header = TRUE,
sep = "\t",
row.names = 1)
the results of the taxonomic assignment:
<- read.table(file = file.path(input_dir, "taxonomy.tsv"),
taxonomy header = TRUE,
sep = "\t",
row.names = 1)
and the ASV sequences:
<- Biostrings::readDNAStringSet(
asv_seq filepath = file.path(input_dir, "asv.fasta"),
format = "fasta"
)
We will also need some information about the samples, the file is located is another folder.
<- read.table(here::here("data",
context "context",
"mapfileFA.txt"),
header = TRUE,
row.names = 1)
2 Get a phyloseq object
2.1 Check sample file
Make sure sample names in the ASV table…
colnames(asv_table) |> sort()
[1] "S11B" "S1B" "S2B" "S2S" "S3B" "S3S" "S4B" "S4S" "S5B" "S5S"
[11] "S6B" "S6S" "S7B" "S7S" "S8B" "S8S" "S9B" "S9S"
match sample table ids.
row.names(context) |> sort()
[1] "S11B" "S1B" "S2B" "S2S" "S3B" "S3S" "S4B" "S4S" "S5B" "S5S"
[11] "S6B" "S6S" "S7B" "S7S" "S8B" "S8S" "S9B" "S9S"
You can do it in a more formal way using the function setdiff()
. This function returns the elements of x
not present in y
.
setdiff(x = colnames(asv_table),
y = row.names(context))
character(0)
Perfect! The ASV table sample names match with the contextual data table.
2.2 Assemble ASV table, taxonomy and contextual data
Use the phyloseq::phyloseq()
function to create a phyloseq object. A phyloseq object is usualy composed by an ASV table, a taxonomy table and a table describing the samples. You can also add ASV sequences and a phylogenetic tree
<- phyloseq::phyloseq(
physeq ::otu_table(asv_table, taxa_are_rows = TRUE),
phyloseq::tax_table(as.matrix(taxonomy)),
phyloseq::sample_data(context),
phyloseq::refseq(asv_seq)
phyloseq )
3 Add a phylogenetic tree
3.1 Why?
Knowing the ASVs phylogenetic relatedness will help you to have a better understanding of the communities your studying.
To quote the evolutionary biologist Theodosius Dobzhansky:
Nothing in Biology Makes Sense Except in the Light of Evolution.
A phylogenetic tree reconstructed from the ASV sequences will be used to measure their relatedness.
3.2 Alignment using DECIPHER
Before reconstructing a phylogenetic tree we need to align the ASVs sequences.
<- refseq(physeq) |>
aln ::AlignSeqs(anchor = NA) DECIPHER
Once it is done, you can visualise the alignment using the function DECIPHER::BrowseSeqs()
::BrowseSeqs(aln, highlight = 0) DECIPHER
3.3 Infering the phylogenetic tree
We will infer a phylogenetic from our alignement using the library phangorn
.
First, let’s convert our DNAStringSet
alignment to the phangorn
phyDat
format.
<- as.matrix(aln) |> phangorn::phyDat(type = "DNA") phang_align
Then, we compute pairwise distances of our aligned sequences using equal base frequencies (JC69 model used by default).
<- phangorn::dist.ml(phang_align, model = "JC69") dm
Finally, we reconstruct a neighbour joining tree.
<- phangorn::NJ(dm) treeNJ
Other approaches to reconstruct a phylogenetic tree exist. If you want to try them with phangorn
, have a look here
We need the tree to be rooted for future analysis. We can do that using the function phangorn::midpoint()
<- phangorn::midpoint(tree = treeNJ) treeNJ
Once we have a rooted tree, we can add it to the phyloseq object.
<- phyloseq::merge_phyloseq(physeq,treeNJ) physeq
3.4 If I do not have a tree
For some reasons, it is sometimes not relevant or not possible to infer a tree from our data.
For example, the metabarcode you are using is not carrying enough phylogenetic information to reconstruct a tree.
Or you have so many ASVs that infering a tree would require more computational ressource that what you can afford.
In that case, it is fine. You will still be able to perform most of the analyses introduced in the alpha and beta diversity practicals.
4 Extract information from a phyloseq object
Here a list of phyloseq
functions to extract diverse information from your phyloseq
object.
- Get the total number of ASVs
::ntaxa(physeq) phyloseq
[1] 160
- Get the total number of samples
::nsamples(physeq) phyloseq
[1] 18
- Get sample names
::sample_names(physeq) phyloseq
[1] "S11B" "S1B" "S2B" "S2S" "S3B" "S3S" "S4B" "S4S" "S5B" "S5S"
[11] "S6B" "S6S" "S7B" "S7S" "S8B" "S8S" "S9B" "S9S"
- Get ASV identification
::taxa_names(physeq) |> head() phyloseq
[1] "ASV_001" "ASV_002" "ASV_003" "ASV_004" "ASV_005" "ASV_006"
- Get the names of variables in sam_data
::sample_variables(physeq) phyloseq
[1] "Geo" "Description" "groupe" "Pres" "PicoEuk"
[6] "Synec" "Prochloro" "NanoEuk" "Crypto" "SiOH4"
[11] "NO2" "NO3" "NH4" "PO4" "NT"
[16] "PT" "Chla" "T" "S" "Sigma_t"
- Display all variables with values
::get_variable(physeq) phyloseq
Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk Crypto SiOH4
S11B South South5B SGF 35 5370 46830 580 6010 1690 3.324
S1B North North1B NBF 52 660 32195 10675 955 115 1.813
S2B North North2B NBF 59 890 25480 16595 670 395 2.592
S2S North North2S NBS 0 890 25480 16595 670 395 3.381
S3B North North3B NBF 74 835 13340 25115 1115 165 1.438
S3S North North3S NBS 0 715 26725 16860 890 200 1.656
S4B North North4B NBF 78 2220 3130 29835 2120 235 2.457
S4S North North4S NBS 78 2220 3130 29835 2120 235 2.457
S5B North North5B NBF 42 1620 55780 23795 2555 1355 2.028
S5S North North5S NBS 0 1620 56555 22835 2560 945 2.669
S6B South South1B SGF 13 2520 39050 705 3630 1295 2.206
S6S South South1S SGS 0 2435 35890 915 3735 1300 3.004
S7B South South2B SGF 26 0 0 0 4005 1600 3.016
S7S South South2S SGS 0 4535 26545 1340 6585 1355 1.198
S8B South South3B SGF 33 0 0 0 5910 1590 3.868
S8S South South3S SGS 0 4260 36745 985 5470 2265 3.639
S9B South South4B SGF 25 4000 31730 485 4395 1180 3.910
S9S South South4S SGS 0 5465 32860 820 5045 1545 3.607
NO2 NO3 NH4 PO4 NT PT Chla T S Sigma_t
S11B 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967 26.9631
S1B 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204 26.0046
S2B 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627 26.0521
S2S 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176 26.0137
S3B 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549 26.2987
S3S 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960 26.0332
S4B 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542 26.9415
S4S 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542 26.9415
S5B 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192 26.1037
S5S 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213 26.1065
S6B 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877 27.3241
S6S 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884 27.3151
S7B 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699 26.7536
S7S 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708 26.7488
S8B 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054 26.9423
S8S 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885 26.8713
S9B 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094 27.0131
S9S 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777 26.8172
- Sum the reads by sample
::sample_sums(physeq) phyloseq
S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S
879 835 783 929 786 904 808 1050 905 896 970 900 823 852 842 849
S9B S9S
787 873
- Sum the reads by species/taxa/OTUs (for all samples)
::taxa_sums(physeq) |> head() phyloseq
ASV_001 ASV_002 ASV_003 ASV_004 ASV_005 ASV_006
1528 958 896 803 681 640
- Sort species/taxa/OTUs by number of reads
::taxa_sums(physeq) |>
phyloseqsort(decreasing = TRUE) |>
head()
ASV_001 ASV_002 ASV_003 ASV_004 ASV_005 ASV_006
1528 958 896 803 681 640
- Get taxonomic ranks (useful for other commands)
::rank_names(physeq) phyloseq
[1] "Kingdom" "Phylum" "Class" "Order" "Family" "Genus" "Species"
- Show abundance of a given ASV
::get_sample(physeq, i = "ASV_001") phyloseq
S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S
111 60 45 104 60 78 13 66 180 178 155 70 48 79 52 67
S9B S9S
74 88
- Show ASVs and abundance for a given sample
::get_taxa(physeq, i = "S2B") |> head() phyloseq
ASV_001 ASV_002 ASV_003 ASV_004 ASV_005 ASV_006
45 52 0 0 0 0
- Get phyla detected in your samples
::get_taxa_unique(physeq, taxonomic.rank = "Phylum") phyloseq
[1] "Cyanobacteria" "Proteobacteria"
[3] "Thermoplasmatota" "Actinobacteriota"
[5] "Bacteroidota" "Dadabacteria"
[7] "SAR324 clade(Marine group B)" "Chloroflexi"
[9] "Verrucomicrobiota" "Marinimicrobia (SAR406 clade)"
[11] "Desulfobacterota"
If you do not remember the name of the taxonomic ranks in your phyloseq object use phyloseq::rank_names(physeq)
- Number of ASVs per phylum
table(phyloseq::tax_table(physeq)[, "Phylum"])
Actinobacteriota Bacteroidota
14 11
Chloroflexi Cyanobacteria
2 9
Dadabacteria Desulfobacterota
2 1
Marinimicrobia (SAR406 clade) Proteobacteria
8 87
SAR324 clade(Marine group B) Thermoplasmatota
2 20
Verrucomicrobiota
4
You can also plot a barchart to visualise the results.
plot_bar(physeq, fill = "Phylum")
More about plots in the alpha and beta diversity practicals.