Introduction to phyloseq

Nicolas Henry

ASV table

S11B S1B S2B S2S S3B S3S S4B S4S S5B S5S S6B S6S S7B S7S S8B S8S S9B S9S
ASV_001 111 60 45 104 60 78 13 66 180 178 155 70 48 79 52 67 74 88
ASV_002 110 22 52 78 48 0 46 41 62 36 60 32 57 49 60 91 48 66
ASV_003 24 0 0 84 9 0 0 7 16 0 90 295 108 43 60 10 77 73
ASV_004 88 55 0 96 49 39 47 61 0 51 81 33 0 0 55 61 87 0
ASV_005 75 0 0 12 0 26 31 0 70 60 28 23 80 64 77 55 60 20
ASV_006 58 0 0 35 0 0 0 0 56 36 73 47 26 75 59 55 56 64
ASV_007 32 0 0 74 0 0 10 10 14 0 56 73 78 36 66 24 52 70
ASV_008 39 31 54 42 67 0 45 0 0 0 33 19 30 45 45 34 36 54
ASV_009 0 57 39 0 50 45 42 47 94 38 0 0 0 48 0 0 0 33
ASV_010 0 58 60 0 67 50 43 150 0 62 0 0 0 0 0 0 0 0
ASV_011 0 26 0 46 0 35 0 30 31 28 44 25 37 27 30 40 33 54
ASV_012 0 50 32 0 59 96 59 120 0 0 0 0 0 0 0 0 0 0
ASV_013 24 0 0 0 0 0 0 0 0 100 69 0 0 104 0 0 0 83
ASV_014 0 29 0 0 0 92 52 94 34 31 0 0 0 0 0 22 0 0
ASV_015 0 56 0 0 49 54 0 70 54 29 0 0 0 0 0 0 0 0
ASV_016 20 0 0 61 0 0 0 0 0 0 22 18 67 19 47 0 25 14
ASV_017 0 43 57 0 59 42 0 63 0 0 0 0 0 0 0 0 0 0
ASV_018 0 60 64 0 0 20 0 76 0 0 0 0 0 0 0 0 0 0
ASV_019 42 0 0 10 0 0 0 0 0 0 8 21 11 14 22 24 35 23
ASV_020 58 0 0 37 0 0 0 0 0 0 0 0 23 25 0 20 25 0

Information about one ASV

  • Taxonomy
taxonomy[1,]
Kingdom Phylum Class Order Family Genus Species
ASV_001 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Synechococcus CC9902 NA
  • Sequence
asv_seq[["ASV_001"]]
406-letter DNAString object
seq: TGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGA...TGACGCTCATGGACGAAAGCCAGGGGAGCGAAAGGG

Information about the samples

Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S Sigma_t
S1B North North1B NBF 52 660 32195 10675 955 115 1.813 0.256 0.889 0.324 0.132 9.946 3.565 0.0000 22.7338 37.6204 26.0046
S2B North North2B NBF 59 890 25480 16595 670 395 2.592 0.105 1.125 0.328 0.067 9.378 3.391 0.0000 22.6824 37.6627 26.0521
S2S North North2S NBS 0 890 25480 16595 670 395 3.381 0.231 0.706 0.450 0.109 8.817 3.345 0.0000 22.6854 37.6176 26.0137
S3B North North3B NBF 74 835 13340 25115 1115 165 1.438 0.057 1.159 0.369 0.174 8.989 2.568 0.0000 21.5296 37.5549 26.2987
S3S North North3S NBS 0 715 26725 16860 890 200 1.656 0.098 0.794 0.367 0.095 7.847 2.520 0.0000 22.5610 37.5960 26.0332
S4B North North4B NBF 78 2220 3130 29835 2120 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542 26.9415
S4S North North4S NBS 78 2220 3130 29835 2120 235 2.457 0.099 1.087 0.349 0.137 8.689 3.129 0.0000 18.8515 37.4542 26.9415
S5B North North5B NBF 42 1620 55780 23795 2555 1355 2.028 0.103 1.135 0.216 0.128 8.623 3.137 0.0102 24.1905 38.3192 26.1037
S5S North North5S NBS 0 1620 56555 22835 2560 945 2.669 0.136 0.785 0.267 0.114 9.146 3.062 0.0000 24.1789 38.3213 26.1065
S6B South South1B SGF 13 2520 39050 705 3630 1295 2.206 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877 27.3241
S6S South South1S SGS 0 2435 35890 915 3735 1300 3.004 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884 27.3151
S7B South South2B SGF 26 0 0 0 4005 1600 3.016 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699 26.7536
S7S South South2S SGS 0 4535 26545 1340 6585 1355 1.198 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708 26.7488
S8B South South3B SGF 33 0 0 0 5910 1590 3.868 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054 26.9423
S8S South South3S SGS 0 4260 36745 985 5470 2265 3.639 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885 26.8713
S9B South South4B SGF 25 4000 31730 485 4395 1180 3.910 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094 27.0131
S9S South South4S SGS 0 5465 32860 820 5045 1545 3.607 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777 26.8172
S11B South South5B SGF 35 5370 46830 580 6010 1690 3.324 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967 26.9631

A fasta file of Cyanobacteria ASVs found in south samples?

Subset samples

south_context <- subset(context, Geo == "South")


Geo Description groupe Pres PicoEuk Synec Prochloro NanoEuk Crypto SiOH4 NO2 NO3 NH4 PO4 NT PT Chla T S Sigma_t
S6B South South1B SGF 13 2520 39050 705 3630 1295 2.206 0.249 0.768 0.629 0.236 9.013 3.455 0.0000 22.0197 39.0877 27.3241
S6S South South1S SGS 0 2435 35890 915 3735 1300 3.004 0.251 0.927 0.653 0.266 8.776 3.230 0.0134 22.0515 39.0884 27.3151
S7B South South2B SGF 26 0 0 0 4005 1600 3.016 0.157 0.895 0.491 0.176 8.968 4.116 0.0000 23.6669 38.9699 26.7536
S7S South South2S SGS 0 4535 26545 1340 6585 1355 1.198 0.165 1.099 0.432 0.180 8.256 3.182 0.0000 23.6814 38.9708 26.7488
S8B South South3B SGF 33 0 0 0 5910 1590 3.868 0.253 0.567 0.533 0.169 8.395 3.126 0.0000 23.1236 39.0054 26.9423
S8S South South3S SGS 0 4260 36745 985 5470 2265 3.639 0.255 0.658 0.665 0.247 8.991 3.843 0.0132 23.3147 38.9885 26.8713
S9B South South4B SGF 25 4000 31730 485 4395 1180 3.910 0.107 0.672 0.490 0.134 8.954 4.042 0.0172 22.6306 38.9094 27.0131
S9S South South4S SGS 0 5465 32860 820 5045 1545 3.607 0.139 0.644 0.373 0.167 9.817 3.689 0.0062 22.9545 38.7777 26.8172
S11B South South5B SGF 35 5370 46830 580 6010 1690 3.324 0.083 0.756 0.467 0.115 9.539 4.138 0.0182 23.0308 38.9967 26.9631


south_samples <- row.names(south_context)

Which ASVs are Cyanobacteria?

cyano_taxo <- subset(taxonomy, Phylum == "Cyanobacteria")


Kingdom Phylum Class Order Family Genus Species
ASV_001 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Synechococcus CC9902 NA
ASV_012 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Prochlorococcus MIT9313 marinus
ASV_021 Bacteria Cyanobacteria Cyanobacteriia Chloroplast NA NA NA
ASV_044 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Prochlorococcus MIT9313 marinus
ASV_062 Bacteria Cyanobacteria Cyanobacteriia Chloroplast NA NA NA
ASV_063 Bacteria Cyanobacteria Cyanobacteriia Chloroplast NA NA NA
ASV_070 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Prochlorococcus MIT9313 marinus
ASV_073 Bacteria Cyanobacteria Cyanobacteriia Chloroplast NA NA NA
ASV_120 Bacteria Cyanobacteria Cyanobacteriia Synechococcales Cyanobiaceae Cyanobium PCC-6307 NA


cyano_asvs <- row.names(cyano_taxo)

Subset asv table

asv_table_cyano <- asv_table[cyano_asvs,south_samples]


S6B S6S S7B S7S S8B S8S S9B S9S S11B
ASV_001 155 70 48 79 52 67 74 88 111
ASV_012 0 0 0 0 0 0 0 0 0
ASV_021 13 9 16 13 21 13 19 9 6
ASV_044 0 0 0 0 0 0 0 16 0
ASV_062 0 0 0 11 0 12 0 0 0
ASV_063 0 0 0 0 0 0 0 13 0
ASV_070 0 0 0 0 0 0 0 0 0
ASV_073 0 0 0 0 0 0 0 10 0
ASV_120 0 0 0 0 0 0 0 9 0

Cyanobacteria present in south samples

cyano_total <- apply(asv_table_cyano, 1, sum)


        total
ASV_001   744
ASV_012     0
ASV_021   119
ASV_044    16
ASV_062    23
ASV_063    13
ASV_070     0
ASV_073    10
ASV_120     9


cyano_subset <- names(cyano_total)[cyano_total > 0]

Exract sequences

asv_seq_cyano <- asv_seq[cyano_subset]


>ASV_001
TGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGGGATGAAGGCCTCTGGGCTGTAAACCTCTTTTATCAAGGAAGAAGATCTGACGGTACTTGATGAATAAGCCACGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGGAGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGTCCGCAGGCGGCCCTTCAAGTCTGCTGTTAAAAAGTGGAGCTTAACTCCATCATGGCAGTGGAAACTGTTGGGCTTGAGTGTGGTAGGGGCAGAGGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAAGAACACCAGTGGCGAAGGCGCTCTGCTGGGCCATCACTGACGCTCATGGACGAAAGCCAGGGGAGCGAAAGGG
>ASV_021
TGAGGAATTTTCCGCAATGGGCGCAAGCCTGACGGAGCAATACCGCGTGAGGGATGACGGCCTTTGGGTTGTAAACCTCTTTTTTCAAGGAGGAAGTTCTGACGTGTACTTGAAGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAAGACGGAGGATGCAAGTGTTATCCGGAATTATTGGGCGTAAAGCGTCTGTAGGTTGCCTAACAAGTCTGTTGTTAAAGGTTAAAGCTTAACTTTAAAACTGCAGCAGAAACTGCTAGGCTTGAGTACAGTCGAAGTAGAGGGAATTTCCAGTGAAGCGGTGAAATGCGTAGATATTGGAAGGAACACCAATGGCGAAAGCACTCTACTAGACTTTTACTGACACTCAGAGACGAAAGCTAGGGTAGCAAATGGG
>ASV_044
TGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGGGACGAAGGCCTCTGGGCTGTAAACCTCTTTTCTCAAGGAAGAAGATATGACGGTACTTGAGGAATAAGCCACGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGGAGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGTCCGCAGGCGGCTTTTCAAGTCTGCTGTTAAAGCGTGGAGCTTAACTCCATCATGGCAGTGGAAACTGAAAGGCTTGAGTATGGTAGGGGCAGAGGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAAGAACACCAGTGGCGAAGGCGCTCTGCTGGGCCATTACTGACGCTCATGGACGAAAGCCAGGGGAGCGAAAGGG
>ASV_062
TGGGGAATTTTCCGCAATGGACGAAAGTCTGACGGAGCGACGCTGCGTGAAGGATGACGGCCTGCGGGTTGTAAACTTCTTTTCTCGAAGAAGAAGCTCTGACGGTATTCGAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATGCAAGTGTTATTCGGAATGATTGGGCGTAAAGCGTCTGTAGGCTGTATAGAAAGTCTTTTGTTAAATGCCTCGGCTCAACCGAGATCCAGCAAAGGAAACTTCTATACTTGAGGGAAGTAGAGGTACAGGGAATTCCCGGTGGAGCGGTGAAATGCGTAGATATCGGGAGGAACACCAATATGGCGAAGGCACTGTACTGGGCTTTACCTGACGCTCAGAGACGAAAGCTAAAGGAGTGATTAGG
>ASV_063
TGAGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAATACCGCGTGAGGGATGAAGGATTTTGGTCTGTAAACCTCTTTTCTCAAGAAAGAAGTTCTGACGGTACTTGAGGAATAAGCATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGGGGATGCAAGCGTTATCCGGAATCATTGGGCGTAAAGCGCCTGTAGGTTGTTTAATAAGTCTGTTGTTAAAGACTGGGGCTTAACCCCAGGAAAGCAATGGAAACTACTAGACTAGAGTATGGTAGGGGTAAAGGGAATTTCTAGTGTAGCGGTGAAATGCGTAGATATTAGAAAGAACACCGGTGGCGAAAGCGCTTTACTGGACCATTACTGACACTCAGAGGCGAAAGCTAGGGTAGCCAAAGGG
>ASV_073
TGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCGACGCTGCGTGAAGGATGACGGCCTGAGGGTTGTAAACTTCTTTTCTCGAAGAAGAATCAATGACGGTATTCGAGGAATAAGCATCGGCTAACTCTGTGCCAGCAGCCGCGGTAAGACAGAGGATGCAAGTGTTATTCGGATTGATTGGGCGTAAAGCGTCTGTAGGCGGTTTAGAAAGTCTTTTGTGAAATACTTCAGCTCAACTGGGGCTCCGCAAAAGAAACTTCTAGACTTGAGGGAAGTAGAGGTACAGGGAATTTCCGGTGGAGCGGTGAAATGCGTAGATATCGGAAGGAACACCAATATGGCGAAGGCACTGTACTGGGCTTTACCTGACGCTAAGAGACGAAAGCTAAAGGAGTGATTAGG
>ASV_120
TGGGGAATTTTCCGCAATGGGCGAAAGCCTGACGGAGCAACGCCGCGTGAGGGATGAAGGCCTCTGGGCTGTAAACCTCTTTTATCAAGGAAGAAGATCTGACGGTACTTGATGAATAAGCCACGGCTAATTCCGTGCCAGCAGCCGCGGTAATACGGGAGTGGCAAGCGTTATCCGGAATTATTGGGCGTAAAGCGTCCGCAGGCGGCCTGACAAGTCTGCTGTTAAAGCGTGGAGCTTAACTCCATTTCAGCAGTGGAAACTGTCAGGCTTGAGTGTGGTAGGGGCAGAGGGAATTCCCGGTGTAGCGGTGAAATGCGTAGATATCGGGAAGAACACCAGTGGCGAAAGCGCTCTGCTGGGCCATCACTGACGCTCATGGACGAAAGCCAGGGGAGCGAAAGGG

Go fancy with a phylogenetic tree

A complex data structure

G ASV table(abundance information) ASV table (abundance information) Samplesdata Samples contextual data ASV table(abundance information)–Samplesdata Sequences Sequences ASV table(abundance information)–Sequences Taxonomy Taxonomy Sequences–Taxonomy Phylogenetic tree Phylogenetic tree Sequences–Phylogenetic tree

Could it be easier?

yes!

The phyloseq package

phyloseq workflow

phyloseq object

Create our phyloseq object

library(phyloseq)

physeq <- phyloseq(
  otu_table(asv_table,taxa_are_rows = TRUE),
  tax_table(as.matrix(taxonomy)),
  sample_data(context),
  refseq(asv_seq),
  phy_tree(asv_tree)
)


phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 160 taxa and 18 samples ]
sample_data() Sample Data:       [ 18 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 160 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 160 tips and 158 internal nodes ]
refseq()      DNAStringSet:      [ 160 reference sequences ]

Redo our selection

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 160 taxa and 18 samples ]
sample_data() Sample Data:       [ 18 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 160 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 160 tips and 158 internal nodes ]
refseq()      DNAStringSet:      [ 160 reference sequences ]


subset_taxa(physeq,Phylum == "Cyanobacteria")


phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 9 taxa and 18 samples ]
sample_data() Sample Data:       [ 18 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 9 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 9 tips and 8 internal nodes ]
refseq()      DNAStringSet:      [ 9 reference sequences ]

Redo our selection

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 160 taxa and 18 samples ]
sample_data() Sample Data:       [ 18 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 160 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 160 tips and 158 internal nodes ]
refseq()      DNAStringSet:      [ 160 reference sequences ]


subset_taxa(physeq,Phylum == "Cyanobacteria") |>
  subset_samples(Geo == "South") 


phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 9 taxa and 9 samples ]
sample_data() Sample Data:       [ 9 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 9 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 9 tips and 8 internal nodes ]
refseq()      DNAStringSet:      [ 9 reference sequences ]

Redo our selection

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 160 taxa and 18 samples ]
sample_data() Sample Data:       [ 18 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 160 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 160 tips and 158 internal nodes ]
refseq()      DNAStringSet:      [ 160 reference sequences ]


subset_taxa(physeq,Phylum == "Cyanobacteria") |>
  subset_samples(Geo == "South") |>
  filter_taxa(function(x) sum(x) > 0, prune = TRUE)


phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 7 taxa and 9 samples ]
sample_data() Sample Data:       [ 9 samples by 20 sample variables ]
tax_table()   Taxonomy Table:    [ 7 taxa by 7 taxonomic ranks ]
phy_tree()    Phylogenetic Tree: [ 7 tips and 6 internal nodes ]
refseq()      DNAStringSet:      [ 7 reference sequences ]

Redo our selection

subset_taxa(physeq,Phylum == "Cyanobacteria") |>
  subset_samples(Geo == "South") |>
  filter_taxa(function(x) sum(x) > 0, prune = TRUE) |>
  plot_bar(fill="Genus")


phyloseq in this course

  • Create a phyloseq object from your data
  • Explore the taxonomic composition of your communities
  • Compare communities composition with ordinations

Now it is your turn!

References

McMurdie, Paul J., and Susan Holmes. 2013. “Phyloseq: An r Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data.” PLOS ONE 8 (4): 1–11. https://doi.org/10.1371/journal.pone.0061217.