My experience using the
nf-core/ampliseq pipeline
for metabarcoding analysis

Charlotte Berthelier

September 30, 2025

Overview of Metabarcoding

Metabarcoding is a molecular technique used to identify and quantify the diversity of organisms in environmental samples by amplifying and sequencing specific genetic markers (barcodes) from environmental DNA (eDNA).

Metabarcoding Workflow

Environmental genomics

  1. Collect samples (water, sediment, tissue) containing eDNA from various organisms.
  2. Extract environmental DNA from the samples.
  3. PCR Amplification of specific genetic markers (e.g., 16S rRNA for bacteria, 18S rRNA for eukaryotes, COI for animals).
  4. High-throughput sequencing technologies (e.g., Illumina, PacBio) to sequence the amplified DNA.
  5. Process sequencing data to identify and quantify different species.

Advantages of using nf-core pipelines

  • Standardized Workflows:

    Nf-core provides standardized, community-curated workflows for analyzing data, ensuring consistency and reproducibility.

  • Nextflow Integration:

    Built on Nextflow, a powerful workflow management system, allowing for scalable and portable data analysis.

Why nf-core/ampliseq?

* Pre-configured bioinformatics analysis workflow for amplicon sequencing, supporting multiple input formats.

* Developed and maintained by a global community of researchers and bioinformaticians, ensuring continuous improvement and updates.

* Comprehensive documentation and tutorials make it accessible for users of all skill levels.

Setting up the pipeline

Relative requirements to launch ampliseq on the BioPipes VM

  • Software dependencies: Nextflow
  • VM config nextflow file: ampliseq_vm.config (or your own nextflow.config) to define parameters specific to the compute environment of the BioPipes VM for nf-core pipelines (such as resourceLimits and partitions).
  • Need 8 CPU minimum to run

Files needed:

  • FastQ raw files .fastq.gz
  • Samplesheet .tsv, .csv or .yml file listing samples and paths to FastQ files.
  • Parameters file .yml file containing pipeline specific configurations (genomic references, tools, etc.).
  • Bash script .sh to submit the ampliseq pipeline to a SLURM cluster.

Samplesheet input

The sample sheet file can be tab-separated (.tsv), comma-separated (.csv), or in YAML format (.yml/.yaml) and can have two to four columns/entries with the following headers:

sampleID forwardReads reverseReads run
required required optional optional
Unique sample identifiers Paths to (forward) reads zipped FastQ files Paths to reverse reads zipped FastQ files, required if the data is paired-end If the data was produced by multiple sequencing runs

Samplesheet input (example)

samplesheet.tsv

sampleID run forwardReads reverseReads
S11B run_01 data/raw/S11B_R1.fastq.gz data/raw/S11B_R2.fastq.gz
S1B run_01 data/raw/S1B_R1.fastq.gz data/raw/S1B_R2.fastq.gz
S2B run_01 data/raw/S2B_R1.fastq.gz data/raw/S2B_R2.fastq.gz
S2S run_01 data/raw/S2S_R1.fastq.gz data/raw/S2S_R2.fastq.gz

Parameter file (example)

params_ampliseq.yaml

input: "input/samplesheet.tsv"
FW_primer: "GTGYCAGCMGCCGCGGTAA"
RV_primer: "CCGYCAATTYMTTTRAGTTT"
outdir: "output/ampliseq"

# Read trimming and quality filtering
max_ee: 2
trunclenf: 200
trunclenr: 200
trunc_qmin: 2

# Amplicon Sequence Variants (ASV) calculation
sample_inference: "pooled"

# Taxonomic db
dada_ref_taxonomy: "pr2=5.0.0"

# Asv filtering
min_frequency: 2
min_samples: 3

# Generic options
# Max job request
max_cpus: 32
max_memory: "100.GB"
max_time: "720.h"

Submit the ampliseq pipeline to a slurm cluster (example)

Using bash script: ampliseq.sh

#!/bin/bash

nextflow run nf-core/ampliseq \
  -bg \
  -r 2.14.0 \
  -profile docker \
  -c "practicals/ampliseq_files/ampliseq_vm.config" \
  -params-file practicals/ampliseq_files/ampliseq_parameters_simple.yaml


# Path to files: 
# ├── ampliseq.sh
# ├── practicals
# │   └── ampliseq_files
# │      ├── samplesheet.tsv
# │      ├── ampliseq_vm.config
# │      └── ampliseq_parameters_simple.yaml

Running time

Approximately 30 minutes for ANF training data

The longest step being the taxonomic assignment!

Output files (1)

  • summary_report.html: pipeline summary report as standalone HTML file that can be viewed in your web browser. Example here.

Output files (2)

  • ASV_tax.*.tsv: Taxonomic classification for each ASV sequence.

ASV_tax.pr2_5_0_0 in our example

ASV_ID Domain Supergroup Division Subdivision Class Order
b7db Bacteria PANNAM Proteobacteria Proteobacteria_X Gammaproteobacteria Alteromonadales
e01e Bacteria FCB Bacteroidetes Bacteroidetes_X Bacteroidia Flavobacteriales
Family Genus Species Confidence Sequence
Marinobacteraceae Marinobacter Marinobacter_sp. 1 TACGGAGGGTGCAA…
Flavobacteriaceae Flavobacterium Flavobacterium_sp. 0.99 TACGGAGGATCCAAGCG…

Output files (3)

  • ASV_table.tsv: summary of the Amplicon Sequence Variants (ASVs) detected in each of your sample.
ASV_ID A120_A A120_B A141_A A141_B ...
b7db 2841 1118 614 442
bab5 9163 7036 19075 13799

Output files (4)

After preprocessing (primer trimming, QC) & ASV inference with DADA2, the QIIME2 analysis goals are :

  1. Taxonomic profiling
  2. Diversity (α & β)
  3. Differential abundance
  4. Visual summaries (barplots, rarefaction curves, etc.)

Output files (4.1) - Abundance and relative abundance

The abundance tables are the final data for further downstream analysis and visualisations.

Output Description
Absolute abundance table Raw ASV counts per sample
Relative abundance table Normalized counts (e.g. proportions per sample)
  • Useful for comparing taxa abundances across samples (.tsv)
  • These tables feed into plots & statistical tests

Output files (4.2) - Alpha diversity & rarefaction curves

Alpha diversity measures the species diversity within samples.

  • Rarefaction curves: check sampling depth sufficiency; plot number of observed features vs sequencing depth per sample
  • Alpha diversity indices: metrics like Observed ASVs, Shannon, Simpson, Faith’s PD etc.
  • Purpose: evaluate within-sample diversity; check outliers, effect of filtering

Output files (4.2) - Beta diversity & ordination

Beta diversity measures the species community differences between samples.

  • Distance metrics (e.g. Bray-Curtis, UniFrac if phylogenetic tree available)
  • Ordination plots (PCoA, NMDS) to visualise differences between samples/groups
  • Permutational tests (e.g. PERMANOVA / Adonis) to assess significance of grouping

Output files (4.4) - Interactive barplots

  • Barplots of taxonomic composition (e.g. phylum, genus) per sample or group
  • Interactive version: allows zooming, selection of taxa, collapsing taxonomic levels
  • Helps visually assess major community members and compare across conditions

Output files (4.5) - Differential abundance

  • Methods used: e.g. ANCOM (Analysis of Composition of Microbiomes) or ANCOM-BC (with Bias Correction)
  • Aim: identify taxa (ASVs or higher level) with significantly different abundance between groups

Note

There are many others output files, I let you investigate them!

Complete list here

Thank you for your attention!

Thank you for your attention!