September 30, 2025
Metabarcoding is a molecular technique used to identify and quantify the diversity of organisms in environmental samples by amplifying and sequencing specific genetic markers (barcodes) from environmental DNA (eDNA).
Environmental genomics
nf-core pipelinesStandardized Workflows:
Nf-core provides standardized, community-curated workflows for analyzing data, ensuring consistency and reproducibility.
Nextflow Integration:
Built on Nextflow, a powerful workflow management system, allowing for scalable and portable data analysis.
nf-core/ampliseq?* Pre-configured bioinformatics analysis workflow for amplicon sequencing, supporting multiple input formats.
* Developed and maintained by a global community of researchers and bioinformaticians, ensuring continuous improvement and updates.
* Comprehensive documentation and tutorials make it accessible for users of all skill levels.
Relative requirements to launch ampliseq on the BioPipes VM
ampliseq_vm.config (or your own nextflow.config) to define parameters specific to the compute environment of the BioPipes VM for nf-core pipelines (such as resourceLimits and partitions).Files needed:
.fastq.gz.tsv, .csv or .yml file listing samples and paths to FastQ files..yml file containing pipeline specific configurations (genomic references, tools, etc.)..sh to submit the ampliseq pipeline to a SLURM cluster.The sample sheet file can be tab-separated (.tsv), comma-separated (.csv), or in YAML format (.yml/.yaml) and can have two to four columns/entries with the following headers:
sampleID | 
forwardReads | 
reverseReads | 
run | 
|---|---|---|---|
| required | required | optional | optional | 
| Unique sample identifiers | Paths to (forward) reads zipped FastQ files | Paths to reverse reads zipped FastQ files, required if the data is paired-end | If the data was produced by multiple sequencing runs | 
samplesheet.tsv
sampleID | 
run | 
forwardReads | 
reverseReads | 
|---|---|---|---|
| S11B | run_01 | data/raw/S11B_R1.fastq.gz | data/raw/S11B_R2.fastq.gz | 
| S1B | run_01 | data/raw/S1B_R1.fastq.gz | data/raw/S1B_R2.fastq.gz | 
| S2B | run_01 | data/raw/S2B_R1.fastq.gz | data/raw/S2B_R2.fastq.gz | 
| S2S | run_01 | data/raw/S2S_R1.fastq.gz | data/raw/S2S_R2.fastq.gz | 
| … | … | … | … | 
params_ampliseq.yaml
input: "input/samplesheet.tsv"
FW_primer: "GTGYCAGCMGCCGCGGTAA"
RV_primer: "CCGYCAATTYMTTTRAGTTT"
outdir: "output/ampliseq"
# Read trimming and quality filtering
max_ee: 2
trunclenf: 200
trunclenr: 200
trunc_qmin: 2
# Amplicon Sequence Variants (ASV) calculation
sample_inference: "pooled"
# Taxonomic db
dada_ref_taxonomy: "pr2=5.0.0"
# Asv filtering
min_frequency: 2
min_samples: 3
# Generic options
# Max job request
max_cpus: 32
max_memory: "100.GB"
max_time: "720.h"Using bash script: ampliseq.sh
#!/bin/bash
nextflow run nf-core/ampliseq \
  -bg \
  -r 2.14.0 \
  -profile docker \
  -c "practicals/ampliseq_files/ampliseq_vm.config" \
  -params-file practicals/ampliseq_files/ampliseq_parameters_simple.yaml# Path to files: 
# ├── ampliseq.sh
# ├── practicals
# │   └── ampliseq_files
# │      ├── samplesheet.tsv
# │      ├── ampliseq_vm.config
# │      └── ampliseq_parameters_simple.yaml
Approximately 30 minutes for ANF training data
The longest step being the taxonomic assignment!
summary_report.html: pipeline summary report as standalone HTML file that can be viewed in your web browser. Example here.ASV_tax.*.tsv: Taxonomic classification for each ASV sequence.ASV_tax.pr2_5_0_0 in our example
ASV_ID | 
Domain | 
Supergroup | 
Division | 
Subdivision | 
Class | 
Order | 
|---|---|---|---|---|---|---|
| b7db | Bacteria | PANNAM | Proteobacteria | Proteobacteria_X | Gammaproteobacteria | Alteromonadales | 
| e01e | Bacteria | FCB | Bacteroidetes | Bacteroidetes_X | Bacteroidia | Flavobacteriales | 
Family | 
Genus | 
Species | 
Confidence | 
Sequence | 
|---|---|---|---|---|
| Marinobacteraceae | Marinobacter | Marinobacter_sp. | 1 | TACGGAGGGTGCAA… | 
| Flavobacteriaceae | Flavobacterium | Flavobacterium_sp. | 0.99 | TACGGAGGATCCAAGCG… | 
ASV_table.tsv: summary of the Amplicon Sequence Variants (ASVs) detected in each of your sample.ASV_ID | 
A120_A | 
A120_B | 
A141_A | 
A141_B | 
... | 
|---|---|---|---|---|---|
| b7db | 2841 | 1118 | 614 | 442 | … | 
| bab5 | 9163 | 7036 | 19075 | 13799 | … | 
After preprocessing (primer trimming, QC) & ASV inference with DADA2, the QIIME2 analysis goals are :
The abundance tables are the final data for further downstream analysis and visualisations.
| Output | Description | 
|---|---|
| Absolute abundance table | Raw ASV counts per sample | 
| Relative abundance table | Normalized counts (e.g. proportions per sample) | 
Alpha diversity measures the species diversity within samples.
Beta diversity measures the species community differences between samples.
Note
There are many others output files, I let you investigate them!
Complete list here
There’s no shame in getting help!
Thank you for your attention!
Thank you for your attention!