BBmap/BBsplit report explained
0
0
Entering edit mode
6 weeks ago
Rezenman • 0

Hey all, I am using bbmap to align metagenomics reads to the silve reference database (containing 510,000 16S sequences each with unique fasta header). I didn't fully understand the report that bbmap output and I will love to get some explanations on each parameter here, or a link.

The thing that confuses me is that in the mapped read files (both read1 and read2) I get 289947 reads and it does not add up or work with what's written in the report.

This is the report:

Thanks a lot for your help

Shahar

mapping BBmap BBsplit Deep-sequencing • 514 views
0
Entering edit mode

Can you provide exact command line you are using? Since you are mapping against 16S database there is probably a lot of multi-mapping happening.

0
Entering edit mode

Yes sure,

bbsplit.sh in1=/home/labs/bfreich/shaharr/microbiome_seq_290920/raw_data/results/TS/TS_S7_R1_001.fastq.gz in2=/home/labs/bfreich/shaharr/microbiome_seq_290920/raw_data/results/TS/TS_S7_R2_001.fastq.gz ref=/home/labs/bfreich/shaharr/microbiome_paper/BBsplit/ref/ basename=TS_out_%_#.fq outu1=TS_clean1.fq outu2=TS_clean2.fq usejni=t

0
Entering edit mode

BBsplit is meant to be used against a small number of reference genomes but I am not sure if it can used against an entire 16S database in a meaningful way. What is your aim here?

0
Entering edit mode

To seperate 16S reads from metagenomics data, Any advice on a different way of doing that ?

0
Entering edit mode
0
Entering edit mode

Great I'll try it also, in the meantime do you have any advice on my original question regarding the bbsplit report?

0
Entering edit mode

Maybe, but you would need to clarify what confuses you about the report?

You highlighted the Error rate for Read 1 and this seems completely fine to me. You have 128355 reads that have some kind of error in them, which corresponds to 56,9% of your 225577 mapped reads. These errors may be base substitutions, deletions or insertions. But of course, one read can e.g. have an insertion and a base substitution at the same time, which is very likely if you have a 16S read that is not yet present in the reference database.

However, the crucial thing here is, that just 0.36% of your reads map at all, such that you are wasting a lot of information that is part of your data, by just narrowing in to the 16S. Rather, use a tool like Metaphlan that can really make good use of all of this information that you have in your metagenomic dataset. Humann for example can shed light on the abundance of microbial metabolic pathways, something you can't tell from 16S amplicon data.

If you wish to stick with bbmap by all means, you can use bloomfilter.sh to enrich/extract the 16S reads from the sample. Follow the protocol in /bbmap/docs/guides/TaxonomyGuide.txt for a walk through to species identification with Seal.