Question

BBMap not mapping reads to sequences known to exist in reference

0

Entering edit mode

3.2 years ago

robert.murphy ▴ 80

I am trying to use BBMap to map some degenerate primers to an assembled reference genome. The primers are for the adenylation domain of NRPS biosynthetic gene clusters. I have used SeqKit amplicon with this primer pair and got the expected results (I can share if needed), however BBMap mapps nothing.

I am running BBMap twice, once for each primer, would it be better to run in a single go as forward and reverse pair?

Input

bbmap.sh ref=$refSeq in=$input out=$output

Forward primer: GCSTACSYSATSTACACSTCSGG

Reverse primer: SASGTCVCCSGTSCGGTA

Output

I am omitting the output about index generation.

   ------------------   Results   ------------------

Genome:                 1
Key Length:             13
Max Indel:              16000
Minimum Score Ratio:    0.56
Mapping Mode:           normal
Reads Used:             15755   (7876943 bases)

Mapping:                0.351 seconds.
Reads/sec:              44903.74
kBases/sec:             22450.28


Read 1 data:            pct reads       num reads       pct bases          num bases

mapped:                   0.0000%               0         0.0000%                  0
unambiguous:              0.0000%               0         0.0000%                  0
ambiguous:                0.0000%               0         0.0000%                  0
low-Q discards:           0.0000%               0         0.0000%                  0

perfect best site:        0.0000%               0         0.0000%                  0
semiperfect site:         0.0000%               0         0.0000%                  0

Match Rate:                   NA               NA            NaN%                  0
Error Rate:                  NaN%               0            NaN%                  0
Sub Rate:                    NaN%               0            NaN%                  0
Del Rate:                    NaN%               0            NaN%                  0
Ins Rate:                    NaN%               0            NaN%                  0
N Rate:                      NaN%               0            NaN%                  0

What could be causing this to occur as I assume it is something I am doing wrong?

alignment • 1.3k views

ADD COMMENT • link updated 3.2 years ago by GenoMax 141k • written 3.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

These are degenerated sequences and very short, so my guess is that an NGS aligner such as BBMap will never align these as alignment scores are super small, as a consequence of short length and minimal matches. I think it is simply the wrong tool as not developed with this task in mind.

ADD REPLY • link 3.2 years ago by ATpoint 82k

0

Entering edit mode

Okay this makes sense. Is it possible to force BBMap to allow such alignments? If not do you have any other tools you would recommend?

ADD REPLY • link 3.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

What was wrong with SeqKit amplicon?

ADD REPLY • link 3.2 years ago by ATpoint 82k

0

Entering edit mode

The end goal is to search within a metagenome for these domains. SeqKit returned nothing from a large polled metagenomic assembly so I am wanting to check with a different alignment tool.

ADD REPLY • link 3.2 years ago by robert.murphy ▴ 80

0

Entering edit mode

I am not sure about BBMap but bowtie was developed for short ungapped alignments, but it does not work on degenerate sequences. You could write a little piece of code to (rather than the degenerate primers) create every possible sequence based on the degenerated sequences, and then run these sequences with bowtie against your metagenomes. bowtie, not bowtie2!

ADD REPLY • link 3.2 years ago by ATpoint 82k

0

Entering edit mode

If you do write a program to generate all possible combinations of those IUPAC codes then BBMap can be tried with following parameters (which are normally used for miRNA)

ambig=all vslow perfectmode maxsites=1000

ADD REPLY • link 3.2 years ago by GenoMax 141k

score 0 · Answer 1 · 2021-03-04

BBMap aligner is not the correct tool for this. You could try bbduk.sh in filter mode to see if that helps. Replace all IUPAC characters with N's in your query. Since you have a lot of degenerate characters this may not work at all.

From BBDuk guide. Matching degenerate sequences such as primers:

bbduk.sh in=reads.fq out=matching.fq literal=ACGTTNNNNNGTC copyundefined k=13 mm=f

This will clone the reference sequences to represent every possibility for the degenerate bases (Ns and other non-ACGT IUPAC symbols). For example, this would create ACGTTAAAAAGTC, ACGTTAAAACGTC, ACGTTAAAAGGTC, and so forth (all 1024 possibilities). If you are interested in seaching for new life by mining shotgun metagenomic reads for 16s sequences that do not quite match your primers... this (and hdist) might be a good place to start! But it's also useful for adapters with barcodes.

You can also try the tools mentioned in this thread: Automated Primer-Blast like funcionnality You will need to have your metagenome data in fasta format.