BLAST parameters for sequences with high similarity
1
0
Entering edit mode
6 weeks ago

Hi all,

I have a dataset of V4 sequences from MiSeq, and a data set of full-length Sanger 16S sequences.

I'm trying to match-up V4 sequences with their full-length sequence in the other dataset using BLAST. I'm using a pident > 0.99 threshold to assign a V4 sequence to a 16S sequence, but I am seeing sensitivity to the BLAST parameters used.

I'm finding that the results are very sensitive to word_size. I get more hits when I switch to blastn (word_size=11) as opposed to using megablast (word_size=28), these additional hits are also more often hitting pident=100.0.

I'm wondering what best practice is here when I'm expecting high similarity between sequences. Should I be using blastn or megablast? Is BLAST even the correct tool here?

Thanks in advance!

BLAST Sanger 16S v4 miseq • 310 views
ADD COMMENT
0
Entering edit mode

If you have full length sequences and the sequence from MiSeq are similar you should perhaps try doing global alignments (e.g. Needle from EMBOSS) instead.

ADD REPLY
0
Entering edit mode

Wouldn't global alignments attempt to align the entire sequences against each other? With the V4 sequences being shorter I'm not sure this would work. I could trim the 16S sequences to the predicted V4 region first

ADD REPLY
0
Entering edit mode

Sounded like you wanted to align entire MiSeq reads to the reference and thus my suggestion.

ADD REPLY
1
Entering edit mode
6 weeks ago

That sounds like a cool project.

When you expect high sequence similarity between queries and database, megablast is the way to go (it's faster too). And it may have different rules for numbers of sequences, gap extension penalties etc. Some of that can explain the variation in your results.

You may want to consider limiting your max numbers of hits or hsps

(Given that there can be sequencing errors and genetic diversity, you might not systematically expect 100% id)

ADD COMMENT

Login before adding your answer.

Traffic: 1818 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6