Question: How to force blastn to align full subject
0
gravatar for Bastien Hervé
18 months ago by
Bastien Hervé4.4k
Limoges, CBRS, France
Bastien Hervé4.4k wrote:

My aim is to dispatch query sequences from fasta file using blastn according to 2 subject sequences.

Here, an example of a query sequence :

>M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1

GAGAGCTCTATTACTGCACTGGAGGGTAGC

Here, my 2 subjects sequences, they are both 30bp long :

>ESR

GAGAGCTCTATTTCTTCGGTGTTACCAGCT

>Cercle

GAGAGCTCTATTACTGCACTGGAGGGAAGC

By look, the sequence M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1 will go to Cercle.fa file (29 first bases match with 1 mismatch position 27).

And using blastn :

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.008 16.4

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 100.00 26 0 0 1 26 1 26 1e-13 52.0

My problem is, I would like to know how many mismatch do blastn found in the full length subject sequence. So, first try I decrease the mismatch penalty to -1 (because here I just have the 1->26 information).

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -penalty -1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.027 14.3

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 96.67 30 1 0 1 30 1 30 8e-12 46.0

Nice, I got the 1->30 information, ok that's great.

I inserted a second mismatch position 29

>M02945:227:000000000-BHPRH:1:2114:12657:28629 1:N:0:1

GAGAGCTCTATTACTGCACTGGAGGGTACC

blastn -query test_sequence_for_blast.fa -subject ref.fa -culling_limit 1 -penalty -1 -task blastn-short -outfmt 6

M02945:227:000000000-BHPRH:1:2114:12657:28629 ESR 100.00 8 0 0 2 9 9 2 0.008 16.4

M02945:227:000000000-BHPRH:1:2114:12657:28629 Cercle 100.00 26 0 0 1 26 1 26 1e-13 52.0

And again I lost the info from 26->30... Because at the end I want to totally send to trash sequences with more than 2 mismatches

How to force blastn to align 1->30 straigth and yield the number of mismatch ?

I also tried to modify gap open penalty and gap extend penalty but that didn't do the trick...

Thanks for helping !

blastn • 787 views
ADD COMMENTlink modified 16 months ago by Biostar ♦♦ 20 • written 18 months ago by Bastien Hervé4.4k
4
gravatar for Santosh Anand
18 months ago by
Santosh Anand4.9k
Santosh Anand4.9k wrote:

BLAST, by definition, does only LOCAL alignments. Maybe you are looking for a global alignment tool https://en.wikipedia.org/wiki/List_of_sequence_alignment_software#Pairwise_alignment

ADD COMMENTlink written 18 months ago by Santosh Anand4.9k

Thanks for this help, actually my query sequences are 100-150 bp long, I just trimmed the example sequence here for a better understanding. I think i'll substring the 30 first bases of each sequences and use bowtie2 --end-to-end on them.

ADD REPLYlink written 18 months ago by Bastien Hervé4.4k

Why substring the first 30 bases? Bowtie2 or bwa mem should be supporting longer reads

ADD REPLYlink written 18 months ago by Santosh Anand4.9k

I'm just interested in the first 30 bases

ADD REPLYlink written 18 months ago by Bastien Hervé4.4k

Nevertheless, don’t use a local aligner if you want a full sequence alignment.

You could also look at just calculating Hamming or Levenshtein distances if your sequences are pretty similar and don’t need aligning first

ADD REPLYlink written 16 months ago by jrj.healey13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1640 users visited in the last hour