Question

Blastn/megablast, obtain alignments of more than 100 bp and 99% identity

0

Entering edit mode

9.0 years ago

jlerga • 0

I have a problem using blastn or megablast. The thing is as follows:

I have several sequences, and I want to align them. I want to obtain those sequences with an aligment of more than 99% of identity and more than 100 bp of length.

However, imagine we have the following two sequences:

>seq_a AGCTGACTGACCAGTGACTGCATGACTGCATGGGCCCGAGCGCGCGCGTATTATGCTGCTAGATGCTGTAATGCTCTACTATTAGAGAGAGACTGTGATGATTTGACGTACGTCGTAGCGATCGATAGCATCGATCGAGCTATGCATCGATCGATCGATCGACTAGCATGCATGCTAGTACTGACGTACATGCGTACGTCGTCATGAGTGACGACACACTGATGCAGTCATGTGTTGTGACTGACTCTTTATACTCAAGCTACACATCTCATTTTACGACGTAGCTCAAGACTCTCAGACTGGACTGACGACATC
>seq_b AGCTGACTGACCAGTGACTGCATGACTGCATGGGCCCGCGCGCGCGCGTATTATGCTGCTAGATGCTGTATTGGTGTGTGATTAGAGAGAGTAGTGATGATTTAGACGTACGTCGTAGCGATCGATAGCATCGATCGAGCTATGCATCGATCGATCGATCGACTAGCATGCATGCTAGTACTGACGTACATGCGTACGTCGTCATGAGTGACGACACACTGATGCAGTCATGTGTTGTGACTGACTCTTTATACTCAAGCTACACATCTCATATTACGACGTAGCTCAAGATAGTA

They can be aligned and the results will show an aligment of 96% covering 282 nucleotides. HOWEVER, there is a CORE (an that is what I want to know, the important part) of 186 nucleotides with an identity of 99%.

If I use blastn or megablast with default parameters (and many others...) I will never know if my sequences possess a region within them with a high identity.

You know which paremeters I should use? Or which programs can do what I am suggesting?

blast alignment • 2.2k views

ADD COMMENT • link updated 20 months ago by Ram 43k • written 9.0 years ago by jlerga • 0

0

Entering edit mode

You want to find conserved cores between any two pairs of sequences on your dataset, or find a core where all sequences are conserved? Do your sequences have a good alignment over all their extension, or conserved regions alternated with divergent regions?

ADD REPLY • link updated 20 months ago by Ram 43k • written 9.0 years ago by h.mon 35k

0

Entering edit mode

No a conserved core in all sequences
Nor conserved regions alterned with divergent regions

I want to find any pair of sequences that have, in general, a good alignment (over all their extension or over a relatively large area) with a very conserved core (more than 100 bp and 99% identity).

The problem is... maybe we find two sequences very very similar, with an aligned region of 1000 bp with 90% identity. However, there is not a core of more than 100bp of 99% identity. That's the point.

I am thinking in using blast with a word size (the length of the seed that initiates an alignment) of 100 bp. That way, I can obtain those pairs of sequences very similar with a core of 100 bp totally conserved. It is not exactly the same... but its something.

Thank you for your interest! If you have any idea, comment it please

ADD REPLY • link updated 20 months ago by Ram 43k • written 9.0 years ago by jlerga • 0

score 0 · Answer 1 · 2015-05-14

0

Entering edit mode

9.0 years ago

h.mon 35k

Maybe you could use blast with default parameters (or tweak to get results to your liking), then parse blast output to get the alignment, and finally calculate percent identity over a small sliding window.

ADD COMMENT • link 9.0 years ago by h.mon 35k