Question

BLAST Alignment vs Subject Sequence Start (sstart) and Subject Sequence End (ssend)

0

Entering edit mode

4.7 years ago

vickyzhe • 0

Hi everyone,

I am currently using blast to align metagenomic reads to a reference database. I am interpreting my blast results in the format: -outfmt "6 qseqid sseqid qlen length sstart send evalue bitscore"

I expected that |sstart - send| would be the same as the alignment length, but it differs. For example:

ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 620872  621507  0.0 730
ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 802698  802063  0.0 730
ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 869271  869906  0.0 730
ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 1278239 1277604 0.0 730
ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 1407425 1408060 0.0 730
ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 1467885 1467250 0.0 730
ERR2906227.2    Staphylococcus_aureus_chromosome    1076    997 2170114 2169166 1.33e-128   460
ERR2906227.3    Lactobacillus_fermentum_complete_genome 1149    1165    976005  974864  0.0 1238
ERR2906227.5    BS.pilon.polished.v3.ST170922   1060    1051    2593827 2594850 0.0 1315
ERR2906227.6    Listeria_monocytogenes_complete_genome  1126    1112    1616191 1617286 0.0 1637

In the very first alignment:

ERR2906227.1    Lactobacillus_fermentum_complete_genome 652 645 620872  621507  0.0 730

The aligned portion of the subject sequence has a length of |(620872-621507)| = 635 but blast is saying the alignment length is 645. Can anyone explain why that is?

Assembly alignment genome BLAST metagenomics • 1.2k views

ADD COMMENT • link updated 4.6 years ago by juanjo75es ▴ 130 • written 4.7 years ago by vickyzhe • 0

0

Entering edit mode

It is possible that simple stretches of A/T's are masked. Have you looked at the pair-wise alignment of query/subject?

$ efetch -db nuccore -format fasta -id NC_010610.1 -chr_start 620872 -chr_stop 621507
>NC_010610.1:620873-621508 Lactobacillus fermentum IFO 3956 DNA, complete genome
ACCAGGCGGAAGAAACGGCGAAGGGAACCTTGATTGCCGCCGAAGTTAAGGTGGCCGGAATGGACCAACT
GCGTCAAATGGCTGACGATTGGCGCCAAAAGGATTACTCCGATGTCTTGGTCTTAGCCACGGCCGCTGGT
GACAAGGCCAACCTCTTGGTAGCAGTCAAGGATGGCCTGGCCAAGACGGTCAAGGCTGGTGACTTGATCA
AGGCAATTGCCCCGGCCATTAACGGGGGCGGGGGCGGTCGCCCAACCCTGGCCCAGGCCGGCGGAAAGAA
CCCAGCCGGGATTAGCGAAGCTCTCAAGCTGGCTAGCCAATTCATGAATGATTAAGGGAAAGTCGACATT
TTCCTCCTCGGTAGACTGGATTTAGACACTAAGTCCGCGGGGAATTTCCTGGCACCATTACGAAATAAAA
CCCCTCACCGGCTTGCGCAAGCTGGTAGGGGTTTTTGTTTGGCACCATTTCCGGGGGGCTAAATTGTACT
TTTGGTGGGGATATAGTACAATTAGCAATGGTACATTTGATGGAGGTGACTGAAATGGCAACTGATGAAA
CGATGTTTTTTGACTTTGATAAGTTGCACGAAAAAAACGTCAAAGAGACCCTTAAGACGGTCTATGATGC
CCTTGA

ADD REPLY • link 4.7 years ago by GenoMax 141k

0

Entering edit mode

After looking at the blast algorithm again, I believe this difference in length might be because of gaps in the subject sequence. I originally believed the algorithm gapped the query sequence but it seems like it gaps both. And the alignment length seems to include those gaps. However, I am not 100% sure of this.

ADD REPLY • link 4.7 years ago by vickyzhe • 0

1

Entering edit mode

Did you look at the pair-wise alignment?

ADD REPLY • link 4.7 years ago by GenoMax 141k

0

Entering edit mode

Yes, I used the online BLAST available on the blast website rather than the linux interface and it turns out that there are gaps in the subject sequence. Thanks for the help!

ADD REPLY • link 4.6 years ago by vickyzhe • 0

score 0 · Answer 1 · 2019-09-02

You can also make a try with this tool. It's an alternative tool for finding alignments to reference geneomes (actually only a few of them). It finds different results than BLAST given that it's a different technology. Please feeel free to make any comment directly to my email (in the page) if you need help. I am very interested to get feedback from users.