Question: Blastn command line results -- bad alignments but high scores
0
gravatar for jennycade
4.4 years ago by
jennycade0
United States
jennycade0 wrote:

I'm running blastn from the command line and keep getting weird results -- about half of the sequences have terrible alignments to the query, but have the same scores and E values as perfect matches. Using the -perc_identity option doesn't seem to change anything. I've tried changing just about everything except for the blast database.

If anyone has any advice I'd be very grateful.

ETA: This is using BLASTN 2.2.30+.

command line used: blastn -db ../i20.fasta -query query1.txt -out test.txt -perc_identity 100

Example of results:

>HS3:591:C5VMVACXX:4:1101:10795:94073 1:N:0:CTAAGGTC
Length=100

 Score =   102 bits (55),  Expect = 2e-20
 Identities = 12/55 (22%), Gaps = 0/55 (0%)
 Strand=Plus/Plus

Query  1   GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  55
           |    |   | |            || || | |                |    |
Sbjct  33  GGTGGTAAGATTCAATAAATACAATACAAGACACACATTATCTAATGGCTCTTTT  87

>HS3:591:C5VMVACXX:4:2316:8876:76137 1:N:0:CTAAGGTC
Length=100

 Score =   102 bits (55),  Expect = 2e-20
 Identities = 55/55 (100%), Gaps = 0/55 (0%)
 Strand=Plus/Plus

Query  1   GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  55
           |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  41  GTATTTTTCAATTCTATTTACGCGTATAATATATCTTCGTCAACTATTGTGGAGT  95
blast • 1.5k views
ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by jennycade0

If you blast against very big contigs, then blast will return all the matches that fit your threshold. For instance your query could match many times against chromosome 1; so the score isn't for any particular match but for chromosome 1 in total. 

Can you provide the blast version and the exact command line used and we can provide more help.

ADD REPLYlink written 4.4 years ago by mark.ziemann1.2k

Thanks for the reply. I added the version and command line I used to the post.

The database I'm searching is a bunch of sequencing reads, so they're not huge contigs. The thing that's really confusing me is that one sequence that only matches in 12 positions has the same score as another sequence one that matches all the way across.

ADD REPLYlink written 4.4 years ago by jennycade0

Can you post the read sequences that were the queries in your example output? And the species of interest (I don't have your database). I wanted to recreate this locally.

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by pld4.8k

Thanks. The query is just the 55 bp sequence shown in the example—the sequencing reads are in the database. This is in an Arabidopsis mutant.

(I hope I'm being clear enough. I'm a newbie to bioinformatics!)

ADD REPLYlink written 4.4 years ago by jennycade0

Can you pull the sequencing reads from the database?

ADD REPLYlink written 4.4 years ago by pld4.8k
0
gravatar for jennycade
4.4 years ago by
jennycade0
United States
jennycade0 wrote:

Mystery solved—I'm looking at paired-end sequencing reads, and it turns out that the poorly-aligning sequences that showed up were mates of the good matches. I had concatenated the two read files together before I made the database, so blast was pulling out both of the mates in a pair when it gave me the results.

When I blast against a database made from just one of the read files it works fine.

ADD COMMENTlink written 4.4 years ago by jennycade0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 840 users visited in the last hour