Question: Blastx false positives
gravatar for biobio
5.8 years ago by
United States
biobio50 wrote:





We are working on using sequencing to identify novel viruses using blast. The idea is to sequence siRNAs from plants and use blast to find virus associated sequences. We are using the viral refseq database and using blastx. The problem we are running into is that even with low E-value cutoffs (10E-20), we are getting a lot of false positives. 

By false positive, we mean that the blast result shows a virus hit, but when we blast that contig again, we get matches from a plant genome. How can we filter our results to ensure the hits that say virus are actually viruses?



blast • 2.2k views
ADD COMMENTlink written 5.8 years ago by biobio50

When you blast it again, you blast it against the same database of viruses, correct? How can you get plant results then?

ADD REPLYlink written 5.8 years ago by Ram32k

No, sorry. We take the interesting results and blast it against NR using the web interface. 

ADD REPLYlink written 5.8 years ago by biobio50

That's why you're seeing plant results - because they are always a better fit than viral seqs when not filtered by organism.

ADD REPLYlink written 5.8 years ago by Ram32k

But if the sequences are actually from viruses, shouldn't viruses be the best hit?

ADD REPLYlink written 5.8 years ago by biobio50

They're not sequences from viruses, they're small plant (host) molecules that target complementary nucleotide sequences. What these things complement could be either host or foreign (viral).

When you get hits against plants for a given siRNA, I can think of two reasons:

    1. You're just finding that siRNA in the plant's genome

  - or -

    2. You're finding that siRNA's target in the plant's genome

ADD REPLYlink written 5.8 years ago by pld4.9k

Ah okay, that makes sense. So when doing blast against the viral database, is it possible to remove the plant hits without doing a blast against NR?

ADD REPLYlink written 5.8 years ago by biobio50

Well for one, I'm not sure why you're using BLASTX, siRNAs are sequence specific and target mRNA molecules, not proteins. So using BLASTX doesn't make any sense here.

This is the tricky part, on one hand you should still search against the host, but on the other hand even if a siRNA matches a host gene, it still may have anti-viral activity in vivo (either through silencing a gene needed by the virus or by silencing the virus directly).

I would also narrow my search down to plant viruses, no sense in searching animal viruses.

ADD REPLYlink modified 5.7 years ago • written 5.7 years ago by pld4.9k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1867 users visited in the last hour