Question: BLAST run - assigning which one as query and subject?
0
gravatar for pbigbig
3.1 years ago by
pbigbig190
United States
pbigbig190 wrote:

Hi everyone,

I have some confusion here that would be great if you could clarify: Supposed I have a large set of scaffold in fasta format (from a genome assembly for example, and it may contain assembly errors), and I have a small reference cDNA set (obtained from Ensembl, so it can be considered as high quality reference). Normally, I was told that the larger set should be the subject for BLAST-ing and the smaller one should be used as query. Thus, should I makeblastdb of my large scaffold set and query the reference cDNA set against it? or doing vice-versa? (I have the feeling that using ref cDNA set as query is quite counter-intuitive because its role is for reference, so it should be the subject for BLAST-ing, isn't it?)

Thank you very much for any suggestion and clarification!

blast • 1.3k views
ADD COMMENTlink modified 3.1 years ago by Michael Dondrup46k • written 3.1 years ago by pbigbig190
0
gravatar for Michael Dondrup
3.1 years ago by
Bergen, Norway
Michael Dondrup46k wrote:

Using the small sequence set as a reference would give false low E-values, as those depend on the DB size and not on the total length or number of query sequences tested (Run a certain query sequence for itself, and with a bunch of other queries, the E-values are identical). You would get a massive multiple testing problem which is not corrected for. The assembly is much closer to the complete reference of all sequences that occur in this organism.

ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Michael Dondrup46k

Hi Michael, thank you very much for your clarification.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by pbigbig190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 818 users visited in the last hour