7.0 years ago
erwan.scaon

I am using culling_limit 1 as a parameter

From the manual : Delete a hit that is enveloped by at least this many higher-scoring hits

My understanding : The culling limit can be used to remove redundant hits. In practice it sets the number of hits returned per subject sequence

The command line $blastn -query reads.fa -subject locus.fa -strand plus -culling_limit 1 -dust no -out result.csv -outfmt 6

One unexpected result :

qseqid  sseqid  pident  length  mismatch    gapopen qstart  qend    sstart  send    evalue  bitscore
QJLFG:08700:06611   gi|372099098:113208001-113426000    98.131  107 1   1   1   106 51978   52084   5.11E-49    185
QJLFG:08700:06611   gi|372099098:113208001-113426000    79.167  120 14  8   103 215 217412  217527  4.15E-15    73.1
QJLFG:08700:06611   gi|372099098:113208001-113426000    97.561  41  1   0   103 143 217437  217477  1.49E-14    71.3

The "3rd hit" as far as i understand is redundant regarding the "2nd hit" : same subject region, same part of the read involved, but it's a shorter alignement with a higher e-value

Why is it not discarded with culling_limit 1 ?

I appreciate this is an old post, but I am having the same issue. Did you manage to find a solution to this problem? I am searching a large number of similar queries against about 2k genomes, and for some of these target sequences I am getting >50 hits with culling limit of 1. Does culling limit not do what I think it does?

Thanks in advance.

That is an uncommon parameter that I have not personally used but help for that parameter says

culling_limit Delete a hit that is enveloped by at least this many higher-scoring hits.

If you are getting >50 hits then perhaps they are all higher scoring hits. Have you tried to set the parameter to a larger number? Are you looking to keep only one hit?

Hi GenoMax, sorry for the late response. Yes I am looking to only keep one hit.

I am not sure I follow the idea that all the reported hits are higher scoring hits. My interpretation of that definition is that culling limit should remove the hits for which there is a higher scoring hit. If I set it to 1 it should "delete a hit that is enveloped by at least 1 higher scoring hit". Surely this means that only the highest scoring hit would remain?

Hum it's pretty hard to read, here is a focus on relevant infos :

format : qstart<->qend --- sstart<->send --- evalue

2nd hit : 103<->215 --- 217412<->217527 --- 4.15E-15

3rd hit : 103<->143 --- 217437<->217477 --- 1.49E-14


