Question: Filtering for Percentage of identity (sequence identity: pident, %) from tblastn results
gravatar for endretoth
11 months ago by
endretoth20 wrote:

Dear Bioinformaticians,

I would like to ask about defining the level of filtering by sequence identity (pident, %) from tblastn results.

I have a table of tblasn results in Galaxy including about 800,000 sequences. I would like to filter them by sequence identity but if I filter them with 98% I lose almost all sequences. I would like to know what is the accepted level for filtering considering that this is from protein! data. I think this should not be as strict as a blastn filtering (commonly 98 or 99%). Please give me advice and link me to any publication which tells me a proper percentage.

All answers are greatly appreciated. :)


ADD COMMENTlink modified 11 months ago • written 11 months ago by endretoth20

It is impossible to say without knowing any details about the project. Why do you need to filter the sequences?

Even when knowing the details, there is probably no perfect threshold, it is often a trade-off between removing artifacts (I guess this is what you want to do) and not losing too much information.

ADD REPLYlink written 11 months ago by Corentin450
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1579 users visited in the last hour