Question: Filtering for Percentage of identity (sequence identity: pident, %) from tblastn results
0
gravatar for endretoth
4 weeks ago by
endretoth0
endretoth0 wrote:

Dear Bioinformaticians,

I would like to ask about defining the level of filtering by sequence identity (pident, %) from tblastn results.

I have a table of tblasn results in Galaxy including about 800,000 sequences. I would like to filter them by sequence identity but if I filter them with 98% I lose almost all sequences. I would like to know what is the accepted level for filtering considering that this is from protein! data. I think this should not be as strict as a blastn filtering (commonly 98 or 99%). Please give me advice and link me to any publication which tells me a proper percentage.

All answers are greatly appreciated. :)

Thend

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by endretoth0

It is impossible to say without knowing any details about the project. Why do you need to filter the sequences?

Even when knowing the details, there is probably no perfect threshold, it is often a trade-off between removing artifacts (I guess this is what you want to do) and not losing too much information.

ADD REPLYlink written 4 weeks ago by Corentin320
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1701 users visited in the last hour