Question

blastx output cut-off for creating gff

0

Entering edit mode

7.4 years ago

Whoknows ▴ 960

Hi friends,

I need some help for selecting right target sequences for my gff file. I'm working on a specie without ref. genome, I have got the scaffold from genome sequencing and I have performed blastx against SWISS-Prot database.

I want to convert my blastx output file to GFF; Should I define a cut-off based on identity percent or e-value for blastx output??
Because many of target sequences have identity percent less than 50% but their e-values are fine.

Thanks

gff blastx ngs • 2.3k views

ADD COMMENT • link 7.4 years ago by Whoknows ▴ 960

0

Entering edit mode

I think by using scaffold it means that your sequence is long enough not to be aligned by chance , so I would go for Identity in such case and here a tool that will help you to convert

ADD REPLY • link 7.4 years ago by Medhat 9.7k

0

Entering edit mode

I think that i have a similar problem - you are trying to select the best hit to include in the GFF file right? I am currently analysing a metatranscriptome dataset. After 'blasting' the sequences against the nr DB I was trying to find/develop a reasonable algorithm to select the "correct hit". I guess when mapping against a large DB like swissprot or nr the E-value is not a bad score to work with, but what about all the other scores such as alignment length, mismatches, and bit-score? I thought about combining all these scores to create a factor that will include all these scores but simply using something like (legnth/mismatches)*bitscore/Evalue sounds over-simplified for me... I mean- should all scores receive the same weight? are they all equally important? If anyone known about a tool that is meant to calculate the best hit from a blast output (preferably the standard 12 columns tabular format) I will be very happy to hear about it...

ADD REPLY • link 7.4 years ago by KarberoS • 0

score 1 · Answer 1 · 2016-11-24

1

Entering edit mode

7.4 years ago

Whoknows ▴ 960

Hi

I found my answer !!!

In this below paper, they define default values for blastx Identity percent threshold = 0.5 and E-value = 1e-06

Genome Annotation and Curation Using MAKER and MAKER-P, Current Protocol in Bioinformatics,2014.
By Mark Yandell

ADD COMMENT • link 7.4 years ago by Whoknows ▴ 960