Hi friends,
I need some help for selecting right target sequences for my gff file. I'm working on a specie without ref. genome, I have got the scaffold from genome sequencing and I have performed blastx against SWISS-Prot database.
I want to convert my blastx output file to GFF; Should I define a cut-off based on identity percent or e-value for blastx output??
Because many of target sequences have identity percent less than 50% but their e-values are fine.
Thanks
I think by using scaffold it means that your sequence is long enough not to be aligned by chance , so I would go for Identity in such case and here a tool that will help you to convert
I think that i have a similar problem - you are trying to select the best hit to include in the GFF file right? I am currently analysing a metatranscriptome dataset. After 'blasting' the sequences against the nr DB I was trying to find/develop a reasonable algorithm to select the "correct hit". I guess when mapping against a large DB like swissprot or nr the E-value is not a bad score to work with, but what about all the other scores such as alignment length, mismatches, and bit-score? I thought about combining all these scores to create a factor that will include all these scores but simply using something like (legnth/mismatches)*bitscore/Evalue sounds over-simplified for me... I mean- should all scores receive the same weight? are they all equally important? If anyone known about a tool that is meant to calculate the best hit from a blast output (preferably the standard 12 columns tabular format) I will be very happy to hear about it...