Entering edit mode
5.7 years ago
James Reeve
▴
130
I'm looking into filtering a GFF3 file I generated from GMAP. Looking online it's clear that there isn't a universal way to filter GFF3s, but the score (column 6) seems to be commonly used. I would like to understand what the score means before setting a threshold for filtering. However, there isn't a universal format for the score (GFF3 specifications) and the GMAP manual doesn't explain the GFF3 format.
What is the value in the score column?
Gut feeling tells me that it will likely be some alignment score based value. That being said, I think this column is often ignored in GFF format (as in usually not set).
If you can explain what exaclty (or why) you want to filter from the GFF file, we might be able to better assist you.
Ah, and never use 'universal' in combination with GFF they are literally nearly complete opposites of each other indeed. ;)
I created the GFF as the first step in identifying orthologs between two species. I want to filter out any sequences that have a poor match between species. I'm also plaining to filter for coverage and % identity. I'm checking if the score was a relavent filtering parameter.
Yeah, I'm quickly learning there's no instruction manual to ortholog detection, file formats included.
But you will have those values from something like a blast output or such, no? then simply filter on IDs, there is not point in starting to filter the GFF file itself (and especially not on the score included, eg. might have low score but still valid blast hits)
I might have to disagree here a little, there are quite a few nice tutorials around for orhtology detection. Have you looked around? perhaps worth to have a look at:
Easy way to run easily orthoMCL (Copy & paste)