Hi all, total noob question:
I have a GFF3 file of a pepper (C. annuum) plant genome that looks like this:
seqid src type start end
chr01 PROTEIN gene 29119 37617 . - . ID=CA.PGAv.1.6.scaffold567.122
chr01 PROTEIN mRNA 29119 37617 . - . ID=TC.CA.PGAv.1.6.scaffold567.122;Parent=CA.PGAv.1.6.scaffold567.122
chr01 PROTEIN exon 29119 29457 . - 0 Parent=TC.CA.PGAv.1.6.scaffold567.122
...
chr02 ABINITI gene 157637 159805 0.22 - . ID=CA.PGAv.1.6.scaffold1545.2
...
chr04 ISGAP gene 11689 14256 1096 + . ID=CA.PGAv.1.6.scaffold638.93
...
I am trying to cross-reference the features in the GFF3 with the genes from this paper which identifies the locations with numbers such as "LOC107867643", "LOC107868281" etc which I'm assuming are the absolute coordinates in their aligned sequence.
I'm assuming the "start" column is relative to the location of the seqid (because chr04 for example has a start less than chr02) and the spec.
My question is: how then do I translate the chr02 start 157637 for example to an absolute coordinate I can match up relative to the LOC numbers published in the paper?
For example, if the last feature for chr01 has an "end" of 309042759 and the first feature for chr02 has a "start" of 157637 can I just do 309042759 + 157637 = 309200396 to get the whole genome coordinate for that feature?
I found this Biostars question that noted if the chromosome was listed in the file it would start with 1 but I do not have any such entries in this file.
Any help would be great thanks
Numbering for each chromosome should re-start with number 1. So that is not a problem. Looks like chr04 has a feature annotated earlier than chr02.
thanks for the reply max, so when a paper quotes a gene at "LOC107867643" for example, is that usually the coordinate from the beginning of the entire alignment ie 1 of chr01? Or is it from the beginning of a chromosome and for that reason I need to know what chromosome it is too?
AFAIK
LOC
id's have no relation to the chromosome at NCBI. They are ID's assigned to gene's of unknown function.Ahhh don't know how I missed that. Thanks Max this is what I needed to know