Hi, I just downloaded the hg38.ncbiRefSeq.gtf annotation file from UCSC and I'm looking at the first few lines of it.
chr1 ncbiRefSeq exon 11874 12227 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "1"; exon_id "NR_046018.2.1"; gene_name "DDX11L1";
chr1 ncbiRefSeq exon 12613 12721 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "2"; exon_id "NR_046018.2.2"; gene_name "DDX11L1";
chr1 ncbiRefSeq exon 13221 14409 . + . gene_id "DDX11L1"; transcript_id "NR_046018.2"; exon_number "3"; exon_id "NR_046018.2.3"; gene_name "DDX11L1";
So, since the gene DDX11L1 is on the plus strand, I interpret this so that its exon1 spans from position 11874-12227, its exon2 from 12613-12721 and its exon3 from 13221-14409, correct? Meaning that the bases in between these positions correspond to the introns 1 (12228-12612) and 2 (12722-13220), right?
So far, so good. But now when I look at a gene that is located on the minus strand, for example WASH7P (the very next gene in the file):
chr1 ncbiRefSeq exon 14362 14829 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; exon_number "1"; exon_id "NR_024540.1.1"; gene_name "WASH7P";
chr1 ncbiRefSeq exon 14970 15038 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; exon_number "2"; exon_id "NR_024540.1.2"; gene_name "WASH7P";
chr1 ncbiRefSeq exon 15796 15947 . - . gene_id "WASH7P"; transcript_id "NR_024540.1"; exon_number "3"; exon_id "NR_024540.1.3"; gene_name "WASH7P";
(I just show the first 3 exons here)
I understand that the positions shown here actually show the end and than the start of the exon, because its on the minus strand, right? So, position 14362 is the LAST base of exon1 and 14829 is the FIRST base of exon1, correct? For exon2 the FIRST base is 15038 and the LAST one is 14970, right?
So, in the mRNA resulting from joining all the exons here, wouldn't what is called "exon1" in WASH7P actually be the LAST exon in the mRNA? Not the first one? Why is it called exon ONE in the file? Wouldn't the exon order be reversed for genes located on the minus strand if their START and END positions are reversed? On the minus strand, I would expect exon n+1 to be located UPSTREAM of exon n, not downstream? This is confusing af.
Am I interpreting anything wrong?