Lets say I have a file that contains all the exon starts and stops of a particular gene that also includes the identification of the strand (+ or -). I have a second file that contains a particular position within that gene that also contains a strand identifier. If I just want to compare the two files to determine the relative position in the transcript the particular position lies, can I directly compare my particular position with the exons if the strands match in both of my files. In other words, I don't need to correct for the strand if both the exon starts/stops and the particular position for testing are reported on the same strand, right? I'm trying to determine if particular positions are in the front or end of a transcript and don't want to mess up my results because of strand errors.
These are what my files look like at the time:
chr20 60718876 60718945 NM_198935_cds_0_0_chr20_60718877_f 0 +
chr20 60733727 60733804 NM_198935_cds_1_0_chr20_60733728_f 0 +
chr20 60734932 60735017 NM_198935_cds_2_0_chr20_60734933_f 0 +
chr20 60736491 60736636 NM_198935_cds_3_0_chr20_60736492_f 0 +
chr20 60737807 60737987 NM_198935_cds_4_0_chr20_60737808_f 0 +
chr20 60738513 60738678 NM_198935_cds_5_0_chr20_60738514_f 0 +
chr20 60739200 60739302 NM_198935_cds_6_0_chr20_60739201_f 0 +
chr20 60740477 60740570 NM_198935_cds_7_0_chr20_60740478_f 0 +
chr20 60747737 60747857 NM_198935_cds_8_0_chr20_60747738_f 0 +
chr20 60749572 60749700 NM_198935_cds_9_0_chr20_60749573_f 0 +
chr20 60754237 60754264 NM_198935_cds_10_0_chr20_60754238_f 0 +
chr2 86067266 86067515 NM_003896_cds_0_0_chr2_86067267_r 0 -
chr2 86071518 86071677 NM_003896_cds_1_0_chr2_86071519_r 0 -
chr2 86073499 86073686 NM_003896_cds_2_0_chr2_86073500_r 0 -
chr2 86074983 86075327 NM_003896_cds_3_0_chr2_86074984_r 0 -
chr2 86088303 86088415 NM_003896_cds_4_0_chr2_86088304_r 0 -
chr2 86090484 86090608 NM_003896_cds_5_0_chr2_86090485_r 0 -
chr2 86115946 86116028 NM_003896_cds_6_0_chr2_86115947_r 0 -
and the particular positions for testing:
NM_198935 chr20 60749698 +
NM_003896 chr2 86071665 -
Right now, I'm getting that the position for testing on chr20 is towards the end and the position on chr2 is toward the beginning. This is true in regards to the transcript, right?
Not true, actually -- both positions fall in the second-to-last exon of their respective transcripts! When sorted ascending by genomic coordinate, genes on the (+) strand will appear first exon -> last exon, while genes on the (-) strand will appear last exon -> first exon.