Relative Positions +/- strand
1
0
Entering edit mode
7.6 years ago

Lets say I have a file that contains all the exon starts and stops of a particular gene that also includes the identification of the strand (+ or -). I have a second file that contains a particular position within that gene that also contains a strand identifier. If I just want to compare the two files to determine the relative position in the transcript the particular position lies, can I directly compare my particular position with the exons if the strands match in both of my files. In other words, I don't need to correct for the strand if both the exon starts/stops and the particular position for testing are reported on the same strand, right? I'm trying to determine if particular positions are in the front or end of a transcript and don't want to mess up my results because of strand errors.

These are what my files look like at the time:

chr20    60718876    60718945    NM_198935_cds_0_0_chr20_60718877_f    0    +
chr20    60733727    60733804    NM_198935_cds_1_0_chr20_60733728_f    0    +
chr20    60734932    60735017    NM_198935_cds_2_0_chr20_60734933_f    0    +
chr20    60736491    60736636    NM_198935_cds_3_0_chr20_60736492_f    0    +
chr20    60737807    60737987    NM_198935_cds_4_0_chr20_60737808_f    0    +
chr20    60738513    60738678    NM_198935_cds_5_0_chr20_60738514_f    0    +
chr20    60739200    60739302    NM_198935_cds_6_0_chr20_60739201_f    0    +
chr20    60740477    60740570    NM_198935_cds_7_0_chr20_60740478_f    0    +
chr20    60747737    60747857    NM_198935_cds_8_0_chr20_60747738_f    0    +
chr20    60749572    60749700    NM_198935_cds_9_0_chr20_60749573_f    0    +
chr20    60754237    60754264    NM_198935_cds_10_0_chr20_60754238_f    0    +
chr2    86067266    86067515    NM_003896_cds_0_0_chr2_86067267_r    0    -
chr2    86071518    86071677    NM_003896_cds_1_0_chr2_86071519_r    0    -
chr2    86073499    86073686    NM_003896_cds_2_0_chr2_86073500_r    0    -
chr2    86074983    86075327    NM_003896_cds_3_0_chr2_86074984_r    0    -
chr2    86088303    86088415    NM_003896_cds_4_0_chr2_86088304_r    0    -
chr2    86090484    86090608    NM_003896_cds_5_0_chr2_86090485_r    0    -
chr2    86115946    86116028    NM_003896_cds_6_0_chr2_86115947_r    0    -


and the particular positions for testing:

NM_198935    chr20    60749698    +
NM_003896    chr2    86071665    -


Right now, I'm getting that the position for testing on chr20 is towards the end and the position on chr2 is toward the beginning. This is true in regards to the transcript, right?

sequence • 1.5k views
0
Entering edit mode

Not true, actually -- both positions fall in the second-to-last exon of their respective transcripts! When sorted ascending by genomic coordinate, genes on the (+) strand will appear first exon -> last exon, while genes on the (-) strand will appear last exon -> first exon.

2
Entering edit mode
7.6 years ago
Asaf 9.5k

You have to consider strands since when the strand is + the distance from the beginning of the exon is measured from the first position and for the - strand it's from the second (last) position.

If you have doubts about your calculation it might be a good idea to do the analysis separately on the two strands and then see if you get the same results or off-by 1, reverse etc. differences between the two.