Hi,
I am trying to create a representation of the alignment of RNA binding proteins to the transcript regions of mRNA (5' UTR, CDS, 3' UTR) using data from ENCODE. To do this I am aligning the data with an annotation file that I have (which I've borrowed from someone else) and am confused by the directionality and hence regional alignment of the proteins.
For example, here are 2 lines from the annotation:
#name chrom strand txStart txEnd cdsStart cdsEnd
uc031pju.2 chr1 + 925740 944581 925941 944153
uc001abz.5 chr1 - 944203 959290 944693 959240
So a read that is from the CDS would obviously be anywhere between 925941 and 944153 in the plus strand and between 944693 and 959240 in the minus strand.
But are the following statements correct? And if not can you correct them? In the case of the plus strand: 1) a read that is less than cdsStart but greater than txStart is 5' UTR 2) a read that is greater than cdsEnd but less than txEnd is 3' UTR
In the case of the minus strand: 1) a read that is less than cdsStart but greater than txStart is 3' UTR 2) a read that is greater than cdsEnd but less than txEnd is 5' UTR
I guess I am kind of confused about what the position numbers in the annotation really mean? Where do they come from?
Thanks in advance