I have RNAseq
data and trying to count the reads that map to 50 nt
upstream and downstream of CDS
start site. to do so, I aligned the fastq
files to the transcriptome
and to count the reads that map to the window of 100 nt
(50 nt
upstream and downstream of the CDS
start site) I got GTF
file from gencode
and made a new one only for the "start_codon
" (as feature type). and then I changed coordinates (start was "start_codon"-50
and end was "start_codon"+50
).
but when I checked the counts for the mentioned window (100 nt
). for many of them the read count is not correct. in fact if there is any intron
in the window of 100 nt
that would make the problem because I need only the exons
.
BTW, I used htseq
to count the reads that map to the 100 nt
window.
do you know how to solve this problem?