counting the reads that map to a special window
1
0
Entering edit mode
6.0 years ago

I have RNAseq data and trying to count the reads that map to 50 nt upstream and downstream of CDS start site. to do so, I aligned the fastq files to the transcriptome and to count the reads that map to the window of 100 nt (50 nt upstream and downstream of the CDS start site) I got GTF file from gencode and made a new one only for the "start_codon" (as feature type). and then I changed coordinates (start was "start_codon"-50 and end was "start_codon"+50). but when I checked the counts for the mentioned window (100 nt). for many of them the read count is not correct. in fact if there is any intron in the window of 100 nt that would make the problem because I need only the exons. BTW, I used htseq to count the reads that map to the 100 nt window. do you know how to solve this problem?

RNA-Seq • 1.0k views
ADD COMMENT
0
Entering edit mode
6.0 years ago

Here's a set-operation based approach:

$ gtf2bed < annotations.gtf | grep -w CDS | awk -vpad=50 -vOFS="\t" '($6=="+"){ print $1, ($2-pad), ($2+pad) } ($6=="-"){ print $1, ($3-pad), ($3+pad) }' > CDS.100nt.bed
$ bedmap --echo --count --delim '\t' CDS.100nt.bed <(bam2bed < reads.bam) > counts.bed
$ bedops -n 1 counts.bed introns.bed > counts.filtered.bed

You'd need to generate introns.bed from exon and gene annotations. There are other questions on Biostars about this calculation.

ADD COMMENT

Login before adding your answer.

Traffic: 1242 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6