Question: counting the reads that map to a special window
0
gravatar for Christleemans
24 months ago by
Christleemans0 wrote:

I have RNAseq data and trying to count the reads that map to 50 nt upstream and downstream of CDS start site. to do so, I aligned the fastq files to the transcriptome and to count the reads that map to the window of 100 nt (50 nt upstream and downstream of the CDS start site) I got GTF file from gencode and made a new one only for the "start_codon" (as feature type). and then I changed coordinates (start was "start_codon"-50 and end was "start_codon"+50). but when I checked the counts for the mentioned window (100 nt). for many of them the read count is not correct. in fact if there is any intron in the window of 100 nt that would make the problem because I need only the exons. BTW, I used htseq to count the reads that map to the 100 nt window. do you know how to solve this problem?

rna-seq • 543 views
ADD COMMENTlink modified 24 months ago by Alex Reynolds30k • written 24 months ago by Christleemans0
0
gravatar for Alex Reynolds
24 months ago by
Alex Reynolds30k
Seattle, WA USA
Alex Reynolds30k wrote:

Here's a set-operation based approach:

$ gtf2bed < annotations.gtf | grep -w CDS | awk -vpad=50 -vOFS="\t" '($6=="+"){ print $1, ($2-pad), ($2+pad) } ($6=="-"){ print $1, ($3-pad), ($3+pad) }' > CDS.100nt.bed
$ bedmap --echo --count --delim '\t' CDS.100nt.bed <(bam2bed < reads.bam) > counts.bed
$ bedops -n 1 counts.bed introns.bed > counts.filtered.bed

You'd need to generate introns.bed from exon and gene annotations. There are other questions on Biostars about this calculation.

ADD COMMENTlink modified 24 months ago • written 24 months ago by Alex Reynolds30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1197 users visited in the last hour