Entering edit mode
2.8 years ago
lma
•
0
Hello,
I need to remove reads of intronic and intergenic origin from my bam files. I used the script split_bam.py (rseqc) and genome_annotation.gft2bed.bed file to filter out these reads. Although, most intronic and intergenic reads were removed, I'm still getting some reads from these regions (see output from qualimap). I'm not sure why I'm still getting intronic and intergenic reads (any comment?). Also, Is there another way to totally remove these reads? how can I get a bed file that helps filter out these reads?
FYI, I removed rRNA reads from these samples in a previous step.
Sample 1:
Before:
exonic = 26,416,935 (90.23%),
intronic = 1,082,069 (3.7%),
intergenic = 1,779,920 (6.08%),
overlapping exon = 1,036,788 (3.54%).
After:
exonic = 26,416,653 (95.54%),
intronic = 680,971 (2.46%),
intergenic = 551,391 (1.99%),
overlapping exon = 918,605 (3.32%).
Sample 2:
Before:
exonic = 1,069,866 (30.54%),
intronic = 139,044 (3.97%),
intergenic = 2,294,436 (65.49%),
overlapping exon = 201,608 (5.75%).
After:
exonic = 1,069,850 (76.3%),
intronic = 53,733 (3.83%),
intergenic = 278,517 (19.86%),
overlapping exon = 128,756 (9.18%).
Thanks!