How To Remove The Intronic Reads Before Counting
2
1
Entering edit mode
10.9 years ago
camelbbs ▴ 710

I got RNASeq data in several samples. I checked the FastQC, seems the read quality are good (Hiseq 2000). But the problem is many reads are mapped to intronic region, and the regions have no any reference exons there (Refseq, ensembl, gencode). We don't know what they are. We guess the problem happend in library preparation, the concentration was low. Now the data has come out and we can't re-sequencing, so we want to remove the reads mapped to intronic region, is there a method to do that? Or anyone have an idea about the intronic reads. Thanks.

rnaseq rna-seq • 4.5k views
ADD COMMENT
3
Entering edit mode

"We don't know what they are."

Ask yourself: "why should I remove intronic reads?" Do you want to remove outcome that you do not understand, until your experiment fits your expectations?

"We guess the problem happend in library preparation, the concentration was low."

What does low concentration have to do with getting unwanted reads, what 'makes up sequences' that are not real in case of low concentration? See also Why are there many RNA-seq hits to intronic regions? Intronic sequences might be novel transcripts, remains of nascent RNA, lincRNA, antisense RNA, if close to exons, wrong exon boundaries in the annotation.

ADD REPLY
2
Entering edit mode
10.9 years ago

If you have a bed file of exonic regions, or gtf, something like that, you can use BEDTools to filter your .bam for reads that fall in the desired coordinates, using intersectBed

ADD COMMENT
0
Entering edit mode

Thanks. It will be like this?

intersectBed -abam s1.bam -b hg19ensembl.gtf > s1.filter.bam
ADD REPLY
1
Entering edit mode
10.9 years ago

You can easily use BEDOPS to solve this problem quickly. It includes bedops and various conversion scripts for putting data into BED format, which bedops can process.

Assuming your reads are in BAM format:

$ bam2bed < reads.bam \
    | bedops --not-element-of -1 - introns.bed \
    > reads-not-in-introns.bed

The file reads-not-in-introns.bed is a sorted BED file containing all reads that do not overlap intronic elements.

You can then pass this result to bedmap to do counting of reads over other region sets (whole-genome or subsets).

Note that we assume your introns are in BED format and are sorted, e.g.:

$ sort-bed unsorted-introns.bed > introns.bed

Alternatively, if your introns are in some other format — say, GTF — then BEDOPS conversion scripts will losslessly turn them into sorted BED, e.g.:

$ gtf2bed < introns.gtf > introns.bed
ADD COMMENT
0
Entering edit mode

A very useful tool. Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 3252 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6