Question: How To Remove The Intronic Reads Before Counting
1
gravatar for camelbbs
5.3 years ago by
camelbbs630
China
camelbbs630 wrote:

I got RNASeq data in several samples. I checked the FastQC, seems the read quality are good (Hiseq 2000). But the problem is many reads are mapped to intronic region, and the regions have no any reference exons there (Refseq, ensembl, gencode). We don't know what they are. We guess the problem happend in library preparation, the concentration was low. Now the data has come out and we can't re-sequencing, so we want to remove the reads mapped to intronic region, is there a method to do that? Or anyone have an idea about the intronic reads. Thanks.

rnaseq rna-seq • 2.2k views
ADD COMMENTlink modified 5.3 years ago by Alex Reynolds25k • written 5.3 years ago by camelbbs630
3

"We don't know what they are."

Ask yourself: "why should I remove intronic reads?" Do you want to remove outcome that you do not understand, until your experiment fits your expectations?

"We guess the problem happend in library preparation, the concentration was low."

What does low concentration have to do with getting unwanted reads, what 'makes up sequences' that are not real in case of low concentration? See also Why are there many RNA-seq hits to intronic regions? Intronic sequences might be novel transcripts, remains of nascent RNA, lincRNA, antisense RNA, if close to exons, wrong exon boundaries in the annotation.

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Michael Dondrup44k
2
gravatar for swbarnes2
5.3 years ago by
swbarnes24.0k
United States
swbarnes24.0k wrote:

If you have a bed file of exonic regions, or gtf, something like that, you can use BEDTools to filter your .bam for reads that fall in the desired coordinates, using intersectBed

ADD COMMENTlink written 5.3 years ago by swbarnes24.0k

Thanks. It will be like this? intersectBed -abam s1.bam -b hg19ensembl.gtf > s1.filter.bam

ADD REPLYlink written 5.3 years ago by camelbbs630
1
gravatar for Alex Reynolds
5.3 years ago by
Alex Reynolds25k
Seattle, WA USA
Alex Reynolds25k wrote:

You can easily use BEDOPS to solve this problem quickly. It includes bedops and various conversion scripts for putting data into BED format, which bedops can process.

Assuming your reads are in BAM format:

$ bam2bed < reads.bam \
    | bedops --not-element-of -1 - introns.bed \
    > reads-not-in-introns.bed

The file reads-not-in-introns.bed is a sorted BED file containing all reads that do not overlap intronic elements.

You can then pass this result to bedmap to do counting of reads over other region sets (whole-genome or subsets).

Note that we assume your introns are in BED format and are sorted, e.g.:

$ sort-bed unsorted-introns.bed > introns.bed

Alternatively, if your introns are in some other format — say, GTF — then BEDOPS conversion scripts will losslessly turn them into sorted BED, e.g.:

$ gtf2bed < introns.gtf > introns.bed
ADD COMMENTlink written 5.3 years ago by Alex Reynolds25k

A very useful tool. Thanks a lot.

ADD REPLYlink written 5.3 years ago by camelbbs630
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1402 users visited in the last hour