Create Interval List for Picard CollectRNASeqmetrics
3
2
Entering edit mode
9.4 years ago
Harshal ▴ 60

Hi,

How to create ribosomal RNA Interval list for to be used with picard CollectRNASeqMetrics?

I have mapped reads to Drosophila_melanogaster Ensembl Genome with Ensembl GTF. I need to identify the percent of reads mapping to ribosomal RNA? Is there a way to create interval list from GTF file?

Thanks!

next-gen RNA-Seq • 11k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
5
Entering edit mode
9.4 years ago
Dan D 7.4k

The documentation for the IntervalList format is somewhat hard to find. From the linked page:

Represents a list of intervals against a reference sequence that can be written to and read from a file. The file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file which lists the sequence records against which the intervals are described. After the header the file then contains records one per line in text format with the following values tab-separated: Sequence name, Start position (1-based), End position (1-based, end inclusive), Strand (either + or -), Interval name (an, ideally unique, name for the interval),

So the first thing you need to do is get the header from your SAM/BAM file:

samtools view -H [your.bam] > intervalList.txt

If your GTF file is standard and we assume that it contains only ribosomal intervals, then we need the first, fourth, fifth, seventh, and ninth fields from the file. We can append them onto our text file which contains the header:

cut -s -f 1,4,5,7,9 [your.gtf] >> intervalListBody.txt

This is a very basic approach and you'll probably want to modify it somewhat for your specific needs, but hopefully it's a good start.

ADD COMMENT
0
Entering edit mode

Thanks Deedee !! It worked!

ADD REPLY
0
Entering edit mode
9.4 years ago
Kamil ★ 2.3k

You can see my ribosomal intervals file and a simple script I used to create it here:

Related questions:

ADD COMMENT
0
Entering edit mode
8.5 years ago
lhaiyan3 ▴ 80

Hi, Kamil:

Thanks for the post. For the rRNA.interval file, you use the 1, 4, 5, 7 and 9 fields from the file. Can you please also tell me how to get the genes.interval and exons.interval file? I want to have the human and mouse genes, exons, rRNA.intervals. I download the gtf file from Ensembl. Thanks very much.

HY

ADD COMMENT
1
Entering edit mode

You'll have to modify line 43 to say "exon" or "gene" instead of "transcript".

ADD REPLY

Login before adding your answer.

Traffic: 3287 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6