4.0 years ago by
The documentation for the IntervalList format is somewhat hard to find. From the linked page:
Represents a list of intervals against a reference sequence that can be written to and read from a file. The file format is relatively simple and reflects the SAM alignment format to a degree. A SAM style header must be present in the file which lists the sequence records against which the intervals are described. After the header the file then contains records one per line in text format with the following values tab-separated: Sequence name, Start position (1-based), End position (1-based, end inclusive), Strand (either + or -), Interval name (an, ideally unique, name for the interval),
So the first thing you need to do is get the header from your SAM/BAM file:
samtools view -H [your.bam] > intervalList.txt
If your GTF file is standard and we assume that it contains only ribosomal intervals, then we need the first, fourth, fifth, seventh, and ninth fields from the file. We can append them onto our text file which contains the header:
cut -s -f 1,4,5,7,9 [your.gtf] >> intervalListBody.txt
This is a very basic approach and you'll probably want to modify it somewhat for your specific needs, but hopefully it's a good start.