Get A Graphical Representation Of Number Of Reads In Bam File For A Certain Genomic Region
4
2
Entering edit mode
9.6 years ago
Whetting ★ 1.6k

Hi guys and girls,

One of my coworkers is scrambling to redo a figure for a manuscript under review. The reviewers want to see a graphical representation of the number of reads from an RNA seq experiment. My coworkers is a biologists and computers are not his strong suit. We received a BAM file from the company who did the sequeuncing. However, they are not very responsive about getting us the data needed for the figure. I am probably the most computer literate in the lab, but I have never dealt with next-gen sequencing data, so I ma completely at a loss.

From some of the other questions on the forum, I gathered that Samtools could to this. We are interested in plotting the number of reads in "Chr2: 212,000,000 – 213,800,000". I assume, you would do some kind of sliding window in order to bin the reads?

Any pointers would be greatly appreciated!

example of desired output

EDIT: thanks for all the help. Each of the proposed options seems to come with its own little problems. I am sure I'll figure it out though!

bam samtools • 6.0k views
1
Entering edit mode

But see the question Converting Bam To Bedgraph For Viewing On Ucsc? for some possibly helpful ideas.

4
Entering edit mode
9.6 years ago

upload your BAM (or a sub part of the BAM) in the UCSC as a custom track and download the figure(s) as PDF/PS.

1
Entering edit mode

I would go with this method, but I would first convert to a coverage plot to reduce the file sizes. I would recommend bedtools genomecov, using the -ibam flag to import the BAM file. With the resulting bedGraph output you can convert it to bigWig (search 'UCSC bigwig'). It is often problematic trying to directly view a BAM file in UCSC, plus a coverage plot is more aesthetically pleasing in a figure.

3
Entering edit mode
9.6 years ago

Try IGV, from the Broad institute. You can import indexed .bams to it, and zoom in on the gene, and see the reads pileup up on exons.

0
Entering edit mode

Could be a bit tricky given a 1.8Mb region. My IGV stops displaying alignments at regions > 0.1Mb roughly.

4
Entering edit mode

That setting can be changed under View -> Preferences -> Alignments -> Visibility range threshold. It is set at 30kb by default. With enough memory available you might be able to increase to 2000kb to encompass this region? I tried it on my laptop (for a bam with ~50X coverage) with 2MB of ram allocated for IGV and it was insufficient though. I'll try again later on a bigger memory machine and see if this is possible. Increasing downsampling might help also? But, given the challenges some of the other solutions suggested in this thread might be better.

0
Entering edit mode

Good point, thanks! That works on my workstation (one 5GB RNA-Seq BAM file using ~500MB main memory).

0
Entering edit mode

How would I get the image out of IGV?

0
Entering edit mode

File -> save image in IGV will save a snapshot of what every you are looking at in IGV. I often just take snapshots using mac's built in screenshot options.

3
Entering edit mode
9.6 years ago
Christof Winter ★ 1.0k

If you are interested in the number of reads per position to then plot them with a tool of your choice, samtools is indeed your friend:

samtools depth -r chr2:212,000,000-213,800,000 accepted_hits.bam > reads-per-position.txt


will produce

chr2    212107123    1
chr2    212107124    1
chr2    212107125    1
chr2    212107126    2
chr2    212107127    2
chr2    212107128    2
...


The last column contains the number of reads covering the given position. Note that positions with zero coverage are omitted. You could easily go ahead and turn the output into a BED file and then use bedtools intersect to restrict your positions exons only (using e.g. a GFF or GTF file containing the exon coordinates).

0
Entering edit mode

is this faster than samtools mpileup? That command gives depth at each position along with some additional information

0
Entering edit mode

For RNA-Seq BAM files that have N's in the CIGAR string to represent introns, samtools mpileup unfortunately also counts the intronic gaps in reads spanning two exons as read depth (which is normally not what you want). Also, by default it ignores reads with an unmapped mate (to get these, add the -A option).

3
Entering edit mode
9.6 years ago

what we normally do in our NGS pipeline is to generate a bigwig coverage file for each BAM file, which can be visualized directly on IGV. here is all the code needed:

genomeCoverageBed -bg -ibam input.bam > coverage.bedgraph
bedGraphToBigWig coverage.bedgraph human_hg19.fa.fai coverage.bigwig


you will have to have installed bedtools and the UCSC's bedGraphToBigWig, plus the index of the reference used (human_hg19.fa.fai in our case)

0
Entering edit mode

cool, how do you get the image out of IGV?

0
Entering edit mode

File -> save image. Or, take a screenshot.

1
Entering edit mode

true. IGV natively provides a PNG image as output, although it won't be as good as the PDF/PS image that Pierre mentioned.

1
Entering edit mode

You can save as SVG as well, so you'll get a vector graphic.

1
Entering edit mode

Yes. You can choose between .png, .jpeg, and .svg (under 'file format') when you save image.