Tutorial: Plotting the coverage and depth (Y-axis) statistics of a bam file along the genome (X-axis), using Samtools, awk and R.
3
gravatar for kirannbishwa01
7 months ago by
United States
kirannbishwa01590 wrote:

This is the same problem I recently had, but I was not able to find a good solution. I was finally able to work a method which I am sharing here. Specially for empirical biologists who know what to do but not what to do exactly this solution might be an opener. I hope this will be helpful for others.

Identification of the depth of coverage is quite useful in 1) identifying the regions that might have potential paralogous alignment 2) finding the coverage at your regions of interest.

Here is a step by step guide to the problem/question.

Step #1) First identify the depth at each locus from a bam file.

I have found samtools depth option more useful in this regard, when coverage at each locus is desired.

samtools depth deduped_MA605.bam > deduped_MA605.coverage

The output file 'deduped_MA605.coverage' file will have 3 columns (Chr#, position and depth at that position) like below.

Chr  position depth (this header will be absent though)
1       3980        66
1       3981        68
1      3982     67
1      3983        67
1      3984     68

Step #2) Now, select the coverage (depth) by locus for each chromosome and/or regions

We can use the coverage file to plot it in R. But, the file is so large that it will suck up almost all the memory. It better to split the coverage by chromosome (or region of the chromosome if required).

To select the coverage for a particular chromosome (Chr#1 in my case)

awk '$1 == 1 {print $0}' deduped_MA605.coverage > chr1_MA605.coverage

To select coverage from chr #2

awk '$1 == 2 {print $0}' deduped_MA605.coverage > chr2_MA605.coverage

If the chrosomosome has string characters it can be adjusted as

awk '$1 == "chr2" {print $0}' deduped_MA605.coverage > chr2_MA605.coverage

Step #3) To plot the data in R this coverage file will need to be imported and the headers need to be added.

MA605.chr2 <- read.table("/media/everestial007/SeagateBackup4.0TB/New_Alignment_Set/05-B-deDupedReads-forScanIndel/chr2_MA605.coverage",
         header=FALSE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)`

Note: The header of the column are automatically set as V1, V2 and V3.

To rename the headers

library(reshape) # loads the library to rename the column names

MA605.chr2<-rename(MA605.chr2,c(V1="Chr", V2="locus", V3="depth")) # renames the header

Now, plot the coverage by depth:

plot(MA605.chr2$locus, MA605.chr2$depth)

to get the wider/cleaner view of the plot use

library(lattice, pos=10) xyplot(depth ~ locus, type="p", pch=16, auto.key=list(border=TRUE), par.settings=simpleTheme(pch=16), scales=list(x=list(relation='same'), y=list(relation='same')), data=MA605.chr2, main="depth by locus - Chr2 (SampleMA605)")

The output plot looks like this: coverage depth by locus

Note:

  • We can see that there is a region around the middle and next to it that has a very high coverage. This likely suggests paralogous alignment.

  • Also, there is a gap in alignment. The region with no alignment is near (at) centromere where no consensus sequence has been found.

  • Reads may be later filtered by coverage.

I will post the details on how to filter by coverage in a week or more.

Thanks,

ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum101k • written 7 months ago by kirannbishwa01590
3
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum101k wrote:

a one-liner:

samtools depth -b <(echo -e '1\t231408092\t231408119') input.bam | cut -f 2,3 |\
gnuplot -e "set terminal png; set output 'out.png'; set title 'Coverage'; set xlabel 'position' ; set xtics rotate by 90 right ; set ylabel 'depth' ; plot '-'  using 2:xticlabels(1)  with lines notitle;"
ADD COMMENTlink modified 7 months ago • written 7 months ago by Pierre Lindenbaum101k

Hi Pierre, You script looks interesting. If you have some spare time, could you add some details about your script. Just requesting.

Thanks,

ADD REPLYlink written 7 months ago by kirannbishwa01590
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1102 users visited in the last hour