Question: Picard CollectWgsMetrics for each chromosome? Standard deviation of coverage per chromosome
4.6 years ago by

Hi II'm running Picard on some BAM files and noticed that the default output for CollectWgsMetrics is for the whole genome. I would like to find the standard deviation of the coverage for each chromosome.


Does anyone know how to make this possible? Even using a different program. I have the output from bedtools genomeCov but I do not know how to infer the standard deviation of read depth for each chromosome. 


ADD COMMENTlink modified 4.5 years ago by travcollier150 • written 4.6 years ago by QVINTVS_FABIVS_MAXIMVS2.2k

This would be a really nice feature for all the picard Collect* commands.  In our case, looking at the whole genome together isn't appropriate since some of the contigs are repeated elements (and some may well be contamination or other cruft).  A few select contigs would provide a better characterization.  Also, the current "all the 'genome'" approach doesn't make sense if you are competitively mapping to multiple genomes at once (doing WGS metagenomics for example).

IMO, the ideal implementation would be a command-line argument (lets call it CHROMS) where you give a list of chroms/contigs to process together as a single unit.  The default (null) would process all the chroms/contigs.

To generate stats for each chrom/contig, you could either run picard multiple times, specifying a single chrom/contig each time.  Probably not ideal, but it would be trivial to implement.
Alternatively, if the framework allows it, the CHROMS argument could be given multiple times, and the stats for each one would be output separately (and perhaps overall stats too).

Anyway, let me see if I can find the right place to send a feature request to.  You might want to do the same.

ADD REPLYlink written 4.6 years ago by travcollier150
gravatar for travcollier
4.5 years ago by
UC Davis
travcollier150 wrote:

I added INTERVAL_RANGE (taking an interval file) and INTERVAL_RANGE_STRING (taking a whitespace separated list of samtools-like chrom[:start-[end]] regions) to CollectWgsMetrics.  Hopefully these changes will get merged into the picard distribution soon.

In the meantime, the source is on github:
I've also uploaded CollectWgsMetrics.jar:

To generate per-chromosome output using my version of CollectWgsMetrics.jar (assuming you have the command parallel avaialbe):

ref.fa is your indexed reference sequence (index file is ref.fa.fai)
in.bam is your bam file... indexed if you want the commands to be fast

parallel --gnu --colsep '\t' -a ref.fa.fai java -Xmx4g -jar CollectWgsMetrics.jar INPUT=in.bam REFERENCE_SEQUENCE=ref.fa  OUTPUT="WGS_{1}.out" INTERVAL_LIST_STRING="{1}"  

You might need to change -Xmx4g if more or less memory is needed.

ADD COMMENTlink written 4.5 years ago by travcollier150

That would be a very useful addition to the tool.

ADD REPLYlink written 4.5 years ago by Dan D6.7k
gravatar for Dan D
4.6 years ago by
Dan D6.7k
Dan D6.7k wrote:

I would split the BAM by chromosome:

bamtools split -in [bamfile] -reference

And then run the Picard CollectWGSMetrics tool on each of the BAM files. The output of the tool is highly regular so it would be easy to then combine the results into a single text file afterwards.

ADD COMMENTlink written 4.6 years ago by Dan D6.7k
gravatar for Pierre Lindenbaum
4.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

I would use GATK DepthOfCoverage with a BED file containing one chromosome per line :  chrom/0/leng(chrom)

ADD COMMENTlink written 4.6 years ago by Pierre Lindenbaum119k

The last time I used DepthOfCoverage, it was really slow in comparison to Picard's complement. Also, I don't think it does the extensive filtering and reporting for duplicates, pass-filter reads, etc that CollectWgsMetrics performs. it's been a while since I've used the GATK tool though--perhaps it's been updated.

ADD REPLYlink written 4.6 years ago by Dan D6.7k
gravatar for travcollier
4.6 years ago by
UC Davis
travcollier150 wrote:

Apologies... I cannot seem to get this to actually work myself.  In theory it should work, but there is nastiness with the reference and bam header which I just can't seem to sort out.

You can stream individual regions to CollectWgsMetrics (thanks to Nils Homer for reminding me of that).

samtools view -b <in.bam> <region> | java -jar CollectWgsMetrics.jar I=/dev/stdin ...

You could just do that for each chromosome.

ADD COMMENTlink modified 4.5 years ago • written 4.6 years ago by travcollier150
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 845 users visited in the last hour