Question: Running GATK DepthOfCoverage on BAM files with multiple RG's
0
gravatar for stefano.iantorno
4.7 years ago by
United Kingdom
stefano.iantorno70 wrote:

I am trying to run GATK DepthOfCoverage on some BAM files that I have merged from two original files (the same sample was sequenced on two lanes to maximize the number of reads). I realized after the fact that my merged file has reads with different read groups (as reflected by the RG field of each read), and that the header of my two original files differed in their @RG fields.

I have tried running samtools reheader adding a new @RG field in the header, but when I merge the two files each read group is based on the name of the two BAM files, not on the name of the @RG in the headers of the two BAM files.

For example, my two starting samples are:

27163.pe.markdup.bam
27091.pe.markdup.bam

but when I merge them using samtools merge

samtools merge merged.bam 27163.pe.markdup.bam 27091.pe.markdup.bam 

The resulting merged.bam has the same @RG field in the header as only one of the two, and each of the reads has a read name based on the name of the file it came from as such:

Read 1 will have RG:Z:27091.pe.markdup

Read 2 will have RG:Z:27163.pe.markdup

etcetera for the rest of the reads in the BAM

Am I doing something wrong? Should I rehead each of the original files before merging? Or simply rehead after merging to something that is compatible with GATK? It seems like no matter what the @RG field in the header is before merging, the merged file will always have reads with different RGs based on the name of the two input files.

I am also not sure what does GATK DepthOfCoverage want as input in terms of read groups. Does it want a single RG for all reads? In that case, should I use something different than samtools merge?

Thanks in advance for any help you can give me.

ADD COMMENTlink modified 4.7 years ago by Pierre Lindenbaum118k • written 4.7 years ago by stefano.iantorno70
1

For reference:

What Pierre suggested actually works.

I used Picard MergeSamFiles followed by AddOrReplaceReadGroups, specifying a new RGPL, RGLB, RGSM, and RGPU flag in the merged file. This will replace the @RG field for all reads in the merged BAM file with what I specified in AddOrReplaceReadGroups.

The output of this is compatible with GATK.

Hope this clears it up for whoever else finds themselves in my same situation.

ADD REPLYlink written 4.6 years ago by stefano.iantorno70
2
gravatar for Pierre Lindenbaum
4.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

don't merge with samtools but picard : http://picard.sourceforge.net/command-line-overview.shtml#MergeSamFiles

$ samtools  merge


Usage:   samtools merge [-nr] 
(...)

Note: Samtools' merge does not reconstruct the @RG dictionary in the header. Users
      must provide the correct header with -h, or uses Picard which properly maintains
      the header dictionary in merging.
ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Pierre Lindenbaum118k

Thanks, will give it a try. Are you referring to MergeSamFiles or MergeBamAlignment? I read somewhere that MergeSamFile is the one I should use, despite the name.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by stefano.iantorno70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2000 users visited in the last hour