Running GATK DepthOfCoverage on BAM files with multiple RG's
1
0
Entering edit mode
9.8 years ago

I am trying to run GATK DepthOfCoverage on some BAM files that I have merged from two original files (the same sample was sequenced on two lanes to maximize the number of reads). I realized after the fact that my merged file has reads with different read groups (as reflected by the RG field of each read), and that the header of my two original files differed in their @RG fields.

I have tried running samtools reheader adding a new @RG field in the header, but when I merge the two files each read group is based on the name of the two BAM files, not on the name of the @RG in the headers of the two BAM files.

For example, my two starting samples are:

27163.pe.markdup.bam
27091.pe.markdup.bam

but when I merge them using samtools merge

samtools merge merged.bam 27163.pe.markdup.bam 27091.pe.markdup.bam 

The resulting merged.bam has the same @RG field in the header as only one of the two, and each of the reads has a read name based on the name of the file it came from as such:

Read 1 will have RG:Z:27091.pe.markdup

Read 2 will have RG:Z:27163.pe.markdup

etcetera for the rest of the reads in the BAM

Am I doing something wrong? Should I rehead each of the original files before merging? Or simply rehead after merging to something that is compatible with GATK? It seems like no matter what the @RG field in the header is before merging, the merged file will always have reads with different RGs based on the name of the two input files.

I am also not sure what does GATK DepthOfCoverage want as input in terms of read groups. Does it want a single RG for all reads? In that case, should I use something different than samtools merge?

Thanks in advance for any help you can give me.

next-gen sequence software error • 8.4k views
ADD COMMENT
1
Entering edit mode

For reference:

What Pierre suggested actually works.

I used Picard MergeSamFiles followed by AddOrReplaceReadGroups, specifying a new RGPL, RGLB, RGSM, and RGPU flag in the merged file. This will replace the @RG field for all reads in the merged BAM file with what I specified in AddOrReplaceReadGroups.

The output of this is compatible with GATK.

Hope this clears it up for whoever else finds themselves in my same situation.

ADD REPLY
2
Entering edit mode
9.8 years ago

Don't merge with samtools but picard

$ samtools  merge


Usage:   samtools merge [-nr] 
(...)

Note: Samtools' merge does not reconstruct the @RG dictionary in the header. Users
      must provide the correct header with -h, or uses Picard which properly maintains
      the header dictionary in merging.
ADD COMMENT
0
Entering edit mode

Thanks, will give it a try. Are you referring to MergeSamFiles or MergeBamAlignment? I read somewhere that MergeSamFile is the one I should use, despite the name.

ADD REPLY

Login before adding your answer.

Traffic: 2580 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6