Question

How treat a Bam file with mutiple read groups (RG)

0

Entering edit mode

6.4 years ago

zamani_2012 • 0

Hi everyone, I am pretty new to the NGS data analysis. I have downloaded a WES dataset at BAM file format from SRA database. I try to mark duplicates in bam file using MarkDuplicates command at Picard . However, I encountered an error in terminal "error parsing sam header. @rg line missing sm tag". I tough that there is an error in @RG tag. So I run a command in terminal i.e. "samtools view -H SRR1693634_NC_000005.9.sorted.bam | grep '@RG' to see rg tag". Output is below

@RG ID:FGC0630.4.ACTGAT
@RG ID:FGC0639.8.ACTGAT
@RG ID:FGC0639.7.ACTGAT
@RG ID:FGC0639.4.ACTGAT
@RG ID:FGC0639.6.ACTGAT
@RG ID:FGC0639.5.ACTGAT

Now I have two questions:

What is meaning of the several read groups for a single sample? Are MarkDuplicates results for a sample with single RG are similar to the that sample with multiple RGs?
RGs in my sample lacks some necessary information such as RGID, RGLB, RGPL, RGPU, RGSM. How can I obtain and then add these information to the bam file for each RG? As you know AddOrReplaceReadGroups in Picard only treats sample with a single RG.

next-gen BAM GATK picard read group • 4.1k views

ADD COMMENT • link updated 6.4 years ago by Pierre Lindenbaum 166k • written 6.4 years ago by zamani_2012 • 0

score 1 · Answer 1 · 2019-03-06

. What is meaning of the several read groups for a single sample?

https://software.broadinstitute.org/gatk/documentation/article?id=6472

Are MarkDuplicates results for a sample with single RG are similar to the that sample with multiple RGs?

it depends: imagine two read for the same sample but made of two distinct DNA libraries : they cannot be considered as duplicate.

RGPL, RGPU, RGSM

Adding read group to bam files from multiplexed samples

most of the time, you only need SM (sample name) but it's always better if you can fill all the informations. see https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

How can I obtain and then add these information to the bam file for each RG?

extract the sam header with samtools view -H

use sed to add the information to the header,something like:

sed '/^@RG/s/FGC0630.4.ACTGAT/FGC0630.4.ACTGAT\tSM:sample1/'

use samtools reheader to update the bam.