Question: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None
0
gravatar for Covux
23 months ago by
Covux10
Covux10 wrote:

Hey,

I downloaded a BAM file for chr20 from NCBI (SRR1976036). This is a NA12878 sample.

I wanted to do some variant calling with freebayes and got the following error.

could not find SM: in @RG tag

After some investigation i found that my BAM file does not have a RG tag.

 @HD    VN:1.2  SO:coordinate
@SQ SN:CM000663.1   LN:249250621
@SQ SN:CM000664.1   LN:243199373
@SQ SN:CM000665.1   LN:198022430

......
@SQ SN:GL000248.1   LN:39786
@SQ SN:GL000249.1   LN:38502
@RG ID:None

I looked around the internet for an answer and i though i found an answer using Picard function AddOrReplaceReadGroups.

So i tried the following

java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test

However i got the following message:

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95

I looked around and couldn't find an answer.

it is the first time working with a BAM directly. I have used variant calling from FASTQ to VCF files and never got this problem.

Could someone tell me what i can do to add the information properly?

Kind regards Covux

na12878 picard • 3.5k views
ADD COMMENTlink modified 23 months ago by Pierre Lindenbaum117k • written 23 months ago by Covux10
2
gravatar for Santosh Anand
23 months ago by
Santosh Anand4.6k
Santosh Anand4.6k wrote:

The @RG line in the bam-header looks malformed. Remove that and then try Piccard

ADD COMMENTlink written 23 months ago by Santosh Anand4.6k

how do i remove the @RG line?

ADD REPLYlink written 23 months ago by Covux10

You can remove it from the header if it is only there (and not in the reads).

samtools view -H your.bam | grep -v "^@RG" | samtools reheader - your.bam > your.new.bam

Then run Picard again on your.new.bam.

Note: It will fail if you have malformed RG in each read. In that case, post some of the initial reads

ADD REPLYlink written 23 months ago by Santosh Anand4.6k

I just your command line and now i get a different error.

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name SRR1976036.630013850.None, RG ID on SAMRecord not found in header: None

here are some of the reads i have in my file

 SRR1976036.630013850.None  163 CM000682.1  59993   60  88M =   60001   96  GTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAG    B<B7BBFFFIFBFBF<B<FBB7BFFFFFBBF'B<BFFFBF<<BB<F<BFF7<BFBBFFB<<BBBB<7<B<<<<BBB<<<'77BB'<B<    RG:Z:None   BX:Z:TATGCGAGGCTGTG-1   NH:i:1  NM:i:8

   SRR1976036.630013848.None    83  CM000682.1  59999   60  88M =   60000   88  CAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTGGTA    BFFFFFFFFFFFFFFFBFFFFFFFFFFFFFIFFB0IIIIIIIFIIIIIIIIIIFIIIFFFBIIIFIIIIIIFFBFFFFFIIIFIIFFF    RG:Z:None   BX:Z:GCTTGACAAAGATC-1   NH:i:1  NM:i:2

    SRR1976036.630013849.None   16  CM000682.1  60000   60  4S84M   *   0   0   ACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTG    BFFFFFFFFFFFFFFFFFFFFFFFFFFIIIIIIIIFB<IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIFIIIIIIIIIFFF    RG:Z:None   BX:Z:CAACGTGTTCCGTA-1   NH:i:1  NM:i:1

    SRR1976036.697073837.None   81  CM000682.1  60000   60  8S80M   CM000685.1  137107070   0   GGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACA    BFFFFFFFFFFFFFFFFFFFFFFFFFFFFIIIIIIFIFFFB<IIIIIIIIIIIIIIIFIIIFFIIIIIIIIIIIIIFFFFFFFIIFFF    RG:Z:None   BX:Z:TGACCAGTATACAG-1   NH:i:1  NM:i:1

    SRR1976036.630013888.None   163 CM000682.1  60000   60  10S78M  =   60259   347 GAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAA    FFFFFIFIIIIIIIIIIIIII0BFFFFFIIFIIFIIIFIIIIIIIIIIIIIIFIBFIFFFFFBFFFFFFFFFFFFBFFFFBFFFFFFF    RG:Z:None   NH:i:1  NM:i:1
    SRR1976036.697073835.None   81  CM000682.1  60000   60  11S77M  CM000685.1  137107070   0   AGAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAA    BFFFFFFBFFFFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIFFBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFF    RG:Z:None   BX:Z:CTGAACAGGCTCGA-1   NH:i:1  NM:i:1

    SRR1976036.630013848.None   163 CM000682.1  60000   60  16S72M  =   59999   -88 AATAGAGAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGC    FFFIIFFFIBFFFIFFIIFIIIIIIIF'<FFFFIIIIIIIIFFFFIIIIIIIIIFFFFFFBFFFFFFF<BBFFBBBBBFFFFFFFFFF    RG:Z:None   BX:Z:GCTTGACAAAGATC-1   NH:i:1  NM:i:1
ADD REPLYlink modified 22 months ago • written 22 months ago by Covux10
1

Try this on original BAM

samtools view -H your.bam | sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' |  samtools reheader - your.bam > your.new.bam

The do the variant calling directly on your.new.bam (DON'T run Picard).

ADD REPLYlink written 22 months ago by Santosh Anand4.6k

it worked!

would you mind telling me what the command line does? :)

ADD REPLYlink written 22 months ago by Covux10

Your read groups were malformed. In the actual reads, the RG:Z:None tag means that your sample name is "None". However, your header contains only ID tag in RG: @RG ID:None (It must contain at least the SM=Sample tag). To reconcile, I added the sample info (and some other default info like PL=Platform=Illumina, SM=Sample etc.). You can see all details of @RG here https://software.broadinstitute.org/gatk/documentation/article.php?id=6472

ADD REPLYlink written 22 months ago by Santosh Anand4.6k

samtools view -H your.bam => take the header of BAM

sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' => Replace line starting with @RG to @RG\tID:None\tSM:None\tLB:None\tPL:Illumina

samtools reheader - your.bam => make new header with above changes

ADD REPLYlink written 22 months ago by Santosh Anand4.6k

Many thanks for your explanation!

ADD REPLYlink written 22 months ago by Covux10
0
gravatar for Pierre Lindenbaum
23 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum117k wrote:
 However i got the following message:

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95

of course, because from what you've done:

java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test

the new bam carrying the read groups is now called 036_RG.bam

ADD COMMENTlink written 23 months ago by Pierre Lindenbaum117k

So you are telling me it did succeed? and now a new file called 036_RG.bam is created? ( which is my intention.)

But i cannot find a file called 036_RG.bam that also lets me believe i did something wrong.

ADD REPLYlink written 23 months ago by Covux10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2276 users visited in the last hour