Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None
2
0
Entering edit mode
4.7 years ago
Covux ▴ 10

Hey,

I downloaded a BAM file for chr20 from NCBI (SRR1976036). This is a NA12878 sample.

I wanted to do some variant calling with freebayes and got the following error.

could not find SM: in @RG tag

After some investigation i found that my BAM file does not have a RG tag.

 @HD    VN:1.2  SO:coordinate
@SQ SN:CM000663.1   LN:249250621
@SQ SN:CM000664.1   LN:243199373
@SQ SN:CM000665.1   LN:198022430

......
@SQ SN:GL000248.1   LN:39786
@SQ SN:GL000249.1   LN:38502
@RG ID:None

I looked around the internet for an answer and i though i found an answer using Picard function AddOrReplaceReadGroups.

So i tried the following

java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test

However i got the following message:

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95

I looked around and couldn't find an answer.

it is the first time working with a BAM directly. I have used variant calling from FASTQ to VCF files and never got this problem.

Could someone tell me what i can do to add the information properly?

Kind regards Covux

Picard NA12878 • 9.8k views
ADD COMMENT
3
Entering edit mode
4.7 years ago

The @RG line in the bam-header looks malformed. Remove that and then try Piccard

ADD COMMENT
0
Entering edit mode

how do i remove the @RG line?

ADD REPLY
0
Entering edit mode

You can remove it from the header if it is only there (and not in the reads).

samtools view -H your.bam | grep -v "^@RG" | samtools reheader - your.bam > your.new.bam

Then run Picard again on your.new.bam.

Note: It will fail if you have malformed RG in each read. In that case, post some of the initial reads

ADD REPLY
0
Entering edit mode

I just your command line and now i get a different error.

Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 1, Read name SRR1976036.630013850.None, RG ID on SAMRecord not found in header: None

here are some of the reads i have in my file

 SRR1976036.630013850.None  163 CM000682.1  59993   60  88M =   60001   96  GTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAG    B<B7BBFFFIFBFBF<B<FBB7BFFFFFBBF'B<BFFFBF<<BB<F<BFF7<BFBBFFB<<BBBB<7<B<<<<BBB<<<'77BB'<B<    RG:Z:None   BX:Z:TATGCGAGGCTGTG-1   NH:i:1  NM:i:8

   SRR1976036.630013848.None    83  CM000682.1  59999   60  88M =   60000   88  CAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTGGTA    BFFFFFFFFFFFFFFFBFFFFFFFFFFFFFIFFB0IIIIIIIFIIIIIIIIIIFIIIFFFBIIIFIIIIIIFFBFFFFFIIIFIIFFF    RG:Z:None   BX:Z:GCTTGACAAAGATC-1   NH:i:1  NM:i:2

    SRR1976036.630013849.None   16  CM000682.1  60000   60  4S84M   *   0   0   ACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACAGGTG    BFFFFFFFFFFFFFFFFFFFFFFFFFFIIIIIIIIFB<IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFIIIFIIIIIIIIIFFF    RG:Z:None   BX:Z:CAACGTGTTCCGTA-1   NH:i:1  NM:i:1

    SRR1976036.697073837.None   81  CM000682.1  60000   60  8S80M   CM000685.1  137107070   0   GGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAACA    BFFFFFFFFFFFFFFFFFFFFFFFFFFFFIIIIIIFIFFFB<IIIIIIIIIIIIIIIFIIIFFIIIIIIIIIIIIIFFFFFFFIIFFF    RG:Z:None   BX:Z:TGACCAGTATACAG-1   NH:i:1  NM:i:1

    SRR1976036.630013888.None   163 CM000682.1  60000   60  10S78M  =   60259   347 GAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAAA    FFFFFIFIIIIIIIIIIIIII0BFFFFFIIFIIFIIIFIIIIIIIIIIIIIIFIBFIFFFFFBFFFFFFFFFFFFBFFFFBFFFFFFF    RG:Z:None   NH:i:1  NM:i:1
    SRR1976036.697073835.None   81  CM000682.1  60000   60  11S77M  CM000685.1  137107070   0   AGAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGCCCTAA    BFFFFFFBFFFFFFFFFFFFFFFFFFFFIIIIIIIIIIIIIIFFBIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFFFFFF    RG:Z:None   BX:Z:CTGAACAGGCTCGA-1   NH:i:1  NM:i:1

    SRR1976036.630013848.None   163 CM000682.1  60000   60  16S72M  =   59999   -88 AATAGAGAGGTGACTCAGATCCAGAGGTGGAAGAGGAAGGAAGCTTGGAACCCTATAGAGTTGCTGAGTGCCAGGACCAGATCCTGGC    FFFIIFFFIBFFFIFFIIFIIIIIIIF'<FFFFIIIIIIIIFFFFIIIIIIIIIFFFFFFBFFFFFFF<BBFFBBBBBFFFFFFFFFF    RG:Z:None   BX:Z:GCTTGACAAAGATC-1   NH:i:1  NM:i:1
ADD REPLY
2
Entering edit mode

Try this on original BAM

samtools view -H your.bam | sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' |  samtools reheader - your.bam > your.new.bam

The do the variant calling directly on your.new.bam (DON'T run Picard).

ADD REPLY
0
Entering edit mode

it worked!

would you mind telling me what the command line does? :)

ADD REPLY
0
Entering edit mode

Your read groups were malformed. In the actual reads, the RG:Z:None tag means that your sample name is "None". However, your header contains only ID tag in RG: @RG ID:None (It must contain at least the SM=Sample tag). To reconcile, I added the sample info (and some other default info like PL=Platform=Illumina, SM=Sample etc.). You can see all details of @RG here https://software.broadinstitute.org/gatk/documentation/article.php?id=6472

ADD REPLY
0
Entering edit mode

samtools view -H your.bam => take the header of BAM

sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' => Replace line starting with @RG to @RG\tID:None\tSM:None\tLB:None\tPL:Illumina

samtools reheader - your.bam => make new header with above changes

ADD REPLY
0
Entering edit mode

Many thanks for your explanation!

ADD REPLY
0
Entering edit mode
4.7 years ago
 However i got the following message:

Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95

of course, because from what you've done:

java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test

the new bam carrying the read groups is now called 036_RG.bam

ADD COMMENT
0
Entering edit mode

So you are telling me it did succeed? and now a new file called 036_RG.bam is created? ( which is my intention.)

But i cannot find a file called 036_RG.bam that also lets me believe i did something wrong.

ADD REPLY

Login before adding your answer.

Traffic: 2428 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6