I downloaded a BAM file for chr20 from NCBI (SRR1976036). This is a NA12878 sample.
I wanted to do some variant calling with freebayes and got the following error.
could not find SM: in @RG tag
After some investigation i found that my BAM file does not have a RG tag.
@HD VN:1.2 SO:coordinate @SQ SN:CM000663.1 LN:249250621 @SQ SN:CM000664.1 LN:243199373 @SQ SN:CM000665.1 LN:198022430 ...... @SQ SN:GL000248.1 LN:39786 @SQ SN:GL000249.1 LN:38502 @RG ID:None
I looked around the internet for an answer and i though i found an answer using Picard function AddOrReplaceReadGroups.
So i tried the following
java -jar /home/user/Downloads/picard.jar AddOrReplaceReadGroups I=SRR1976036_chr20.bam O=036_RG.bam RGID=4 RGLB=lib1 RGPL=illumina RGPU=unit1 RGSM=test
However i got the following message:
Exception in thread "main" htsjdk.samtools.SAMFormatException: Error parsing SAM header. @RG line missing SM tag. Line: @RG ID:None; File /home/user/NA12878/SRR1976036_chr20.bam; Line number 95
I looked around and couldn't find an answer.
it is the first time working with a BAM directly. I have used variant calling from FASTQ to VCF files and never got this problem.
Could someone tell me what i can do to add the information properly?
Kind regards Covux
how do i remove the @RG line?
You can remove it from the header if it is only there (and not in the reads).
Then run Picard again on
Note: It will fail if you have malformed RG in each read. In that case, post some of the initial reads
I just your command line and now i get a different error.
here are some of the reads i have in my file
Try this on original BAM
The do the variant calling directly on your.new.bam (DON'T run Picard).
would you mind telling me what the command line does? :)
Your read groups were malformed. In the actual reads, the
RG:Z:Nonetag means that your sample name is
"None". However, your header contains only ID tag in RG:
@RG ID:None(It must contain at least the SM=Sample tag). To reconcile, I added the sample info (and some other default info like PL=Platform=Illumina, SM=Sample etc.). You can see all details of @RG here https://software.broadinstitute.org/gatk/documentation/article.php?id=6472
samtools view -H your.bam => take the header of BAM
sed 's,^@RG.*,@RG\tID:None\tSM:None\tLB:None\tPL:Illumina,g' => Replace line starting with @RG to @RG\tID:None\tSM:None\tLB:None\tPL:Illumina
samtools reheader - your.bam => make new header with above changes
Many thanks for your explanation!