I'm analyzing whole exam sequencing data, and using Picard to perform some QC on my aligned / sorted / duplicated removed .bam files. When running
picard CollectSequencingArtifactMetrics, I receive the following error:
Exception in thread "main" picard.PicardException: Record contains library that is missing from header: UnknownLibrary
WES data processing / analysis is not my main area of expertise at all, and I'm not super familiar with the .sam / .bam formats. Does anyone have any idea what could be causing this? I've run other
picard functions, such as
MarkDuplicates without errors, so I'm pretty confused.
So I ran
picard ValidateSamFileand did receive the error:
The .bam file was generated automatically as output from
bwa mem, first in .sam format and then converted to .bam using
picard SamFormatConverter. So, I'm guessing that I didn't declare the read group in the header (unless it's done automatically) and I'm not sure how to check if I have a read without the
you should/must have specified some read-groups. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups
Specified them at what step? I ran
samtools view -H | grep '^@RG'and got nothing in return, which I'm guessing means I failed to specify them at some point when I was supposed to.
EDIT: might this have something to do with how I combined the R[1-2]_L001.fastq & R[1-2]_L002.fastq files? I used:
to combine the lanes for both paired reads.
bwa can use gzipped fastq files
Furthermore, you can parallelize things by mapping each fastq and merge the bam later.
At the time of original alignment. Could add now: Adding Read Groups To Bam Files
catworks for combining lane specific files.
Thank you so much; I am a little bit confused about what string I should use as the read group argument? Can it be an arbitrary name or does it need to follow a certain naming convention?
it can be any string. You can use the sample name.
This worked; I would add though that to use the
-Rflag, one needs to enclose the string in single quotes, like
'@RG\tID:$samplename'instead of double quotes, like
"@RG\tID:$samplename", otherwise it will not work.