Question

Picard CollectSequencingArtifactMetrics error

0

Entering edit mode

5.1 years ago

jrleary ▴ 220

I'm analyzing whole exam sequencing data, and using Picard to perform some QC on my aligned / sorted / duplicated removed .bam files. When running picard CollectSequencingArtifactMetrics, I receive the following error:

Exception in thread "main" picard.PicardException: Record contains library that is missing from header: UnknownLibrary

WES data processing / analysis is not my main area of expertise at all, and I'm not super familiar with the .sam / .bam formats. Does anyone have any idea what could be causing this? I've run other picard functions, such as SortSam and MarkDuplicates without errors, so I'm pretty confused.

picard WES • 2.6k views

ADD COMMENT • link 5.1 years ago by jrleary ▴ 220

score 1 · Answer 1 · 2020-05-22

1

Entering edit mode

5.1 years ago

Pierre Lindenbaum 166k

This error happens when a read is not associated to a read group

1) did you declare the read group in the header @RG

2) do you have any read without the RG attribute ?

ADD COMMENT • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

So I ran picard ValidateSamFile and did receive the error:

ERROR:MISSING_READ_GROUP

The .bam file was generated automatically as output from bwa mem, first in .sam format and then converted to .bam using picard SamFormatConverter. So, I'm guessing that I didn't declare the read group in the header (unless it's done automatically) and I'm not sure how to check if I have a read without the RG attribute.

ADD REPLY • link 5.1 years ago by jrleary ▴ 220

0

Entering edit mode

you should/must have specified some read-groups. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

ADD REPLY • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Specified them at what step? I ran samtools view -H | grep '^@RG' and got nothing in return, which I'm guessing means I failed to specify them at some point when I was supposed to.

EDIT: might this have something to do with how I combined the R[1-2]_L001.fastq & R[1-2]_L002.fastq files? I used:

zcat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq

to combine the lanes for both paired reads.

ADD REPLY • link 5.1 years ago by jrleary ▴ 220

1

Entering edit mode

better/faster:

cat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq.gz

bwa can use gzipped fastq files

Furthermore, you can parallelize things by mapping each fastq and merge the bam later.

ADD REPLY • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

At the time of original alignment. Could add now: Adding Read Groups To Bam Files

Plain cat works for combining lane specific files.

ADD REPLY • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

Thank you so much; I am a little bit confused about what string I should use as the read group argument? Can it be an arbitrary name or does it need to follow a certain naming convention?

ADD REPLY • link 5.1 years ago by jrleary ▴ 220

0

Entering edit mode

it can be any string. You can use the sample name.

ADD REPLY • link 5.1 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

This worked; I would add though that to use the -R flag, one needs to enclose the string in single quotes, like '@RG\tID:$samplename' instead of double quotes, like "@RG\tID:$samplename", otherwise it will not work.

ADD REPLY • link 5.1 years ago by jrleary ▴ 220