Question: Picard CollectSequencingArtifactMetrics error
0
gravatar for jrleary
5 days ago by
jrleary110
Lineberger Comprehensive Cancer Center
jrleary110 wrote:

I'm analyzing whole exam sequencing data, and using Picard to perform some QC on my aligned / sorted / duplicated removed .bam files. When running picard CollectSequencingArtifactMetrics, I receive the following error:

Exception in thread "main" picard.PicardException: Record contains library that is missing from header: UnknownLibrary

WES data processing / analysis is not my main area of expertise at all, and I'm not super familiar with the .sam / .bam formats. Does anyone have any idea what could be causing this? I've run other picard functions, such as SortSam and MarkDuplicates without errors, so I'm pretty confused.

wes picard • 81 views
ADD COMMENTlink written 5 days ago by jrleary110
1
gravatar for Pierre Lindenbaum
5 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum128k wrote:

This error happens when a read is not associated to a read group

1) did you declare the read group in the header @RG

2) do you have any read without the RG attribute ?

ADD COMMENTlink written 5 days ago by Pierre Lindenbaum128k

So I ran picard ValidateSamFile and did receive the error:

ERROR:MISSING_READ_GROUP

The .bam file was generated automatically as output from bwa mem, first in .sam format and then converted to .bam using picard SamFormatConverter. So, I'm guessing that I didn't declare the read group in the header (unless it's done automatically) and I'm not sure how to check if I have a read without the RG attribute.

ADD REPLYlink written 5 days ago by jrleary110

you should/must have specified some read-groups. https://gatk.broadinstitute.org/hc/en-us/articles/360035890671-Read-groups

ADD REPLYlink written 5 days ago by Pierre Lindenbaum128k

Specified them at what step? I ran samtools view -H | grep '^@RG' and got nothing in return, which I'm guessing means I failed to specify them at some point when I was supposed to.

EDIT: might this have something to do with how I combined the R[1-2]_L001.fastq & R[1-2]_L002.fastq files? I used:

zcat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq

to combine the lanes for both paired reads.

ADD REPLYlink modified 5 days ago • written 5 days ago by jrleary110
1

better/faster:

cat sample_R1_L001.fastq.gz sample_R1_L002.fastq.gz >  sample_R1.fastq.gz

bwa can use gzipped fastq files

Furthermore, you can parallelize things by mapping each fastq and merge the bam later.

ADD REPLYlink modified 5 days ago • written 5 days ago by Pierre Lindenbaum128k

At the time of original alignment. Could add now: Adding Read Groups To Bam Files

Plain cat works for combining lane specific files.

ADD REPLYlink modified 5 days ago • written 5 days ago by genomax83k

Thank you so much; I am a little bit confused about what string I should use as the read group argument? Can it be an arbitrary name or does it need to follow a certain naming convention?

ADD REPLYlink written 5 days ago by jrleary110

it can be any string. You can use the sample name.

ADD REPLYlink written 5 days ago by Pierre Lindenbaum128k

This worked; I would add though that to use the -R flag, one needs to enclose the string in single quotes, like '@RG\tID:$samplename' instead of double quotes, like "@RG\tID:$samplename", otherwise it will not work.

ADD REPLYlink modified 14 hours ago • written 14 hours ago by jrleary110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1325 users visited in the last hour