Adding read group to bam files from multiplexed samples
0
0
Entering edit mode
5.7 years ago
serpalma.v ▴ 80

Hello

I have 60 samples (samp1...samp60), each one was barcoded and then pooled (10 samples/pool, 6 pools).

Each pool was sequenced in 9 lanes.

This leads to 1080 fastq files ( 60 samples * 9 lanes * 2 (PE) ) and 540 bam files.

I want to do variant calling with GATK.

I went through these two very informative posts:

https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

Read Group In Sam/Bam Files: What Do They Exactly Describe?

Accordingly, I am trying to define the read groups for each bam file, as follows.

  • ID: flowcell ID and lane ID (i.e. HNTW5BBXX_1)
  • SM: the name of the sample (i.e. samp31)
  • PL: ILLUMINA
  • LB: lib_samp31
  • PI: insert size (i.e. 200)
  • PU: flowcell ID and lane ID and sample ID (i.e. HNTW5BBXX_1_samp31)

I would like to clarify the following:

  • Did I get something wrong interpreting the fields?
  • Could I exclude PU?, as it is not required by GATK, according to the link above. Do you usually include it anyway?

Thanks in advance!

bam picard gatk • 2.7k views
ADD COMMENT
0
Entering edit mode

Unless you have QC reasons to say that a lane did poorly, you should concatenate all 9 lanes together for each sample. Keeping them separate is doing you no favors. Merge the bams now before you do more.

ADD REPLY
0
Entering edit mode

I read here that keeping bams separated during pre-processing is reasonable. And also, the way I understood it, for each sample, every bam file corresponds to a different read group, as they are derived from reads produced by different lanes.

ADD REPLY
1
Entering edit mode

5 year old recommendations are no longer relevant, just concatenate the lanes together.

ADD REPLY
0
Entering edit mode

so then the read groups should be as follows:

  • ID: samp31
  • SM: samp31
  • PL: ILLUMINA
  • LB: samp31

Not sure about keepin PI and PU now...

Correct?

ADD REPLY

Login before adding your answer.

Traffic: 1907 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6