Question: Adding read group to bam files from multiplexed samples
0
gravatar for serpalma.v
14 months ago by
serpalma.v20
Germany
serpalma.v20 wrote:

Hello

I have 60 samples (samp1...samp60), each one was barcoded and then pooled (10 samples/pool, 6 pools).

Each pool was sequenced in 9 lanes.

This leads to 1080 fastq files ( 60 samples * 9 lanes * 2 (PE) ) and 540 bam files.

I want to do variant calling with GATK.

I went through these two very informative posts:

https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

Read Group In Sam/Bam Files: What Do They Exactly Describe?

Accordingly, I am trying to define the read groups for each bam file, as follows.

  • ID: flowcell ID and lane ID (i.e. HNTW5BBXX_1)
  • SM: the name of the sample (i.e. samp31)
  • PL: ILLUMINA
  • LB: lib_samp31
  • PI: insert size (i.e. 200)
  • PU: flowcell ID and lane ID and sample ID (i.e. HNTW5BBXX_1_samp31)

I would like to clarify the following:

  • Did I get something wrong interpreting the fields?
  • Could I exclude PU?, as it is not required by GATK, according to the link above. Do you usually include it anyway?

Thanks in advance!

bam picard gatk • 790 views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 14 months ago by serpalma.v20

Unless you have QC reasons to say that a lane did poorly, you should concatenate all 9 lanes together for each sample. Keeping them separate is doing you no favors. Merge the bams now before you do more.

ADD REPLYlink written 14 months ago by swbarnes26.5k

I read here that keeping bams separated during pre-processing is reasonable. And also, the way I understood it, for each sample, every bam file corresponds to a different read group, as they are derived from reads produced by different lanes.

ADD REPLYlink modified 14 months ago • written 14 months ago by serpalma.v20
1

5 year old recommendations are no longer relevant, just concatenate the lanes together.

ADD REPLYlink written 14 months ago by Devon Ryan92k

so then the read groups should be as follows:

  • ID: samp31
  • SM: samp31
  • PL: ILLUMINA
  • LB: samp31

Not sure about keepin PI and PU now...

Correct?

ADD REPLYlink modified 14 months ago • written 14 months ago by serpalma.v20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1511 users visited in the last hour