Question: Adding read group to bam files from multiplexed samples
0
gravatar for serpalma.v
2.0 years ago by
serpalma.v40
Germany
serpalma.v40 wrote:

Hello

I have 60 samples (samp1...samp60), each one was barcoded and then pooled (10 samples/pool, 6 pools).

Each pool was sequenced in 9 lanes.

This leads to 1080 fastq files ( 60 samples * 9 lanes * 2 (PE) ) and 540 bam files.

I want to do variant calling with GATK.

I went through these two very informative posts:

https://gatkforums.broadinstitute.org/gatk/discussion/6472/read-groups

Read Group In Sam/Bam Files: What Do They Exactly Describe?

Accordingly, I am trying to define the read groups for each bam file, as follows.

  • ID: flowcell ID and lane ID (i.e. HNTW5BBXX_1)
  • SM: the name of the sample (i.e. samp31)
  • PL: ILLUMINA
  • LB: lib_samp31
  • PI: insert size (i.e. 200)
  • PU: flowcell ID and lane ID and sample ID (i.e. HNTW5BBXX_1_samp31)

I would like to clarify the following:

  • Did I get something wrong interpreting the fields?
  • Could I exclude PU?, as it is not required by GATK, according to the link above. Do you usually include it anyway?

Thanks in advance!

bam picard gatk • 1.2k views
ADD COMMENTlink modified 22 months ago by Biostar ♦♦ 20 • written 2.0 years ago by serpalma.v40

Unless you have QC reasons to say that a lane did poorly, you should concatenate all 9 lanes together for each sample. Keeping them separate is doing you no favors. Merge the bams now before you do more.

ADD REPLYlink written 2.0 years ago by swbarnes28.1k

I read here that keeping bams separated during pre-processing is reasonable. And also, the way I understood it, for each sample, every bam file corresponds to a different read group, as they are derived from reads produced by different lanes.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by serpalma.v40
1

5 year old recommendations are no longer relevant, just concatenate the lanes together.

ADD REPLYlink written 2.0 years ago by Devon Ryan96k

so then the read groups should be as follows:

  • ID: samp31
  • SM: samp31
  • PL: ILLUMINA
  • LB: samp31

Not sure about keepin PI and PU now...

Correct?

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by serpalma.v40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 823 users visited in the last hour