AddOrReplaceReadGroups.jar --RGLB (unique or the same label for different libraries sequenced on the same lane?)
1
0
Entering edit mode
9.1 years ago

I have 16 libraries that were multiplexed and sequenced on the same lane. When using AddOrReplaceReadGroups.jar, should I create a unique RGLB label for each library (e.g., lib1, lib2, lib3, etc.)? Apologies if this sounds like a silly question, but I have read that some folks use the same name (e.g., lib1) for the RGLB flag when libraries were sequenced on the same lane.

next-gen SNP alignment • 6.7k views
ADD COMMENT
0
Entering edit mode

Oh well, I decided to give each library a unique ID. I did, however, keep sample and read group IDs (RGSM and RGID) for a given library, but obviously diff for each library, is that acceptable?

ADD REPLY
1
Entering edit mode

Giving each one a unique ID is perfectly fine :) As long as each sample has its own RGSM and RGID (assuming you only sequenced each sample once, which is likely) then you're good to go.

ADD REPLY
0
Entering edit mode

This is what a piece of my batch file looks like (paired and unpaired):

java -jar AddOrReplaceReadGroups.jar \
I=sample1.bam \
O=sample1-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib1 \
RGID=sample1 \
RGSM=sample1 \
VALIDATION_STRINGENCY=SILENT

java -jar AddOrReplaceReadGroups.jar \
I=sample1-singles.bam \
O=sample1-singles-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib1 \
RGID=sample1-singles \
RGSM=sample1-singles \
VALIDATION_STRINGENCY=SILENT

#####
java -jar AddOrReplaceReadGroups.jar \
I=sample2.bam \
O=sample2-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib2 \
RGID=sample2 \
RGSM=sample2 \
VALIDATION_STRINGENCY=SILENT

java -jar AddOrReplaceReadGroups.jar \
I=sample2-singles.bam \
O=sample2-singles-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib2 \
RGID=sample2-singles \
RGSM=sample2-singles \
VALIDATION_STRINGENCY=SILENT

Are we on the same page?

ADD REPLY
1
Entering edit mode

We're now on neighbouring pages, but in the same chapter. Are the "-singles" (A) just orphans from trimming or did you (B) run the same libraries as single and then paired-end or (C) make different libraries, one for a single-end and the other for a paired-end run?

The proper method for each:

A. These should be in the same BAM file, with the same ID/SM/etc.

B. They should have the same RGSM and RGLB, but a different RGID.

C. They should have different RGLB and RGID, but the same RGSM.

In any case, your usage of RGLB is fine in the cases you showed.

ADD REPLY
0
Entering edit mode

Oh, these are just orphans from trimming (A.). For clarity, this is how I am doing things (please correct me if I am wrong):

  1. Mapped trimmed reads to reference, paired and orphans, separately with aln, sampe and samse bwa sub-commands; then converted sam to bam, resulting in sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam)
  2. AddOrReplaceReadGroups.jar as mentioned.
  3. Mark and remove duplicates via MarkDuplicates on sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam
  4. Merge sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam via MergeSamFiles
  5. Realign on merged.bam, call snps, recalibrate, etc.
ADD REPLY
1
Entering edit mode

Looks good, happy variant calling!

ADD REPLY
0
Entering edit mode

Ah, now I am confused. Basically I need help with this part of the command:

I=sample1.bam \
O=sample1-RG.bam \
RGLB=Lib1 \
RGID=sample1 \
RGSM=sample1 \

I=sample1-singles.bam \
O=sample1-singles-RG.bam \
RGLB=Lib1 \
RGID=sample1-singles \
RGSM=sample1-singles \

What should be different or the same? As mentioned, “singles” refers to orphans (reads that lost their mate). Whatever I do to this particular library, I will obviously carry out with the other 15 libraries.

ADD REPLY
1
Entering edit mode

For paired-end and orphans from the same sample and library, RGLB and RGSM should be identical. I would also make RGID the same, but that's because I'd put them in the same file. Obviously if they're in different files then they should have different RGIDs.

ADD REPLY
0
Entering edit mode

Awesome, thanks for clarifying. For paired-end and orphans from the same sample and library (like above example), I will make RGLB and RGSM the same (ie, RGSM=sample1, RGSM=sample1, RGLB=Lib1, RGLB=Lib1), but keep RGID different (i.e., RGID=sample1, RGID=sample1-singles).

ADD REPLY
2
Entering edit mode
9.1 years ago

I assume that they were multiplexed rather than being pooled. In any case it won't much matter if you give them the same library ID (I would recommend something that includes the date of library construction) or not, since the sample IDs differ. Should you ever run a single sample multiple times, then the library ID starts to become important (e.g., PCR duplicates can only exist within, but not between, libraries).

ADD COMMENT
0
Entering edit mode

-----------

ADD REPLY
0
Entering edit mode

Ah, please see my reply above (I must have moved your answer to a comment as you realized you'd meant it to be a comment!).

ADD REPLY

Login before adding your answer.

Traffic: 1454 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6