AddOrReplaceReadGroups.jar --RGLB (unique or the same label for different libraries sequenced on the same lane?)
1
0
Entering edit mode
7.4 years ago

I have 16 libraries that were multiplexed and sequenced on the same lane. When using AddOrReplaceReadGroups.jar, should I create a unique RGLB label for each library (e.g., lib1, lib2, lib3, etc.)? Apologies if this sounds like a silly question, but I have read that some folks use the same name (e.g., lib1) for the RGLB flag when libraries were sequenced on the same lane.

next-gen SNP alignment • 6.0k views
0
Entering edit mode

Oh well, I decided to give each library a unique ID. I did, however, keep sample and read group IDs (RGSM and RGID) for a given library, but obviously diff for each library, is that acceptable?

1
Entering edit mode

Giving each one a unique ID is perfectly fine :) As long as each sample has its own RGSM and RGID (assuming you only sequenced each sample once, which is likely) then you're good to go.

0
Entering edit mode

This is what a piece of my batch file looks like (paired and unpaired):

java -jar AddOrReplaceReadGroups.jar \
I=sample1.bam \
O=sample1-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib1 \
RGID=sample1 \
RGSM=sample1 \
VALIDATION_STRINGENCY=SILENT

I=sample1-singles.bam \
O=sample1-singles-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib1 \
RGID=sample1-singles \
RGSM=sample1-singles \
VALIDATION_STRINGENCY=SILENT

#####
I=sample2.bam \
O=sample2-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib2 \
RGID=sample2 \
RGSM=sample2 \
VALIDATION_STRINGENCY=SILENT

I=sample2-singles.bam \
O=sample2-singles-RG.bam \
SORT_ORDER=coordinate \
RGPL=illumina \
RGLB=Lib2 \
RGID=sample2-singles \
RGSM=sample2-singles \
VALIDATION_STRINGENCY=SILENT


Are we on the same page?

1
Entering edit mode

We're now on neighbouring pages, but in the same chapter. Are the "-singles" (A) just orphans from trimming or did you (B) run the same libraries as single and then paired-end or (C) make different libraries, one for a single-end and the other for a paired-end run?

The proper method for each:

A. These should be in the same BAM file, with the same ID/SM/etc.

B. They should have the same RGSM and RGLB, but a different RGID.

C. They should have different RGLB and RGID, but the same RGSM.

In any case, your usage of RGLB is fine in the cases you showed.

0
Entering edit mode

Oh, these are just orphans from trimming (A.). For clarity, this is how I am doing things (please correct me if I am wrong):

1. Mapped trimmed reads to reference, paired and orphans, separately with aln, sampe and samse bwa sub-commands; then converted sam to bam, resulting in sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam)
3. Mark and remove duplicates via MarkDuplicates on sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam
4. Merge sample1-singles.bam, sample1.bam,...,sample16-singles.bam, sample16.bam via MergeSamFiles
5. Realign on merged.bam, call snps, recalibrate, etc.
1
Entering edit mode

Looks good, happy variant calling!

0
Entering edit mode

Ah, now I am confused. Basically I need help with this part of the command:

I=sample1.bam \
O=sample1-RG.bam \
RGLB=Lib1 \
RGID=sample1 \
RGSM=sample1 \

I=sample1-singles.bam \
O=sample1-singles-RG.bam \
RGLB=Lib1 \
RGID=sample1-singles \
RGSM=sample1-singles \

What should be different or the same? As mentioned, “singles” refers to orphans (reads that lost their mate). Whatever I do to this particular library, I will obviously carry out with the other 15 libraries.

1
Entering edit mode

For paired-end and orphans from the same sample and library, RGLB and RGSM should be identical. I would also make RGID the same, but that's because I'd put them in the same file. Obviously if they're in different files then they should have different RGIDs.

0
Entering edit mode

Awesome, thanks for clarifying. For paired-end and orphans from the same sample and library (like above example), I will make RGLB and RGSM the same (ie, RGSM=sample1, RGSM=sample1, RGLB=Lib1, RGLB=Lib1), but keep RGID different (i.e., RGID=sample1, RGID=sample1-singles).

2
Entering edit mode
7.4 years ago

I assume that they were multiplexed rather than being pooled. In any case it won't much matter if you give them the same library ID (I would recommend something that includes the date of library construction) or not, since the sample IDs differ. Should you ever run a single sample multiple times, then the library ID starts to become important (e.g., PCR duplicates can only exist within, but not between, libraries).

0
Entering edit mode

-----------

0
Entering edit mode

Ah, please see my reply above (I must have moved your answer to a comment as you realized you'd meant it to be a comment!).