Question

Read group info

0

Entering edit mode

2.7 years ago

priya.bmg ▴ 60

Hello

I am trying to run the following command to align the read with reference using bwa-mem, but, keep getting the error:

 [E::bwa_set_rg] no ID at the read group line

This is the header of my fastq file:

@A00684:110:H2TYCDMXY:1:1101:2790:1000 1:N:0:TGAAGGTGAA+AACGAGGCGT

This is command used to run the alignment

bwa-mem2 mem -t 8 -R @RG\tID:A00684\tLB:110\tPL:ILLUMINA\tSM:H2TYCDMXY\ /scicore/home/cichon/GROUP/bwa-mem2/gch38.fa DE98NGSUKBD117612_1_1.fq DE98NGSUKBD117612_1_2.fq > aligned.sam

I am bit lost. What is the tID, t LB, tPM and t SM in the fastq file. How to get the read group info frm fastq file

Thanks

Priya

info read BWA group bwa-mem2 • 1.6k views

ADD COMMENT • link updated 2.7 years ago by GenoMax 141k • written 2.7 years ago by priya.bmg ▴ 60

0

Entering edit mode

Read group information can be found via GATK help pages. Read groups are not absolute. You may need to make up some of the strings yourself. What is critical is that replicates should be clearly identifiable (e.g. SM group).

ADD REPLY • link 2.7 years ago by GenoMax 141k

0

Entering edit mode

Thank you. I referred the GATK page. Could you provide an example of how to get the SM group info?. How to get the DNA library preparation identifier info, LB?

Priya

ADD REPLY • link 2.7 years ago by priya.bmg ▴ 60

1

Entering edit mode

You know which samples you are working with so use the right names. This is clearly described in link above for SM section.

GATK tools treat all read groups with the same SM value as containing sequencing data for the same sample, and this is also the name that will be used for the sample column in the VCF file. Therefore it is critical that the SM field be specified correctly. When sequencing pools of samples, use a pool name instead of an individual sample name.

For LB if same library was run on multiple lanes then the name you use just need to be identical.

MarkDuplicates uses the LB field to determine which read groups might contain molecular duplicates, in case the same DNA library was sequenced on multiple lanes.

ADD REPLY • link 2.7 years ago by GenoMax 141k