0
0
Entering edit mode
13 months ago
priya.bmg ▴ 20

Hello

I am trying to run the following command to align the read with reference using bwa-mem, but, keep getting the error:

 [E::bwa_set_rg] no ID at the read group line


This is the header of my fastq file:

@A00684:110:H2TYCDMXY:1:1101:2790:1000 1:N:0:TGAAGGTGAA+AACGAGGCGT


This is command used to run the alignment

bwa-mem2 mem -t 8 -R @RG\tID:A00684\tLB:110\tPL:ILLUMINA\tSM:H2TYCDMXY\ /scicore/home/cichon/GROUP/bwa-mem2/gch38.fa DE98NGSUKBD117612_1_1.fq DE98NGSUKBD117612_1_2.fq > aligned.sam


I am bit lost. What is the tID, t LB, tPM and t SM in the fastq file. How to get the read group info frm fastq file

Thanks

Priya

info read BWA group bwa-mem2 • 656 views
0
Entering edit mode

Read group information can be found via GATK help pages. Read groups are not absolute. You may need to make up some of the strings yourself. What is critical is that replicates should be clearly identifiable (e.g. SM group).

0
Entering edit mode

Thank you. I referred the GATK page. Could you provide an example of how to get the SM group info?. How to get the DNA library preparation identifier info, LB?

Priya

1
Entering edit mode

You know which samples you are working with so use the right names. This is clearly described in link above for SM section.

GATK tools treat all read groups with the same SM value as containing sequencing data for the same sample, and this is also the name that will be used for the sample column in the VCF file. Therefore it is critical that the SM field be specified correctly. When sequencing pools of samples, use a pool name instead of an individual sample name.

For LB if same library was run on multiple lanes then the name you use just need to be identical.

MarkDuplicates uses the LB field to determine which read groups might contain molecular duplicates, in case the same DNA library was sequenced on multiple lanes.