Question: Obtaining read group information from the Fastq file
0
gravatar for sktbanerjee1
3.4 years ago by
sktbanerjee130
sktbanerjee130 wrote:

Hello everyone, I am new in the field of computational biology and I am working with few paired end fastq files with the aim of prioritizing genomic variants but I am finding it very hard to understand how to get the read group information from the fastq header. here are two fastqc headers of paired end samples (whole exome sequence, Illumina)

@SN963:294:C847FACXX:1:1106:1077:2087 1:N:0:AGGCAGAA (File name -DYP26_blood_S3_L001_R1_001.fastq)
@SN963:294:C847FACXX:1:1106:1077:2087 2:N:0:AGGCAGAA (File name- DYP26_blood_S3_L001_R2_001.fastq)

will be really great if any one can explain me how to obtain the read group information.

readgroup fastq • 4.6k views
ADD COMMENTlink modified 3.4 years ago • written 3.4 years ago by sktbanerjee130

Actually, after QC I have aligned them using BWA-MEM. now I am to call variants using GATK haplotype caller, but before that I am to re calibrate the base quality scores using GATK BQSR. when I try to perform that task, I get an error "ERROR: ReadGroup information in the BAM header is not present". I need the read group information to resolve this issue I think. If you can tell me how to obtain read group information for this purpose it will be really helpful.

ADD REPLYlink written 3.4 years ago by sktbanerjee130

I see. So that is a different issue than the one you posted as original question.

Take a look at this thread for solutions using picard to add the read group information to your BAM files: GATK, SAM file doesn't have any read groups defined in the header

ADD REPLYlink modified 3.4 years ago • written 3.4 years ago by genomax87k

Thanks, for the help. I was wondering if including the read group information in the bwa-mem step would fix this? If, Yes, then how to find out the read group information.

ADD REPLYlink written 3.4 years ago by sktbanerjee130

It would. But you can also add that information to the existing bam files. Ask people who you are analyzing the data for to get the relevant bits you need to include in the groups. If no real info is available you could use some dummy fields as indicated in the thread above.

ADD REPLYlink written 3.4 years ago by genomax87k

Thanks a lot for your replies.

ADD REPLYlink written 3.4 years ago by sktbanerjee130
2
gravatar for genomax
3.4 years ago by
genomax87k
United States
genomax87k wrote:

There is no group information in fastq header (if you are thinking of SAM format read groups).

Illumina fastq headers are explained in this WikiPedia entry.

ADD COMMENTlink written 3.4 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 654 users visited in the last hour