Using Gatk If I Have No Read Group Information
2
1
Entering edit mode
12.1 years ago
Bioscientist ★ 1.7k

To use GATK for SNP-calling, we need read group info in the header. But what if we have no such info?

I've got several bam files from other collaborator without knowing any info of, say, which lane or libraries the sequence come from. In such circumstances, we simply regard all alignment in the bam file as from the same lane of same library?

And we just insert a line into the header like:

@RG ID:filename SM:filename LB:filename PL:Illumina

? thx

gatk • 6.8k views
ADD COMMENT
4
Entering edit mode
12.1 years ago

use picard AddOrReplaceReadGroups ( http://picard.sourceforge.net/command-line-overview.shtml#AddOrReplaceReadGroups ) to add a simple group to your bam.

ADD COMMENT
1
Entering edit mode

The read group information has to be added to each read (RG:Z:your_RG_ID) so it needs to reprocess the entire file. Per-read run groups allow a BAM file to contain multiple RGs, but it's extra work when fixing up BAMs with missing information.

ADD REPLY
0
Entering edit mode

Yeah, it's working. But my bam file is 100GB; and seems it's rebuilding the whole file, which takes a long time.

ADD REPLY
1
Entering edit mode
12.1 years ago
Andreas ★ 2.5k

You can use GATK's --default_read_group flag and set it to anything you'd like

Andreas

ADD COMMENT
0
Entering edit mode

Sorry I cannot find it....

ADD REPLY
0
Entering edit mode

Oops...seems that option is gone from newer versions of GATK. No idea why they would delete it

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6