GATK best practices for Broad-produced NGS data
1
0
Entering edit mode
3.0 years ago

This is a more generalised question. I wish to discover variants in my raw WGS data which has been produced by studies carried out by Broad Institute. What modifications would you recommend for the standard GATK best practices workflow to better suit Broad-produced data ?

For e.g., while GATK/Broad strongly recommends recalibrating base qualities, this workshop says at the end that

All recent Broad‐produced data is already recalibrated

GATK4 NGS BroadInstitute VariantCalling • 745 views
ADD COMMENT
2
Entering edit mode

I think the GATK forum would be a more appropriate place for this question, but I have asked someone from GATK to take a look here via twitter.

ADD REPLY
0
Entering edit mode

Thank you ! GATK forums seems to be having some issues with enabling posting and commenting because of recent spam reports but I will retry posting there.

ADD REPLY
2
Entering edit mode
3.0 years ago
vdauwera ★ 1.1k

It depends what form of the data you’re starting from. If you’re starting from true raw WGS data, ie unmapped reads (in fastq or ubam) then you should follow the best practices as laid out in the GATK documentation. However if you’re starting from an aligned bam or cram file you received from the Broad’s Genomic Services, then you don’t need to do the pre-processing part and you can go straight to the variant calling part.

In general you can check what processing has been applied to the data in a bam file by looking at th PG lines in the header.

If you need any additional info, please ask on the GATK forum and thank you for your patience while we deal with the spam issues.

ADD COMMENT

Login before adding your answer.

Traffic: 1657 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6