Question

Whole Genome GATK with uBAM files

1

Entering edit mode

7.8 years ago

abhishek.maj08 ▴ 10

Hi,

I am trying to run GATK on a whole genome. However my files (8) are in unmapped bam format. So do I have to merge the bam files first (Picard MergeSamFiles) into a single file before using BWA mem and subsequently MergeBamAlignment?

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Thanks

GATK ubam SNP • 3.3k views

ADD COMMENT • link updated 7.8 years ago by d-cameron ★ 2.9k • written 7.8 years ago by abhishek.maj08 ▴ 10

0

Entering edit mode

Are those 8 files one sample or 8 samples?

Note that you can pipe the SamToFastq step directly to bwa mem: http://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently#step3D

ADD REPLY • link 7.8 years ago by WouterDeCoster 47k

0

Entering edit mode

All the eight files belong to the same animal, so one sample, different flow cells.

ADD REPLY • link 7.8 years ago by abhishek.maj08 ▴ 10

score 0 · Answer 1 · 2017-01-04

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Bwa supports uBAM as in input file format so FASTQ generation is not required, and is in fact, not done by the Broad Institute for their samples[1]. That said, their documentation does assume a FASTQ based pipeline:

In case you're wondering, we still show the FASTQ-based workflow as the default in most of our documentation because it is by far the most commonly-used workflow, and we want to keep the documentation accessible for our more novice users.

[1] http://gatkforums.broadinstitute.org/gatk/discussion/5990/what-is-ubam-and-why-is-it-better-than-fastq-for-storing-unmapped-sequence-data