Question: Whole Genome GATK with uBAM files
1
gravatar for abhishek.maj08
22 months ago by
abhishek.maj0810 wrote:

Hi,

I am trying to run GATK on a whole genome. However my files (8) are in unmapped bam format. So do I have to merge the bam files first (Picard MergeSamFiles) into a single file before using BWA mem and subsequently MergeBamAlignment?

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Thanks

snp ubam gatk • 1.2k views
ADD COMMENTlink modified 22 months ago by d-cameron1.9k • written 22 months ago by abhishek.maj0810

Are those 8 files one sample or 8 samples?

Note that you can pipe the SamToFastq step directly to bwa mem: http://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently#step3D

ADD REPLYlink written 22 months ago by WouterDeCoster34k

All the eight files belong to the same animal, so one sample, different flow cells.

ADD REPLYlink modified 22 months ago • written 22 months ago by abhishek.maj0810
0
gravatar for d-cameron
22 months ago by
d-cameron1.9k
Australia
d-cameron1.9k wrote:

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Bwa supports uBAM as in input file format so FASTQ generation is not required, and is in fact, not done by the Broad Institute for their samples[1]. That said, their documentation does assume a FASTQ based pipeline:

In case you're wondering, we still show the FASTQ-based workflow as the default in most of our documentation because it is by far the most commonly-used workflow, and we want to keep the documentation accessible for our more novice users.

[1] http://gatkforums.broadinstitute.org/gatk/discussion/5990/what-is-ubam-and-why-is-it-better-than-fastq-for-storing-unmapped-sequence-data

ADD COMMENTlink modified 22 months ago • written 22 months ago by d-cameron1.9k

Oh, Broad. I wish they would give up on their ill-fated uBam format and start to care about efficiency.

ADD REPLYlink written 22 months ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1705 users visited in the last hour