Question: Whole Genome GATK with uBAM files
1
gravatar for abhishek.maj08
2.1 years ago by
abhishek.maj0810 wrote:

Hi,

I am trying to run GATK on a whole genome. However my files (8) are in unmapped bam format. So do I have to merge the bam files first (Picard MergeSamFiles) into a single file before using BWA mem and subsequently MergeBamAlignment?

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Thanks

snp ubam gatk • 1.4k views
ADD COMMENTlink modified 2.1 years ago by d-cameron2.0k • written 2.1 years ago by abhishek.maj0810

Are those 8 files one sample or 8 samples?

Note that you can pipe the SamToFastq step directly to bwa mem: http://gatkforums.broadinstitute.org/gatk/discussion/6483/how-to-map-and-clean-up-short-read-sequence-data-efficiently#step3D

ADD REPLYlink written 2.1 years ago by WouterDeCoster36k

All the eight files belong to the same animal, so one sample, different flow cells.

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by abhishek.maj0810
0
gravatar for d-cameron
2.1 years ago by
d-cameron2.0k
Australia
d-cameron2.0k wrote:

Also according to this tutorial (https://software.broadinstitute.org/gatk/documentation/article?id=6483) even though I have a uBam, I still have to convert it to fastq at an intermediate step. Is this because of Bwa Mem's input constraints?

Bwa supports uBAM as in input file format so FASTQ generation is not required, and is in fact, not done by the Broad Institute for their samples[1]. That said, their documentation does assume a FASTQ based pipeline:

In case you're wondering, we still show the FASTQ-based workflow as the default in most of our documentation because it is by far the most commonly-used workflow, and we want to keep the documentation accessible for our more novice users.

[1] http://gatkforums.broadinstitute.org/gatk/discussion/5990/what-is-ubam-and-why-is-it-better-than-fastq-for-storing-unmapped-sequence-data

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by d-cameron2.0k

Oh, Broad. I wish they would give up on their ill-fated uBam format and start to care about efficiency.

ADD REPLYlink written 2.1 years ago by Brian Bushnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1300 users visited in the last hour