About GATK data-preprocessing workflow
Entering edit mode
2.5 years ago
9521ljh ▴ 40

I have fastq files that i want to make BAM files.

In GATK workflow of pre-processing, uBAM(unmapped bam)file is necessary because it have metadata.

Thus, i did

Fastq -> BWA - mapped BAM

Fastq -> Picard - uBAM

uBAM + mapped BAM -> Picard - Merge

However, i really don't know why this process is needed. Because we can add metadata to BAM with Picard(Addorreplacereadgroups) instead of using uBAM

i already read this article: https://gatkforums.broadinstitute.org/gatk/discussion/11694/why-is-converting-from-fastq-to-ubam-nesessary-before-preprocessing#latest

assembly next-gen GATK Preprocessing uBAM • 1.0k views
Entering edit mode
2.5 years ago
benformatics ★ 2.6k

The metadata is not related to the read groups.

As the skywarrior person said in the post you linked:

BWA hardclips reads if there is a significant discordance between the best matching kmer and the read. These hardclips may end up costing you a particular structural variant or a true indel call. Merging unmapped bam and initial alignment restores the hardclips which I know of no solution for that in BWA parameters.

Thus you are not really losing metadata... you are potentially losing actual data from your original sequencing reads. This step may be unnecessary depending on the type of dataset you have (Exome vs. Whole genome) or furthermore maybe you don't care about certain structural variants and/or know that they aren't present in your dataset.

Entering edit mode

Thank you for reply.

could you explain example of metadata??.. i just thought it was like platform(illumina), library, Sample_NAMe...

but all of these is included AddorReplacegroups options.

Entering edit mode

Yes those are examples of metadata... but the issue here is that you are excluding the core of your data (i.e. nucleotide sequence) because of an underlying aspect of the bwa software. This is completely independent of any meta-deta.


Login before adding your answer.

Traffic: 2009 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6