Question: Is it okay to realign and subsequently process fastq files that were converted from processed BAM files?
0
gravatar for mary.a.wood.91
3.0 years ago by
mary.a.wood.9110 wrote:

I have been given BAM files from a collaborator that have already gone through processing, but I want to incorporate this data set into a larger analysis for which I have a standard protocol for taking files from fastq through somatic variant calling (roughly similar to GATK's best practices). I am hoping to convert the BAM files back to fastq using Picard's SamToFastq program, and then take the files from there through the typical protocol, but I was curious about what potential issues this may raise, given that the quality scores may be different now than they were for the original fastqs.

FYI, the specifics for the BAM files were that the original fastqs were “aligned to the hg19 human genome build using BWA (v0.7.5) [and then] subjected to mark duplication, realignment, and recalibration using the Picard tool and GATK software tools”

Unfortunately, I don't know more about the origin of these files than that, but any general insights as to how the previous processing of the BAMs might affect how the fastqs I will generate are treated would be appreciated!

ADD COMMENTlink modified 3.0 years ago by Istvan Albert ♦♦ 84k • written 3.0 years ago by mary.a.wood.9110

If you have been given the files, you could ask the person who gave you the bam files for more information, or even ask about the fastq files. Or is that out of question?

ADD REPLYlink written 3.0 years ago by h.mon30k

It's not necessarily out of the question, but difficult. I wasn't the one directly given the data, and my supervisor who received the files is having difficulty reaching those who generated the data. I was hoping to be able to move forward with analysis, but I wanted to try get a general feel for how feasible it is to work with fastqs generated from processed bams.

ADD REPLYlink written 3.0 years ago by mary.a.wood.9110
3
gravatar for Istvan Albert
3.0 years ago by
Istvan Albert ♦♦ 84k
University Park, USA
Istvan Albert ♦♦ 84k wrote:

Unless the reads were hard clipped the sequence information is unaltered and can be recovered into its original format.

The samtools fastq command can also perform the back conversion.

ADD COMMENTlink written 3.0 years ago by Istvan Albert ♦♦ 84k

To add more to this - the content of the BAM file may only be a subset of the original data though.

ADD REPLYlink written 3.0 years ago by Istvan Albert ♦♦ 84k

Thanks Istvan! You don't think there will be any issues with the quality scores, for example when it comes time to call variants?

ADD REPLYlink written 3.0 years ago by mary.a.wood.9110

It will reconstitute the quality scores as well.

That FASTQ file will be just as fresh, fluffy and untouched as if it just rolled off of an instrument.

ADD REPLYlink written 3.0 years ago by Istvan Albert ♦♦ 84k

Okay, thanks so much for your help!

ADD REPLYlink written 3.0 years ago by mary.a.wood.9110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1795 users visited in the last hour