Question

From FASTQ to clean BAM using GATK tutorial #6484

0

Entering edit mode

6.6 years ago

lamteva.vera ▴ 220

Hi!

I'm trying to build an efficient pipeline for processing amplicon sequencing data. The problem is that ValidateSamFile reveals a bunch of errors in BAM files after running BamClipper (whereas BAMs were free of errors before). Exemplary output of ValidateSamFile (MODE=SUMMARY):

HISTOGRAM   java.lang.String
Error Type  Count
ERROR:INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT  138
ERROR:INVALID_MAPPING_QUALITY   315
ERROR:MISMATCH_FLAG_MATE_UNMAPPED   217
ERROR:MISMATCH_MATE_ALIGNMENT_START 8775
ERROR:MISMATCH_MATE_CIGAR_STRING    2385125
WARNING:MISSING_TAG_NM  2387464

I've read that MergeBamAlignment is a powerful tool for cleaning BAM files while preserving original read information and base quality scores. So I decided to implement the GATK's tutorial #6484 into my analysis pipeline to get rid of the errors.

I just want to ask the community's opinion about the following workflow:

enter image description here

I could have missed something. Any critical thoughts are welcome.

gatk MergeBamAlignment uBAM bamclipper • 3.1k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 6.6 years ago by lamteva.vera ▴ 220

0

Entering edit mode

If I am reading the flow diagram right, why are you adding unaligned BAM data back into final BAM? Isn't that duplicating many reads (aligned and original copy).

ADD REPLY • link 6.6 years ago by GenoMax 142k

0

Entering edit mode

GATK claims that

Broadly, the tool [MergeBamAlignment] merges defined information from the unmapped BAM (uBAM, step 1) with that of the aligned BAM (step 3) to conserve read data, e.g. original read information and base quality scores.

ADD REPLY • link 6.6 years ago by lamteva.vera ▴ 220

0

Entering edit mode

I see. Have you compared the merged BAM with the aligned BAM to see what MergeBamAlignment did?

ADD REPLY • link 6.6 years ago by GenoMax 142k