From FASTQ to clean BAM using GATK tutorial #6484
0
0
Entering edit mode
7.1 years ago
lamteva.vera ▴ 220

Hi!

I'm trying to build an efficient pipeline for processing amplicon sequencing data. The problem is that ValidateSamFile reveals a bunch of errors in BAM files after running BamClipper (whereas BAMs were free of errors before). Exemplary output of ValidateSamFile (MODE=SUMMARY):

HISTOGRAM   java.lang.String
Error Type  Count
ERROR:INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT  138
ERROR:INVALID_MAPPING_QUALITY   315
ERROR:MISMATCH_FLAG_MATE_UNMAPPED   217
ERROR:MISMATCH_MATE_ALIGNMENT_START 8775
ERROR:MISMATCH_MATE_CIGAR_STRING    2385125
WARNING:MISSING_TAG_NM  2387464

I've read that MergeBamAlignment is a powerful tool for cleaning BAM files while preserving original read information and base quality scores. So I decided to implement the GATK's tutorial #6484 into my analysis pipeline to get rid of the errors.

I just want to ask the community's opinion about the following workflow:

enter image description here

I could have missed something. Any critical thoughts are welcome.

gatk MergeBamAlignment uBAM bamclipper • 3.2k views
ADD COMMENT
0
Entering edit mode

If I am reading the flow diagram right, why are you adding unaligned BAM data back into final BAM? Isn't that duplicating many reads (aligned and original copy).

ADD REPLY
0
Entering edit mode

GATK claims that

Broadly, the tool [MergeBamAlignment] merges defined information from the unmapped BAM (uBAM, step 1) with that of the aligned BAM (step 3) to conserve read data, e.g. original read information and base quality scores.

ADD REPLY
0
Entering edit mode

I see. Have you compared the merged BAM with the aligned BAM to see what MergeBamAlignment did?

ADD REPLY

Login before adding your answer.

Traffic: 1859 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6