I'm trying to build an efficient pipeline for processing amplicon sequencing data. The problem is that ValidateSamFile reveals a bunch of errors in BAM files after running BamClipper (whereas BAMs were free of errors before). Exemplary output of ValidateSamFile (MODE=SUMMARY):
HISTOGRAM java.lang.String Error Type Count ERROR:INVALID_FLAG_SUPPLEMENTARY_ALIGNMENT 138 ERROR:INVALID_MAPPING_QUALITY 315 ERROR:MISMATCH_FLAG_MATE_UNMAPPED 217 ERROR:MISMATCH_MATE_ALIGNMENT_START 8775 ERROR:MISMATCH_MATE_CIGAR_STRING 2385125 WARNING:MISSING_TAG_NM 2387464
I've read that MergeBamAlignment is a powerful tool for cleaning BAM files while preserving original read information and base quality scores. So I decided to implement the GATK's tutorial #6484 into my analysis pipeline to get rid of the errors.
I just want to ask the community's opinion about the following workflow:
I could have missed something. Any critical thoughts are welcome.