MuTect2 still discard many reads, how to fix?
1
1
Entering edit mode
6.2 years ago
Sharon ▴ 600

Hello everyone

MuTect2 from GATK3 discarded 87.33% of the reads. This is during preparing panel of normals. I will use sp1.vcf in the PON creation !

sp1.vcf: sp1.bam
        java -jar ${GATK}/GenomeAnalysisTK.jar \
        -T MuTect2 \
        -R ${hg38}.fasta \
        -I:tumor sp1.bam \
        --dbsnp ${DBSNP} \
        --cosmic ${COSMIC} \
        --artifact_detection_mode \
        -o sp1.vcf

Does this seem okay to you? Is there anything I can do to fix non primary reads (42.77%) and duplicate reads (44.56%)?

INFO  00:57:17,188 MicroScheduler - 158885515 reads were filtered out during the traversal out of approximately 181931389 total reads (87.33%) 
INFO  00:57:17,190 MicroScheduler -   -> 9210 reads (0.01% of total) failing BadCigarFilter 
INFO  00:57:17,192 MicroScheduler -   -> 81066454 reads (44.56% of total) failing DuplicateReadFilter 
INFO  00:57:17,193 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO  00:57:17,195 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO  00:57:17,197 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO  00:57:17,198 MicroScheduler -   -> 77809851 reads (42.77% of total) failing NotPrimaryAlignmentFilter 
INFO  00:57:17,200 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
------------------------------------------------------------------------------------------
Done. ------------------------------------------------------------------------------------------

This is how STAR log summary looks:

            UNIQUE READS:
               Uniquely mapped reads number |       29463720
                    Uniquely mapped reads % |       72.35%
                      Average mapped length |       271.00
                   Number of splices: Total |       21158459
        Number of splices: Annotated (sjdb) |       21067563
                   Number of splices: GT/AG |       20952149
                   Number of splices: GC/AG |       126813
                   Number of splices: AT/AC |       5622
           Number of splices: Non-canonical |       73875
                  Mismatch rate per base, % |       0.48%
                     Deletion rate per base |       0.02%
                    Deletion average length |       1.27
                    Insertion rate per base |       0.01%
                   Insertion average length |       1.84
                         MULTI-MAPPING READS:
    Number of reads mapped to multiple loci |       9507878
         % of reads mapped to multiple loci |       23.35%
    Number of reads mapped to too many loci |       355263
         % of reads mapped to too many loci |       0.87%
                              UNMAPPED READS:
   % of reads unmapped: too many mismatches |       0.00%
             % of reads unmapped: too short |       2.76%
                 % of reads unmapped: other |       0.67%
                              CHIMERIC READS:
                   Number of chimeric reads |       0
                        % of chimeric reads |       0.00%
RNA-Seq Variant Calling • 2.2k views
ADD COMMENT
2
Entering edit mode
6.2 years ago

If you are sure you want to retain non primary and duplicates, I think you can add to mutect the option --disable_read_filter NotPrimaryAlignmentFilter --disable_read_filter DuplicateReadFilter (not tested). See some docs here.

ADD COMMENT
0
Entering edit mode

Thanks dariober. I don't know it I should retain or not !

ADD REPLY
1
Entering edit mode

If you don't know, then you should probably not retain. There is a reason why they are not retained by default. You can consult GATK Best Practices for more info: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11146

ADD REPLY

Login before adding your answer.

Traffic: 2868 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6