Filtering significant somatic variants from a VCF file
0
1
Entering edit mode
2.2 years ago
gprashant17 ▴ 70

I have used GATK's pipeline (RNA-seq alignment) and obtained a VCF file using Mutect2 and FilterMutectCalls. Filters such as PASS, Clustered_events, germline, weak-evidence, etc. were added to the variants. Before annotating the file with dbSNP, COSMIC, ANNOVAR, I would like to filter out significant somatic mutations into a separate file in order to facilitate easier analysis. Is it a good practice to exclude all the variants whose filter is not 'PASS'? Also, in case of deciding whether the variant is germline, is the presence of 'germline' filter alone sufficient or it is better to set a threshold to the GERMQ score?

Or is it better to filter out variants after annotating the files?

vcf alignment RNA-Seq mutations ngs • 2.4k views
1
Entering edit mode

A lot of the terms you use here don't seem to be global terms. Can you please edit your question and explain what these terms mean? You can look at the ##INFO fields to understand them, except for PASS, which you will need to look at ##FILTER.

• PASS
• Clustered_events
• germline
• weak-evidence
• GERMQ

Also, please add some details on your experiment design - if there were matched normals, panel of normals, etc.

1
Entering edit mode

I have aligned RNA-seq tumour samples using STAR and I am following GATK's best practices for variant calling, in order to identify somatic mutations causing cancer. Since I do not have a normal sample, I used Mutect2 with only the required arguments.

I am encountering these filters (PASS, Clustered_events, weak-evidence) for the first time and I am reading the following paper about Mutect filtering for reference:

Apparently, FilterMutectCalls labels variants which are false positives with a list of failed filters and true positives with PASS.