Filetring in vcf files
2
0
Entering edit mode
7.0 years ago
Gene_MMP8 ▴ 240

So I have a vcf file (myvar.vcf) and I want to do the following:-
1. Remove all mutations tagged as INDELS.
2. Consider mutaions that have a read depth greater than 100
3.To get all the mutations that occur in the tumor genome with respect to the control genome, we select all the mutations that have a genotype of 0/0 for the control genome and have a genotype of 1/1 or 0/1 for the tumor genome.

I have been trying all the three for quite a long time, but with no result. So can anyone help me with the steps? Thanks.

alignment • 2.3k views
ADD COMMENT
2
Entering edit mode

Have you tried doing it with SNPeff/SNPsift? The manual is well written and should guide you to do that.

ADD REPLY
0
Entering edit mode

Thanks. Wil try that

ADD REPLY
2
Entering edit mode

"read depth greater than 100" in both samples? What if one has read depth 100 and the other 99? Total read depth? What if a mutation appears somatic as your tumour has read depth 100 and your normal 0?

ADD REPLY
0
Entering edit mode

i have 12 samples in all. So read depth > 100 in all of them. I would filter out these sites because I dont want to trust mutations at sites with very high coverage, because they might represent variation between variable copy number repeats, i.e., the reads that map to this location in the reference are actually from duplicated sites in the sample.

ADD REPLY
2
Entering edit mode

Your genotyping filter assumes diploid. This condition is frequently violated in tumours. When you also consider tumour purity and sub-clonality, variant allele frequencies of 0.33/0.66 are perfectly normal even for diploid tumours.

ADD REPLY
2
Entering edit mode
7.0 years ago

using vcffilterjs http://lindenb.github.io/jvarkit/VCFFilterJS.html

java -jar dist/vcffilterjs.jar -f script.js input.vcf

and the following script:

var sample2status={
    "Sample1":0,
    "Sample2":0,
    "Sample3":1
    };

function accept(v)
    {
    if(v.isIndel()) return false;
    if(v.hasAttribute("DP") && v.getAttributeAsInt("DP",0) <100) return false;
    for(var sampleName in sample2status)
        {
        var status = sample2status[sampleName];
        var genotype= v.getGenotype(sampleName);
        if(genotype==null || genotype.isNoCall()) continue;
        if( status == 0 && !genotype.isHomRef()) return false;
        if( status == 1 && !(genotype.isHomVar() || genotype.isHet())) return false;
        }
    return true;
    }
 accept(variant);
ADD COMMENT
0
Entering edit mode
7.0 years ago
d-cameron ★ 2.9k

What program generated the VCF? A somatic variant caller (which you should be using if you are doing somatic variant analysis) will typically use the SOMATIC flag to identify tumour-only mutations.

Have you looked at the VCF headers in your file? The R VariantAnnotation package will allow you to perform each of the filters you want to do each in 1 line of code.

ADD COMMENT
0
Entering edit mode

I have tried Variant Annotation package. But it takes a very long time. Each tumor and control file is 100GB in size on an average. Any suggestions?

ADD REPLY

Login before adding your answer.

Traffic: 2628 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6