Filetring in vcf files
2
0
Entering edit mode
5.7 years ago
Gene_MMP8 ▴ 210

So I have a vcf file (myvar.vcf) and I want to do the following:-
1. Remove all mutations tagged as INDELS.
2. Consider mutaions that have a read depth greater than 100
3.To get all the mutations that occur in the tumor genome with respect to the control genome, we select all the mutations that have a genotype of 0/0 for the control genome and have a genotype of 1/1 or 0/1 for the tumor genome.

I have been trying all the three for quite a long time, but with no result. So can anyone help me with the steps? Thanks.

alignment • 1.7k views
2
Entering edit mode

Have you tried doing it with SNPeff/SNPsift? The manual is well written and should guide you to do that.

0
Entering edit mode

Thanks. Wil try that

2
Entering edit mode

"read depth greater than 100" in both samples? What if one has read depth 100 and the other 99? Total read depth? What if a mutation appears somatic as your tumour has read depth 100 and your normal 0?

0
Entering edit mode

i have 12 samples in all. So read depth > 100 in all of them. I would filter out these sites because I dont want to trust mutations at sites with very high coverage, because they might represent variation between variable copy number repeats, i.e., the reads that map to this location in the reference are actually from duplicated sites in the sample.

2
Entering edit mode

Your genotyping filter assumes diploid. This condition is frequently violated in tumours. When you also consider tumour purity and sub-clonality, variant allele frequencies of 0.33/0.66 are perfectly normal even for diploid tumours.

2
Entering edit mode
5.7 years ago

using vcffilterjs http://lindenb.github.io/jvarkit/VCFFilterJS.html

java -jar dist/vcffilterjs.jar -f script.js input.vcf


and the following script:

var sample2status={
"Sample1":0,
"Sample2":0,
"Sample3":1
};

function accept(v)
{
if(v.isIndel()) return false;
if(v.hasAttribute("DP") && v.getAttributeAsInt("DP",0) <100) return false;
for(var sampleName in sample2status)
{
var status = sample2status[sampleName];
var genotype= v.getGenotype(sampleName);
if(genotype==null || genotype.isNoCall()) continue;
if( status == 0 && !genotype.isHomRef()) return false;
if( status == 1 && !(genotype.isHomVar() || genotype.isHet())) return false;
}
return true;
}
accept(variant);

0
Entering edit mode
5.7 years ago
d-cameron ★ 2.8k

What program generated the VCF? A somatic variant caller (which you should be using if you are doing somatic variant analysis) will typically use the SOMATIC flag to identify tumour-only mutations.

Have you looked at the VCF headers in your file? The R VariantAnnotation package will allow you to perform each of the filters you want to do each in 1 line of code.

0
Entering edit mode

I have tried Variant Annotation package. But it takes a very long time. Each tumor and control file is 100GB in size on an average. Any suggestions?