LOH and CNV data from VCF files
1
2
Entering edit mode
6.4 years ago
GarF ▴ 20

I was asked to find both the CNV and the LOH datas from 3 VCF files (all from the same patient but 2 of them are from different tumors). Here's one of the rows:

chr1 843352 . T C 11.3 . DP=39;VDB=0.0412;AF1=0.5;AC1=1;DP4=13,11,5,5;MQ=42;FQ=14.2;PV4=1,1,1.2e-05,1 GT:PL:GQ 0/1:41,0,255:43

where

##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">

Since I'm new to this I did some researches and, considering that all I have are these VCFs, it seems that I lack both the BAM/SAM files and the fields from the INFO column of the VCFs to get the data I need.
I kept searching anyway and I bumped into a Bioconductor package, SomatiCA, that seems to deliver what I
need from, among the other things, Lesser Allele Frequency (LAF) infos. I did some other researches but now
I'm kinda struggling trying to figure out if and how I can calculate the LAFs from just what I have.

Any help regarding the whole situation will be appreciated, thanks in advance.

LOH CNV VCF • 3.3k views
0
Entering edit mode
6.1 years ago
ivivek_ngs ★ 5.1k

From the vcf file you can always extact the allele frequency. LAF should be having frequency less than 50%. So at each position you have calculate based from the read depth the lesser allele frequency and then create a tab-delimited file in the fashion required by SomatiCA and then run the required algorithm. You should be familiar with unix and shell commands.