LOH and CNV data from VCF files
1
2
Entering edit mode
9.4 years ago
GarF ▴ 20

I was asked to find both the CNV and the LOH datas from 3 VCF files (all from the same patient but 2 of them are from different tumors). Here's one of the rows:

chr1 843352 . T C 11.3 . DP=39;VDB=0.0412;AF1=0.5;AC1=1;DP4=13,11,5,5;MQ=42;FQ=14.2;PV4=1,1,1.2e-05,1 GT:PL:GQ 0/1:41,0,255:43

where

##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">

Since I'm new to this I did some researches and, considering that all I have are these VCFs, it seems that I lack both the BAM/SAM files and the fields from the INFO column of the VCFs to get the data I need.

I kept searching anyway and I bumped into a Bioconductor package, SomatiCA, that seems to deliver what I need from, among the other things, Lesser Allele Frequency (LAF) infos. I did some other researches but now I'm kinda struggling trying to figure out if and how I can calculate the LAFs from just what I have.

Any help regarding the whole situation will be appreciated, thanks in advance.

VCF LOH CNV • 4.1k views
ADD COMMENT
0
Entering edit mode
9.1 years ago
ivivek_ngs ★ 5.2k

From the vcf file you can always extract the allele frequency. LAF should be having frequency less than 50%. So at each position you have calculate based from the read depth the lesser allele frequency and then create a tab-delimited file in the fashion required by SomatiCA and then run the required algorithm. You should be familiar with unix and shell commands.

ADD COMMENT

Login before adding your answer.

Traffic: 1976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6