Question: Vcf And Copy Number Variations - How These Could Be Displayed?
gravatar for user56
7.0 years ago by
United States
user56290 wrote:

This is a newbie questions. Assume I have a VCF file with 69 people. I can see all short in[/]dels, in addition to SNPs. (nice improvement over genotyping)

But the genomes of the patients also have CNVs. The DNA is huge and I only see differences in the VCF to the reference genome. Can the VCF file somehow also display that patient X has 69 copies of certain CNV region. How are CNV adresses in the reference genome build.?

vcf cnv • 3.8k views
ADD COMMENTlink written 7.0 years ago by user56290
gravatar for Ryan D
7.0 years ago by
Ryan D3.3k
Ryan D3.3k wrote:

CNVs are listed as SVs (structural variants) in VCF files. I believe an VT=SV is anything that is over 50bp while indels (VT=INDEL) are for sizes under this. As to your question of copy number--I think sequencing technologies typically catch deletions and so most deletion SVs will appear in a VCF file while those that are duplications would not likely be well-measured with sequencing technologies. I am relatively new to the VCF format, though, since I work mostly with array data. You can pull data for your region from 1000 genomes and check what SVs are in the region with the following code:

tabix -fh 4:152890286-153092741 > genotypes.vcf
cat genotypes.vcf | grep SV

This will pull out all of the SVs in the 1092 1000G samples.

What you're asking though is how to display differences with this reference sequence. I'm not sure the VCF file is the best way to do this. Some kind of read depth analysis might be more helpful and give you a semi-quantitative measure of copy number. People doing more sequencing may have better insight.

ADD COMMENTlink written 7.0 years ago by Ryan D3.3k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2421 users visited in the last hour