Vcf And Copy Number Variations - How These Could Be Displayed?
1
7
Entering edit mode
11.6 years ago
user56 ▴ 300

This is a newbie questions. Assume I have a VCF file with 69 people. I can see all short in[/]dels, in addition to SNPs. (nice improvement over genotyping)

But the genomes of the patients also have CNVs. The DNA is huge and I only see differences in the VCF to the reference genome. Can the VCF file somehow also display that patient X has 69 copies of certain CNV region. How are CNV adresses in the reference genome build.?

vcf cnv • 5.2k views
ADD COMMENT
6
Entering edit mode
11.6 years ago
Ryan D ★ 3.4k

CNVs are listed as SVs (structural variants) in VCF files. I believe an VT=SV is anything that is over 50bp while indels (VT=INDEL) are for sizes under this. As to your question of copy number--I think sequencing technologies typically catch deletions and so most deletion SVs will appear in a VCF file while those that are duplications would not likely be well-measured with sequencing technologies. I am relatively new to the VCF format, though, since I work mostly with array data. You can pull data for your region from 1000 genomes and check what SVs are in the region with the following code:

tabix -fh ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr4.phase1_integrated_calls.20101123.snps_indels_svs.genotypes.vcf.gz 4:152890286-153092741 > genotypes.vcf
cat genotypes.vcf | grep SV

This will pull out all of the SVs in the 1092 1000G samples.

What you're asking though is how to display differences with this reference sequence. I'm not sure the VCF file is the best way to do this. Some kind of read depth analysis might be more helpful and give you a semi-quantitative measure of copy number. People doing more sequencing may have better insight.

ADD COMMENT

Login before adding your answer.

Traffic: 1210 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6