Question: Database Of Human Structural Variants Enabling Vcf File Creation
1
gravatar for Travis
7.4 years ago by
Travis2.8k
USA
Travis2.8k wrote:

Hi all,

I am trying to get hold of or create a VCF format file containing known, well validated human structural variants in order to introduce them into a reference genome. The end goal is to assess a range of SV detection tools using an unbiased, mutated dataset. Is anyone aware of a human SV database that enables download of this information in VCF format or in a format easily converted to VCF? I have done some searching but the only databases I have found don't appear to provide the 'reference' and 'mutant' sequence that would enable me to recreate the mutation in silico.

Thanks in advance.

vcf sv next-gen • 2.4k views
ADD COMMENTlink modified 7.4 years ago by deanna.church1.1k • written 7.4 years ago by Travis2.8k
1
gravatar for deanna.church
7.4 years ago by
deanna.church1.1k
Bethesda, MD
deanna.church1.1k wrote:

dbVar (http://www.ncbi.nlm.nih.gov/dbvar) is a database of structural variants and provides FTP files in gvf format. You can get data by organisms/assembly or by organism/study: ftp://ftp.ncbi.nlm.nih.gov/pub/dbVar/data/Homo_sapiens/

ADD COMMENTlink written 7.4 years ago by deanna.church1.1k
0
gravatar for matted
7.4 years ago by
matted7.2k
Boston, United States
matted7.2k wrote:

I'm not positive what exactly you need, but have you looked at the 1000 Genomes structural variation dataset?

"The pilot paper data directory contains vcf files for different types of structural variants both for the low coverage and trio pilot studies"

Data here: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/paper_data_sets/a_map_of_human_variation/

It looks like things are categorized by sequencing strategy and variant type, which might be what you need. They're already in VCF, as well.

ADD COMMENTlink written 7.4 years ago by matted7.2k

Apologies - I have edited the question to include "The end goal is to assess a range of SV detection tools using an unbiased, mutated dataset". Since the 1000G VCFs are largely unvalidated and based on some of the software I would like to test, it doesn't satisfy the well-validated/unbiased criteria. Furthermore, the 1000 genomes data you linked to consists of SNVs and small Indels only - I am specifically interested in larger structural variants.

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Travis2.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1328 users visited in the last hour