Question: Where can I get exome vcf file from the 1000 genome project?
1
gravatar for kelly.wang135
4.0 years ago by
Korea, Republic Of
kelly.wang13550 wrote:

Hi, I'm trying to use 1000 genome data as control data for my analysis. I am interested in exome data and where can I get exome vcf data from 1000 genome project? I could find just whole genome data.

Does 1000 genome project provide exome vcf file? Or can I just restrict the target region from whole genome vcf to get the exome data?

1000genome exome • 2.6k views
ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by kelly.wang13550
1
gravatar for Evgeniia Golovina
4.0 years ago by
New Zealand
Evgeniia Golovina1.0k wrote:

Please, look there --> Frequency of Exome data from 1000 Genomes Project

ADD COMMENTlink written 4.0 years ago by Evgeniia Golovina1.0k

Thanks. What I needed is individual level exome data. Using that file, I could extract exome data from whole genome data.

But I have one question. As I understand, whole exome and genome should undergo different calling process. Simply extracting some regions (target exon) from genome data can be considered as exome data?

ADD REPLYlink written 4.0 years ago by kelly.wang13550
1

Hi Kelly,

I came across this question because I wanted exome VCF files. In the end I couldn't find any and I downloaded the FASTQ data and called variants myself. I also subsetted the target exon regions from the WGS data. As you surmised, it is quite different from calling variants from exome data. I wrote this work up on my blog https://davetang.org/muse/2017/02/14/a-single-exome/ if you are still interested.

Cheers,

Dave

ADD REPLYlink written 3.1 years ago by Dave Tang190

I guess, no. I prefer to consider exome data as ones that obtained from Whole Exome Sequencing technology (WES) (library preparation includes enrichment for exon targets). Target exon regions (extracted from WGS data) may not be properly covered and if you plan to use them further in variant calling analysis, you can expect some false-postive and false-negative results.

If you are interested in covering only variants in exons and not in non-coding regions, it's better to use WES data.

1) WES shows high coverage towards the target exon regions. 2) There will always be regions that are not covered sufficiently by WGS , e.g. for variant calling. WGS has its value in identifying variants in regions that are not covered by exome enrichment technologies. These can be regions where enrichment fails, non-coding regions as well as regions that are not present on the current exome designs.

You can find some info about WES and WGS experiments, as well as different enrichment platforms in this paper:

Clark M. J., et al. Performance comparison of exome DNA sequencing technologies. Nature biotechnology 2011; 29(10):908-914.

ADD REPLYlink modified 4.0 years ago • written 4.0 years ago by Evgeniia Golovina1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1806 users visited in the last hour