Question: How To Get Non-Snp Sites Or Invariant Sites Based On 1000 Genomes Project
0
gravatar for Gangcai
7.4 years ago by
Gangcai230
Berlin Or Shanghai
Gangcai230 wrote:

Hi everyone, I need to find the human genomic sites which are invariant among all the individuals (identified by sequencing in all individuals and also without SNP). I know the sites which are not included in the 38 million SNPs identified by the 1000 genome project are possible candidates. However, not all of them are necessarily sequenced or covered in all individuals. One possible way to get such invariant sites is to first find the sites which are covered in all individuals and then subtract the SNP sites. Does anybody know how to get such information or where to get it (exclude extracting such information from the raw mapping data)? Thanks very much.

1000genomes snp • 3.2k views
ADD COMMENTlink modified 7.1 years ago by esha.sharma0 • written 7.4 years ago by Gangcai230

The vcf file by definition can record monomorphic sites (sites without alternate alleles), however the vcf files from the 1000 genomes release (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/) seems only contain SNPs, INDELS, SVS but no monomorphic or invariant sites.

ADD REPLYlink modified 7.4 years ago • written 7.4 years ago by Gangcai230
1
gravatar for Pierre Lindenbaum
7.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

use GATK SelectVariants to keep the variant where all the samples have been covered; something like (not tested):

-select 'vc.getGenotype("SAMPLE1").isCalled() && vc.getGenotype("SAMPLE2").isCalled() && vc.getGenotype("SAMPLE3").isCalled()  .... etc...'

see also GATK multi-sample VCF VariantFiltration

use vcftools to remove the SNP from the VCF of the 1K genomes projects http://vcftools.sourceforge.net/docs.html#isec

ADD COMMENTlink modified 11 months ago by RamRS30k • written 7.4 years ago by Pierre Lindenbaum131k

Hi Pierre, thanks for your reply. For the first step, which should be the input file? I have checked the 1000 genomes release files, they seems only contain the sequencing information for variant sites (eg SNPs, INDELs, SVS) but no invariant sites or monomorphic sites.

ADD REPLYlink written 7.4 years ago by Gangcai230
0
gravatar for esha.sharma
7.1 years ago by
esha.sharma0 wrote:

hello Gangcai, did u get any way to get non-SNP or invariant sites based on 1000 genome project ??? I am trying Unified Genotyper of GATK-2.7-2 for the same, will it give any direction for finding invariants in exomes ???

ADD COMMENTlink written 7.1 years ago by esha.sharma0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1281 users visited in the last hour