Question

Extracting SNP data for specific rs#s from published genome sequences

0

Entering edit mode

10.0 years ago

devenvyas ▴ 740

had a question about a project I am conceptualizing. Since I have no experience yet dealing with nuclear DNA, I have some questions.

I have SNP data on 64 samples from my population of interest (~330,000 SNPs per sample using the HumanCNV370-Quad).

I will likely be SNP typing some more in the near future, but I wanted to see what I can do with the existing SNP data in regards to estimating archaic introgression. I know Sánchez-Quinto et al. (2012) and Reich et al. (2011) had used f4 statistics (described in depth by Patterson et al. (2012) here to estimate Neanderthal and Denisovan ancestry respectively using SNP data.

Basically, (f4(A,O;X,C))/(f4(A,O;B,C)) equals the estimator of Neanderthal ancestry when A=Denisovan, B=Neanderthal, C=YRI, O=Pan troglodytes or paniscus, and X=My data and other comparative populations.

I need to be able to align a Pan genome to the high coverage Altai Neanderthal and Denisovan genomes and the YRI genomes to extract polymorphism data for the ~330,000 rs #s the array typed, and then filter out cases of C-T/G-A (Modern-Archaic) sites. I have no idea how to start on this, and I was wondering if anyone here had an idea for where I should start? Thanks!

snp genome • 2.5k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 10.0 years ago by devenvyas ▴ 740

1

Entering edit mode

just guessing here but how about extracting the a few hundred sequences around each of your snps, say 150bp that cover the SNP somewhere randomly in the 150bp, aligning those to the other genomes and calling snps on those

ADD REPLY • link 10.0 years ago by Istvan Albert 100k

0

Entering edit mode

I unfortunately do not know how to do that. Beyond assembling mitogenomes or getting BEAST to run, I'm still very new to computational stuff.

I have managed to find the Altai Neanderthal VCF files and start dl'ing them to the cluster. I am trying to filter the SNPs by these rs numbers, but I keep getting error messages as shown before

VCFtools - v0.1.11
(C) Adam Auton 2009

Parameters as interpreted:
        --gzvcf AltaiNea.hg19_1000g.1.mod.vcf.gz
        --out filtered_AltaiNea.hg19_1000g.1_
        --snps 330k.txt

Using zlib version: 1.2.3
Versions of zlib >= 1.2.4 will be *much* faster when reading zipped VCF files.
Reading Index file.
Building new index file.
        Scanning Chromosome: 1
        Warning - file contains entries with the same position. These entries will be processed separately.

        Scanning Chromosome: ;GAnc=C;OAnc=C;bSC=640;mSC=0.001;pSC=0.138;GRP=0.28;Map20=1
        Scanning Chromosome: 1
Error: VCF file is not sorted at position 1:3.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by devenvyas ▴ 740