How to process 1000GP data
1
0
Entering edit mode
9.1 years ago
kumbarov ▴ 10

I am willing to bulk process the 1000GP Y-SNP data for some projects of mine. I've downloaded the ALL.chrY.phase3_integrated.20130502.genotypes.vcf file and I would like to extract the data for only some samples or remove some samples. What will be the easiest way? Even better, is there a readily available script to import this sort of data into a database? I am new to this, so any advice on working with this sort of data is welcome.

next-gen SNP snp • 2.4k views
ADD COMMENT
4
Entering edit mode
9.1 years ago

using bcftools

$ curl -s  "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view --samples NA19455,NA20291 -

(...)
#CHROM    POS    ID    REF    ALT    QUAL    FILTER    INFO    FORMAT    NA19455NA20291
Y    2655180    rs11575897    G    A    100    PASS    AA=G;AC=0;AF=0.0178427;AN=2;DP=84761;NS=1233;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;EAS_AF=0.0451;VT=SNP;EX_TARGET    GT    0    0
Y    2655471    .    A    C    100    PASS    AA=A;AC=0;AF=0.00405515;AN=2;DP=72067;NS=1233;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;EAS_AF=0.0102;VT=SNP;EX_TARGET    GT    0    0

(...)
ADD COMMENT
1
Entering edit mode

Note, curl not necessary here:

bcftools view --samples NA19455,NA20291 ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz
ADD REPLY
0
Entering edit mode
curl -s "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view --samples NA19455,NA20291
view: invalid option -- '-'
open: No such file or directory
Segmentation fault (core dumped)

curl -s "ftp://ftp-trace.ncbi.nih.gov//1000genomes/ftp/release/20130502/ALL.chrY.phase3_integrated_v1a.20130502.genotypes.vcf.gz" | gunzip -c | bcftools view -samples NA19455,NA20291
open: No such file or directory
Segmentation fault (core dumped)
ADD REPLY
0
Entering edit mode
bcftools -v
bcftools 1.2
Using htslib 1.2.1
Copyright (C) 2015 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
ADD REPLY
0
Entering edit mode

oh and I forgot to paste the hyphen '-' after the command (read stdin)

ADD REPLY
0
Entering edit mode

bcftools -v
[main] Unrecognized command.

I am using the version that comes with Ubuntu 14.04. I've downloaded and compiled the htslib and samtools source code and compiled it but I don't get a bcftools binary.
ADD REPLY
0
Entering edit mode

The version of bcftools that comes with Ubuntu 14.04 is completely broken. I get segfaults all the time. I downloaded the source for htslib, samtools and bcftools from GitHub and compiled it. The above command works perfectly with the upstream version of bcftools.

ADD REPLY

Login before adding your answer.

Traffic: 2703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6