Question: Extraction of SNPs from the 1000G vcf file for a set of regions
0
gravatar for SGMS
3.2 years ago by
SGMS60
European Union
SGMS60 wrote:

Hi all,

I am trying to extract SNPs from the vcf file of the 1000G using bcftools. I managed to extract the SNPs in a separate file, but I want to get the SNPs only in the regions which I'm interested in. I found that -R option provides me with this but I tried it in several ways and I didn't make it work. So the following works:

bcftools query -f'%CHROM\t%POS\t%REF,%ALT\n' ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > extraction.txt 

But then the following does not:

bcftools query -f'%CHROM\t%POS\t%ID\t%REF,%ALT\n' -R regions.bed ALL.wgs.phase3_shapeit2_mvncall_integrated_v5b.20130502.sites.vcf.gz > extraction.txt

I'm suspecting that the problem is the bed file which initially was a .txt file. The original format was:

1:150748683-150800917
1:151758546-151824348
1:154520256-154572353

and now:

1:150718683-150830917   150718682       150830917
1:151728546-151854348   151728545       151854348
1:154490256-154602353   154490255       154602353

If someone could give me a hint, that would be great.

Thank you!

snp bcftools 1000g • 1.3k views
ADD COMMENTlink written 3.2 years ago by SGMS60

Try tab-delimited chrom, start, end and possibly identifier.

1   150718682       150830917 1:150718683-150830917
1   151728545       151854348 1:151728546-151854348
1   154490255       154602353 1:154490256-154602353

ADD REPLYlink written 3.2 years ago by trausch1.2k

Did you look at tabix?

Usage:   tabix [OPTIONS] [FILE] [REGION [...]]

tabix -h file.vcf.gz 1:150718683-150830917 

Loop over a set of regions in your bed file.

ADD REPLYlink written 3.2 years ago by venu6.1k

Sorry, I'm not sure whether I got this.. My initial file has the chr:start-end format and I don't know how to turn the bed file into a format accepted by bcftools..  

ADD REPLYlink written 3.2 years ago by SGMS60

@StatGen Its not for bcftools solution. Its is for your entire task to create a new VCF file with a set of regions in bed file.

ADD REPLYlink written 3.2 years ago by venu6.1k

Hi venu,

Thanks.. The task though is not to create a new vcf with a set of regions in a bed file.. I want to extract the SNPs from the 1000G vcf, whose positions overlap with the regions I will specify.. My regions though are in a text format which I turned into a bed format which is most probably not acceptable by the bcftools.. 

ADD REPLYlink written 3.2 years ago by SGMS60

Can you just show how you are expecting the output file?

ADD REPLYlink written 3.2 years ago by venu6.1k

After I use bcftools I'm expecting:

1       10177   rs367896724     A,AC
1       10235   rs540431307     T,TA
1       10352   rs555500075     T,TA
1       10505   rs548419688     A,T
1       10506   rs568405545     C,G
1       10511   rs534229142     G,A

i.e CHR POS ID REF,ALT

Thanks

 

ADD REPLYlink written 3.2 years ago by SGMS60
tabix file.vcf.gz 1-10177-10177 | cut -f 1-5 > out.vcf

Select required fields with cut.

ADD REPLYlink written 3.2 years ago by venu6.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1700 users visited in the last hour