Picking random SNPs from 1000 Genomes using Vcftools
6.6 years ago
pifferdavide ▴ 100

I need to pick random sets of SNPs using Vcftools from 1000 Genomes variant set files. Is there a command to do this?

What kind of output are you looking for? A smaller vcf with random lines from 1000 Genomes vcfs, or just a list of SNPs (rs ids, or list of chr,position,ref,alt)?

A list of SNPs (rs ids)

6.6 years ago

I wrote a simple tool to downsample vcf fles:  https://github.com/lindenb/jvarkit/wiki/DownSampleVcf

$curl -skL "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/ALL.wgs.phase3_shapeit2_mvncall_integrated_v5a.20130502.sites.vcf.gz" |\ gunzip -c | java -jar downsamplevcf.jar -n 100 > out.vcf ADD COMMENT 0 Entering edit mode I installed jvarkit but it won't let me install downsamplevcf. I get the following error message. Curl command not found. How do I install curl? ADD REPLY 0 Entering edit mode ADD REPLY 0 Entering edit mode Awesome. But now it doesn't read the command "ant". What do I need to install next? ADD REPLY 0 Entering edit mode Hi Pierre, Using downsamplevcf.jar, is there any possibility to get random SNP with similar LD and allele frequency to the our SNPs under study? ADD REPLY 1 Entering edit mode 6.6 years ago To sample without replacement with sample: $ N=1234
$sample --sample-size=${N} foo.vcf > sample.\${N}.vcf

It looks like sample is not a Vcftools command

Alex clearly pointed to a tool that is not vcftools.

Reread my question "Picking random SNPs from 1000 Genomes using Vcftools". Wrong answer since I asked how to do that job using vcftools!

To paraphrase the great English philosopher Mick Jagger, "You can't always get what you want. But if you ask some time, then you might find, there's a different tool that will actually do what you want."

Sure. I tried to install your downsamplevcf, but there are too many previous steps. I installed jvarkit but it still won't work. The ant command isn't recognized. I suppose I'll have to install Apache Ant too? Sorry for these newbie questions...

0
Sorry, which answer do you mean? please kindly let me know if you have any suggestions.

0
6.6 years ago

There's no simple way of doing this directly in vcftools (although using 'sample' seems a good suggestion). However, perhaps you could use the --thin command to achieve what you need?

