Code to select randomly 1 SNP per scaffold
1
1
Entering edit mode
3.4 years ago
angelaparody ▴ 80

Hi,

I would like/need to pick 1 SNP per scaffold randomly from a .vcf file and generate a new .vcf file with those SNPs. The input file I have has SNPs from scaffolds of different length and with different number of SNPs (i.e. there are scaffolds with 5 SNPs and scaffolds with 200 SNPs). What I need is similar to --thin-count (PLINK) which removes variants at random until only n remains, but I want to include the fact that I want just 1 SNP per scaffold (well, in this case, remove all SNPs of each scaffold leaving just one).

Second step would be doing this re-sampling several times. Specifically I am looking for a code that produces X number of .vcf files, and each .vcf file has a randomly selection of SNPs, 1 per scaffold.

Would this be possible? (Nothing is impossible, right!? ;) ) or suggestions?

Kind regards,

'Angela

PS: Let me know if you need more specifications.

SNP vcf filtering selection random • 1.7k views
1
Entering edit mode
3.4 years ago
JC 12k

#!/usr/bin/perl
use strict;
use warnings;
my %snps_per_seq = ();
while (<>) {
if (/^#/) { # print headers as the original
print;
}
else {
my ($seq_id) = split (/\t/,$_);
push @{ $snps_per_seq{$seq_id } }, $_; # store a hash per sequence, with possitions as array } } foreach my$seq_id (sort keys %snps_per_seq) {
my @snps = @{ $snps_per_seq{$seq_id } };
print \$snps[ int rand @snps ]; # grab one random position from the array
}


usage:

perl randSnps.pl < input.vcf > output.vcf