Code to select randomly 1 SNP per scaffold
1
1
Entering edit mode
3.4 years ago
angelaparody ▴ 80

Hi,

I would like/need to pick 1 SNP per scaffold randomly from a .vcf file and generate a new .vcf file with those SNPs. The input file I have has SNPs from scaffolds of different length and with different number of SNPs (i.e. there are scaffolds with 5 SNPs and scaffolds with 200 SNPs). What I need is similar to --thin-count (PLINK) which removes variants at random until only n remains, but I want to include the fact that I want just 1 SNP per scaffold (well, in this case, remove all SNPs of each scaffold leaving just one).

Second step would be doing this re-sampling several times. Specifically I am looking for a code that produces X number of .vcf files, and each .vcf file has a randomly selection of SNPs, 1 per scaffold.

Would this be possible? (Nothing is impossible, right!? ;) ) or suggestions?

Thanks in advance,

Kind regards,

'Angela

PS: Let me know if you need more specifications.

SNP vcf filtering selection random • 1.7k views
ADD COMMENT
1
Entering edit mode
3.4 years ago
JC 12k

Perl is your friend:

#!/usr/bin/perl
use strict;
use warnings;
my %snps_per_seq = ();
while (<>) {
    if (/^#/) { # print headers as the original
        print; 
    }
    else {
        my ($seq_id) = split (/\t/, $_);
        push @{ $snps_per_seq{ $seq_id } }, $_; # store a hash per sequence, with possitions as array
    }
}
foreach my $seq_id (sort keys %snps_per_seq) {
    my @snps = @{ $snps_per_seq{ $seq_id } };
    print $snps[ int rand @snps ]; # grab one random position from the array
}

usage:

perl randSnps.pl < input.vcf > output.vcf
ADD COMMENT

Login before adding your answer.

Traffic: 1430 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6