Parsing A Vcf File On An Ftp Server Using Single-Line Perl
2
1
Entering edit mode
11.1 years ago
soosus ▴ 90

Sanger recently released mouse SNPs (in VCF format) from next-generation sequencing for 18 strains. I need to get and parse (without downloading the huge file) all SNPs where the alternative alleles are homozygous for strain 129S1 for gene Impact with single line perl.

The site: ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz

So far, I know I need to do this, but after that I'm fairly lost:

curl ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz | gunzip - | perl ...

and yes, this needs to be in single-line perl, not using VCFtools

perl vcf • 3.1k views
ADD COMMENT
0
Entering edit mode

How is that not downloading the file?

ADD REPLY
1
Entering edit mode
11.1 years ago

using awk:

curl -s "ftp://ftp-mouse.sanger.ac.uk/current_snps/mgp.v3.snps.rsIDdbSNPv137.vcf.gz" | gunzip -c |\
awk -F '       ' '($0 ~ /^#/ || $11 ~ /1\/1\:/)'

now, if I use a2p to convert it to perl:

#!/usr/bin/perl
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
    if $running_under_some_shell;
            # this emulates #! processing on NIH machines.
            # (remove #! line above if indigestible)

eval '$'.$1.'$2;' while $ARGV[0] =~ /^([A-Za-z_0-9]+=)(.*)/ && shift;
            # process any FOO=bar switches

while (<>) {
    chomp;    # strip record separator
    @Fld = split(' ', $_, -1);
    print $_ if ($_ =~ /^#/ || $Fld[(11)-1] =~ /1\/1\:/);
}
ADD COMMENT
1
Entering edit mode
11.0 years ago
soosus ▴ 90
lftp -c 'open -e "zcat mgp.v3.snps.rsIDdbSNPv137.vcf.gz” ftp-mouse.sanger.ac.uk/current_snps/' | perl -ne '{chomp; if (/^#CHROM/) {print "$_\n"; }else {@a = split (/\t/,
$_); print "$_\n" if ($a[0] ==18 and $a[1] >= 12972252 and $a[1] <= 12992948 and $a[10] =~ /^1\/1|^1\|1/); } }' > mm10.129S1.Impact.altHom &
ADD COMMENT

Login before adding your answer.

Traffic: 1368 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6