vcftools and multiple alternative allelles
1
0
Entering edit mode
8.2 years ago
oddmundn • 0

I am trying to extract only variants which as an ALT allele frequency above a certain value, using vcftools and the parameter --non-ref-af-any. Several of my variants have several alternative alleles, but when using the --non-ref-af-any it doesn't filter at all. Every variant is included, despite having AF=0 for the only ALT allele. When using the --non-ref-af option, only single ALT alleles with AF > x is extracted.

Is this a bug in vcftools? I am using VCF files from Ion Torrent Suite.

Thanks for any help!

vcftools • 3.6k views
ADD COMMENT
0
Entering edit mode
8.2 years ago
abascalfederico ★ 1.2k

I think it is not a bug. The documentation says that the site is not filtered if "any" of the variants meet the specified criteria. I had a similar problem and after struggling with different programs finally wrote my own perl script to remove AF=0 variants. It works with multi-allelic variants - only the affected allele will be removed. But it only works for the specific case of alleles with AF=0 (after Beagle refinement in my case). Here it is in case you want to try it.

#!/usr/bin/perl -w

use strict;
$|=1;

while(<>) {
    if(/^#/) {
        print;
        next;
    }
    chomp;
    my @tmp = split(/\t/,$_);
    my %allele_counts;
    my($chr,$pos,$id,$ref,$alt,$qual,$filter,$info,$format) = (@tmp)[0..8];
    my $num_alt;
    my @alts;
    $alts[0] = $ref;
    if($alt =~ /,/) {
        my @tmp2 = split(/,/,$alt);
        $num_alt = scalar(@tmp2)+1;
        push(@alts,@tmp2);
    } else {
        $num_alt = 2;
        $alts[1] = $alt;
    }
    for(my $i=9; $i<=$#tmp; $i++) {
        my $genotype = (split(/:/,$tmp[$i]))[0];
        my($a,$b) = (split(/\|/,$genotype))[0,1];
        $allele_counts{$a}++;
        $allele_counts{$b}++;
    }
    if(scalar(keys(%allele_counts)) == 1) {
        print STDERR "REMOVED, AF=0: $_\n";
        next;
    }
    if(!exists($allele_counts{'0'})) {
        print STDERR "REMOVED, REF ALLELE HAS AF=0: $_\n";
        next;
    }
    if(scalar(keys(%allele_counts)) != $num_alt) {
        print STDERR "REMOVED: There are multiple alleles but at least one has to be removed: $_\n";
        for(my $i=1; $i<=$#alts; $i++) {
            if(!exists($allele_counts{$i})) {
                if($i==1) {
                    $tmp[4] =~ s/$alts[$i],//;
                    for(my $j=9; $j<=$#tmp; $j++) {
                        my $recode = $i+1; #al siguiente alelo tenemos que bajarle la cuenta
                        $tmp[$j] =~ s/\|$recode/\|1/;
                        $tmp[$j] =~ s/$recode\|/1\|/;
                    }
                } else {
                    $tmp[4] =~ s/,$alts[$i]//;
                    print STDERR "\tRemoving allele $i=$alts[$i]\n";
                }
            }
        }
    }
    print "@tmp\n";
}    


__END__
ADD COMMENT
0
Entering edit mode

Thank you very much!

Unfortunately, your filter removed all entries in the VCF file, even though some of them have AF > 0. Used this way: scriptname filename.vcf > outfile.vcf

The VCF file is generated by Ion Torrent Suite software (Life Technologies/Thermo).

Oddmund

ADD REPLY
0
Entering edit mode

Could you send me the first 1,000 or 10,000 lines of your vcf to test the script? You can email me to fa8 - at - sanger.ac.uk

ADD REPLY
0
Entering edit mode

Earlier today I wrote my own python script to filter NOCALLS and AF=0 from my files, which works fine on my VCF files. That means; I don't need help anymore. Please tell me if you would like the "head" of my Ion Torrent VCF file anyway?

ADD REPLY
0
Entering edit mode

Glad to hear that. I don't need it, but thanks

ADD REPLY

Login before adding your answer.

Traffic: 1513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6