Extract Private Snps From Multi-Sample Vcf File
2
0
Entering edit mode
8.4 years ago
William ★ 4.8k

Is there a tool / script already available to filter private SNPs from a multi-sample vcf file? I looked but couldn't find a option in VCF tools.

What I mean with a private SNP is a SNP were an alternative genotype is unique to one sample.

I am also not interested in shared alternative genotypes. So I am not interested in SNPs where multiple samples share a 0/1 or 1/1 or 0/2 or 1/2 etc genotype. Only complete private / unique genotypes.

vcf snp • 8.4k views
ADD COMMENT
2
Entering edit mode
8.4 years ago
matted 7.5k

I believe VCFtools can do that. It's an example in vcf-annotate for designing a custom filter for use with the --filter option (click the "Read even more" link on the documentation). The filter:

# Annotate INFO field with SINGLETON flag when one and only one sample is different from the reference
{
    header   => [
        qq[key=INFO,ID=SINGLETON,Number=0,Type=Flag,Description="Only one non-ref sample"],
    ],
    tag      => 'FORMAT/GT',
    name     => 'Dummy',
    desc     => 'Dummy',
    test     => sub {
        my $nalt = 0;
        for my $gt (@$MATCH)
        {
            my @gt = $VCF->split_gt($gt);
            for my $allele (@gt)
            {
                if ( $allele ne 0 && $allele ne '.' ) { $nalt++; last; }
            }
            if ( $nalt>1 ) { last; }
        }
        if ( $nalt==1 ) { $$RECORD[7] = $VCF->add_info_field($$RECORD[7],'SINGLETON'=>''); }
        return $PASS;
    },
},
ADD COMMENT
1
Entering edit mode
8.4 years ago

I would use awk to find one sample having a genotype!="0/0"

 gunzip -c my.vcf.gz |
awk -F ' ' '/^#/{print; next;} {n=0; for(i=10;i<=NF;++i) { if($i!="." &&  index($i,"0/0")==0) n++;} if(n==1) print; } '

change the test according to your needs.

ADD COMMENT
0
Entering edit mode

The thing is that I am also not interested in shared alternative genotypes. So I am not interested in SNPs where multiple samples share a 0/1 or 1/1 or 0/2 or 1/2 etc genotype. Only complete private / unique genotypes.

ADD REPLY

Login before adding your answer.

Traffic: 2376 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6