Question: Extract Private Snps From Multi-Sample Vcf File
0
gravatar for William
6.5 years ago by
William4.4k
Europe
William4.4k wrote:

Is there a tool / script already available to filter private SNPs from a multi-sample vcf file? I looked but couldn't find a option in VCF tools.

What I mean with a private SNP is a SNP were an alternative genotype is unique to one sample.

I am also not interested in shared alternative genotypes. So I am not interested in SNPs where multiple samples share a 0/1 or 1/1 or 0/2 or 1/2 etc genotype. Only complete private / unique genotypes.

vcf snp • 7.2k views
ADD COMMENTlink modified 6.5 years ago by matted7.0k • written 6.5 years ago by William4.4k
2
gravatar for matted
6.5 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

I believe VCFtools can do that. It's an example in vcf-annotate for designing a custom filter for use with the --filter option (click the "Read even more" link on the documentation). The filter:

# Annotate INFO field with SINGLETON flag when one and only one sample is different from the reference
{
    header   => [
        qq[key=INFO,ID=SINGLETON,Number=0,Type=Flag,Description="Only one non-ref sample"],
    ],
    tag      => 'FORMAT/GT',
    name     => 'Dummy',
    desc     => 'Dummy',
    test     => sub {
        my $nalt = 0;
        for my $gt (@$MATCH)
        {
            my @gt = $VCF->split_gt($gt);
            for my $allele (@gt)
            {
                if ( $allele ne 0 && $allele ne '.' ) { $nalt++; last; }
            }
            if ( $nalt>1 ) { last; }
        }
        if ( $nalt==1 ) { $$RECORD[7] = $VCF->add_info_field($$RECORD[7],'SINGLETON'=>''); }
        return $PASS;
    },
},
ADD COMMENTlink written 6.5 years ago by matted7.0k
1
gravatar for Pierre Lindenbaum
6.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum120k wrote:

I would use awk to find one sample having a genotype!="0/0"

 gunzip -c my.vcf.gz |
awk -F ' ' '/^#/{print; next;} {n=0; for(i=10;i<=NF;++i) { if($i!="." &&  index($i,"0/0")==0) n++;} if(n==1) print; } '

change the test according to your needs.

ADD COMMENTlink written 6.5 years ago by Pierre Lindenbaum120k

The thing is that I am also not interested in shared alternative genotypes. So I am not interested in SNPs where multiple samples share a 0/1 or 1/1 or 0/2 or 1/2 etc genotype. Only complete private / unique genotypes.

ADD REPLYlink written 6.5 years ago by William4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 828 users visited in the last hour