Extract unphased genotypes from vcf file and convert to 012 matrix
2
1
Entering edit mode
5.6 years ago
Nev ▴ 20

I have a vcf file in which I want to extract the genotypes and convert them into a 0,1, 2 matrix. Is there a way to extract only the unphased genotypes using Plink or VCF tools or maybe with grep? 

 

Thank you in advance.

 

vcf SNP • 3.6k views
ADD COMMENT
0
Entering edit mode

not clear. what is the result of "I've already extracted all of the genotypes,"

ADD REPLY
0
Entering edit mode

Sorry, I rephrased. Thanks for pointing that out.

ADD REPLY
3
Entering edit mode
5.6 years ago

Using my tool bioalcidae https://github.com/lindenb/jvarkit/wiki/BioAlcidae

while(iter.hasNext())
    {
    var ctx = iter.next();
    out.print(ctx.contig);
    out.print("\t");
    out.print(ctx.start);
    for(var i = 0;i< ctx.getNSamples();++i)
        {
        var g = ctx.getGenotype(i);
        out.print("\t");
        if(g.isCalled() && !g.isPhased())
            {
            if(g.isHomVar())
                {
                out.print("2");
                }
            else if(g.isHomRef())
                {
                out.print("0");
                }
            else if(g.isHet() && !g.isHetNonRef())
                {
                out.print("1");
                }
            else
                {
                out.print("9");
                }
            }
        else
            {
            out.print("9");
            }
        }
    out.println();
    }

usage:

java -jar dist-1.133/bioalcidae.jar  -F VCF -f filter.js your.vcf

 

ADD COMMENT
0
Entering edit mode

Ok, great. Thank you Pierre. Is there citation info for your tool? I will give this a try now!

ADD REPLY
0
Entering edit mode

Is there any simple way to modify the script to print sample names at the top of the output?

ADD REPLY
0
Entering edit mode

there is no sample name at the top of the output of this script.

ADD REPLY
0
Entering edit mode

It's cool, I figured out how to print the names with the example VCF script on the bioalcidae homepage. :) On a different note, though, I think there's a typo in the script above - I believe the "g.isHomRef()" clause should print a 0, not a 1, and vice-versa with the "g.isHet() && !g.isHetNonRef()" statement.

ADD REPLY
0
Entering edit mode

I believe the "g.isHomRef()" clause should print a 0, not a 1, and vice-versa with the "g.isHet() && !g.isHetNonRef()"

ah yes, you're right

ADD REPLY
0
Entering edit mode

:) Is there any way to make the script above allele-specific? I'd love to see the exact same output, just with an individual comparison to each potential homozygous A,T,C, & G allele - instead of just the reference allele.

ADD REPLY
0
Entering edit mode
5 months ago

how can I get the filter.js file?

ADD COMMENT

Login before adding your answer.

Traffic: 2729 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6