Convert VCF 012 format to raw genotype data
1
0
Entering edit mode
7.1 years ago
mike1234 ▴ 10

I would like to convert unphased VCF genotypes into raw genotype data. For example if my REF is C and ALT is T, I would convert 0/0 to CC, 0/1 to CT, 1/0 to TC and 1/1 to TT. Bonus question: Would it also be possible to convert both 0/1 and 1/0 to the same CT instead of converting 1/0 to TC. Does vcftools do this? Possibly another tool? Thanks for your help.

vcftools snps genotypes • 3.4k views
ADD COMMENT
2
Entering edit mode
7.1 years ago

using bioalcidae:

while(iter.hasNext())
    {
    var ctx= iter.next();
    out.print(ctx.getContig());
    out.print("\t");
    out.print(ctx.getStart());
    for(var i=0;i< ctx.getNSamples();++i)
        {
        out.print("\t");
        var g=ctx.getGenotype(i);
        if(g.isNoCall()) 
            {
            out.print("-");
            }
        else
            {
            var alleles=[];
            for(var k=0;k< g.getAlleles().size();++k)
                {
                alleles.push(g.getAlleles().get(k).getDisplayString());
                }
            alleles.sort();
            out.print(alleles.join("/"));
            }
        }
    out.println();
    }

example:

$ java -jar dist/bioalcidae.jar -f script.js input.vcf

rotavirus   51  A/A A/A A/A G/G
rotavirus   91  A/A A/T A/A A/A
rotavirus   130 C/T T/T T/T T/T
rotavirus   232 A/T T/T T/T T/T
rotavirus   267 C/G C/C C/C C/C
rotavirus   424 A/A A/G A/A A/A
rotavirus   520 T/T T/T A/T T/T
rotavirus   536 T/T A/A A/A A/A
rotavirus   562 A/A A/A A/A A/G
rotavirus   583 G/G G/G G/G C/G
rotavirus   661 T/T A/T T/T T/T
rotavirus   693 G/G T/T T/T T/T
rotavirus   738 T/T T/T A/T T/T
rotavirus   799 A/A A/A C/C A/A
rotavirus   812 G/G G/G T/T G/G
rotavirus   833 A/A G/G G/G G/G
rotavirus   916 T/T A/A A/A A/A
rotavirus   946 C/C A/C C/C C/C
rotavirus   961 A/T T/T T/T T/T
rotavirus   1044    A/A T/T A/A A/A
rotavirus   1045    C/C C/C G/G C/C
rotavirus   1054    C/C G/G C/C C/C
rotavirus   1064    G/G G/G G/G A/A
ADD COMMENT
0
Entering edit mode

Hi Pierre Lindenbaum, Thanks a lot for the script. When I am running the script on file like this one down, it gives error. Any suggesstion how the script could be modified to work also with this format? Thanks

#CHROM  POS REF ALT QUAL    FILTER  INFO    FORMAT  atcc2355    cd01    cd02    cd03    cd04    cd05
atcc2355    530 T   C   40  PASS    NA  GT  0   0   0   0   0   0   0   0   
atcc2355    531 A   T   40  PASS    NA  GT  0   0   0   0   0   0   0   0
atcc2355    533 A   C,T 40  PASS    NA  GT  0   0   1   2   0   0   0   0
atcc2355    569 G   A   40  PASS    NA  GT  0   0   0   0   0   0   0   0
atcc2355    573 T   C,A 40  PASS    NA  GT  0   1   1   1   1   1   2   1
atcc2355    599 A   G   40  PASS    NA  GT  0   0   0   0   0   0   0   0
ADD REPLY

Login before adding your answer.

Traffic: 1703 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6