Question: Convert VCF 012 format to raw genotype data
0
gravatar for mike1234
2.5 years ago by
mike123410
mike123410 wrote:

I would like to convert unphased VCF genotypes into raw genotype data. For example if my REF is C and ALT is T, I would convert 0/0 to CC, 0/1 to CT, 1/0 to TC and 1/1 to TT. Bonus question: Would it also be possible to convert both 0/1 and 1/0 to the same CT instead of converting 1/0 to TC. Does vcftools do this? Possibly another tool? Thanks for your help.

genotypes vcftools snps • 1.4k views
ADD COMMENTlink modified 2.5 years ago • written 2.5 years ago by mike123410

Thanks a lot for your help! that works great :D

ADD REPLYlink written 2.5 years ago by mike123410
2
gravatar for Pierre Lindenbaum
2.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:

using bioalcidae:

while(iter.hasNext())
    {
    var ctx= iter.next();
    out.print(ctx.getContig());
    out.print("\t");
    out.print(ctx.getStart());
    for(var i=0;i< ctx.getNSamples();++i)
        {
        out.print("\t");
        var g=ctx.getGenotype(i);
        if(g.isNoCall()) 
            {
            out.print("-");
            }
        else
            {
            var alleles=[];
            for(var k=0;k< g.getAlleles().size();++k)
                {
                alleles.push(g.getAlleles().get(k).getDisplayString());
                }
            alleles.sort();
            out.print(alleles.join("/"));
            }
        }
    out.println();
    }

example:

$ java -jar dist/bioalcidae.jar -f script.js input.vcf

rotavirus   51  A/A A/A A/A G/G
rotavirus   91  A/A A/T A/A A/A
rotavirus   130 C/T T/T T/T T/T
rotavirus   232 A/T T/T T/T T/T
rotavirus   267 C/G C/C C/C C/C
rotavirus   424 A/A A/G A/A A/A
rotavirus   520 T/T T/T A/T T/T
rotavirus   536 T/T A/A A/A A/A
rotavirus   562 A/A A/A A/A A/G
rotavirus   583 G/G G/G G/G C/G
rotavirus   661 T/T A/T T/T T/T
rotavirus   693 G/G T/T T/T T/T
rotavirus   738 T/T T/T A/T T/T
rotavirus   799 A/A A/A C/C A/A
rotavirus   812 G/G G/G T/T G/G
rotavirus   833 A/A G/G G/G G/G
rotavirus   916 T/T A/A A/A A/A
rotavirus   946 C/C A/C C/C C/C
rotavirus   961 A/T T/T T/T T/T
rotavirus   1044    A/A T/T A/A A/A
rotavirus   1045    C/C C/C G/G C/C
rotavirus   1054    C/C G/G C/C C/C
rotavirus   1064    G/G G/G G/G A/A
ADD COMMENTlink written 2.5 years ago by Pierre Lindenbaum123k
1

Thanks a lot for your help! that works great :D

ADD REPLYlink written 2.5 years ago by mike123410

Hi Pierre Lindenbaum, Thanks a lot for the script. When I am running the script on file like this one down, it gives error. Any suggesstion how the script could be modified to work also with this format? Thanks

#CHROM  POS REF ALT QUAL    FILTER  INFO    FORMAT  atcc2355    cd01    cd02    cd03    cd04    cd05
atcc2355    530 T   C   40  PASS    NA  GT  0   0   0   0   0   0   0   0   
atcc2355    531 A   T   40  PASS    NA  GT  0   0   0   0   0   0   0   0
atcc2355    533 A   C,T 40  PASS    NA  GT  0   0   1   2   0   0   0   0
atcc2355    569 G   A   40  PASS    NA  GT  0   0   0   0   0   0   0   0
atcc2355    573 T   C,A 40  PASS    NA  GT  0   1   1   1   1   1   2   1
atcc2355    599 A   G   40  PASS    NA  GT  0   0   0   0   0   0   0   0
ADD REPLYlink written 14 months ago by mmyoussef100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2384 users visited in the last hour