Hi,
I have been recently dealing with the Personal Genomes Project, and trying to work with the data. I downloaded the raw data for an individual's whole genome.
The main concern is the format of the data. Complete Genomics frees the genomes of the individuals in its own format; a format called masterVar which looks like this:
#ASSEMBLY_ID    GS000014558-ASM
#COSMIC    COSMIC v48
#DBSNP_BUILD    dbSNP build 132
#GENOME_REFERENCE    NCBI build 37
#SAMPLE    GS01669-DNA_D02
#GENERATED_BY    cgatools
#GENERATED_AT    2012-Sep-28 19:43:38.251270
#SOFTWARE_VERSION    2.0.4.14
#FORMAT_VERSION    2.0
#GENERATED_BY    dbsnptool
#TYPE    VAR-ANNOTATION
>locus    ploidy    allele    chromosome    begin    end    varType    reference    alleleSeq    varScoreVAF    varScoreEAF    varQuality    hapLink    xRef
17    2    all    chr1    11365    11370    ref    =    =                    
302    2    1    chr1    21579    21580    snp    C    T    123    123    VQHIGH        dbsnp.83:rs526642
302    2    2    chr1    21579    21580    snp    C    T    153    153    VQHIGH        dbsnp.83:rs526642
They provide some tools to work on it and I tried to convert to vcf with this tool, but what I get is some kind of funny vcf, with duplicated entries and inconsistent information.
Has anyone dealt with it before?
Thanks in advance!
P.
Hi, I'm dealing with the same issue. Did you figure out any way to convert Complete Genomics to vcf or to plink ped format without bugs?
Thank you.
Hi, Unfortunately I was not able to make it work and quited by now. As said, I tried different conversion tools, but all returned a very weird file with clear errors compared to original. I am really surprised that no further information on this issue explaining the thing a bit more could be found... Anyway, if you get to know anything else, let me know about it, Best,