VCF to 23 and Me format and changing ensamble reference help needed for underestanding VCF
3 months ago

Hello i am trying to change my nebula Genomics report to 23 and me Format i have to problems nebula uses 38 human ensemble and 23 and me 37, I was thinking to do a python script but i have some doubts:

My plan was to change the genotype according to 23 and ME format (the two copies of each allele) using the genotipe provided in VCF 0/1 .. etc.. 0 means Reference i means i-th alternate right?

But i don't understand for example this

chr4 169311085 rs199775492 ATTT A ,AT 1/2

The reference is ATTT ? and the sample genotype would be: A AT right?

but 23and me format does not admit that: As far as i understand the genotype to 23 and me would be DD ? right? since both alleles have deletions? I am understanding correctly?

after that is done for changing ensamble i need to change all de ID and physical coordintates, right? where can i get such a map .. I am thinking to use mongoDB or sqlite3 i dont know what database would be better.

If there is some software that does all that for me i would be very happy but i haven't found any i found this script

but i don't know why i does not work... its pretty old i guess maybe it asummes i have the same ensamble 37 human ensemble at the beginning, but i have been aligned using the 38 (i think since my VCF file says ##reference=file:///mnt/ssd/MegaBOLT_scheduler/reference/hg38.fa)

I am understanding things correctly ??

