Set random genotypes to missing in VCF file by individuals and sites
1
0
Entering edit mode
21 months ago
selplat21 ▴ 20

I'm trying to assess imputation accuracy in a population sample.

I have a multi-sample vcf file and was thinking of setting some genotypes across different individuals to missing so I can impute them and compare to the original file. I want genotypes to be set to missing randomly with respect to SNP_ID and individual.

I have seen many posts about this, but they only mention how to subset sites, which does not help me in this case.

imputation • 755 views
ADD COMMENT
2
Entering edit mode
21 months ago
raphael.B ▴ 520

Hello, you can use the pyVCF python module to do this. Something like this should do the trick (the function deleteGT works but I didn't tested the rest of the code) :

import vcf
from random import randint

def deleteGT(record,VCFReader, sample_name):

    """ deleteGT(record,VCFReader, sample_name) ----> ModifiedRecord
    VCFReader: A vcf reader object from vcf module
    record: a record contained by VCFReeader
    sample_name: name of a sample contained in VCFReader
    ModifiedRecord: record without the genotype associed to sample_name """

    samp_fmt=record.FORMAT
    fields=samp_fmt.split(':')
    new_gt = './.'
    new_CallData = namedtuple('CallData', fields)
    calldata = [new_gt] + [None]*(len(fields)-1)
    record.samples[(VCFReader.samples).index(sample_name)].data = new_CallData(*calldata)
    return(record)

VCF=vcf.Reader("path_to_your_vcf")
OUT=vcf.Writer("out_path", VCF)
Samples=VCF.samples
N_samps=len(Samples)
for record in VCF:
       if (randint(0,100)>95):
            samp_to_del=Samples[randint(0,N_samps)]
            record=deleteGT(record,VCF,samp_to_del)
      OUT.write_record(record)
ADD COMMENT
0
Entering edit mode

This is excellent! Thank you so much!

ADD REPLY

Login before adding your answer.

Traffic: 2269 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6