Hi everyone,
I'd appreciate some help from someone familiar with R or Python. I'm trying to write a simple piece of code that does the following, with very minimal programming skills:
I have a VCF with missing data. I want to have a loop go down the SNP vector of each individual pair, and if both individuals have no missing data at a given locus, then spit out if the alleles are identical or not. The output would include two numbers for each pair of individuals: 1)How many loci at which both individuals have non-missing data? 2)Of those, at how many loci do the individuals have identical alleles?
Any help appreciated!
Take a look at cyvcf2 for parsing vcf files, which can also do what you want.