Question

Genotype Info From 1000 Genomes - Confused By Forward Strand Naming

0

Entering edit mode

11.9 years ago

winners • 0

Hi, I'm trying to calculate the genotype frequencies of a couple of CYP2C19 alleles (rs4244285 + rs12248560) using the 1000 genomes data.

However, I'm confused by the "forward strand" naming convention.

I want to determine the % of people with *2 / *17 but every list I generate has the same person repeated twice it seems.

For eg. I get this on a list for re12248560:

HG00099 (F) A|A
HG00099 (F) T|C

The same individual it seems but which is it? A|T or A|C or A|A? How can I tell what the "typical" genotype would be?

I want to simply note the diplotype.

Many thanks for your help!

1000genomes strand genotyping • 2.3k views

ADD COMMENT • link updated 9.5 years ago by Biostar 20 • written 11.9 years ago by winners • 0

score 0 · Answer 1 · 2012-05-29

Can you describe which steps you do to get the genotypes?

It seems that close to this SNPs, there is a deletion:

# note: you need the latest tabix version from svn for this to work, 
# otherwise you will get an error complaining that the index is too old.
$: tabix <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr10.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz> 10:96521657-96521657| cut -f 1,2,3,4,5,12
10  96497183    MERGED_DEL_2_60889  A   <DEL>   0|0:0.000:0,0,0
10  96521657    rs12248560  C   T   1|0:1.000:-3.85,-0.00,-5.00

Maybe your script is confused by the deletion. Try to filter out anything that is not a SNP.