Genotype Info From 1000 Genomes - Confused By Forward Strand Naming
1
0
Entering edit mode
11.9 years ago
winners • 0

Hi, I'm trying to calculate the genotype frequencies of a couple of CYP2C19 alleles (rs4244285 + rs12248560) using the 1000 genomes data.

However, I'm confused by the "forward strand" naming convention.

I want to determine the % of people with *2 / *17 but every list I generate has the same person repeated twice it seems.

For eg. I get this on a list for re12248560:

  • HG00099 (F) A|A
  • HG00099 (F) T|C

The same individual it seems but which is it? A|T or A|C or A|A? How can I tell what the "typical" genotype would be?

I want to simply note the diplotype.

Many thanks for your help!

1000genomes strand genotyping • 2.3k views
ADD COMMENT
0
Entering edit mode
11.9 years ago

Can you describe which steps you do to get the genotypes?

It seems that close to this SNPs, there is a deletion:

# note: you need the latest tabix version from svn for this to work, 
# otherwise you will get an error complaining that the index is too old.
$: tabix <ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20110521/ALL.chr10.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz> 10:96521657-96521657| cut -f 1,2,3,4,5,12
10  96497183    MERGED_DEL_2_60889  A   <DEL>   0|0:0.000:0,0,0
10  96521657    rs12248560  C   T   1|0:1.000:-3.85,-0.00,-5.00

Maybe your script is confused by the deletion. Try to filter out anything that is not a SNP.

ADD COMMENT
1
Entering edit mode

A description of how you are generating your list (even a code example) would help a lot in trying to figure out what might be going on

ADD REPLY

Login before adding your answer.

Traffic: 3111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6