Question: Discrepancy Between Ncbi / Vcf And Ncbi / Html
1
gravatar for Pierre Lindenbaum
4.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum98k wrote:

in the VCF for dbsnp137, rs11412589 is said to be located at chr9:74300311

$ curl -s "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz" |gunzip -c |  grep BUILD
##dbSNP_BUILD_ID=137

~$ curl -s "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/00-All.vcf.gz" |gunzip -c | cut -d '      ' -f 1-5 |  awk -F '    ' '($3=="rs11412589")'
9    74300311    rs11412589    T    TA,TAA

but on its page on dbsnp the position is shifted by ~20 bases (I know it's an indel but...) : chr9:74300329:74300330 .

as well as on the UCSC/mysql:

+-------+------------+----------+
| chrom | chromStart | chromEnd |
+-------+------------+----------+
| chr9  |   74300329 | 74300329 |
+-------+------------+----------+
1 row in set (0.19 sec)

is it me or is it a bug ?

Pierre

ncbi vcf dbsnp error • 1.4k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Pierre Lindenbaum98k
1

It seems to me a bug in VCF. The base at chr9:74300311 is "A", not "T". Actually there is no "T" around this region.

ADD REPLYlink written 4.5 years ago by lh330k
1

I just sent a mail to the NCBI...

ADD REPLYlink written 4.5 years ago by Pierre Lindenbaum98k
2
gravatar for Pierre Lindenbaum
4.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum98k wrote:

here is the answer from the NCBI;

Hi,

The reason for this difference is that the VCF format checks for repeats. For variations happening within a repeat region, it will left justify the position to the base that is NOT a repeat unit.

For this variation, the flanking clearly indicates it is within a poly A region:

rs11412589 [Homo sapiens] 
TACTCCCTAAAAAAAAAAAAAAAAAA[-/A/AA]GAAAAAGAAAAAAAATCAATTTTTA

The BLAST alignment shows this more directly:

SNP  1         TACTCCCTAAAAAAAAAAAAAAAAAAWGAAAAAGAAAAAAAATCAATTTTTA  52
               |||||||||||||||||||||||||| |||||||||||||||||||||||||
ch9  74300304  TACTCCCTAAAAAAAAAAAAAAAAAA-GAAAAAGAAAAAAAATCAATTTTTA  74300354

in which the T is at 74300311.

Regards,

ADD COMMENTlink written 4.5 years ago by Pierre Lindenbaum98k

I misread 74300311 in your original post as 74300331. No wonder I have not found any T... Thanks.

ADD REPLYlink written 4.5 years ago by lh330k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1394 users visited in the last hour