Multiple rsIDs at chromosomal location?
1
1
Entering edit mode
4.9 years ago
rrbutleriii ▴ 260

In the VCF format, there is the option for the ID field to have multiple semi-colon separated values. In theory, there could be two dbSNP rsIDs in a single line (i.e. two indels at chr:pos), but for programming purposes, that should not happen, correct? dbSNP has merged all variants for a given position to a common rsID?

SNP annotation vcf • 1.5k views
ADD COMMENT
3
Entering edit mode
4.9 years ago

dbSNP has merged all variants for a given position to a common rsID?

I'm afraid no:

$ wget -q -O - "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.vcf.gz" | gunzip  -c | grep -v "#" | cut -f 1,2 | uniq -d  | head
1   10051
1   10055
1   10108
1   10109
1   10128
1   10132
1   10177
1   10228
1   10229
1   10235

.

$ wget -q -O - "ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF/All_20180418.vcf.gz" | gunzip  -c | grep -v "#" | cut -f 1,2,3,4,5 | grep -w 10051 -m2
1   10051   rs1052373574    A   G
1   10051   rs1326880612    A   AC
ADD COMMENT
0
Entering edit mode

Follow up: So when parsing a vcf, would I then have to anticipate some variant callers giving me: 1 10051 rs1052373574;rs1326880612 A G,AC

I haven't ever seen that before, but I don't see anything to prohibit it.

ADD REPLY
1
Entering edit mode

Correct - nothing to prohibit it; however, it can cause issues for downstream analysis tools. Most will not support multi-allelic calls like this.

ADD REPLY

Login before adding your answer.

Traffic: 2838 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6