Question: dbSNP SNPHistory file: to what correspond each column?
0
gravatar for Nolwenn Lavielle
3.5 years ago by
Paris (France)
Nolwenn Lavielle90 wrote:

Hello,

I am looking for information about the SNPHistory.bcp.gz file provided by dbSNP (NCBI).

I found a short (too short?) documentation about the table SNPHistory : http://www.ncbi.nlm.nih.gov/projects/SNP/snp_db_table_description.cgi?t=SNPHistory
But the problem is that there is a lack of information.
In the file I downloaded, if I print the 10 first lines, I can see 5 columns:

$ zhead SNPHistory.bcp.gz
311    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
332    2000-09-19 17:02:00.0    2011-01-11 17:12:00.0    2011-05-20 17:31:00.0        
471    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
668    2000-09-19 17:02:00.0    2014-08-21 18:14:00.0    2014-08-26 00:20:00.0    SNP-6860    
730    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
743    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
744    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
745    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
799    2000-09-19 17:02:00.0    2012-11-29 09:31:00.0    2013-11-18 14:52:00.0    SNP-6191    
840        2000-08-22 15:29:00.0    2000-09-19 14:28:00.0       

But if I count the number of field, I count 6 columns:

$ zcat SNPHistory.bcp.gz | awk -F"\t" '{print NF}' | sort | uniq -c
17390806 6

And if I reprint the 10 first lines with all characters:

$ zhead SNPHistory.bcp.gz | cat -A
311^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
332^I2000-09-19 17:02:00.0^I2011-01-11 17:12:00.0^I2011-05-20 17:31:00.0^I^I$
471^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
668^I2000-09-19 17:02:00.0^I2014-08-21 18:14:00.0^I2014-08-26 00:20:00.0^ISNP-6860^I$
730^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
743^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
744^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
745^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
799^I2000-09-19 17:02:00.0^I2012-11-29 09:31:00.0^I2013-11-18 14:52:00.0^ISNP-6191^I$
840^I^I2000-08-22 15:29:00.0^I2000-09-19 14:28:00.0^I^I$

So I wonder, what are those 2 last columns? Anyone could help me?

I need to use this file to know which SNPs are suppressed but I don't know how to interpret those columns...

Thanks in advance.

snp • 996 views
ADD COMMENTlink modified 3.5 years ago by Max Ivon110 • written 3.5 years ago by Nolwenn Lavielle90
0
gravatar for Max Ivon
3.5 years ago by
Max Ivon110
Russian Federation
Max Ivon110 wrote:

According to database schema (avialabe at dbSNP ftp site):

CREATE TABLE [SNPHistory]
(
[snp_id] [int] NOT NULL ,
[create_time] [smalldatetime] NULL ,
[last_updated_time] [smalldatetime] NOT NULL ,
[history_create_time] [smalldatetime] NULL ,
[comment] [varchar](255) NULL ,
[reactivated_time] [smalldatetime] NULL
)

I think you can find more information about this file from this faq http://www.ncbi.nlm.nih.gov/books/NBK44468/. As there said, SNPHistory contain only deleted mutations, so if you want to get suppressed mutations, just take lines where reactivated time is not defined.

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Max Ivon110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2128 users visited in the last hour