Question: Importing SNP and phenotype data from dbGaP into R
0
gravatar for rdlady
6 months ago by
rdlady0
rdlady0 wrote:

I am working on a project that consists of finding associations between SNPs and certain phenotypes using data sets from the dbGaP database. I have found some interesting data sets, I downloaded them from dbGaP, decrypted, and extracted them.

This resulted in some folders with idat format files, gtc.txt files, and some phenotype files in xml format.

I would like to use this data as input for analyzing it in R with packages like SNPassoc, snpMatrix, or GenABEL.

The problem is that it seems that the supported input format of these R packages is a tab delimited table in plain text format, which consist of the sample ID, phenotype data, SNP content, etc. This format is very different from the idat, gtx.txt and xml formats that I found in the dbGaP data.

Is there an R package or any script/program that can take all the dbGaP data (idat, gtx.txt, and phenotype info in xml) and generate summary tables like that one required by the R packages?

Here are some examples of the files found in the dbGaP data that I have extracted:

gtc.txt:

SNP Name GC Score Allele1 - Top Allele2 - Top
Allele1 - AB Allele2 - AB X Y Raw X Raw Y 200003 0.9226053 A A A A 0.934740661177471 0.0394069163635861 7614 1009 200006 0.80280876 G G B B 0.03840060068975691 1.5842950219375036 788
19290 200047 0.7352572 A A A A
0.42971193949905434 0.03922872128858323 3636 953 200050 0.789192 G G B B 0.020351741593668694 1.0929231320570174 545 9315 200052 0.9563731 T T B B 0.01696443095800867 0.9911898858364148 945
12561

phenotype xml:

?xml-stylesheet type="text/xsl" href="varreports_v3.xsl"?>data_table name="MEC_XXXXXX_Subject" dataset_id="XXXXXX" study_name="A Multiethnic GWAS of XXXXXX" study_id="phs000306.v4" participant_set="1" date_created="04/10/2014"><variable id="XXXXXXX.v2.p1" var_name="SUBJID" calculated_type="string" reported_type="integer"><description>XXXXX ID</description><total><subject_profile><sex><male>9454</male><female>13</female></sex></subject_profile><stats><stat n="9482" nulls="0"/></stats></total></variable><variable id="XXXXX.v2.p1.c1" var_name="SUBJID" calculated_type="string" reported_type="integer"><description>XXXX ID</description><total><subject_profile><sex><male>2467</male></sex></subject_profile><stats>

idat is a binary format and can't be read as plain text.

snp R genome • 399 views
ADD COMMENTlink modified 4 months ago • written 6 months ago by rdlady0
0
gravatar for rdlady
6 months ago by
rdlady0
rdlady0 wrote:

Anyone knows how to analyze dbGaP data in R?

ADD COMMENTlink written 6 months ago by rdlady0
0
gravatar for rdlady
6 months ago by
rdlady0
rdlady0 wrote:

I have decrypted the dbGaP files but now the problem is that I can't map the phenotype files to genotype files, so I have a bunch of information on SNPs but I don't know to who they belong (cases or controls, male or female, age, etc). Does anyone know how to map the genotypes to phenotypes in the dbGaP data sets?

ADD COMMENTlink written 6 months ago by rdlady0
1

Sorry, but I have a question.

To access dbGaP database do I need special account?

Thank you so much!

ADD REPLYlink written 6 months ago by 49652710

You probably should request an account to access all the content of dbGaP database, because some datasets are not open to the public. In my case I had to request an account because I needed to have access these closed datasets.

ADD REPLYlink written 4 months ago by rdlady0
0
gravatar for rdlady
4 months ago by
rdlady0
rdlady0 wrote:

I ended up never being able to use those XML files as source of Phenotype data, but I was able to find phenotype data files in a very simple tabular text format when I requested the download of my dbGaP dataset again. So for some reason, only the XML files where available when I made the first download request. Now with the tabular text files I was able to extract the phenotype data very easily, using R's GenABEL package.

ADD COMMENTlink modified 4 months ago • written 4 months ago by rdlady0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 420 users visited in the last hour