Parsing XML file from dbGaP
0
0
Entering edit mode
6 months ago
CTLong ▴ 110

Hi all,

I have downloaded a series of phenotype and genotype data from dbGaP but have problems opening the data dict files with XML extension. What is the recommended way to parse these files?

Furthermore, is there any valuable information encrypted within these files? All I can see from it (without parsing) includes the study accession and xsl stylesheet. There isn't any information regarding the individual samples of the dataset.

XML • 768 views
ADD COMMENT
0
Entering edit mode

What kind of information do you need in those files ?

ADD REPLY
0
Entering edit mode

Hi Pierre, thanks for the reply. That is what I'm trying to figure out. I suspect most of the metadata could be found in the associated text files. Just not sure if there is any valuable information for individual samples in these XML files.

ADD REPLY
0
Entering edit mode

well, you tell us. We don't know the XML file you're looking at. For example, I only see phenotypes in that random XML file : view-source:https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000001/phs000001.v1.p1/archive/phs000001.AREDS.pht000001.v1.p1.datadict.xml

ADD REPLY
0
Entering edit mode

I think I successfully opened the XML file with Excel yesterday. Not much information to take out of these files in my case. Thanks!

ADD REPLY
0
Entering edit mode

excel ?? XML is just a text file. why not something like cat or more ??

ADD REPLY
1
Entering edit mode

I tried using cat and more, but the formatting looks unfamiliar so I tried opening with excel. I guess the information is the same, but it is better presented?

ADD REPLY

Login before adding your answer.

Traffic: 2080 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6