Parsing XML file from dbGaP

0

Entering edit mode

11 months ago

CTLong ▴ 120

Hi all,

I have downloaded a series of phenotype and genotype data from dbGaP but have problems opening the data dict files with XML extension. What is the recommended way to parse these files?

Furthermore, is there any valuable information encrypted within these files? All I can see from it (without parsing) includes the study accession and xsl stylesheet. There isn't any information regarding the individual samples of the dataset.

XML • 1.0k views

ADD COMMENT • link 11 months ago by CTLong ▴ 120

0

Entering edit mode

What kind of information do you need in those files ?

ADD REPLY • link 11 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

Hi Pierre, thanks for the reply. That is what I'm trying to figure out. I suspect most of the metadata could be found in the associated text files. Just not sure if there is any valuable information for individual samples in these XML files.

ADD REPLY • link 11 months ago by CTLong ▴ 120

0

Entering edit mode

well, you tell us. We don't know the XML file you're looking at. For example, I only see phenotypes in that random XML file : view-source:https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs000001/phs000001.v1.p1/archive/phs000001.AREDS.pht000001.v1.p1.datadict.xml

ADD REPLY • link 11 months ago by Pierre Lindenbaum 164k

0

Entering edit mode

I think I successfully opened the XML file with Excel yesterday. Not much information to take out of these files in my case. Thanks!

ADD REPLY • link 11 months ago by CTLong ▴ 120

0

Entering edit mode

excel ?? XML is just a text file. why not something like cat or more ??

ADD REPLY • link 11 months ago by Pierre Lindenbaum 164k

1

Entering edit mode

I tried using cat and more, but the formatting looks unfamiliar so I tried opening with excel. I guess the information is the same, but it is better presented?

ADD REPLY • link 11 months ago by CTLong ▴ 120

Login before adding your answer.