Question: Where can I upload SNP array data for public accessibility?
1
gravatar for devenvyas
2.8 years ago by
devenvyas580
Stony Brook
devenvyas580 wrote:

I have SNP data from the Affymetrix Human Origins array in Plink format. I am finishing up a manuscript detailing my first round of findings from these data. Any journal I try to publish in will require me to make the data accessible. As the array name implies, these are modern human genotyping data from already-known SNPs. There are absolutely no phenotyping data, just genotypes.

I am looking for a place where I can upload my data for public accessibility. In order for ease of use, I need to keep these data in Plink format. Using a service that requires me convert my data will just make my data harder for others to use.

I was wondering if anyone has any suggestions. The binary files weigh less than 30 Mb.

snp • 1.2k views
ADD COMMENTlink modified 2.8 years ago by harold.smith.tarheel4.4k • written 2.8 years ago by devenvyas580

I was going to say GEO, but that was recommended (along with several other repositories) when you posted this question previously. Were none of those options acceptable?

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

These are only genotypic data. There are no phenotypes or expression data, so dbGaP and GEO don't seem to make any sense. Moreover, they are genotypic data from an array designed specifically for population genetic analyses. They are for more of the "DNA as a history book" than "DNA as an instruction manual" type of analyses, so those repositories feel like a very off place to upload them.

I also know that dbGaP and EGA tend to lock up data and make researchers jump through a ton of hoops to get to actual data (e.g., there are some Ethiopian genomes on both services I want(ed) to analyze, but both datasets are locked behind a mountain of bureaucracy). If I want my data to be cited, I need them to be easy to access.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by devenvyas580

GEO accepts SNP data from the Affy platform, including custom arrays (see here). There's an email link on that webpage where you can contact the curators with questions about your data and GEO.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

I've seen that page before. GEO will not work. They want more than what I want to give. I wanted to provide a Plink file with genotypes for just the SNPs I analyzed from my samples. GEO wants the whole data and the files I got from the core lab (and inputted into Genotyping Console).

ADD REPLYlink written 2.8 years ago by devenvyas580

Data repositories do not accept Plink files b/c it's not data - it's the interpretation of data (i.e., analysis). Most journals have the same requirement as GEO - deposition of the raw data files (CEL in your case). This requirement allows other investigators to replicate your results and/or perform independent analyses unrelated to yours.

Note that, if the research was supported by public funds (e.g., NIH grant), there are data sharing policies that may also dictate public deposition of the raw data.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

I am not sure that applies in this case. The samples were collected and genotyped on different NSF grants. Previous genotyping data from the same samples were distributed as Plink files on DataDryad (but that costs money).

It's still data. Most people with Affy Human Origins array data distribute them as called genotypes (either in Plink or some Eigensoft format), not as raw data.

ADD REPLYlink written 2.8 years ago by devenvyas580

NSF policy on data sharing is available here. It explicitly states that primary data are to be shared.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

Yes, but in the population genetics realm, CEL files is not what is interpreted as the primary data. As I have said, whenever anyone using this array (or doing similar analyses with other arrays) with NIH/NSF funding has shared their data it was never as CEL files. It was called genotypes such Plink or Eigensoft formats.

ADD REPLYlink written 2.8 years ago by devenvyas580

If that's the custom in population genetics, then it would probably be more productive to ask those colleagues for recommendations.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k

It's still bioinformatics, so it is still relevant here.

From what I have noticed, most of the time the data gets put on the PIs website, but my advisor is wary of doing that. I am guessing unless someone else has any other suggestions, DataDryad is probably the place.

ADD REPLYlink written 2.8 years ago by devenvyas580
1

I didn't say that it wasn't bioinformatics or that is was irrelevant here. What gave you that impression?

Several members of this forum have made suggestions that you've found objectionable for a variety of reasons. Since your query here has not generated a response that meets your criteria, I was simply suggesting an alternative means of finding the answer that you seek.

ADD REPLYlink written 2.8 years ago by harold.smith.tarheel4.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1807 users visited in the last hour