Dear Biostars,

Most of the students/researchers use PubMed in order to find publications relative to their work or the field of interest. Although Pubmed, most of the time, includes the links of the journal that each publication is published it doesn't have the data.

Why a link directly to GEO/ARRAYexpress/SRA is not included in each publication entry to have direct access to the data?

• Do you believe that it should be added?

So far I always found a link in the publications I read that redirected my to the respective entries at GEO/ArrayExpress etc. Exceptions are sometimes older array data of the pre-NGS era, but even then things were typically available as analyzed data rather than raw intensity files. You have to open the publication itself and then simply search for keywords such as GSE or GEO. Some journals such as Cell now even have the accession number on the publically-available front page (abstract page) of the paper.

Some journals such as Cell now even have the accession number on the publically-available front page (abstract page) of the paper.

Exactly, that's what I expect also to be the case for every entry in a Pubmed publication. As every entry has the abstract then it can have also a link to the data.

While logical, what you are asking for is not a trivial thing. Machine learning etc can help to an extent but back-filling this information for millions of abstracts in PubMed would be a big undertaking.

One avenue would be to check the SRA. Many, not all, BioProjects have a DOI of the paper they're associated with, so you could get the BioProject associated with the abstract without scanning the paper!

But it isn't the case. You often have to manually look for it. Nothing you can do about it :)

@ATpoint But wouldn't be possible to be improved in the future?

@genomax I understand that it's not a trivial task, but why it wasn't included from the start. Since the time that GEO and ArrayExpress became available why they haven't linked them directly?

I totally agree that this would be desirable but you'll have to ask NCBI folks for the why. Still, it is possible to get accessions from the vast majority of studies by checking the paper itself rather than the PubMed entry, might be a little inconvenient but so far kind of always worked for me.

I believe accessions for datasets were not always required in distant past. There is always the difference between good intentions and practical considerations. MIAME standards for arrays were formulated a long while ago but I am not sure how many studies comply with them (or Journals require this info).

@genomax What would have been the practical considerations regarding direct access to the data? (In the first place of course, now it's late, as you have already said that it would need a lot of work computationally and time/manpower consuming) Is there any data with respect to how many studies comply with them?

@ATpoint Of course, that way is the one I follow and probably most of the people but when there is no access to the paper that makes it quite complicated. It is a different thing to have to pay for access regarding the publication and have access to the data. There are always different ways to find them (email directly the authors, buy the publication, etc.) but that doesn't change the fact for more transparency, and availability.

Are there any members of Biostars that are also "NCBI folks" ?

buy the publication...ehem...s-c-i-h-u-b-.....emhm...

@ATpoint Well someone bought it in the first place :)

Yes sure but no individual person or small institute should need to buy publications. It is often created from public money so a preprint or non-edited manuscript should always be available for free...should...