Question

How Easy Would Be For Google To Find Unpublished Interesting Research Results?

3

Entering edit mode

10.5 years ago

biotech ▴ 570

I'm a bit worried about the metadata that we generate daily using services as google. Since we, researchers, put a lot of effort to get interesting results, it is totally unfair for services like google to have this fresh and clean data ready-to-use, but at the same time it could be very difficult to achieve some of this results without google aid.

What do you think about this new phenomenon in biological research? Do you use google? Do you use alternative tools to hide your findings to metadata mining?

Imagine you are working to get a vaccine for an important bacterial pathogen and google knows there is a need for this vaccine, so he can patent it and get money. With the appropriate mining tools, it would be extremely easy to do it.

Imagine you have a vaccine candidate and want to patent it. In your email you will have keywords like:

-vaccine

-the name of the pathogen, for example AIDS, Salmonella, Haemophilus...

-patent

From all the email accounts google has, he could filter yours and go further. So now he knows you have a 'vaccine' for the pathogen 'Salmonella' and want to 'patent'.

What remains here for google to patent:

-the gene locus tag?

What I mean is that with this simple argumentation, if google discovers that there is a link between 'locus tag', 'vaccine', 'Salmonella' and 'patent' he will only had to test this new vaccine candidate to see if protects and patent it. He would have saved years of research investment.

Too paranoid? Right?

Do you think this is not probable? Would you try to do this if you worked for google, had the required knowledge and access to all email accounts?

Thanks for your help. Please let me know if you find the question not appropriate in this forum.

network • 3.9k views

ADD COMMENT • link updated 10.5 years ago by Mary 11k • written 10.5 years ago by biotech ▴ 570

score 8 · Answer 1 · 2013-11-13

8

Entering edit mode

10.5 years ago

Istvan Albert 100k

I think you are looking at the wrong level of abstraction here. You are worrying about the security at the wrong level.

If you work for an organization how easy is for your system admin to see everything that you do? Very easy, all traffic passes through the organization's connection, they have access to your computer and can read off the encryption keys and decrypt the information that you are sending. They could push out an "update" to your system that would log all keystrokes etc.

have you even installed any program that you have not written? have you ever compiled a bioinformatics tool and ran it? have you ever inserted a thumb drive into your computer? have you ever left your computer unattended for a short period of time? each one of these are far more easily be used as an attack vector than assuming that Google will read your email and act on it.

Also imagine how much risk they would be taking on if they did this and, given how rich they are how big punitive damages would be. Google! If you are out there reading this, please steal from me.

As for you original question if you are involved in any type of information that is extremely sensitive then you shouldn't be using a Gmail to begin with.

ADD COMMENT • link 10.5 years ago by Istvan Albert 100k

3

Entering edit mode

Ken Thompson has us all owned :P

ADD REPLY • link 10.5 years ago by Devon Ryan 104k

0

Entering edit mode

Read that some years ago but forgot who wrote it

ADD REPLY • link 10.5 years ago by Istvan Albert 100k

1

Entering edit mode

Istvan, you have been really helpful with your answer, giving a point of view different that the one I previously had. Definitely, there are no secrets today and if you need to have one, I think we should avoid using google services.

ADD REPLY • link 10.5 years ago by biotech ▴ 570

score 5 · Answer 2 · 2013-11-13

A funny story. CAPRI is a community effort to evaluate the state-of-the-art of the docking field and pose new challenges and spur the development of better algorithms. Researchers provide the CAPRI "jury" with unreleased, unpublished, and therefore unknown 3D coordinates of a new protein-protein complex (or DNA, or sugar, or etc..) they are working on and CAPRI participants are given only partial information and have to reconstruct (dock) the protein complex. The results are evaluated later on by the committee using the original coordinate set. Often, researchers provide these challenges when they are about to submit a paper, and since everybody signs a sort of "NDA" it's not dangerous at all.

A few rounds ago, a new CAPRI target (i.e. complex) came up and our lab was participating. One of our members was tasked with googling the complex. We do this routinely to find papers, sequences, mutational analysis, etc that can help us in the docking (we develop a data-driven docking approach). After an hour or so of working, this colleague of mine came jumping in the office. He had found not only the unreleased PDB structures but the drafts of the manuscript. All with Google and a really precise query string. The CAPRI round was cancelled and the authors notified.

Who's the culprit? My colleague and his amazing Google-fu, the sysadmin that left that FTP open to everybody, or the PhD student or postdoc that deposited the data in such insecure media?

score 2 · Answer 3 · 2013-11-13

I've actually seen a number of cases of unpublished stuff turning up in Google Scholar. With alerts even.

http://topsy.com/s?q=google%20scholar%20unpublished

I don't know what the results or reasons are for these, but I have found them unsettling. I could swear in a recent conversation someone found an unpublished patent submission but I can't find that tweet right now....

EDIT: Casey Bergman on twitter reminded me of the story Keith Bradnam told recently on this:

The scary indexing power of Google Scholar