Question: Automated Literature Search For List Of Genes
3
gravatar for Davy
5.8 years ago by
Davy360
United States
Davy360 wrote:

I have a smallish list of genes that I need to do some literature searching on. There are about 80 of them so individually searching for each one and all their aliases would be much too time consuming. Does anyone know of a tool I could use to search pubmed (or other database) with a list of genes, other than building a very long query by hand?

literature genetics • 4.0k views
ADD COMMENTlink modified 5.3 years ago by reachtoskumar0 • written 5.8 years ago by Davy360
8
gravatar for Pierre Lindenbaum
5.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

a one-liner :-)

$ echo -e "NOTCH2\nPRKCB1" | while read G; do curl -s "http://www.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=${G}" | xsltproc <(echo "<x:stylesheet xmlns:x="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0'><x:output method="text"/><x:template match="/">http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=text&rettype=abstract&id=<x:for-each select="eSearchResult/IdList/Id"><x:value-of select="."/>,</x:for-each>
</x:template></x:stylesheet>") -  | xargs curl -s "${U}" ; done

1. J Immunol. 2013 Jun 5. [Epub ahead of print]

Intrinsic Molecular Factors Cause Aberrant Expansion of the Splenic Marginal Zone
B Cell Population in Nonobese Diabetic Mice.

Stolp J, Mariño E, Batten M, Sierro F, Cox SL, Grey ST, Silveira PA.

Garvan Institute of Medical Research, Immunology Program, Darlinghurst, New South
Wales 2010, Australia.
ADD COMMENTlink written 5.8 years ago by Pierre Lindenbaum118k
1

I've grouped all the Id into the same call for curl+efetch. That won't work for a large number of Id (big retmax) returned by esearch. But you could generate one URL per Id:

....  | xsltproc <(echo "<x:stylesheet xmlns:x="&lt;a href="http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform' version='1.0'><x:output method="text"/><x:template match="/"><x:for-each select="eSearchResult/IdList/Id">http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=text&rettype=abstract&id=<x:value-of select="."/>
</x:for-each></x:template></x:stylesheet>") -  | while read U; do curl -s "${U}" ; done
ADD REPLYlink modified 5.8 years ago • written 5.8 years ago by Pierre Lindenbaum118k

can the code for literature search work for other identifiers as well, for eg; rs ids or protein ids ?

ADD REPLYlink written 5.8 years ago by Nandini760

rs: use ncbi-elink, protein: yes but like the genes, beware the ambiguities

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum118k

As usual, Pierre to the rescue!!!

ADD REPLYlink written 5.8 years ago by Davy360

Is it possible to display >20 items? Pubmed lets to see 200 items per page.

ADD REPLYlink written 5.8 years ago by PoGibas4.7k

yes, see the NCBI doc for esearch / retmax. See also my first comment.

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum118k
7
gravatar for Tky
5.8 years ago by
Tky990
Japan
Tky990 wrote:

I modified a script from EUtilities years ago, without use BioPerl.

Please check the code and you can modify it to fit your need :-)

# 2010/11/29 
# pubfetch.pl
# Code modified based on Entrez Programming Utilities from PubMed
# http://eutils.ncbi.nlm.nih.gov/
# Usage: perl pubfetch.pl
use 5.010;
use LWP::Simple;
print "Please Enter The Keyword for Fetch: "; # ask for keyword to search
my $keyword = <> ;
chomp $keyword; 

my $year = 2000; # From which year to fetch
open COUNT,">$keyword.count.txt";
open RESULT,">$keyword.result.txt";

for ($year; $year<=2011; $year++){
    my $utils = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/";
    my $db     = "Pubmed";
    my $query  = $keyword;
    my $report = "abstract";
    my $esearch = "$utils/esearch.fcgi?" .
              "db=$db&retmax=1&usehistory=y&maxdate=$year&mindate=$year&term=";
#        say "$esearch$query";

    $output=get($esearch . $query);
    $hash{$year}=$output;
    my $web = $1 if ($output =~ /<WebEnv>(\S+)<\/WebEnv>/);
    my $key = $1 if ($output =~ /<QueryKey>(\d+)<\/QueryKey>/);
    my $count=$1 if ($output =~ /<eSearchResult><Count>(\d+)<\/Count>/);
#    say "$web $key $count";

    print "The total number of publication for $keyword in year $year is $count;\n";
    print COUNT "$year $count\n";

    if ( $count != 0 ){
        my $efetch = "$utils/efetch.fcgi?" .
               "rettype=$report&retmode=text&retstart=0&retmax=10000&" .
               "db=$db&query_key=$key&WebEnv=$web";


        my $efetch_result = get($efetch);

        print RESULT "$efetch_result";
    }  
}


my $idcount=0;
close RESULT;

open ID, "$keyword.result.txt";
open IDRESULT,">$keyword.IDlist.txt";

while (<ID>){
if (m/^PMID:\s(\d*)/){
    print IDRESULT "$1\n" ;
    push (my @array,"$1");
    $idcount++;

    }
}

print "the total number of the publication fetched is $idcount\n";
ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Tky990

Wow. Thanks so much! I think I should be able to modify this to do what I want.

ADD REPLYlink written 5.8 years ago by Davy360
6
gravatar for Giovanni M Dall'Olio
5.8 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

option 1: the Publication track in UCSC

The UCSC browser has recently included a new track, called Publications, containing literature relative to a gene. Thus, you can use the UCSC APIs to get all the references for a gene. For example, the following will get you all the references for the gene "CD97":

http://genome.ucsc.edu/cgi-bin/hgc?hgsid=338677393&c=chr19&o=14491955&t=14519537&g=pubsMarkerGene&i=CD97

I guess that you can also connect to the Mysql table, but I am not 100% sure that "articleId" field corresponds to the pubmed id:

mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select * from hgFixed.pubsMarkerAnnot where markerId="CD97" limit 10'

# select only the Ids (less verbose output)
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -e 'select distinct articleId, markerId from hgFixed.pubsMarkerAnnot where markerId="CD97" limit 10'

option 2: getting citations from Uniprot

Uniprot has some well curated citations for genes. You can get all the references for a list of genes by using the "Retrieve" tool from the Uniprot main page, and then parsing the RDF file.

option 3: use the eutils, but from another tool

If you do not want to spend time trying using the Bioperl (or Biopython) APIs to eutils, you can try this taverna workflow.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Giovanni M Dall'Olio26k
2
gravatar for cts
5.8 years ago by
cts1.6k
Pasadena
cts1.6k wrote:

Bioperl has this sort of functionality. I've never used it to query pubmed but the following website contains snippets to help you on your way: http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook#Simple_database_query

I think what you're looking for is the esearch or efetch utilities.

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by cts1.6k

Thanks but I was hoping to avoid having to use the BioPerl EUtilities. :( I really should get around to familiarising myself with them but I abandoned Perl a long time ago.

ADD REPLYlink written 5.8 years ago by Davy360
1

are you also adverse to the other "bio" packages, I believe that biopython/bioruby have similar functionality (although I've never used them)

ADD REPLYlink written 5.8 years ago by cts1.6k
0
gravatar for reachtoskumar
5.3 years ago by
reachtoskumar0 wrote:

You can also give a try to BioGyan (http://www.biogyan.com/). It is a comprehensive search tool specially designed for biologists, enabling search, annotation and ranking of scientific literature from public databases. It can accept multiple Genes.

ADD COMMENTlink written 5.3 years ago by reachtoskumar0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1835 users visited in the last hour