Question

How I Can Obtain "Product" Features From A List Of Ncbi Accession?

1

Entering edit mode

12.4 years ago

Luke ▴ 240

I have a table of SNVs. Each row is a different exonic SNV. The column 4 of my table cointains a list of NCBI accession numbers. How I can append at the end of each row (i.e. column 5) the information of the gene product? Thank you! Luca

mutation retrieval genbank identifiers • 2.8k views

ADD COMMENT • link updated 12.1 years ago by Rm 8.3k • written 12.4 years ago by Luke ▴ 240

0

Entering edit mode

Specifically, which information do you wish to retrieve and add to column 5? Are you intending to parse info from GenBank format or another source?

ADD REPLY • link 12.4 years ago by Larry_Parnell 16k

0

Entering edit mode

Hi Larry! I wish retrieve this information from GenBank:

[...] FEATURES Location/Qualifiers [...] gene 1..3285 /gene="CWF19L2" ==> /note="CWF19-like 2, cell cycle control /dbxref="GeneID:143884" /dbxref="HGNC:26508" /dbxref="HPRD:13102" [...] CDS 31..2715 /gene="CWF19L2" /codonstart=1 ==> /product="CWF19-like protein 2" [...]

For example those indicated by the arrows. Thank you!

ADD REPLY • link 12.4 years ago by Luke ▴ 240

0

Entering edit mode

Yes, Larry, from GenBank! For example /note="CWF19-like 2, cell cycle control" under the section "gene" or /product="CWF19-like protein 2" under the section "CDS" of the GenBank file.

ADD REPLY • link 12.4 years ago by Luke ▴ 240

0

Entering edit mode

http://www.ncbi.nlm.nih.gov/nuccore/124487290?report=fasta is this format you wanna add?: >gi|124487290|ref|NM_001081077.1| Mus musculus CWF19-like 1, cell cycle control (S. pombe) (Cwf19l1), mRNA

ADD REPLY • link 12.4 years ago by Rm 8.3k

0

Entering edit mode

I'm interested only in a brief description of the CDS product.

ADD REPLY • link 12.4 years ago by Luke ▴ 240

score 2 · Answer 1 · 2011-11-30

Say you have the following input:

A NM_001081077 A
B NM_001081078 B
C NM_001081079 C
D NM_001081080 D

and the following xslt stylesheet:


<xsl:stylesheet xmlns:xsl="&lt;a href=" http:="" www.w3.org="" 1999="" XSL="" Transform"="" rel="nofollow">http://www.w3.org/1999/XSL/Transform"
    version="1.0"
    >

  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:text> </xsl:text>
    <xsl:value-of select="/GBSet/GBSeq/GBSeq_feature-table/GBFeature/GBFeature_quals/GBQualifier[GBQualifier_name='product']/GBQualifier_value"/>
    <xsl:text>
</xsl:text>
  </xsl:template>
</xsl:stylesheet>

the command line would be:

$ while read L ; do ID=`echo $L | cut -d ' ' -f 2`; echo -n $L; xsltproc --novalid stylesheet.xsl  "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ID}&rettype=gb&retmode=xml" ;done < input.txt

result:

A NM_001081077 A CWF19-like protein 1
B NM_001081078 B lactase-phlorizin hydrolase preproprotein
C NM_001081079 C opioid growth factor receptor-like protein 1
D NM_001081080 D PHD finger protein 3

score 1 · Answer 2 · 2011-11-30

you can get that information using eutils and curl and grep or awk for the regular expression you are looking for:

simplest will be:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NM_001081077&rettype=gb" | grep "/note="

more precisely if you are looking "note" within the "gene" feature:

curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=NM_001081077&rettype=gb" | awk '/     gene/,/note=/' | grep "/note="

Putting into a Pierre's while loop:

 while read L ; do ID=`echo $L | cut -d ' ' -f 2`; echo -n $L; curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ID}&rettype=gb" | awk '/ CDS/,/product=/' | grep "/product=" | sed 's/ *\/product=\"// ; s/"$//' ;done < input.txt

Output:

A NM_001081077 A CWF19-like protein 1
B NM_001081078 B lactase-phlorizin hydrolase preproprotein
C NM_001081079 C opioid growth factor receptor-like protein 1
D NM_001081080 D PHD finger protein 3