Question

Get Info gen sumary

0

Entering edit mode

7.9 years ago

cristina_sabiers ▴ 110

Hi!

I am wondering if someone can help me to get a code to call the sumary info from ncbi or genecards, I asume maybe using wget???

For sample I want from a list of gens get in a doc (so I can print it out) something like this:

TAP2

The membrane-associated protein encoded by this gene is a member of the superfamily of ATP-binding cassette (ABC) transporters. ABC proteins transport various molecules across extra- and intra-cellular membranes. ABC genes are divided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White). This protein is a member of the MDR/TAP subfamily. Members of the MDR/TAP subfamily are involved in multidrug resistance. This gene is located 7 kb telomeric to gene family member ABCB2. The protein encoded by this gene is involved in antigen presentation. This protein forms a heterodimer with ABCB2 in order to transport peptides from the cytoplasm to the endoplasmic reticulum. Mutations in this gene may be associated with ankylosing spondylitis, insulin-dependent diabetes mellitus, and celiac disease. Alternative splicing of this gene produces products which differ in peptide selectivity and level of restoration of surface expression of MHC class I molecules. [provided by RefSeq, Feb 2014]

IDUA

Summary This gene encodes an enzyme that hydrolyzes the terminal alpha-L-iduronic acid residues of two glycosaminoglycans, dermatan sulfate and heparan sulfate. This hydrolysis is required for the lysosomal degradation of these glycosaminoglycans. Mutations in this gene that result in enzymatic deficiency lead to the autosomal recessive disease mucopolysaccharidosis type I (MPS I). [provided by RefSeq, Jul 2008]

thanks!

gen • 2.4k views

ADD COMMENT • link updated 7.9 years ago by WouterDeCoster 47k • written 7.9 years ago by cristina_sabiers ▴ 110

score 1 · Answer 1 · 2016-08-23

1

Entering edit mode

7.9 years ago

WouterDeCoster 47k

I think you can find some ideas here: Bulk Download Of Ncbi Gene "Summary" Field or here:

It's reasonably easy to achieve using Biopython

ADD COMMENT • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you so much!!! I will try it tomorrow and see :)

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

well I tried yesterday with this code..Im not a programer ... I generate the py file you gave me, and from internet I added modules Bio.py, Bio.Expasy.py and Bip._py3k...Hope I did right

I get this error message, what does mean?

python gene_sumary.py 8.vcf Traceback (most recent call last): File "gen.py", line 4, in <module> from Bio import Entrez File "/home/cri/Desktop/GET GEN/Bio.py", line 101 from Bio._py3k import urlopen as _urlopen ^ IndentationError: unexpected indent

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

Could you post the changes you made to the script? The error seems easy to solve, just a tab which is incorrect. It even tells you on which line...

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

mmm I think I did smething wrong..so I resintalled phyton and run your script, now instead all this error messages I get:

python gen.py 8.vcf
Traceback (most recent call last):
File "gen.py", line 17, in <module>
print("%(Name)s, %(Summary)s" % result[0])
KeyError: 0

I hadnt modify anything

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

As a clarification, it's not my script, I just found it while googling. It looks a bit strange. Do you have your input in a list in a file? How would you like to use the script? I'll rewrite it.

Which python version do you use?

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

oh thanks if you dont mind...Im not a programmer so for me this is pretty hard.

I thought I could just call the gens from my vcf file, I have tried just now from a txt file just with two genes and see..get same error.

I have PHYTON- 1.67

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

1

Entering edit mode

I wrote it to just read from a file with a gene name on each line. I don't think writing this for a vcf file is a lot of fun, unless you really want to... I guess this should work. Execute script as python getGeneSummary.py yourlist.txt Let me know if it doesn't work as it should or you would like a modification.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

ohhh

THANK YOU SOOOOOOOO MUUUUCH!!!!!!!!!! :))))))

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

I'm wondering if isn't any better alternative than ncbi? Even at genecards I find more information, many of the genes appear like this O_O

SLC6A5 !! No summary found SLC16A2 !! No summary found HRNR !! No summary found ......

Thanks

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

http://www.ncbi.nlm.nih.gov/gene/?term=SLC6A5%5Bsym%5D
http://www.ncbi.nlm.nih.gov/gene/?term=SLC16A2%5Bsym%5D if you need the human gene.
http://www.ncbi.nlm.nih.gov/gene/?term=HRNR%5Bsym%5D

ADD REPLY • link 7.9 years ago by GenoMax 144k

0

Entering edit mode

mmm then why I get that error when I use the code he gave me?

appear just a few from my list...

Genomax, would you mind to see if you get same error please?

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

0

Entering edit mode

For some genes (e.g. http://www.ncbi.nlm.nih.gov/gene/388697) there simply is no summary available.

As you can see in my code, the summary information is parsed from what it gets from ncbi. If something isn't properly formatted, it might give an error. I don't know if genecards has an API which I could access to pull the summary out. Although I must say that for the HRNR example genecards isn't very informative as well! (http://www.genecards.org/cgi-bin/carddisp.pl?gene=HRNR)

I don't know yet why SLC genes don't work, and I will look into this. Tonight, or after the conference this week. Sorry for the inconvenience, I hope I'll find a way around this.

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

You dont need to be sorry at all my god, Im really happy and appreciate your help, maybe I have st wrong with my pc and dont go as it should be?

Thanks you so much!, just for when you have the time ^^

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

1

Entering edit mode

Don't worry :p I'm happy to help and improve my scripting at the same time. Thanks for the feedback and challenge!

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

thank you!

hahahah if you want I can give you more challeneges XD

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110

1

Entering edit mode

Hi Christina, I made a first refinement and it took me just a few minutes (actually disappointingly easy). It's beyond me why this is the case, but without specifying the species apparently this gene was the top of the list with the entrez query: http://www.ncbi.nlm.nih.gov/gene/108519407

I limited to Homo sapiens (assuming that is what you're interested in) and it seems to work now.

If you have examples which don't work as they should, please let me know!

Code can be found below:

ADD REPLY • link 7.9 years ago by WouterDeCoster 47k

0

Entering edit mode

THANKS A LOT!

Works like a charm now!! These time some of gens like CPEB1 don't appear just because isn't a summary on ncbi. Good work!!

..but...always is a but with me...XD the file stops bit later and I get this error message...can be because my text file has over 1000 genes???? ^^"

Traceback (most recent call last): File "/home/cri/Desktop/biopython-1.68/getGeneSummary.py", line 15, in <module> result = Entrez.read(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/__init__.py", line 450, in read record = handler.read(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/Parser.py", line 233, in read self.parser.ParseFile(handle) File "/home/cri/Desktop/biopython-1.68/Bio/Entrez/Parser.py", line 390, in endElementHandler raise RuntimeError(value) RuntimeError: Invalid db name specified: gene

and THANK YOU SOOOOO MUCH!!!!!

ADD REPLY • link 7.9 years ago by cristina_sabiers ▴ 110