Question: Download UniProt page source using python
0
gravatar for dovah
5.4 years ago by
dovah30
dovah30 wrote:

Hi guys!
I'm trying to save the content a web page to file, using python (3.4). More specifically, my aim is to save the ID and the FT-lines content of uniprot pages for given proteins. I have a text file containing several url and I have to save every related web page.

All what I can to is an accession code. The print function only allows to displaythe webpage content in a terminal. If I try to write a file on the query, it doesn't give the expected output (just a series of “random” numbers and letters)

So, I wondered if anyone has tested something like this before and could help me with my issue.
Thanks in advance!

#requesting webpage
import urllib.request
url = 'http://www.uniprot.org/uniprot/APBB1_HUMAN.txt'
req = urllib.request.Request(url)
page = urllib.request.urlopen(req)
src = page.readall()

#display webpage content on terminal
print(src)

#writing to file
with open("query.txt", "w") as f:
    for x in src:
    f.write(str(x))
uniprot python • 1.5k views
ADD COMMENTlink modified 5.4 years ago by Matt Shirley9.2k • written 5.4 years ago by dovah30
2
gravatar for Matt Shirley
5.4 years ago by
Matt Shirley9.2k
Cambridge, MA
Matt Shirley9.2k wrote:

You might try something like this:

Note that you don't need to make a Request object, and that you can eliminate the "with" statements and do something like urls = open('url_file.txt'), but then you would need to explicitly close the filehandles.

ADD COMMENTlink modified 5.4 years ago • written 5.4 years ago by Matt Shirley9.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2245 users visited in the last hour