Question: How to parse entries from a text file into a python script?
1
gravatar for Ming
17 months ago by
Ming60
Ming60 wrote:

Dear All,

I have problem using the input of another file in a python script. I have a text file: taxID.txt that looks like this:

28181
1979370
342108
2032654
1437059
1288970
156889
451514
2032646
2032652

I have about 20,000 entries in this text file.

I also have a script that requires me to input the taxID manually. Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

The python script is below:

 #!/usr/bin/python

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = open(‘taxID.txt.txt’,‘r’)
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join(result))
python • 542 views
ADD COMMENTlink modified 17 months ago by zx87549.6k • written 17 months ago by Ming60

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 17 months ago by WouterDeCoster44k

I have problem using the input of another file in a python script.

To me, it is unclear what your problem exactly is, please elaborate.

Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

I don't understand what you mean here, especially the "pipe" part.

For what it's worth, I believe your code could be more efficient. You now add items to the results list without using it afterwards, except for iterating over the list and printing the items. What you could do would be to first print the header, and then rather than adding items to results just print those directly.

ADD REPLYlink written 17 months ago by WouterDeCoster44k

Dear WouterDeCoster,

I am trying to run a python script, and the input to the script requires me to manually key in the entries manually. I happen to have a file that contains approximately 20, 000 entries.

This is what I am supposed to enter manually in the python script:

taxids = [1204725, 2162,  1300163, 420247]

Imagine put 20,000 entries is insane. If I have a file: taxID.txt, how do I put the entries of taxID.txt into the script?

Thank you!

ADD REPLYlink written 17 months ago by Ming60

Hi Ming,

This is more a pure programming question than bioinformatics. For future reference, questions like that are more appropriate at https://stackoverflow.com/

Cheers,
Wouter

ADD REPLYlink written 17 months ago by WouterDeCoster44k

You might be interested in: https://github.com/jrjhealey/PYlogeny

Which is a WIP script which does essentially what you're doing.

ADD REPLYlink written 17 months ago by Joe18k

@jrj.healey, thank you and will check it out!

ADD REPLYlink written 17 months ago by Ming60
2
gravatar for mohammadhassanj
17 months ago by
mohammadhassanj110 wrote:
import re
taxids = re.sub("\n"," ",open(‘taxID.txt.txt’,‘r’).read()).split(" ")
ADD COMMENTlink written 17 months ago by mohammadhassanj110

Thanks @mohammadhassanj,

but I have the following errors:

File "/home/tanshiming/Scripts/python/blast-taxonomy.py", line 17
    taxids = re.sub("\n"," ",open(‘taxID.txt.txt’,‘r’).read()).split(" ")
                                       ^
SyntaxError: invalid character in identifier
ADD REPLYlink written 17 months ago by Ming60

make sure the quotes are real quotes: ' rather than

ADD REPLYlink written 17 months ago by WouterDeCoster44k
2
gravatar for WouterDeCoster
17 months ago by
Belgium
WouterDeCoster44k wrote:

taxids = open('taxID.txt.txt','r') returns a file object which is an iterator. If you iterate over this object you'll get the lines:

for line in taxids:
    print(line)

If you absolutely need these lines in a list (I don't think that's necessary for this script) you can use the readlines() method:

lines = taxids.readlines()

One problem is that each line will still have the newline character \n at the end, so you have to trim that off, e.g. using rstrip.

for taxid in taxids:
    results.append(list())
    results[-1].append(str(taxid.rstrip('\n')))

Note that the above can also be simplified to:

for taxid in taxids:
    results.append([str(taxid.rstrip('\n'))])
ADD COMMENTlink written 17 months ago by WouterDeCoster44k

@ WouterDeCoster, thank you! It worked very well! :)

ADD REPLYlink written 17 months ago by Ming60

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

Upvote|Bookmark|Accept

ADD REPLYlink modified 17 months ago • written 17 months ago by Devon Ryan96k

@Ming: Be aware that in your opening code example, you do not close() the taxids file. Consider switching to using a with open() block instead.

ADD REPLYlink written 17 months ago by Joe18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1105 users visited in the last hour