How to parse entries from a text file into a python script?
2
1
Entering edit mode
2.6 years ago
Ming ▴ 70

Dear All,

I have problem using the input of another file in a python script. I have a text file: taxID.txt that looks like this:

28181
1979370
342108
2032654
1437059
1288970
156889
451514
2032646
2032652


I have about 20,000 entries in this text file.

I also have a script that requires me to input the taxID manually. Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

The python script is below:

 #!/usr/bin/python

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
lineage = ncbi.get_lineage(taxid)
names = ncbi.get_taxid_translator(lineage)
lineage2ranks = ncbi.get_rank(names)
ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
taxids = open(‘taxID.txt.txt’,‘r’)
desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
results = list()
for taxid in taxids:
results.append(list())
results[-1].append(str(taxid))
ranks = get_desired_ranks(taxid, desired_ranks)
for key, rank in ranks.items():
if rank != '<not present>':
results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
else:
results[-1].append(rank)

#print the results
for result in results:
print('\t'.join(result))

python • 743 views
0
Entering edit mode

I added code markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

0
Entering edit mode

I have problem using the input of another file in a python script.

To me, it is unclear what your problem exactly is, please elaborate.

Can I pipe the entries from taxID.txt into taxids = open(‘taxID.txt.txt’,‘r’) to make the script run?

I don't understand what you mean here, especially the "pipe" part.

For what it's worth, I believe your code could be more efficient. You now add items to the results list without using it afterwards, except for iterating over the list and printing the items. What you could do would be to first print the header, and then rather than adding items to results just print those directly.

0
Entering edit mode

Dear WouterDeCoster,

I am trying to run a python script, and the input to the script requires me to manually key in the entries manually. I happen to have a file that contains approximately 20, 000 entries.

This is what I am supposed to enter manually in the python script:

taxids = [1204725, 2162,  1300163, 420247]


Imagine put 20,000 entries is insane. If I have a file: taxID.txt, how do I put the entries of taxID.txt into the script?

Thank you!

0
Entering edit mode

Hi Ming,

This is more a pure programming question than bioinformatics. For future reference, questions like that are more appropriate at https://stackoverflow.com/

Cheers,
Wouter

0
Entering edit mode

You might be interested in: https://github.com/jrjhealey/PYlogeny

Which is a WIP script which does essentially what you're doing.

0
Entering edit mode

@jrj.healey, thank you and will check it out!

2
Entering edit mode
2.6 years ago
import re

0
Entering edit mode

but I have the following errors:

File "/home/tanshiming/Scripts/python/blast-taxonomy.py", line 17
^
SyntaxError: invalid character in identifier

0
Entering edit mode

make sure the quotes are real quotes: ' rather than ‘

2
Entering edit mode
2.6 years ago

taxids = open('taxID.txt.txt','r') returns a file object which is an iterator. If you iterate over this object you'll get the lines:

for line in taxids:
print(line)


If you absolutely need these lines in a list (I don't think that's necessary for this script) you can use the readlines() method:

lines = taxids.readlines()

One problem is that each line will still have the newline character \n at the end, so you have to trim that off, e.g. using rstrip.

for taxid in taxids:
results.append(list())
results[-1].append(str(taxid.rstrip('\n')))


Note that the above can also be simplified to:

for taxid in taxids:
results.append([str(taxid.rstrip('\n'))])

0
Entering edit mode

@ WouterDeCoster, thank you! It worked very well! :)

0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.

0
Entering edit mode

@Ming: Be aware that in your opening code example, you do not close() the taxids file. Consider switching to using a with open() block instead.