Question: Scan Through Txt, Append Certain Data To An Empty List In Python
1
gravatar for hicsuntdrac0nis
7.5 years ago by
hicsuntdrac0nis220 wrote:

I have a text file that I am reading in python . I'm trying to extract certain elements from the text file that follow keywords to append them into empty lists . The file looks like this:

enter image description here

so I want to make two empty lists

. 1st list will append the sequence names

. 2nd list will be a list of lists which will include be in the format [Bacteria,Phylum,Class,Order, Family, Genus, Species]

most of the organisms will be Uncultured bacterium . I am trying to add the Uncultured bacterium with the following IDs that are separated by ;

Is there anyway to scan for a certain word and when the word is found, take the word that is after it [separated by a '\t'] ?

I need it to create a dictionary of the Sequence Name to be translated to the taxonomic data .

I know i will need an empty list to append the names to:

seq_names=[ ]

a second list to put the taxonomy lists into

taxonomy=[ ]

and a 3rd list that will be reset after every iteration

temp = [ ]

I'm sure it can be done in Biopython but i'm working on my python skills

python programming • 16k views
ADD COMMENTlink modified 6 months ago by Biostar ♦♦ 20 • written 7.5 years ago by hicsuntdrac0nis220
6

Hi, I am sure you will get an answer here, but you will benefit more by learning python a bit longer. A good place is (http://www.diveintopython.net/). This problem is a good practice for your python skills!

ADD REPLYlink written 7.5 years ago by Haibao Tang3.0k
3
gravatar for Chris
7.5 years ago by
Chris1.6k
Munich
Chris1.6k wrote:

In general, this is a typical problem that could be approached by iterating over each line while splitting each line into tokens.

Try something like

seq_names = []
taxonomy = []
for line in file('/path/to/file'):
  if line.startswith('query name'): continue  #omit the header
  tokens = line.split('\t')  #tokens is a list containing words separated by '\t'
  #Store specific tokens in your arrays, e.g.
  seq_names.append( tokens[0] )
  taxonomy.append( tokens[9] )
ADD COMMENTlink modified 6 months ago by RamRS23k • written 7.5 years ago by Chris1.6k
3
gravatar for Giovanni M Dall'Olio
7.5 years ago by
London, UK
Giovanni M Dall'Olio26k wrote:

You can also use the csv module.

Example CSV file:

seq     id      strand  taxon
seq1    1       +       Bacteria
seq2    2       -       Bacteria
seq3    3       +       Archaea
seq4    4       +       Archaea

Extract all the 'bacteria' rows:

>>> import csv
>>> import re
>>> reader = csv.reader(open("sample_csv.txt", "r"), delimiter='\t')
>>> [row[0] for row in reader if re.match('bacteria', row[3], re.IGNORECASE)]
['seq1', 'seq2']
ADD COMMENTlink modified 6 months ago by RamRS23k • written 7.5 years ago by Giovanni M Dall'Olio26k
1
gravatar for Damian Kao
7.5 years ago by
Damian Kao15k
USA
Damian Kao15k wrote:

You can do all of this with python's parsing functions. So here is a basic template to parse a tab delimited file:

inFile = open('yourFile','r')

headers = inFile.next() #skip your header line
for line in inFile:
   data = line.strip().split('\t') #data is now an array of your columns
   if data[0] == "query": #if the first column is equal to something
      seq_names.append(data[0]) #add first column into seq_names array

But you should really study up on the python language itself. There are probably better data structures to store the information you want.

ADD COMMENTlink modified 6 months ago by RamRS23k • written 7.5 years ago by Damian Kao15k
1
gravatar for hicsuntdrac0nis
7.5 years ago by
hicsuntdrac0nis220 wrote:

http://stackoverflow.com/questions/9577830/scan-through-txt-append-certain-data-to-an-empty-list-in-python

ADD COMMENTlink modified 6 months ago by RamRS23k • written 7.5 years ago by hicsuntdrac0nis220
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 694 users visited in the last hour