Question: Write List of subject ID numbers to a new text file(s)
0
gravatar for incensefrenzie2006
3.3 years ago by
incensefrenzie20060 wrote:

I have a function that can read in the first line of multiple blastp files. I use a for loop to iterate through the files and print the subject ID numbers from the first line of each file. How could I write each list to its own file??

    import os, glob, sys

#This method reads in column index [1] from the protein result files and adds them to a list

def readProteinBlastFiles(name):
# Create path for files in my directory using the protein + database wildcard
path = '/Users/sueparks/' + name + 'L*'

Files_P = [] #empty list for subject accession numbers
for file in glob.glob(path):
    with open(file, 'r') as f:
        line = f.readline() # read first line of file
        if len(line) > 0:
            # field_0 = line.strip().split('\t')[0]
            field_1 = line.strip().split('\t')[1]

            #Files_P.append(field_0) # Query ID
            Files_P.append(field_1) # Subject ID
return Files_P

#Removed all P_17 files and removed all files from L_CTV-05 database
Bacillus_Proteins = ['P_1', 'P_2', 'P_3', 'P_4', 'P_5', 'P_6', 'P_7', 'P_8', 'P_9', 'P_10',
                 'P_11', 'P_12', 'P_13', 'P_14', 'P_15', 'P_16', 'P_18', 'P_19',
                 'P_20']

for prot in Bacillus_Proteins:
    list_SubjectAccession_Numbers = readProteinBlastFiles(prot)
    print prot
    print "############################"
    print list_SubjectAccession_Numbers

Results in console`

P_19 
['NP_391972.1', 'EEQ68114.1', 'NP_391972.1', 'EEQ25921.1', 'NP_391972.1', 'EFD99688.1', 'NP_391972.1', 'EFB61660.1', 'NP_391972.1', 'EEQ25318.1', 'NP_391972.1', 'EEJ40542.1', 'NP_391972.1', 'EEW51848.1', 'NP_391972.1', 'ADZ08087.1', 'NP_391972.1', 'EEJ68837.1', 'NP_391972.1', 'EFJ68832.1', 'NP_391972.1', 'EFH30349.1', 'NP_391972.1', 'EFO69387.1', 'NP_391972.1', 'EEU28530.1', 'NP_391972.1', 'EFQ45573.1', 'NP_391972.1', 'EGG32092.1', 'NP_391972.1', 'WP_013086961.1', 'NP_391972.1', 'EGC80044.1']

P_20
['EEQ68452.2', 'EEQ26185.1', 'EFD99988.1', 'EFB62008.1', 'EEQ24617.1', 'EEJ40165.1', 'EEW51876.1', 'ADZ06473.1', 'EEJ68783.1', 'EFJ69255.1', 'EFH29970.1', 'EFO68531.1', 'EEU28950.1', 'EFQ46733.1', 'EGG32019.1', 'WP_005720329.1', 'EGC80018.1']
subject id number python • 567 views
ADD COMMENTlink modified 3.3 years ago by st.ph.n2.6k • written 3.3 years ago by incensefrenzie20060
1
gravatar for st.ph.n
3.3 years ago by
st.ph.n2.6k
Philadelphia, PA
st.ph.n2.6k wrote:

From what I understand; changing this portion this will write each accession number belonging to each 'prot' in Bacillus_Proteins to it's on file as a list:

for prot in Bacillus_Proteins:
   with open(prot + '_results.txt', 'w') as out:
        list_SubjectAccession_Numbers = readProteinBlastFiles(prot)
        print prot
        print "############################"
        for i in list_SubjectAccession_Numbers:
                out.write(i)
ADD COMMENTlink written 3.3 years ago by st.ph.n2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2136 users visited in the last hour
_