Looking for matching names in fasta file and genbank records
1
0
Entering edit mode
19 months ago
AveryB • 0

Is there a python way to look at genbank files in a different directory to see if the names of the files are listed on a fasta file?

biopython • 761 views
ADD COMMENT
0
Entering edit mode
find DIR1 DIR2 -type f -name "*.gb" | grep -F -f <(grep "^>" in.fa | cut -c 2-)
ADD REPLY
0
Entering edit mode
16 months ago
Alban Nabla ▴ 30

In Python you can do this:

import fnmatch
import os
from Bio import SeqIO

filenames = [f for f in os.listdir("your_folder") if fnmatch.fnmatch(f, '*.gb')]

records = SeqIO.parse('records.fasta', 'fasta')
for rec in records:
    for title in filenames:
        if title in rec.description:
        print('Match for ' + str(title))
            print('In: ', rec.id)       
ADD COMMENT
0
Entering edit mode

Alternatively for making filenames, use Python's glob module to make a list of full pathnames with matches in the specified folder combined with os.path.basename() to retain just the filename:

import glob
filenames = [os.path.basename(x) for x in glob.glob(os.path.join('your_folder','*.gb'))]
ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6