Hi all, I need a trained python eye for this :)
I need to remove 100's of genes from a proteome file contains 1000s genes. Obviously I do not want to do it manually. I have pulled the python code pasted below from somewhere, which is a few years old. It is supposed to do what I want, but it does not. It just copies all the files from the original file to the output file, ignoring the remove.file. This code requires 3 files which I supplied. File 1; "123.fasta" - the file with my original unedited proteome, file 2; "remove.txt" - the file with the list of gene ID's to be removed. File 3. "new.fasta" - the output file with the edited proteome minus the genes listed in the remove.txt file. Ideally, I would like the code to identify the genes in "123.fasta" by the fasta format sequence ID (eg. >sequence1, >sequence2 etc).
This is the code:
import Bio
from Bio import SeqIO
import sys
fasta_file = ("123.fasta")
remove_file = ("remove.txt")
result_file = ("new.fasta")
remove = set (">")
with open(remove_file) as f:
for line in f:
line = line.strip()
if line != "":
remove.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file), "fasta")
with open(result_file, "w") as f:
for seq in fasta_sequences:
nam = str()
nam = nam.stripseq.id)
nuc = str(seq.seq)
SeqIO.write([seq], f, "fasta")
As I said, no matter what I tweak, it just copies and pastes all of the 123.fasta file into the output file, no deletions. Any of the python people see what may be the problem? I am not a trained python operator , just using it for my work.
Since this question is not about python code you wrote consider
faSomeRecordsutility from Jim Kent (LINK). After downloading the file add execute permissions (chmod u+x faSomeRecords). Use as follows