Help with record.id
5 months ago
Maya • 0

hello!

i'm using pyrosetta and biopython to write a code that folds proteins. i want to make the protein name a variable because i use it to label intput/output files throughout the code.

for fastafile in fastafiles:
for record in SeqIO.parse(fastafile, 'fasta'):
protein = record.id
sequence = str(record.seq)

pose = pose_from_sequence(sequence)
pose.pdb_info().name('%s' % protein)
dump_pdb(pose, '%s_input_518.pdb' % protein)
input_pose = pose_from_pdb('%s_input_518.pdb' % protein)


but when i run this, the record.id comes out like this

print(record.id)
>6Q21_1|Chains


the same thing happens when i try record.name. how do i fix it so that it doesn't include the "_1|Chains" part?

i'm new to biopython and coding in general so any tips are greatly appreciate!

5 months ago
seidel 8.3k

If all of your IDs have a uniform format, you could address it by splitting the string, and removing the ">" character. For instance, this gives you the first element of the fasta header:

protein = ">6Q21_1|Chains"
# split the string by pipe char and take only the first element
protein = protein.split("|")[0]
protein = protein.replace(">","")
# get rid of the underscore
protein = protein.split("_")[0]

print(protein)


If all your IDs have underscores, you could split on that in the first place.

This fixed it! Thanks so much

