Question

Help with record.id

0

Entering edit mode

4.1 years ago

Maya • 0

hello!

I'm using pyrosetta and biopython to write a code that folds proteins. I want to make the protein name a variable because I use it to label intput/output files throughout the code.

for fastafile in fastafiles:
for record in SeqIO.parse(fastafile, 'fasta'):
    protein = record.id
    sequence = str(record.seq)

pose = pose_from_sequence(sequence)
pose.pdb_info().name('%s' % protein)
dump_pdb(pose, '%s_input_518.pdb' % protein)
input_pose = pose_from_pdb('%s_input_518.pdb' % protein)

but when I run this, the record.id comes out like this

print(record.id)
>6Q21_1|Chains

the same thing happens when I try record.name. how do I fix it so that it doesn't include the "_1|Chains" part?

I'm new to biopython and coding in general so any tips are greatly appreciate!

pyrosetta python biopython • 1.1k views

ADD COMMENT • link updated 2.3 years ago by Ram 45k • written 4.1 years ago by Maya • 0

score 1 · Answer 1 · 2021-05-19

1

Entering edit mode

4.1 years ago

seidel 11k

If all of your IDs have a uniform format, you could address it by splitting the string, and removing the ">" character. For instance, this gives you the first element of the fasta header:

protein = ">6Q21_1|Chains"
# split the string by pipe char and take only the first element
protein = protein.split("|")[0]
protein = protein.replace(">","")
# get rid of the underscore
protein = protein.split("_")[0]

print(protein)

If all your IDs have underscores, you could split on that in the first place.

ADD COMMENT • link 4.1 years ago by seidel 11k

0

Entering edit mode

This fixed it! Thanks so much

ADD REPLY • link 4.1 years ago by Maya • 0

0

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY • link 4.1 years ago by GenoMax 152k