Help with record.id
1
0
Entering edit mode
5 months ago
Maya • 0

hello!

i'm using pyrosetta and biopython to write a code that folds proteins. i want to make the protein name a variable because i use it to label intput/output files throughout the code.

for fastafile in fastafiles:
for record in SeqIO.parse(fastafile, 'fasta'):
    protein = record.id
    sequence = str(record.seq)

pose = pose_from_sequence(sequence)
pose.pdb_info().name('%s' % protein)
dump_pdb(pose, '%s_input_518.pdb' % protein)
input_pose = pose_from_pdb('%s_input_518.pdb' % protein)

but when i run this, the record.id comes out like this

print(record.id)
>6Q21_1|Chains

the same thing happens when i try record.name. how do i fix it so that it doesn't include the "_1|Chains" part?

i'm new to biopython and coding in general so any tips are greatly appreciate!

pyrosetta biopython record.id bio.seqrecord python • 226 views
ADD COMMENT
1
Entering edit mode
5 months ago
seidel 8.3k

If all of your IDs have a uniform format, you could address it by splitting the string, and removing the ">" character. For instance, this gives you the first element of the fasta header:

protein = ">6Q21_1|Chains"
# split the string by pipe char and take only the first element
protein = protein.split("|")[0]
protein = protein.replace(">","")
# get rid of the underscore
protein = protein.split("_")[0]

print(protein)

If all your IDs have underscores, you could split on that in the first place.

ADD COMMENT
0
Entering edit mode

This fixed it! Thanks so much

ADD REPLY
0
Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
upvote_bookmark_accept

ADD REPLY

Login before adding your answer.

Traffic: 2861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6