Question

Issue with PDB+chain to Pfam mapping and domain filter

0

Entering edit mode

3.0 years ago

jmungar2 ▴ 10

Hello,

I have 2 dataframes.

Dataframe 1 is a list of pdb chains presenting a particular substructure, and looks like : df1 = PDBCH RESIDUE_1 RESIDUE_N SUBSTRUCTURE_SEQUENCE

Dataframe 2 is the pdb2pfam mapping file from here http://ftp.ebi.ac.uk/pub/databases/Pfam/mappings/, and looks like: df2 = PDBCH PDB_START PDB_END PFAM_ACCESSION PFAM_NAME

where PDBCH means PDB code + Chain, so 5 character entries.

To map the PDBCH entries in my df1 to df2 and thus get the Pfam families for each of my results in df1 I do this:

df1_to_pfam_list = []

for index, value in enumerate (df1.PDBCH): pfam_indexes_list = df2.index[df2['PDBCH'] == value].tolist() df3 = pdb2pfam.iloc[pfam_indexes_list, :] df1_to_pfam_list.append(df3)

df1_to_pfam_df = pd.concat(df1_to_pfam_list)

Thus, df1_to_pfam_df looks like df2 but following the order of df1 and containing the indexes of df2. this is:

Index_df2 PDBCH(df1 order) PDB_START PDB_END PFAM_ACCESSION PFAM_NAME

Now I need to merge this new dataframe (df1_to_pfam_df) to df1 so that I can check if the sequence RESIDUE_1 TO RESIDUE_N in df1 are inside or not the Pfam domains (PDB_START TO PDB_END entries in df2). The problem is that df1_to_pfam_df is different in size that df1 because some pdbch entries are mapping to more than 1 Pfam family.

I'm quite stuck at this point. Any suggestions?

Thank you Juan

Pfam mapping PDB python • 504 views

ADD COMMENT • link 3.0 years ago by jmungar2 ▴ 10