Question

Iteratively deleting the first word in a dataframe column

0

Entering edit mode

21 months ago

pramirez ▴ 10

I want to modify a dataframe (here is a fragment of the dataframe):

            taxonomy    coverage
            ko04141,map04141  338.586386
            ko00970,map00970  338.586386
            ko00970,map00970  338.586386

I am trying to iteratively delete the first word (the one that contains ko) in a column where the words are separated by a coma. I am running the following loop:

for index,row in df51.iterrows():
word_list = row.taxonomy.split(',')
df51.taxonomy = word_list[1]

When I run the loop, instead of obtaining the full table, I obtain this:

    taxonomy   coverage            
map04112  279917.029342

Do you know why doesn't it iteratively assign the second word to each row and instead keeps only the first entry?

Thank you.

python dataframe • 1.2k views

ADD COMMENT • link updated 21 months ago by zorbax ▴ 610 • written 21 months ago by pramirez ▴ 10

0

Entering edit mode

The line df51.taxonomy = word_list[1] is doing as you wrote it. As you wrote it, it has no idea what row you are currently iterating on.

Do you actually just want to drop the first column? For efficiency you wouldn't do that iteratively.

Also it is bad idea to try and modify something you are iterating on.

ADD REPLY • link 21 months ago by Wayne ★ 1.9k

0

Entering edit mode

Thanks for your kind reply, Wayne! I don't need to drop the first column. The first column contains two words separated by a coma. I need to keep the words containing "map". When I iteratively print word_list[1] python prints the list of words that contain 'map' on them. However, as you mentioned, the line needs to know the row where I am iterating on. I have not found a way of making it clear in the code, and I was hoping to get some help. Thanks again for your time!

ADD REPLY • link 21 months ago by pramirez ▴ 10

0

Entering edit mode

21 months ago

Wayne ★ 1.9k

You can use Panda's apply to apply a function to each row or column.

Spelled out:

#--DEMO PREPARATION (JUPYTER-BASED)--------------
data='''taxonomy    coverage
ko04141,map04141    338.586386
ko00970,map00970    338.586386
ko00970,map00970    338.586386'''
%store data >test_data.txt #This is the line that 
# makes it Jupyter-based.
import pandas as pd 
df51 = pd.read_csv("test_data.txt",sep="\t")
#---END OF DEMO PREPARATION----------------------

def split_at_comma_n_get_2nd(item):
    parts = item.split(",")
    return parts[1]
df51.taxonomy = df51.taxonomy.apply(split_at_comma_n_get_2nd)

Streamlined using lambda and skipping preparation steps:

df51.taxonomy = df51.taxonomy.apply(lambda x: x.split(',')[1])

(Future-you will find the non-lambda approach easier to expand on now and easier to read six months from now. I'm only including it because use in conjunction with .apply() is one of the few places I feel lambda use makes sense to use, and even then, only when the function being implemented is super simple, such as this one.)

Jupyter notebook demonstrating it can be run by clicking here to get a session served via MyBinder.org. When the Jupyter notebook document shows up, select 'Run' > 'Run All Cells' from the toolbar to run it all fresh. Or, click on the first code cell and hit shift-enter to run that selected cell and then just keep hitting shift-enter to continue progressing in steps through each cell to follow along.
The notebook in static form can be viewed here.

ADD COMMENT • link 21 months ago by Wayne ★ 1.9k

score 2 · Accepted Answer · 2022-07-18

2

Entering edit mode

21 months ago

zorbax ▴ 610

I think you can use split and select the second field:

 df['taxonomy'] = df['taxonomy'].str.split(',', expand=True)[1]

ADD COMMENT • link 21 months ago by zorbax ▴ 610