Question

Deleting all the rows from ANY column that contain a key word

0

Entering edit mode

22 months ago

pramirez ▴ 10

I'm writing a script that deletes all the rows from ANY column that contains the word "Eukaryota" from a data frame. Note that the row needs to contain the word "Eukaryota" not be = to "Eukaryota". The columns from the data frame do not have header names.

I am trying the following command:

import numpy as np
import pandas as pd
df = pd.read_csv('output.emapper.annotations_1_10.txt', sep='\t')
df.drop(df[df.apply(lambda row: 'Eukaryota' in row.to_string(header=False), axis=1)].index, inplace=True)
df.to_csv('sin_euk.csv', sep='\t')

The script runs but the file "sin_euk.csv" still contains the entries with the word "Eukaryota"

I have also tried the following strategy and did not obtain the desired result:

df = df[~df.isin(['Eukaryota']).any(axis=1)]

Do you know of any other strategies?

Thank you!

python pandas data-frame • 699 views

ADD COMMENT • link updated 21 months ago by Ram 43k • written 22 months ago by pramirez ▴ 10

0

Entering edit mode

sed can also do the job for you

sed '/Eukaryota/d' your_in_file > your_out_file

ADD REPLY • link 22 months ago by brunobsouzaa ▴ 830

score 2 · Accepted Answer · 2022-06-23

2

Entering edit mode

22 months ago

4galaxy77 2.8k

This sounds like a perfect task for grep (run on the command line, not python):

grep --invert-match 'Eukaryota' output.emapper.annotations_1_10.txt