Deleting all the rows from ANY column that contain a key word
3 months ago
pramirez ▴ 10

I'm writing a script that deletes all the rows from ANY column that contains the word "Eukaryota" from a data frame. Note that the row needs to contain the word "Eukaryota" not be = to "Eukaryota". The columns from the data frame do not have header names.

I am trying the following command:

import numpy as np
import pandas as pd
df.drop(df[df.apply(lambda row: 'Eukaryota' in row.to_string(header=False), axis=1)].index, inplace=True)
df.to_csv('sin_euk.csv', sep='\t')


The script runs but the file "sin_euk.csv" still contains the entries with the word "Eukaryota"

I have also tried the following strategy and did not obtain the desired result:

df = df[~df.isin(['Eukaryota']).any(axis=1)]


Do you know of any other strategies?

Thank you!

python pandas data-frame • 275 views
sed can also do the job for you

sed '/Eukaryota/d' your_in_file > your_out_file

3 months ago
4galaxy77 2.3k

This sounds like a perfect task for grep (run on the command line, not python):

grep --invert-match 'Eukaryota' output.emapper.annotations_1_10.txt

Heeeeey! It worked!!! Thanks, 4galaxy77.