Deleting all the rows from ANY column that contain a key word
1
0
Entering edit mode
3 months ago
pramirez ▴ 10

I'm writing a script that deletes all the rows from ANY column that contains the word "Eukaryota" from a data frame. Note that the row needs to contain the word "Eukaryota" not be = to "Eukaryota". The columns from the data frame do not have header names.

I am trying the following command:

import numpy as np
import pandas as pd
df = pd.read_csv('output.emapper.annotations_1_10.txt', sep='\t')
df.drop(df[df.apply(lambda row: 'Eukaryota' in row.to_string(header=False), axis=1)].index, inplace=True)
df.to_csv('sin_euk.csv', sep='\t')

The script runs but the file "sin_euk.csv" still contains the entries with the word "Eukaryota"

I have also tried the following strategy and did not obtain the desired result:

df = df[~df.isin(['Eukaryota']).any(axis=1)]

Do you know of any other strategies?

Thank you!

python pandas data-frame • 275 views
ADD COMMENT
0
Entering edit mode

sed can also do the job for you

sed '/Eukaryota/d' your_in_file > your_out_file
ADD REPLY
2
Entering edit mode
3 months ago
4galaxy77 2.3k

This sounds like a perfect task for grep (run on the command line, not python):

grep --invert-match 'Eukaryota' output.emapper.annotations_1_10.txt
ADD COMMENT
1
Entering edit mode

Heeeeey! It worked!!! Thanks, 4galaxy77.

ADD REPLY

Login before adding your answer.

Traffic: 1276 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6