Question: filtering out lines (rows) based on repeated zero values with awk in linux
0
rgescudero • 30 wrote:
I want to filter out lines having zero values in more than 70% of the columns. Imagine I have the following “test_awk.txt” file
id sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9 sample10
gene1 1 2 3 4 5 6 0 0 0 0
gene2 0 0 0 0 0 0 0 0 0 0
gene3 0 0 0 0 1 0 0 0 0 0
gene4 0 0 0 10 0 10 0 0 0 0
gene5 0 0 0 0 0 0 0 0 0 0
gene6 10 10 9 9 9 9 9 9 9 9
gene7 8 8 8 8 8 8 8 8 8 8
gene8 0 0 0 0 1 1 1 0 0 0
gene9 0 0 0 0 0 1 1 1 1 1
I would like to remove lines like “gene2”, “gene3”, “gene4”, “gene5”, and “gene8” because they have zero values in more than 7 coulmns out of 10. My reallife file is to big to run it in R, so I’m trying to use "awk" but I’m getting stack Any help would be much appreciated
Ramon
ADD COMMENT
• link
•
modified 12 months ago
by
shenwei356 ♦ 5.8k
•
written
12 months ago by
rgescudero • 30
To make sure that 0 in gene names is not counted, I added gene10 entry copying gene 9 values and changing gene9 to gene 10.