Terminal command for extracting & filter row data from large RNAseq TXT files
6.1 years ago
Genosa

Hi, I am relatively new to terminal command and my knowledge is very basic.

I would like to ask for your advice on how to use a terminal command to 'filter out' specific rows from a large document of cuffdiff output file.

The header of each column looks like this:

test_id gene_id gene    locus   sample_1    sample_2    status  value_1 value_2 log2(fold_change)   test_stat   p_value q_value significant


I would like to filter out rows with EITHER sample_1 or Sample_2 FPKM of 5 or greater. Following which, I would like to generate a new file containing the filtered out rows.

May I know what is the command for executing this please?

Thank you !

linux terminal rnaseq
6.1 years ago
awk '($5>5.0 ||$6>5.0)' in.txt > out.txt

Thank you. May I know what " || " means and where can I learn these commands? Do you have a good awk tutorial resources that you can recommend?

"||" mean the condition "or" where you are setting a condition saying our " FPKM of 5 or greater" .

Actually, it means

FPKM of SAMPLE_1 (5th column in .txt) OR SAMPLE_2 (6th column in .txt) is greater than 5