extracting Genus abundance info from Kraken 2 output file
14 months ago

Hello

I have a kraken 2 output (k2report.txt file that looks like this:

What I want to do is to extract only number of reads for example for Genus i.e. G with taxid 2316020, 2719313, 207244 and 572511 and so on. I am not interested in D, O or R etc.

I have a large file with many hundred Genuses. Does anyone has any shell/python script that I could use to extract only Genus abundance (number of reads) for sample1, sample2 and sample3?

Do you mean you want to subset your matrix where lvl_type == "G"?

If yes then you can use grep "\tG\t" input-file.

Yes that's correct that I want a subset of matrix that contains only "G".

I tried your script but it produced nothing...

Actually, I assumed tab (\t) as the field separator.

If it is the filed separator and is still not working, you should probably add -P to the grep command. Something like this.

grep -P "\tG\t" input-file-name

Ok this one works. Thanks a lot!