extracting Genus abundance info from Kraken 2 output file

0

Entering edit mode

2.7 years ago

arshad1292 ▴ 100

Hello

I have a kraken 2 output (k2report.txt file that looks like this:

enter image description here

What I want to do is to extract only number of reads for example for Genus i.e. G with taxid 2316020, 2719313, 207244 and 572511 and so on. I am not interested in D, O or R etc.

I have a large file with many hundred Genuses. Does anyone has any shell/python script that I could use to extract only Genus abundance (number of reads) for sample1, sample2 and sample3?

I would really appreciate your help.

Many thanks,

kraken2 • 1.1k views

ADD COMMENT • link updated 2.7 years ago by Nitin Narwade ★ 1.6k • written 2.7 years ago by arshad1292 ▴ 100

0

Entering edit mode

Do you mean you want to subset your matrix where lvl_type == "G"?

If yes then you can use grep "\tG\t" input-file.

ADD REPLY • link 2.7 years ago by Nitin Narwade ★ 1.6k

0

Entering edit mode

Yes that's correct that I want a subset of matrix that contains only "G".

I tried your script but it produced nothing...

ADD REPLY • link 2.7 years ago by arshad1292 ▴ 100

1

Entering edit mode

Actually, I assumed tab (\t) as the field separator.

If it is the filed separator and is still not working, you should probably add -P to the grep command. Something like this.

grep -P "\tG\t" input-file-name