How can I extract out lines that contains 'g__' at their end along with the first two line headers? (i.e. only the genus level OTU from metaphlan output)
0
0
Entering edit mode
3.7 years ago
dpc ▴ 240

I want to extract the lines that contain g__something within the line but not followed by s__something.

type    obese   obese   obese   obese   obese   obese   obese   obese   obese   obese   obese
clade_name  ERR011272_profile   ERR011271_profile   ERR011270_profile   ERR011269_profile   ERR011268_profile   ERR011267_profile   ERR011266_profile   ERR567265_profile   ERR008764_profile   ERR03263_profile    ERR234562_profile
k__Bacteria 100 100 100 100 100 100 100 100 100 100 100
k__Bacteria|p__Actinobacteria   8.84082 9.08752 0.5741  0.89084 0.14307 0.13286 2.37624 1.45006 0.41332 0.9011  0.6917
k__Bacteria|p__Actinobacteria|c__Actinobacteria 5.17984 4.86388 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales  0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae  0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces   0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_odontolyticus  0.00902 0.01206 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HMSC035G02  0.01454 0.00469 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HPA0247 0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_ICM47   0.0089  0.02162 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_S6_Spd3 0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_oral_taxon_181  0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales    5.14507 4.80951 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae  5.14507 4.80951 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia 0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia|s__Aeriscardovia_aeriphila  0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium   5.12998 4.78036 0.14615 0.20425 0.05402 0.04654 1.45689 0.88663 0.24043 0.50227 0.32752
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_adolescentis   1.83735 1.8711  0.05485 0.05771 0.05402 0.04654 0.97666 0.57204 0.20912 0.42861 0.10834
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_angulatum  0   0   0   0   0   0   0   0   0   0   0.01811
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_bifidum    0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_catenulatum    0   0   0   0   0   0   0   0   0   0   0.12072
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_dentium    0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_longum 2.40681 2.17987 0.0913  0.14654 0   0   0.48024 0.31459 0.03131 0.07367 0.08035
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_pseudocatenulatum  0.88581 0.72939 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_pullorum   0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Gardnerella   0   0   0   0   0   0   0   0   0   0   0

Thanks

bash command • 1.0k views
ADD COMMENT
1
Entering edit mode

Since it is a 7 level nomenclature, genus is 6th level and each level is separated by |, you can try this: (this would print first two line headers, followed by genus information)

> $ awk -F "|" 'NR<3 {print}; NR>3 && NF==6 {print}' example.txt
ADD REPLY
0
Entering edit mode

Thnaks cpad0112. It helps.

ADD REPLY
0
Entering edit mode
grep 'g__[[:alnum:]]*$'

This should works

ADD REPLY
0
Entering edit mode

Sorry. I put the question wrong by mistake. Please allow me to edit and reply me if you know the answer. Thanks @Hood.

ADD REPLY
0
Entering edit mode

This should work then

grep 'g__[[:alnum:]]*[[:space:]]'

Result

k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces   0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia 0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium   5.12998 4.78036 0.14615 0.20425 0.05402 0.04654 1.45689 0.88663 0.24043 0.50227 0.32752
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Gardnerella   0   0   0   0   0   0   0   0   0   0   0
ADD REPLY

Login before adding your answer.

Traffic: 2971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6