Question: How can I extract out lines that contains 'g__' at their end along with the first two line headers? (i.e. only the genus level OTU from metaphlan output)
0
gravatar for dpc
11 days ago by
dpc140
India
dpc140 wrote:

I want to extract the lines that contain g__something within the line but not followed by s__something.

type    obese   obese   obese   obese   obese   obese   obese   obese   obese   obese   obese
clade_name  ERR011272_profile   ERR011271_profile   ERR011270_profile   ERR011269_profile   ERR011268_profile   ERR011267_profile   ERR011266_profile   ERR567265_profile   ERR008764_profile   ERR03263_profile    ERR234562_profile
k__Bacteria 100 100 100 100 100 100 100 100 100 100 100
k__Bacteria|p__Actinobacteria   8.84082 9.08752 0.5741  0.89084 0.14307 0.13286 2.37624 1.45006 0.41332 0.9011  0.6917
k__Bacteria|p__Actinobacteria|c__Actinobacteria 5.17984 4.86388 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales  0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae  0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces   0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_odontolyticus  0.00902 0.01206 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HMSC035G02  0.01454 0.00469 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_HPA0247 0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_ICM47   0.0089  0.02162 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_S6_Spd3 0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_sp_oral_taxon_181  0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales    5.14507 4.80951 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae  5.14507 4.80951 0.14615 0.20425 0.05402 0.04654 1.47203 0.88991 0.2423  0.50227 0.33122
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia 0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia|s__Aeriscardovia_aeriphila  0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium   5.12998 4.78036 0.14615 0.20425 0.05402 0.04654 1.45689 0.88663 0.24043 0.50227 0.32752
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_adolescentis   1.83735 1.8711  0.05485 0.05771 0.05402 0.04654 0.97666 0.57204 0.20912 0.42861 0.10834
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_angulatum  0   0   0   0   0   0   0   0   0   0   0.01811
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_bifidum    0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_catenulatum    0   0   0   0   0   0   0   0   0   0   0.12072
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_dentium    0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_longum 2.40681 2.17987 0.0913  0.14654 0   0   0.48024 0.31459 0.03131 0.07367 0.08035
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_pseudocatenulatum  0.88581 0.72939 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium|s__Bifidobacterium_pullorum   0   0   0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Gardnerella   0   0   0   0   0   0   0   0   0   0   0

Thanks

bash command • 155 views
ADD COMMENTlink modified 11 days ago • written 11 days ago by dpc140
1

Since it is a 7 level nomenclature, genus is 6th level and each level is separated by |, you can try this: (this would print first two line headers, followed by genus information)

> $ awk -F "|" 'NR<3 {print}; NR>3 && NF==6 {print}' example.txt
ADD REPLYlink modified 11 days ago • written 11 days ago by cpad011213k

Thnaks cpad0112. It helps.

ADD REPLYlink written 11 days ago by dpc140
grep 'g__[[:alnum:]]*$'

This should works

ADD REPLYlink written 11 days ago by Hood0

Sorry. I put the question wrong by mistake. Please allow me to edit and reply me if you know the answer. Thanks @Hood.

ADD REPLYlink written 11 days ago by dpc140

This should work then

grep 'g__[[:alnum:]]*[[:space:]]'

Result

k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces   0.03246 0.03837 0   0   0   0   0   0   0   0   0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Aeriscardovia 0.01509 0.02915 0   0   0   0   0.01514 0.00328 0.00187 0   0.0037
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Bifidobacterium   5.12998 4.78036 0.14615 0.20425 0.05402 0.04654 1.45689 0.88663 0.24043 0.50227 0.32752
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Bifidobacteriales|f__Bifidobacteriaceae|g__Gardnerella   0   0   0   0   0   0   0   0   0   0   0
ADD REPLYlink modified 11 days ago • written 11 days ago by Hood0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1566 users visited in the last hour