Filter based on species from the data
2
0
Entering edit mode
16 months ago

I have a metagenomics data.

k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Lachnospiraceae_bacterium_NSJ_46|t__SGB47656     0.00074 0.00786 0.0047  0.0 0.0      0.0     0.0     0.0     0.03584 0.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Lachnospiraceae|g__Lachnospiraceae_unclassified|s__Lachnospiraceae_unclassified_SGB4890|t__SGB4890  0.00065 0.0     0.0     0.0 0.0      0.0     0.0     0.0     0.0     0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Propionibacteriales|f__Propionibacteriaceae|g__Arachnia|s__Arachnia_SGB15898|t__SGB15898     0.00061 0.0     0.00367 0.00472 0.0     0.0 0.0      0.0     0.00098 0.00064
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Ruminococcaceae|g__Ruminococcaceae_unclassified|s__Ruminococcaceae_bacterium|t__SGB15196    0.00056 0.0     0.0     0.0     0.0 0.0      0.0     0.0691  0.0     0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Trueperella|s__Trueperella_pyogenes|t__SGB17137       0.00053 0.0     0.0     0.00204 0.0     0.0032       0.00049 0.00996 0.0     0.00189
k__Bacteria|p__Firmicutes|c__CFGB16911|o__OFGB16911|f__FGB16911|g__GGB49418|s__GGB49418_SGB69331|t__SGB69331    0.00047 0.01372 0.0     0.0     0.0     0.01526 0.0     0.02567 0.0     0.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Peptostreptococcaceae|g__Intestinibacter|s__Intestinibacter_bartlettii|t__SGB6140   0.00037 0.00717 1.01727 0.03621 0.00633 0.00663      0.17413 0.14154 0.04688 0.00647
k__Bacteria|p__Firmicutes|c__Bacilli|o__Bacilli_unclassified|f__Bacilli_unclassified|g__Bacilli_unclassified|s__Bacilli_bacterium|t__SGB6421    0.00037 0.0     0.13388 0.0     0.0     0.0 0.0      0.0     0.0     0.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Eubacteriaceae|g__Eubacteriaceae_unclassified|s__Eubacteriaceae_bacterium|t__SGB3958        0.0003  0.00372 0.03418 0.0     0.0 0.05297  0.08944 0.04638 0.01594 0.0
k__Bacteria|p__Bacteroidetes|c__Bacteroidia|o__Bacteroidales|f__Porphyromonadaceae|g__Porphyromonas|s__Porphyromonas_bennonis|t__SGB1985        0.00029 0.0     0.0     0.0     0.0     0.0 0.0021   0.0     0.0     0.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridia_unclassified|f__Clostridia_unclassified|g__Clostridia_unclassified|s__Clostridia_bacterium|t__SGB4342     0.00027 0.03727 0.05458 0.0 0.0      0.00094 0.0     0.01194 0.0     0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Pauljensenia|s__Pauljensenia_hongkongensis|t__SGB17148        0.0002  0.0     0.0     0.0     0.0 0.0      0.00483 0.0     0.00023 0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Actinomycetales|f__Actinomycetaceae|g__Actinomyces|s__Actinomyces_israelii|t__SGB15875       0.00019 0.0     0.0     0.0     0.0     0.0 0.00262  0.0     0.00038 0.0
k__Bacteria|p__Firmicutes|c__Clostridia|o__Clostridiales|f__Christensenellaceae|g__Christensenellaceae_unclassified|s__Christensenellaceae_bacterium|t__SGB14128        6e-05   0.00293 0.0 0.0      0.0     0.0     0.0     0.0     0.0     0.0
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_durum|t__SGB17008      5e-05   0.0     0.00114 0.0     0.0 0.0      0.00838 0.0     0.0041  0.00335

I want to fetch the data based on 1st column and wherever s_ is mentioned and upto s_. for example, I want to filter the data in which k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_durum is there. Can anyone help me regarding this? Thanks

metagenomics • 1.1k views
ADD COMMENT
0
Entering edit mode
16 months ago
JC 13k

something like this?

$ perl -lane 'if ($F[0]=~/s__Corynebacterium_durum/) { $F[0]=~s/\|t__.*//; print join "\t", @F }' < data.tsv 
k__Bacteria|p__Actinobacteria|c__Actinobacteria|o__Corynebacteriales|f__Corynebacteriaceae|g__Corynebacterium|s__Corynebacterium_durum  5e-05   0.0 0.00114 0.0 0.0 0.0 0.00838 0.0 0.0041  0.00335
ADD COMMENT
0
Entering edit mode

Thanks for helping me. it worked but the headers are not appearing. I want to keep the headers. Can you please tell me how can I keep the headers as it is?

ADD REPLY
0
Entering edit mode

what are the headers? some example?

ADD REPLY
0
Entering edit mode

these are the headers I want to keep as it is

#mpa_vJan21_CHOCOPhlAnSGB_202103
clade_name      B075Md_output_file      B090Md_output_file      B219Md_output_file      B447Md_output_file      B449Md_output_file      B478Md_output_file      B651Md_output_file      B671Md_output_file   B816Md_output_file      B825Md_output_file
ADD REPLY
0
Entering edit mode
$ perl -lane 'print if (/^#|clade/); if ($F[0]=~/s__Corynebacterium_durum/) { $F[0]=~s/\|t__.*//; print join "\t", @F }' < data.tsv 
ADD REPLY
0
Entering edit mode

thanks JC , it worked really well.

ADD REPLY
0
Entering edit mode
16 months ago
Zuber • 0

you can use Excel - Data --> Text to Columns

ADD COMMENT

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6