to grep pattern
1
0
Entering edit mode
10 weeks ago

I am interested to grep the line only containing the word "gene" present at column 3 of this following file but this word is also present at each line in column 9 of this file. Please any suggestion to use the grep in bash/linux and select only the line which has word "gene" at column 3 of this file. thanks..

HiC_scaffold_5  maker   gene    42689882        42692012        .       +       .       ID "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11"; Name "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11"; 0
HiC_scaffold_5  maker   mRNA    42689882        42692012        .       +       .       ID "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1"; Name "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1"; Parent "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11"; _AED "0.36"; _QI "0|0|0|0.5|1|1|2|0|160"; _eAED "1.00";        0
HiC_scaffold_5  maker   exon    42689882        42690273        .       +       .       ID "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1:1"; Parent "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1";
       0
HiC_scaffold_5  maker   exon    42691922        42692012        .       +       .       ID "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1:2"; Parent "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1";
       0
HiC_scaffold_5  maker   CDS     42689882        42690273        .       +       0       ID "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1:cds"; Parent "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1";
     0
HiC_scaffold_5  maker   CDS     42691922        42692012        .       +       1       ID "IDmodified-cds-15717"; Parent "augustus_masked-@000000F|arrow|arrow-processed-gene-0.11-mRNA-1";    0
HiC_scaffold_5  maker   gene    42938430        42944293        .       +       .       ID "maker-@000000F|arrow|arrow-snap-gene-4.61"; Name "maker-@000000F|arrow|arrow-snap-gene-4.61";       0
HiC_scaffold_5  maker   mRNA    42938430        42944293        .       +       .       ID "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1"; Name "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61"; _AED "0.34"; _QI "0|0|0|0.66|0|0.33|3|0|256"; _eAED "0.83"; 0
HiC_scaffold_5  maker   exon    42938430        42938991        .       +       .       ID "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1:1"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";     0
HiC_scaffold_5  maker   exon    42939695        42939894        .       +       .       ID "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1:2"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";     0
HiC_scaffold_5  maker   exon    42944285        42944293        .       +       .       ID "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1:3"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";     0
HiC_scaffold_5  maker   CDS     42938430        42938991        .       +       0       ID "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1:cds"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";   0
HiC_scaffold_5  maker   CDS     42939695        42939894        .       +       2       ID "IDmodified-cds-21469"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";   0
HiC_scaffold_5  maker   CDS     42944285        42944293        .       +       0       ID "IDmodified-cds-21470"; Parent "maker-@000000F|arrow|arrow-snap-gene-4.61-mRNA-1";   0
Bioinformatics • 163 views
ADD COMMENT
4
Entering edit mode
10 weeks ago

I would use awk instead so you can be explicit about selecting column 3:

awk -v FS='\t' -v OFS='\t' '$3 == "gene"' in.gff

With grep I would probably do:

grep -P '\tgene\t' in.gff
ADD COMMENT

Login before adding your answer.

Traffic: 1931 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6