grep command to grep list of genes in one file (txt) present in other file(gff) but it did not work...?
0
1
Entering edit mode
2.6 years ago
grep  -w -f upregulated_genes_in_BEg GCA_900659725.1_ASM90065972v1_genomic.gff
LOCUS10095
LOCUS10108
LOCUS10129
LOCUS10130
LOCUS10152
LOCUS10161
LOCUS10225
LOCUS10252
LOCUS10266
LOCUS10268
LOCUS10294
LOCUS10313
LOCUS10351
LOCUS10355
LOCUS10358

Gff file:

CAACVG010000011.1   EMBL    exon    112774  112954  .   +   .   ID=exon-CALMAC_LOCUS4-2-3;Parent=rna-CALMAC_LOCUS4-2;Note=source:maker%3B~ID:CALMACT00000015207;gbkey=mRNA;locus_tag=CALMAC_LOCUS4;product=DNA-directed RNA polymerase II 16 kDa polypeptide;standard_name=rpb4tada2a_iso3
CAACVG010000011.1   EMBL    exon    121098  121359  .   +   .   ID=exon-CALMAC_LOCUS4-2-4;Parent=rna-CALMAC_LOCUS4-2;Note=source:maker%3B~ID:CALMACT00000015207;gbkey=mRNA;locus_tag=CALMAC_LOCUS4;product=DNA-directed RNA polymerase II 16 kDa polypeptide;standard_name=rpb4tada2a_iso3
CAACVG010000011.1   EMBL    CDS 83091   83151   .   +   0   ID=cds-VEN33461.1;Parent=rna-CALMAC_LOCUS4-2;Dbxref=NCBI_GP:VEN33461.1;Name=VEN33461.1;Note=source:maker%3B~ID:CALMACC00000015207;gbkey=CDS;locus_tag=CALMAC_LOCUS4;product=VEN33461.1;protein_id=VEN33461.1
CAACVG010000011.1   EMBL    CDS 112774  112954  .   +   2   ID=cds-VEN33461.1;Parent=rna-CALMAC_LOCUS4-2;Dbxref=NCBI_GP:VEN33461.1;Name=VEN33461.1;Note=source:maker%3B~ID:CALMACC00000015207;gbkey=CDS;locus_tag=CALMAC_LOCUS4;product=VEN33461.1;protein_id=VEN33461.1
CAACVG010000011.1   EMBL    CDS 121098  121272  .   +   1   ID=cds-VEN33461.1;Parent=rna-CALMAC_LOCUS4-2;Dbxref=NCBI_GP:VEN33461.1;Name=VEN33461.1;Note=source:maker%3B~ID:CALMACC00000015207;gbkey=CDS;locus_tag=CALMAC_LOCUS4;product=VEN33461.1;protein_id=VEN33461.1
CAACVG010000011.1   EMBL    mRNA    82779   101822  .   +   .   ID=rna-CALMAC_LOCUS4-3;Parent=gene-CALMAC_LOCUS4;Note=source:maker%3B~ID:CALMACT00000015206;gbkey=mRNA;locus_tag=CA
grep regex • 1.3k views
ADD COMMENT
0
Entering edit mode

Did not work because your search string (pattern) does not seem to be present in example posted above?

ADD REPLY
0
Entering edit mode

I think LOCUS* is present in the gff file. Do you think it is not enough or not a complete pattern?

ADD REPLY
1
Entering edit mode

Entries in the file contain the word CALMAC before _LOCUS. So you will need to adjust your search patterns or not use -w option.

$ zgrep "LOCUS10355" GCA_900659725.1_ASM90065972v1_genomic.gff.gz | head -5
CAACVG010008278.1       EMBL    gene    32249   125186  .       -       .       ID=gene-CALMAC_LOCUS10355;Name=CALMAC_LOCUS10355;gbkey=Gene;gene_biotype=protein_coding;locus_tag=CALMAC_LOCUS10355
ADD REPLY

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6