Question: print non match to list lines of GTF file
0
gravatar for Sam
11 weeks ago by
Sam120
Sam120 wrote:

Dear Biostars

I have a GTF file and also a gene_id list file. I want to exclude the lines contain the gene_id of list file

any help?

Thanks

GTF file:

    Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
    Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
    Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";

list file:
MSTRG.26714
MSTRG.26717
MSTRG.26704

output:

Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
   Chr05    StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
awk bash grep • 176 views
ADD COMMENTlink written 11 weeks ago by Sam120
1

Try this:

grep -v -w -f list_file GTF_file
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by SMK1.8k
3
gravatar for Prakash
11 weeks ago by
Prakash1.4k
India
Prakash1.4k wrote:

Did above command worked, it didn't work for me, you can try using awk

awk -F'"' 'NR==FNR{a[$1]++;next}!a[$2]' list_file GTF_file
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Prakash1.4k
1

I used egrep instead of grep and it worked!

ADD REPLYlink written 11 weeks ago by Sam120

What did you get? Check if you have an empty line in list_file...

For me it was:

$ cat GTF_file
Chr08   StringTie   exon    58908449    58908806    1000    -   .   gene_id "MSTRG.26714"; transcript_id "MSTRG.26714.1"; exon_number "1";
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
$ cat list_file
MSTRG.26714
MSTRG.26717
MSTRG.26704
$ grep -v -w -f list_file GTF_file
Chr08   StringTie   exon    58917751    58917790    1000    -   .   gene_id "MSTRG.26718"; transcript_id "MSTRG.26718.1"; exon_number "2";
Chr05   StringTie   exon    61586279    61586326    1000    +   .   gene_id "MSTRG.15742"; transcript_id "MSTRG.15742.1"; exon_number "1";
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by SMK1.8k

you are right SMK, there was actually empty line in the file Its working now. :)

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Prakash1.4k

Great!... Thanks for reporting. :-)

ADD REPLYlink written 11 weeks ago by SMK1.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour