grep invert match - not working fine
0
0
Entering edit mode
6.3 years ago
waqasnayab ▴ 250

Hi,

My this post is in continuation to my previous post: Compare two columns of same file as paired rows

So, I am skipping some file introductory part. As Kevin replied with two commands. The second command:

awk '!index($3, $4) { print }' test.tsv | grep -v -f matches.list

here grep -v is supposed to do invert matching, which it is the case, but still it seems to me that it is missing those records in the output which are not present in the matches.list.

For example, my output should have this record:

CHR POS AACHANGE EFFECT
chr15 41342365 T35A D
chr15 41342366 T35N D

because matches.list has no T35, or T35A or T35N. But still this record is not present. I am unable to understand. I checked my file, for proper formatting, any invisible characters etc, all seems well. Is this grep error, or something else. I am using UBUNTU 16.04 LTS.

Any help appreciated,

Thanks in advance,

Waqas.

SNP software error genome • 2.4k views
ADD COMMENT
0
Entering edit mode

try to

awk '!index($3, $4) { print }' test.tsv | grep T35 | grep -f matches.list  --color

to see what happened.

ADD REPLY
0
Entering edit mode

I tried your command. Instead of getting colored T35, I got following line:

chr15   41342365    rs7178634   A   G   NUSAP1  exonic  NUSAP1  .   nonsynonymous_SNV   NUSAP1:NM_001243142:exon2:c.A97G:p.T35A,NUSAP1:NM_001243143:exon2:c.A97G:p.T35A,NUSAP1:NM_001301136:exon2:c.A97G:p.T35A,NUSAP1:NM_016359:exon2:c.A97G:p.T35A,NUSAP1:NM_018454:exon2:c.A97G:p.T35A   A/G A/G A/G A/A 0/1 0/1 0/1 0/0 MISSENSE    T35A    Acc Gcc GAc D   .   .

in which A97 is colored. I changed my test.tsv to test_annotated now.

Second, 41346366 is also present if I run:

awk '!index($3, $4) { print }' test.tsv | grep T35

but it disappeared when I run your whole command (as you noted above).

ADD REPLY
1
Entering edit mode

in which A97 is colored

you have something like 'A97' in your matches.list. That's why the line is removed.

ADD REPLY
0
Entering edit mode

Yes, Pierre, there is A97, thats why, the desired record had been removed.

I changed Kevin's command:

awk '!index($3, $4) { print }' test.tsv | grep -v -f matches.list

to:

awk '!index($3, $4) { print }' test.tsv | grep -w -v -f matches.list

and now the output has all the records that were removed due to that.

Thanksssss!!!!!

ADD REPLY
0
Entering edit mode

Instead of getting colored T35

there is no reason to see a colored T35 in my command line.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6