Question: Print rows only if number matches
1
gravatar for waqasnayab
4 weeks ago by
waqasnayab110
Pakistan
waqasnayab110 wrote:

Hi,

Dear Community,

I have a column like this:

D309
E308
G296
T297A
P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
L191
A190
R478
R478H

. .. ...

or in other words you can say that this column is present in a very big file as column number 19. I want only those lines in which the number matches only with the next line, that is the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
R478
R478H

I tried this command:

cut -f19 mycolumnfile.txt | uniq -d

I got this output:

P415T

As it matches with the whole line. I want only those rows in which the number matches only.

Thanks,

Waqas.

sequencing snp next-gen • 146 views
ADD COMMENTlink modified 4 weeks ago by Pierre Lindenbaum93k • written 4 weeks ago by waqasnayab110

. I want only those lines in which the number matches only with the next line

not clear.

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum93k

for example, if my input is like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V

Whatever the character is present at first place, either P (in the first two lines) or V (in the lines three and four and so on,,,), I want to print those rows in which the numbers are repeated, that is 415 in the first two lines is repeated, or 457 in the lines three and four are repeated, so the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
ADD REPLYlink written 4 weeks ago by waqasnayab110
4
gravatar for guillaume.rbt
4 weeks ago by
guillaume.rbt240
France
guillaume.rbt240 wrote:

Hi,

I would do it in python, with your list of id in the file "list" (beware, not carefully tested)

import re

with open("./list", 'r') as f1:
    first = True
    last_int = 0
    last_line = ""
    for line in f1:
        if(last_int == int(re.findall("\d+", line)[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line)[0])
        last_line = line
ADD COMMENTlink written 4 weeks ago by guillaume.rbt240

I checked manually as well as by your python solution, it works perfectly fine.

What if I have the multi-column file and the same column is present at column 19, and I need to do the same task? How to mention column number so that filtering would have been taking place on the basis of column 19...,,!!!!???

Thanks,

Waqas.

ADD REPLYlink written 4 weeks ago by waqasnayab110

given that you have a tabulated table "table", and your list of id on column 19 :

import re

with open("./table", 'r') as f1:
    first=True
    last_int=0
    last_line=""    
    for line in f1:
        if(last_int == int(re.findall("\d+", line.split('\t')[18])[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line.split('\t')[18])[0])
        last_line = line
ADD REPLYlink written 4 weeks ago by guillaume.rbt240

Yes, it works fine, If I made some changes to the script will come up to you..,,,!!!!

ADD REPLYlink written 4 weeks ago by waqasnayab110
0
gravatar for Pierre Lindenbaum
4 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum93k wrote:

The idea is to use awk insert a new normalized column for your two files:

$ echo -e "a\tP415T\tb\na\tP415X\tb\na\tP415Y\tb" |\
awk -F '\t' '{key=$2; gsub(/[A-Z]$/,"",key); printf("%s\t%s\n",key,$0);}' |\
sort -t$'\t' -k1,1

P415    a   P415T   b
P415    a   P415X   b
P415    a   P415Y   b

then sort both files on this column and then use join to join both files.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Pierre Lindenbaum93k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 663 users visited in the last hour