Question: Print rows only if number matches
1
gravatar for waqasnayab
6 months ago by
waqasnayab130
Pakistan
waqasnayab130 wrote:

Hi,

Dear Community,

I have a column like this:

D309
E308
G296
T297A
P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
L191
A190
R478
R478H

. .. ...

or in other words you can say that this column is present in a very big file as column number 19. I want only those lines in which the number matches only with the next line, that is the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
R478
R478H

I tried this command:

cut -f19 mycolumnfile.txt | uniq -d

I got this output:

P415T

As it matches with the whole line. I want only those rows in which the number matches only.

Thanks,

Waqas.

sequencing snp next-gen • 248 views
ADD COMMENTlink modified 6 months ago by Pierre Lindenbaum99k • written 6 months ago by waqasnayab130

. I want only those lines in which the number matches only with the next line

not clear.

ADD REPLYlink written 6 months ago by Pierre Lindenbaum99k

for example, if my input is like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V

Whatever the character is present at first place, either P (in the first two lines) or V (in the lines three and four and so on,,,), I want to print those rows in which the numbers are repeated, that is 415 in the first two lines is repeated, or 457 in the lines three and four are repeated, so the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
ADD REPLYlink written 6 months ago by waqasnayab130
4
gravatar for guillaume.rbt
6 months ago by
guillaume.rbt330
France
guillaume.rbt330 wrote:

Hi,

I would do it in python, with your list of id in the file "list" (beware, not carefully tested)

import re

with open("./list", 'r') as f1:
    first = True
    last_int = 0
    last_line = ""
    for line in f1:
        if(last_int == int(re.findall("\d+", line)[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line)[0])
        last_line = line
ADD COMMENTlink written 6 months ago by guillaume.rbt330

I checked manually as well as by your python solution, it works perfectly fine.

What if I have the multi-column file and the same column is present at column 19, and I need to do the same task? How to mention column number so that filtering would have been taking place on the basis of column 19...,,!!!!???

Thanks,

Waqas.

ADD REPLYlink written 6 months ago by waqasnayab130

given that you have a tabulated table "table", and your list of id on column 19 :

import re

with open("./table", 'r') as f1:
    first=True
    last_int=0
    last_line=""    
    for line in f1:
        if(last_int == int(re.findall("\d+", line.split('\t')[18])[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line.split('\t')[18])[0])
        last_line = line
ADD REPLYlink written 6 months ago by guillaume.rbt330

Yes, it works fine, If I made some changes to the script will come up to you..,,,!!!!

ADD REPLYlink written 6 months ago by waqasnayab130
0
gravatar for Pierre Lindenbaum
6 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum99k wrote:

The idea is to use awk insert a new normalized column for your two files:

$ echo -e "a\tP415T\tb\na\tP415X\tb\na\tP415Y\tb" |\
awk -F '\t' '{key=$2; gsub(/[A-Z]$/,"",key); printf("%s\t%s\n",key,$0);}' |\
sort -t$'\t' -k1,1

P415    a   P415T   b
P415    a   P415X   b
P415    a   P415Y   b

then sort both files on this column and then use join to join both files.

ADD COMMENTlink modified 6 months ago • written 6 months ago by Pierre Lindenbaum99k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1673 users visited in the last hour