Question: Print rows only if number matches
1
gravatar for waqasnayab
4 days ago by
waqasnayab110
Pakistan
waqasnayab110 wrote:

Hi,

Dear Community,

I have a column like this:

D309
E308
G296
T297A
P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
L191
A190
R478
R478H

. .. ...

or in other words you can say that this column is present in a very big file as column number 19. I want only those lines in which the number matches only with the next line, that is the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V
P259
P259L
R478
R478H

I tried this command:

cut -f19 mycolumnfile.txt | uniq -d

I got this output:

P415T

As it matches with the whole line. I want only those rows in which the number matches only.

Thanks,

Waqas.

sequencing snp next-gen • 122 views
ADD COMMENTlink modified 4 days ago by Pierre Lindenbaum92k • written 4 days ago by waqasnayab110

. I want only those lines in which the number matches only with the next line

not clear.

ADD REPLYlink written 4 days ago by Pierre Lindenbaum92k

for example, if my input is like this:

P415T
P415T
V457I
V457
A214G
A214
T418
I419V

Whatever the character is present at first place, either P (in the first two lines) or V (in the lines three and four and so on,,,), I want to print those rows in which the numbers are repeated, that is 415 in the first two lines is repeated, or 457 in the lines three and four are repeated, so the output should be like this:

P415T
P415T
V457I
V457
A214G
A214
ADD REPLYlink written 4 days ago by waqasnayab110
4
gravatar for guillaume.rbt
4 days ago by
guillaume.rbt220
France
guillaume.rbt220 wrote:

Hi,

I would do it in python, with your list of id in the file "list" (beware, not carefully tested)

import re

with open("./list", 'r') as f1:
    first = True
    last_int = 0
    last_line = ""
    for line in f1:
        if(last_int == int(re.findall("\d+", line)[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line)[0])
        last_line = line
ADD COMMENTlink written 4 days ago by guillaume.rbt220

I checked manually as well as by your python solution, it works perfectly fine.

What if I have the multi-column file and the same column is present at column 19, and I need to do the same task? How to mention column number so that filtering would have been taking place on the basis of column 19...,,!!!!???

Thanks,

Waqas.

ADD REPLYlink written 4 days ago by waqasnayab110

given that you have a tabulated table "table", and your list of id on column 19 :

import re

with open("./table", 'r') as f1:
    first=True
    last_int=0
    last_line=""    
    for line in f1:
        if(last_int == int(re.findall("\d+", line.split('\t')[18])[0])):
            if first:
                print last_line
                print line
                first = False
            else:
                print line
        else:
            first = True
        last_int = int(re.findall("\d+", line.split('\t')[18])[0])
        last_line = line
ADD REPLYlink written 4 days ago by guillaume.rbt220

Yes, it works fine, If I made some changes to the script will come up to you..,,,!!!!

ADD REPLYlink written 3 days ago by waqasnayab110
0
gravatar for Pierre Lindenbaum
4 days ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum92k wrote:

The idea is to use awk insert a new normalized column for your two files:

$ echo -e "a\tP415T\tb\na\tP415X\tb\na\tP415Y\tb" |\
awk -F '\t' '{key=$2; gsub(/[A-Z]$/,"",key); printf("%s\t%s\n",key,$0);}' |\
sort -t$'\t' -k1,1

P415    a   P415T   b
P415    a   P415X   b
P415    a   P415Y   b

then sort both files on this column and then use join to join both files.

ADD COMMENTlink modified 4 days ago • written 4 days ago by Pierre Lindenbaum92k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour