Grepping from a specific column with pattern list
3
0
Entering edit mode
6.4 years ago
waqasnayab ▴ 250

Hi,

I have a pattern file with IDs:

IID
1
10
1098
1099
11
1130
12
121
127

I want to grep lines using this pattern file from a big file. In big file, the IDs are present in line two. The dummy format of big file is as follows:

FID IID some_more_columns
fam1 1
fam2 10
fam3 1098
fam4 256
fam5 1099

The desired output should be:

FID IID some_more_columns
fam1 1
fam2 10
fam3 1098
fam5 1099

I tried with this solution: http://www.linuxforums.org/forum/programming-scripting/130889-grepping-something-out-specific-column-file-using-pattern-another-file.html

but no luck, any advice is appreciated....

Thanks,

Waqas.

sequence next-gen • 4.6k views
ADD COMMENT
5
Entering edit mode
6.4 years ago
cat lookup.list 
IID
1
10
1098
1099
11
1130
12
121
127

cat BigFile.list 
FID   IID   moredata  masdados
fam1  1     moredata  masdados
fam2  10    moredata  masdados
fam3  1098  moredata  masdados
fam4  256   moredata  masdados
fam5  1099  moredata  masdados


awk 'BEGIN {FS=" "} FNR==NR {key=$1; arrayLookup[key]=$1; next} {key=$2; if (arrayLookup[key]) print $0}' lookup.list BigFile.list 
FID   IID   moredata  masdados
fam1  1     moredata  masdados
fam2  10    moredata  masdados
fam3  1098  moredata  masdados
fam5  1099  moredata  masdados
ADD COMMENT
0
Entering edit mode

Hi Kevin,

I tired but nothing happened, than I tried by changing the {FS=" "} to {FS="\t"} but same result.

Although worked well on test file.,,,,,,,,,,,,

ADD REPLY
1
Entering edit mode

Ensure that your files are delimited properly. You can convert all multiple whitespace to a single whitespace by running sed 's/ \+/ /g' on each file prior to using awk.

ADD REPLY
0
Entering edit mode

Yes, in my file there is a special character ^M a classical problem dos2unix than your command worked for me, now its all ok......,,,,,,,!!!!!!!

Perfect,

Thanks,

Waqas,

ADD REPLY
0
Entering edit mode

Yes, I have encountered that problem before with ^M. dos2unix and unix2dos are useful tools to have.

Good luck

ADD REPLY
0
Entering edit mode

Which OS are you using?

ADD REPLY
3
Entering edit mode
6.4 years ago
michael.ante ★ 3.8k

Hi Waqas,

you can use the join command to get all lines which have a common field. Let t1.txt be your ID list and t2.txt your big file. (I just added some dummy values in column 3):

join -1 1 -2 2 -o 2.1,2.2,2.3  <(sort t1.txt) <(sort -k2,2 t2.txt )
fam1 1 A
fam2 10 A
fam3 1098 A
fam5 1099 G
FID IID some_more_columns

Since join needs sorted columns, you can produce temp-files with <(sort ) . The parameter -1 selects the field to join from file 1 -2 the field of file 2. With -oyou control the output: for each field x you want to have in your result, you need to add the 2.x to the list.

Afterwards, you can re-sort the results.

Cheers,

Michael

ADD COMMENT
0
Entering edit mode

Thanks Micehal for that, I will keep the solution into my wish list somewhere,

ADD REPLY
1
Entering edit mode
6.4 years ago

output:

$  awk 'FNR==NR{a[$1]++;next}a[$2]' test1.txt test.txt 

FID IID
fam1    1
fam2    10
fam3    1098
fam5    1099

Input:

$ cat test1.txt 
IID
1
10
1098
1099
11
1130
12
121
127

$ cat test.txt 
FID IID
fam1    1
fam2    10
fam3    1098
fam4    256
fam5    1099
ADD COMMENT

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6