Question: Extract in specific column with patter file
1
gravatar for flogin
8 months ago by
flogin150
FioCruz/Brazil
flogin150 wrote:

I have a file with 2 columns, the first column have a read ID from fastq, the second column have a ncbi taxi ID, like this:

ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067

and I have an archive with the taxid list of a specific taxos, like this:

2485233
2485231
2059665
2029516
2022430
2022429
1987726
1980986
1738445
1737346

I want to extract the lines with the taxid present on archive 2, but, some of the taxids presents the same number present on read ID. So I tried extract only the correct lines with fgrep, but, fgrep can't specify a column (in the case the column 2).

Anyone can help me? Briefly I need extract the whole line detecting a pattern in column 2 using a file with patterns.

Thanks !

awk grep • 273 views
ADD COMMENTlink modified 8 months ago by Kevin Blighe51k • written 8 months ago by flogin150

output:

$ join -1 1 -2 2 <(sort -k1 test1.txt) <(sort -k2 test.txt) -o 2.1,2.2
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468 512
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468 9606

input:

$ cat test1.txt 
2485233
2485231
2059665
2029516
512
2022430
2022429
9606
1987726
1980986
1738445
1737346

$ cat test.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067
ADD REPLYlink modified 8 months ago • written 8 months ago by cpad011212k
4
gravatar for Kevin Blighe
8 months ago by
Kevin Blighe51k
Kevin Blighe51k wrote:

Assuming tab-delimited files:

cat file1.txt 
2485233
2485231
2059665
2029516
512
2022430
2022429
9606
1987726
1980986
1738445
1737346

cat file2.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067


awk 'FNR==NR {a[$1]==$1; next} {if ($2 in a) print $0}' file1.txt file2.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ADD COMMENTlink written 8 months ago by Kevin Blighe51k
1

This worked, thanks a lot Kevin !!

ADD REPLYlink written 8 months ago by flogin150

You are welcome. Goodnight / Buenas noches (if you want an explanation of what the command is doing, then let me know)

ADD REPLYlink modified 8 months ago • written 8 months ago by Kevin Blighe51k
$ awk 'FNR==NR {a[$0]++; next} {if ($2 in a) print $0}' file1.txt file2.txt
ADD REPLYlink modified 8 months ago • written 8 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1768 users visited in the last hour