Question: Extract in specific column with patter file
1
gravatar for flogin
7 days ago by
flogin20
flogin20 wrote:

I have a file with 2 columns, the first column have a read ID from fastq, the second column have a ncbi taxi ID, like this:

ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067

and I have an archive with the taxid list of a specific taxos, like this:

2485233
2485231
2059665
2029516
2022430
2022429
1987726
1980986
1738445
1737346

I want to extract the lines with the taxid present on archive 2, but, some of the taxids presents the same number present on read ID. So I tried extract only the correct lines with fgrep, but, fgrep can't specify a column (in the case the column 2).

Anyone can help me? Briefly I need extract the whole line detecting a pattern in column 2 using a file with patterns.

Thanks !

awk grep • 100 views
ADD COMMENTlink modified 7 days ago by Kevin Blighe39k • written 7 days ago by flogin20

output:

$ join -1 1 -2 2 <(sort -k1 test1.txt) <(sort -k2 test.txt) -o 2.1,2.2
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468 512
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468 9606

input:

$ cat test1.txt 
2485233
2485231
2059665
2029516
512
2022430
2022429
9606
1987726
1980986
1738445
1737346

$ cat test.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067
ADD REPLYlink modified 7 days ago • written 7 days ago by cpad011211k
4
gravatar for Kevin Blighe
7 days ago by
Kevin Blighe39k
Republic of Ireland
Kevin Blighe39k wrote:

Assuming tab-delimited files:

cat file1.txt 
2485233
2485231
2059665
2029516
512
2022430
2022429
9606
1987726
1980986
1738445
1737346

cat file2.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19451:10468  14230
ST-E00243:601:HWJFHCCXY:8:1101:19471:10468  10468
ST-E00243:601:HWJFHCCXY:8:1101:19492:10468  1512
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ST-E00243:601:HWJFHCCXY:8:1101:19532:10468  1421067


awk 'FNR==NR {a[$1]==$1; next} {if ($2 in a) print $0}' file1.txt file2.txt 
ST-E00243:601:HWJFHCCXY:8:1101:19431:10468  9606
ST-E00243:601:HWJFHCCXY:8:1101:19512:10468  512
ADD COMMENTlink written 7 days ago by Kevin Blighe39k
1

This worked, thanks a lot Kevin !!

ADD REPLYlink written 7 days ago by flogin20

You are welcome. Goodnight / Buenas noches (if you want an explanation of what the command is doing, then let me know)

ADD REPLYlink modified 7 days ago • written 7 days ago by Kevin Blighe39k
$ awk 'FNR==NR {a[$0]++; next} {if ($2 in a) print $0}' file1.txt file2.txt
ADD REPLYlink modified 6 days ago • written 7 days ago by cpad011211k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2212 users visited in the last hour