Question

Unix manipulation of blast output: find and replace between two files

0

Entering edit mode

13 months ago

benjamin.pyenson • 0

HI all,

I know there's a way to do this within Unix, but I cannot figure out how to do it with the functions that I know (grep, sed, awk, cut, paste). I am dealing with output from blast, so I thought I would try to see if anyone in the bioinformatics community has also run into this issue and might have a better solution.

I want to take the values from column 2 (e.g. nachRalpha3) of f2.txt and replace them in the matching lines of column 1 of file f1.txt. See below for the first 6 lines of each of these files.

f2.txt

Ccalc.v3.01697  nAChRalpha3     1.63e-04        52.8

Ccalc.v3.01745  mam     2.79e-04        52.8

Ccalc.v3.01914  HisCl1  2.05e-31        141

Ccalc.v3.01935  AdamTS-B        1.37e-04        54.7

Ccalc.v3.01861  dsf     7.55e-05        52.8

Ccalc.v3.01870  Cyp301a1        2.57e-05        54.7

f1.txt

Ccalc.v3.01697

Ccalc.v3.01698

Ccalc.v3.01699

Ccalc.v3.01700

Ccalc.v3.01701

Ccalc.v3.01702

Below is one effort using awk, but it fails since I don't know how to do this kind of a function between lists in two different files.

awk '{sub(/'{if $1 == f2.txt$1)}'/, f1.txt$2); print}' f1.txt > f3.txt

The intended output in this case should look like:

f3.txt:

nAChRalpha3

Ccalc.v3.01698

Ccalc.v3.01699

Ccalc.v3.01700

Ccalc.v3.01701

Ccalc.v3.01702

I am open to solutions. Thanks!

blastn • 1.3k views

ADD COMMENT • link updated 10 months ago by GenoMax 142k • written 13 months ago by benjamin.pyenson • 0

0

Entering edit mode

I saved the identifiers in a file called f1.txt and then the first set of data as f2.txt. I also changed a couple of identifiers in f2 so they matched ones in f1.

Here is one way (if I understand what you want):

$ more f1.txt 
Ccalc.v3.21177
Ccalc.v3.21598
Ccalc.v3.21599
Ccalc.v3.20672
Ccalc.v3.01542
Ccalc.v3.01545

$ grep -f f1.txt -w f2.txt | awk -F " " '{OFS="\t"}{print $1,$2}'
Ccalc.v3.21598  nAChRalpha3
Ccalc.v3.20672  AdamTS-B

ADD REPLY • link 13 months ago by GenoMax 142k

0

Entering edit mode

GenoMax This is somewhat helpful as it gives me something to work with. I have modified my original query with intended output. Let me know if you (or anyone else) has a more specific solution.

ADD REPLY • link 13 months ago by benjamin.pyenson • 0

0

Entering edit mode

GenoMax (or anyone else reading this) thanks so much for your help, unfortunately, I am now running into an issue with slightly differently labelled genes and need assistance again determining how to select '$pattern' but for only the first column of f2. syntax using something like $1 on the end of $pattern does not seem to work:

$ while read pattern; do if grep -q "$pattern$1" f2.txt; then grep "$pattern$1" f2.txt | awk -F " " '{OFS="\t"}{print $2}'; else echo "$pattern"; fi; done < f1.txt

In addition, selecting the complete 'word' from f1 might also help, but -w flag doesn't seem to work with while read.

Again, getting the results of this search in the order of f1 is critical. Thanks!

ADD REPLY • link updated 10 months ago by GenoMax 142k • written 10 months ago by benjamin.pyenson • 0

0

Entering edit mode

Please provide examples for the few lines in the two files.

ADD REPLY • link 10 months ago by GenoMax 142k

score 1 · Accepted Answer · 2023-04-17

1

Entering edit mode

13 months ago

GenoMax 142k

Using my example files from comment above

$ while read pattern; do if grep -q "$pattern" f2.txt; then grep "$pattern" f2.txt | awk -F " " '{OFS="\t"}{print $2}'; else echo "$pattern"; fi; done < f1.txt


Ccalc.v3.21177
nAChRalpha3
Ccalc.v3.21599
AdamTS-B
Ccalc.v3.01542
Ccalc.v3.01545

ADD COMMENT • link 13 months ago by GenoMax 142k

0

Entering edit mode

GenoMax Thank you so much! This is perfect.

ADD REPLY • link 13 months ago by benjamin.pyenson • 0

0

Entering edit mode

Please consider accepting the answer (green check mark) to provide closure to this thread.

ADD REPLY • link 13 months ago by GenoMax 142k