Unix manipulation of blast output: find and replace between two files
1
0
Entering edit mode
13 months ago

HI all,

I know there's a way to do this within Unix, but I cannot figure out how to do it with the functions that I know (grep, sed, awk, cut, paste). I am dealing with output from blast, so I thought I would try to see if anyone in the bioinformatics community has also run into this issue and might have a better solution.

I want to take the values from column 2 (e.g. nachRalpha3) of f2.txt and replace them in the matching lines of column 1 of file f1.txt. See below for the first 6 lines of each of these files.

f2.txt

Ccalc.v3.01697  nAChRalpha3     1.63e-04        52.8

Ccalc.v3.01745  mam     2.79e-04        52.8

Ccalc.v3.01914  HisCl1  2.05e-31        141

Ccalc.v3.01935  AdamTS-B        1.37e-04        54.7

Ccalc.v3.01861  dsf     7.55e-05        52.8

Ccalc.v3.01870  Cyp301a1        2.57e-05        54.7

f1.txt

Ccalc.v3.01697

Ccalc.v3.01698

Ccalc.v3.01699

Ccalc.v3.01700

Ccalc.v3.01701

Ccalc.v3.01702

Below is one effort using awk, but it fails since I don't know how to do this kind of a function between lists in two different files.

awk '{sub(/'{if $1 == f2.txt$1)}'/, f1.txt$2); print}' f1.txt > f3.txt

The intended output in this case should look like:

f3.txt:

nAChRalpha3

Ccalc.v3.01698

Ccalc.v3.01699

Ccalc.v3.01700

Ccalc.v3.01701

Ccalc.v3.01702

I am open to solutions. Thanks!

blastn • 1.3k views
ADD COMMENT
0
Entering edit mode

I saved the identifiers in a file called f1.txt and then the first set of data as f2.txt. I also changed a couple of identifiers in f2 so they matched ones in f1.

Here is one way (if I understand what you want):

$ more f1.txt 
Ccalc.v3.21177
Ccalc.v3.21598
Ccalc.v3.21599
Ccalc.v3.20672
Ccalc.v3.01542
Ccalc.v3.01545

$ grep -f f1.txt -w f2.txt | awk -F " " '{OFS="\t"}{print $1,$2}'
Ccalc.v3.21598  nAChRalpha3
Ccalc.v3.20672  AdamTS-B
ADD REPLY
0
Entering edit mode

GenoMax This is somewhat helpful as it gives me something to work with. I have modified my original query with intended output. Let me know if you (or anyone else) has a more specific solution.

ADD REPLY
0
Entering edit mode

GenoMax (or anyone else reading this) thanks so much for your help, unfortunately, I am now running into an issue with slightly differently labelled genes and need assistance again determining how to select '$pattern' but for only the first column of f2. syntax using something like $1 on the end of $pattern does not seem to work:

$ while read pattern; do if grep -q "$pattern$1" f2.txt; then grep "$pattern$1" f2.txt | awk -F " " '{OFS="\t"}{print $2}'; else echo "$pattern"; fi; done < f1.txt

In addition, selecting the complete 'word' from f1 might also help, but -w flag doesn't seem to work with while read.

Again, getting the results of this search in the order of f1 is critical. Thanks!

ADD REPLY
0
Entering edit mode

Please provide examples for the few lines in the two files.

ADD REPLY
1
Entering edit mode
13 months ago
GenoMax 142k

Using my example files from comment above

$ while read pattern; do if grep -q "$pattern" f2.txt; then grep "$pattern" f2.txt | awk -F " " '{OFS="\t"}{print $2}'; else echo "$pattern"; fi; done < f1.txt


Ccalc.v3.21177
nAChRalpha3
Ccalc.v3.21599
AdamTS-B
Ccalc.v3.01542
Ccalc.v3.01545
ADD COMMENT
0
Entering edit mode

GenoMax Thank you so much! This is perfect.

ADD REPLY
0
Entering edit mode

Please consider accepting the answer (green check mark) to provide closure to this thread.

ADD REPLY

Login before adding your answer.

Traffic: 1354 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6