Question: sequence extract problem
0
gravatar for mxlsherry1992
5 months ago by
mxlsherry199230 wrote:

Dear all, I have some ID in file1, and I want to extract its' corresponding line from file 2, but the ID in these two file is not complet match, if you know there is anyway I could use a command line for that?

I got two command line here but it seems doestn't work.

grep -Fwf file1.txt file2.txt > results

awk 'NR==FNR{x[$0];next}{for(i in x)if($0~i)print}' file1.txt file2.txt

Here is ID from file 1:

TRINITY_DN100263_c0_g1_i13
TRINITY_DN100263_c1_g1_i1
TRINITY_DN100330_c0_g1_i1
TRINITY_DN100330_c0_g2_i14
TRINITY_DN100529_c0_g1_i3
TRINITY_DN100620_c0_g1_i2

Here is file 2:

TRINITY_DN132010_c5_g4  0   0   0   0   0.18    0.93    0.67    0.61    0   0.45    00.25   0   0
TRINITY_DN100263_c1_g1  0.08    0.06    0.06    0.09    0.1 0.07    0.43    0.2 0.16    0.36    0.06    0.42    0   0
TRINITY_DN50647_c0_g1   0   0   0   0.9 0   0   0   0   0   0   00
TRINITY_DN100330_c0_g2  0   0   0   0   0   0   0   0   0   0   01.06   0   0
TRINITY_DN137407_c4_g1  0   0   0.19    0   0   0   0.17    0.15    0   0.12    0.
rna-seq • 154 views
ADD COMMENTlink modified 5 months ago by Pierre Lindenbaum127k • written 5 months ago by mxlsherry199230
0
gravatar for JC
5 months ago by
JC9.7k
Mexico
JC9.7k wrote:

You need to remove the non-matching part of the first part before doing your search, for example:

perl -pe "s/_i\d+//" < file1 > file1_mod

then you can search with grep or awk or perl.

ADD COMMENTlink written 5 months ago by JC9.7k
0
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum127k wrote:
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt) <(sort -t $'\t' -k1,1 file2.txt) > results
ADD COMMENTlink written 5 months ago by Pierre Lindenbaum127k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1278 users visited in the last hour