Question: sequence extract problem
0
gravatar for mxlsherry1992
11 months ago by
mxlsherry199230 wrote:

Dear all, I have some ID in file1, and I want to extract its' corresponding line from file 2, but the ID in these two file is not complet match, if you know there is anyway I could use a command line for that?

I got two command line here but it seems doestn't work.

grep -Fwf file1.txt file2.txt > results

awk 'NR==FNR{x[$0];next}{for(i in x)if($0~i)print}' file1.txt file2.txt

Here is ID from file 1:

TRINITY_DN100263_c0_g1_i13
TRINITY_DN100263_c1_g1_i1
TRINITY_DN100330_c0_g1_i1
TRINITY_DN100330_c0_g2_i14
TRINITY_DN100529_c0_g1_i3
TRINITY_DN100620_c0_g1_i2

Here is file 2:

TRINITY_DN132010_c5_g4  0   0   0   0   0.18    0.93    0.67    0.61    0   0.45    00.25   0   0
TRINITY_DN100263_c1_g1  0.08    0.06    0.06    0.09    0.1 0.07    0.43    0.2 0.16    0.36    0.06    0.42    0   0
TRINITY_DN50647_c0_g1   0   0   0   0.9 0   0   0   0   0   0   00
TRINITY_DN100330_c0_g2  0   0   0   0   0   0   0   0   0   0   01.06   0   0
TRINITY_DN137407_c4_g1  0   0   0.19    0   0   0   0.17    0.15    0   0.12    0.
rna-seq • 229 views
ADD COMMENTlink modified 11 months ago by Pierre Lindenbaum130k • written 11 months ago by mxlsherry199230
0
gravatar for JC
11 months ago by
JC11k
Mexico
JC11k wrote:

You need to remove the non-matching part of the first part before doing your search, for example:

perl -pe "s/_i\d+//" < file1 > file1_mod

then you can search with grep or awk or perl.

ADD COMMENTlink written 11 months ago by JC11k
0
gravatar for Pierre Lindenbaum
11 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum130k wrote:
join -t $'\t' -1 1 -2 1 <(sort -t $'\t' -k1,1 file1.txt) <(sort -t $'\t' -k1,1 file2.txt) > results
ADD COMMENTlink written 11 months ago by Pierre Lindenbaum130k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour