Question: Grep A Pattern From File
1
gravatar for federico.gaiti
5.1 years ago by
Brisbane
federico.gaiti70 wrote:

I am trying to use grep to pull out express values.

I run eXpress and I have my .xprs tab separated value file which looks like this:

bundle_id   target_id   length  eff_length  tot_counts  uniq_counts est_counts  eff_counts  ambig_distr_alpha   ambig_distr_beta    fpkm    fpkm_conf_low   fpkm_conf_high  solvable
1   Contig14365 310 106.787904  85  85  85.000000   246.750792  0.000000e+00    0.000000e+00    147.370523  147.370523  147.370523  T
2   Singlet_45262   346 232.432874  109 37  89.933541   133.875234  1.998601e+00    7.198885e-01    71.637085   51.273440   92.000730   T
2   Singlet_68764   236 119.092916  74  2   21.066459   41.746263   6.254955e+00    1.736541e+01    32.750608   0.142967    65.358248   T
3   Contig1270  736 500.694431  50  0   0.125252    0.184116    1.000000e+00    1.000000e+00    0.046316    0.000000    0.759071    F
3   Contig1271  851 628.717767  57  9   43.657462   59.092492   4.701649e-01    1.810055e-01    12.856315   4.051524    21.661106   T
3   Singlet_69558   790 555.880836  50  0   15.217286   21.626318   1.000000e+00    1.000000e+00    5.068381    0.000000    12.670313   F

I want to get non-codingRNA-specific express values so I thought to use:

grep -f <list of ncRNAs contigs> <express file>

I made a file with ncRNAs contigs IDs which looks like this:

Singlet_51268
Singlet_63946
Singlet_70630
Singlet_72272
Singlet_60543
Contig11105
Singlet_18043
Singlet_64779
Singlet_50335
Singlet_39678
Singlet_21655
Singlet_5438
Singlet_6400
Contig4197
Singlet_17193
Singlet_55710
Singlet_70948
Singlet_25172
Singlet_65515
Singlet_30239
Singlet_54617
Singlet_11188
Contig14540

Since my ncRNAs are 577, I expect to end up with a .xprs file with 577 rows but I ended up with a .xprs file of 701 Contigs.

So I have 124 Contigs that do not correspond to my ncRNAs.

How could I pull out ncRNAs-specific values? I tried playing around with grep but I can't fix it.

Any suggestions?

THanks

command-line • 1.5k views
ADD COMMENTlink modified 5.1 years ago by rbagnall1.2k • written 5.1 years ago by federico.gaiti70
8
gravatar for rbagnall
5.1 years ago by
rbagnall1.2k
Australia
rbagnall1.2k wrote:

I think you need to add -w (grep word).

Without this, grepping Singlet_51268 will also pull out Singlet_512681, Singlet_512682, Singlet_512683 etc..

try:

grep -w -f <list of ncRNAs contigs> <express file>

ADD COMMENTlink written 5.1 years ago by rbagnall1.2k

it worked perfectly. Thankls for the right and fast answer!

ADD REPLYlink written 5.1 years ago by federico.gaiti70

Please use grep -wFf <list>. It will be much faster given a long list.

ADD REPLYlink written 5.1 years ago by lh331k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 938 users visited in the last hour