Question: compare two columns of two files
0
gravatar for Sam
4 weeks ago by
Sam100
Sam100 wrote:

Dear All

I have two files of gene list in tab format, I want to compare 2nd column if 1st file file with 1st column of 2nd file and report all columns in both files for each match

any help ?

Thanks

file 1 
1st column        2nd column
chr10_17026 VIT_202s0033g00190
chr18_37402 VIT_210s0003g03400
chr19_39195 VIT_200s0313g00080


files 2
1st column                                                  2nd column 3rd column
VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
VIT_202s0033g00190  8.48        6.49
VIT_210s0003g03400  8.18        6.05
VIT_210s0116g01020  8.67        5.82
VIT_213s0064g01330  9.02        5.84

output :
chr10_17026 VIT_202s0033g00190     8.48        6.49
chr18_37402 VIT_210s0003g03400     8.18        6.05
chr19_39195 VIT_200s0313g00080     7.86         6.22
bash awk • 175 views
ADD COMMENTlink modified 4 weeks ago by brett.vanderwerff100 • written 4 weeks ago by Sam100

Hello Sam ,

what have you tried so far?

fin swimmer

ADD REPLYlink written 4 weeks ago by finswimmer11k
1

another tip is to use sort and join commands.

ADD REPLYlink written 4 weeks ago by husensofteng70

One tip, use R. Import the files into R and use subsetting. There are many tutorials, for example here, so try to do it yourself.

ADD REPLYlink written 4 weeks ago by Benn6.8k
2
gravatar for Kevin Blighe
4 weeks ago by
Kevin Blighe42k
Kevin Blighe42k wrote:

The most simple way to do it with awk is like this:

awk 'FNR==NR {lookup[$2]; next} {if ($1 in lookup) print}' file1.tsv file2.tsv 
VIT_202s0033g00190  8.48    6.49
VIT_210s0003g03400  8.18    6.05

You may want to tidy up this type of line, though, as its formatting is inconsistent:

VIT_200s0313g00072 VIT_200s0313g00080 VIT_200s0313g00066    7.86            6.22
ADD COMMENTlink written 4 weeks ago by Kevin Blighe42k
1
gravatar for bioguy24
4 weeks ago by
bioguy24190
Chicago
bioguy24190 wrote:

Another awk:

awk 'NR==FNR{c[$2]++;next};c[$1] > 0' file1 file2
ADD COMMENTlink written 4 weeks ago by bioguy24190
1
gravatar for brett.vanderwerff
4 weeks ago by
brett.vanderwerff100 wrote:

If you know python you can look at the pandas package. You probably want to perform an inner merge. Learning SQL-like processes can take a long time to get used to, but after a while become second nature. It might not help you now depending on your situation, but you eventually might want to look into pandas. It is my daily driver for working with tab separated data like this. It is incredibly powerful

https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by brett.vanderwerff100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 794 users visited in the last hour