Question: How to extract gene IDs from a tabular file based on a list of locus tags?
0
gravatar for majeedaasim
5 months ago by
majeedaasim30
United States
majeedaasim30 wrote:

I have a file like:

GeneID Locus tag Protein name
839580 AT1G01010 NAC domain containing protein 1
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein

I also have a list of locus tags (arround 5000) which I want to extract from the entire file.

e.g if I want to extract gene Id and protein name of

AT1G01010

AT1G01020

I should get

839580 NAC domain containing protein 1

839569 ARV1 family protein

R • 239 views
ADD COMMENTlink modified 5 months ago by Alex Reynolds27k • written 5 months ago by majeedaasim30
1
gravatar for Prakash
5 months ago by
Prakash730
India
Prakash730 wrote:

If it is a data frame , you can simply use merge function

merge(file1 , file2, by.x = "Locus",by.y = "Locus",all.x  = TRUE)
ADD COMMENTlink written 5 months ago by Prakash730
0
gravatar for Alex Reynolds
5 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

One approach using Unix tools:

$ grep -Fwf locusTags.txt geneAnnotations.txt > answer.txt
ADD COMMENTlink written 5 months ago by Alex Reynolds27k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 963 users visited in the last hour