Question: How to extract gene IDs from a tabular file based on a list of locus tags?
0
gravatar for majeedaasim
10 months ago by
majeedaasim40
United States
majeedaasim40 wrote:

I have a file like:

GeneID Locus tag Protein name
839580 AT1G01010 NAC domain containing protein 1
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein
839569 AT1G01020 ARV1 family protein

I also have a list of locus tags (arround 5000) which I want to extract from the entire file.

e.g if I want to extract gene Id and protein name of

AT1G01010

AT1G01020

I should get

839580 NAC domain containing protein 1

839569 ARV1 family protein

R • 412 views
ADD COMMENTlink modified 10 months ago by Alex Reynolds28k • written 10 months ago by majeedaasim40
1
gravatar for Prakash
10 months ago by
Prakash1.2k
India
Prakash1.2k wrote:

If it is a data frame , you can simply use merge function

merge(file1 , file2, by.x = "Locus",by.y = "Locus",all.x  = TRUE)
ADD COMMENTlink written 10 months ago by Prakash1.2k
0
gravatar for Alex Reynolds
10 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

One approach using Unix tools:

$ grep -Fwf locusTags.txt geneAnnotations.txt > answer.txt
ADD COMMENTlink written 10 months ago by Alex Reynolds28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1577 users visited in the last hour