Question: Script for changing genes nomenclature
0
gravatar for Francisco Muñoz
5 weeks ago by
Francisco Muñoz 0 wrote:

Hi everyone! I have performed a differential expression analysis of some RNA-seq data. Now I want to use Bedtools to measure the distance between these sequences and a lot of target genes (~10000). I have these genes' names in a CSV file (just one column), but they are written in an old nomenclature. I also have a CSV file with two columns: all the genes' names of the organism and their updated counterparts. The question is, how do I use this file to obtain a new file with the target genes in the new nomenclature? I thought about make a bash script but it seemed too inefficient. Maybe is there a R package that could help? Thanks in advance

rna-seq scripting bedtools • 144 views
ADD COMMENTlink modified 5 weeks ago by vkkodali860 • written 5 weeks ago by Francisco Muñoz 0
1

Show us an example of what the two files look like. You may be able to use comm command to identify rows that at common between two files (sorted on column 1).

ADD REPLYlink written 5 weeks ago by genomax60k

First file:

gene00041
gene00094
gene00127
gene00130
gene00140
gene00142
gene00150
gene00154
gene00156
gene00168
gene00207
gene00215
gene00216
gene00226
gene00231
...

Second file:

Gene old,Gene 4.0
gene00502,FvH4_2g22400
gene10171,FvH4_1g00010
gene10170,FvH4_1g00020
gene10169,FvH4_1g00030
gene10168,FvH4_1g00040
gene10167,FvH4_1g00050
gene10166,FvH4_1g00060
gene10165,FvH4_1g00070
gene10164,FvH4_1g00080
gene10163,FvH4_1g00090
...

I'd like to export the right column names parallel to the common names between te left column and the first file. I will try comm. Thank you!

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by Francisco Muñoz 0
2
gravatar for vkkodali
5 weeks ago by
vkkodali860
United States
vkkodali860 wrote:

You can use the linux command join for this as shown below. Here, file1.txt would be First file with just gene### identifiers and file2.txt would be the Second file with comma-separated identifiers and the output will be in file3.txt as comma-separated values. I am assuming that the gene identifiers in both of the files are unique.

join -1 1 -2 1 <(sort file1.txt) <(sort -k1,1 file2.txt -t ',') -t ',' > file3.txt
ADD COMMENTlink written 5 weeks ago by vkkodali860

It worked. Thanks a lot!

ADD REPLYlink written 5 weeks ago by Francisco Muñoz 0
1

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept

ADD REPLYlink written 5 weeks ago by genomax60k

Done! Thanks for the advice

ADD REPLYlink written 4 weeks ago by Francisco Muñoz 0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1118 users visited in the last hour