Question

Script for changing genes nomenclature

0

Entering edit mode

6.6 years ago

Francisco Muñoz ▴ 10

Hi everyone! I have performed a differential expression analysis of some RNA-seq data. Now I want to use Bedtools to measure the distance between these sequences and a lot of target genes (~10000). I have these genes' names in a CSV file (just one column), but they are written in an old nomenclature. I also have a CSV file with two columns: all the genes' names of the organism and their updated counterparts. The question is, how do I use this file to obtain a new file with the target genes in the new nomenclature? I thought about make a bash script but it seemed too inefficient. Maybe is there a R package that could help? Thanks in advance

scripting RNA-Seq bedtools • 1.9k views

ADD COMMENT • link updated 6.6 years ago by vkkodali_ncbi ★ 3.8k • written 6.6 years ago by Francisco Muñoz ▴ 10

1

Entering edit mode

Show us an example of what the two files look like. You may be able to use comm command to identify rows that at common between two files (sorted on column 1).

ADD REPLY • link 6.6 years ago by GenoMax 152k

0

Entering edit mode

First file:

gene00041
gene00094
gene00127
gene00130
gene00140
gene00142
gene00150
gene00154
gene00156
gene00168
gene00207
gene00215
gene00216
gene00226
gene00231
...

Second file:

Gene old,Gene 4.0
gene00502,FvH4_2g22400
gene10171,FvH4_1g00010
gene10170,FvH4_1g00020
gene10169,FvH4_1g00030
gene10168,FvH4_1g00040
gene10167,FvH4_1g00050
gene10166,FvH4_1g00060
gene10165,FvH4_1g00070
gene10164,FvH4_1g00080
gene10163,FvH4_1g00090
...

I'd like to export the right column names parallel to the common names between te left column and the first file. I will try comm. Thank you!

ADD REPLY • link 6.6 years ago by Francisco Muñoz ▴ 10

score 2 · Accepted Answer · 2018-12-12

2

Entering edit mode

6.6 years ago

vkkodali_ncbi ★ 3.8k

You can use the linux command join for this as shown below. Here, file1.txt would be First file with just gene### identifiers and file2.txt would be the Second file with comma-separated identifiers and the output will be in file3.txt as comma-separated values. I am assuming that the gene identifiers in both of the files are unique.

join -1 1 -2 1 <(sort file1.txt) <(sort -k1,1 file2.txt -t ',') -t ',' > file3.txt

ADD COMMENT • link 6.6 years ago by vkkodali_ncbi ★ 3.8k

0

Entering edit mode

It worked. Thanks a lot!

ADD REPLY • link 6.6 years ago by Francisco Muñoz ▴ 10

1

Entering edit mode

If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one if they work.
Upvote|Bookmark|Accept