Hi everyone! I have performed a differential expression analysis of some RNA-seq data. Now I want to use Bedtools to measure the distance between these sequences and a lot of target genes (~10000). I have these genes' names in a CSV file (just one column), but they are written in an old nomenclature. I also have a CSV file with two columns: all the genes' names of the organism and their updated counterparts. The question is, how do I use this file to obtain a new file with the target genes in the new nomenclature? I thought about make a bash script but it seemed too inefficient. Maybe is there a R package that could help? Thanks in advance
You can use the linux command
join for this as shown below. Here,
file1.txt would be First file with just
gene### identifiers and
file2.txt would be the Second file with comma-separated identifiers and the output will be in
file3.txt as comma-separated values. I am assuming that the gene identifiers in both of the files are unique.
join -1 1 -2 1 <(sort file1.txt) <(sort -k1,1 file2.txt -t ',') -t ',' > file3.txt