removing decimal from ENSEMBL gene ID from deseq2 output
1
1
Entering edit mode
4.1 years ago
krushnach80 ▴ 950

I want to remove the decimal from the ensembl gene ID ,since it contains the decimal point it becomes difficult when i try to map the same to gene name .

gene                   "nH1.bam"    "nH2.bam"   "nH3.bam"              "nH4.bam"
"ENSG00000238164.4" -0.6534833425   -0.6404869759   -0.5898568965   -0.586357257
"ENSG00000049249.6" 1.0589150487    0.2235087421    0.5028436068    0.5201173416


I want this in my gene field "ENSG00000049249" instead of this "ENSG00000049249.6"

I tried this awk '{gsub(/\..*$/,$1)}1' it seems it messing up the data frame im not sure what im doing wrong.

Any help or suggestion would be highly appreciated

R ensembl • 4.0k views
0
Entering edit mode

How would you alter the command if there are two digits after the decimal point, say ENSG00000000460.15 ? I am not able to remove the numbers after the decimal point in such cases using the above command.

8
Entering edit mode
4.1 years ago
sed 's/$$ENSG[0-9]*$$\.[0-9]*/\1/g' input.txt

0
Entering edit mode

thank you very much

0
Entering edit mode

Hi Pierre, Your solution worked very well, but do you mind explaining the RE?

For example, I am not sure where the substitution to blank space instead of the version number is taking place? I understand that "\1" reverts the found RE to output and that \g is global... but where exactly is the substitution?

Thanks.

0
Entering edit mode

How would you alter the command if there are two digits after the decimal point, say ENSG00000000460.15 ? I am not able to remove the numbers after the decimal point in such cases using the above command.