Question: removing decimal from ENSEMBL gene ID from deseq2 output
gravatar for krushnach80
19 months ago by
krushnach80480 wrote:

I want to remove the decimal from the ensembl gene ID ,since it contains the decimal point it becomes difficult when i try to map the same to gene name .

gene                   "nH1.bam"    "nH2.bam"   "nH3.bam"              "nH4.bam"
"ENSG00000238164.4" -0.6534833425   -0.6404869759   -0.5898568965   -0.586357257
"ENSG00000049249.6" 1.0589150487    0.2235087421    0.5028436068    0.5201173416

I want this in my gene field "ENSG00000049249" instead of this "ENSG00000049249.6"

I tried this awk '{gsub(/\..*$/,$1)}1' it seems it messing up the data frame im not sure what im doing wrong.

Any help or suggestion would be highly appreciated

ensembl R • 1.5k views
ADD COMMENTlink modified 19 months ago by Emily_Ensembl18k • written 19 months ago by krushnach80480
gravatar for Pierre Lindenbaum
19 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:
sed 's/\(ENSG[0-9]*\)\.[0-9]*/\1/g' input.txt
ADD COMMENTlink written 19 months ago by Pierre Lindenbaum119k

thank you very much

ADD REPLYlink written 19 months ago by krushnach80480

Hi Pierre, Your solution worked very well, but do you mind explaining the RE?

For example, I am not sure where the substitution to blank space instead of the version number is taking place? I understand that "\1" reverts the found RE to output and that \g is global... but where exactly is the substitution?


ADD REPLYlink written 13 months ago by r.t.greenblatt0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1068 users visited in the last hour