Question: removing decimal from ENSEMBL gene ID from deseq2 output
gravatar for krushnach80
2.4 years ago by
krushnach80640 wrote:

I want to remove the decimal from the ensembl gene ID ,since it contains the decimal point it becomes difficult when i try to map the same to gene name .

gene                   "nH1.bam"    "nH2.bam"   "nH3.bam"              "nH4.bam"
"ENSG00000238164.4" -0.6534833425   -0.6404869759   -0.5898568965   -0.586357257
"ENSG00000049249.6" 1.0589150487    0.2235087421    0.5028436068    0.5201173416

I want this in my gene field "ENSG00000049249" instead of this "ENSG00000049249.6"

I tried this awk '{gsub(/\..*$/,$1)}1' it seems it messing up the data frame im not sure what im doing wrong.

Any help or suggestion would be highly appreciated

ensembl R • 2.4k views
ADD COMMENTlink modified 2.4 years ago by Emily_Ensembl20k • written 2.4 years ago by krushnach80640
gravatar for Pierre Lindenbaum
2.4 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum126k wrote:
sed 's/\(ENSG[0-9]*\)\.[0-9]*/\1/g' input.txt
ADD COMMENTlink written 2.4 years ago by Pierre Lindenbaum126k

thank you very much

ADD REPLYlink written 2.4 years ago by krushnach80640

Hi Pierre, Your solution worked very well, but do you mind explaining the RE?

For example, I am not sure where the substitution to blank space instead of the version number is taking place? I understand that "\1" reverts the found RE to output and that \g is global... but where exactly is the substitution?


ADD REPLYlink written 23 months ago by r.t.greenblatt0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour