removing decimal from ENSEMBL gene ID
2
0
Entering edit mode
4.1 years ago
arsa ▴ 20

I want to remove the decimal from the ensembl gene transcript ID. It contains the decimal point ans becomes difficult when I try to map the same to gene's transcript names in R.

Ensembl gives me "ENST00000265620.11", "ENST00000371081.5". But I want "ENST00000265620" and "ENST00000371081".

R genome Assembly • 7.8k views
ADD COMMENT
0
Entering edit mode
$ echo ENST00000265620.11 | cut -f1 -d .
ENST00000265620

$ echo ENST00000265620.11 | sed 's/\..*//g'
ENST00000265620
ADD REPLY
2
Entering edit mode
4.1 years ago

To learn what those numbers mean, please see here: https://www.ensembl.org/Help/Faq?id=488

Different types of 'regexes' (regular expressions) will do the job for you:

ens <- c('ENST00000265620.11', 'ENST00000371081.5')

sub('\\.[0-9]*$', '', ens)
[1] "ENST00000265620" "ENST00000371081"

Kevin

ADD COMMENT
0
Entering edit mode
4.1 years ago
ATpoint 81k
gsub("\\..*","", transcript.list)

like in https://stackoverflow.com/questions/10617702/remove-part-of-string-after

ADD COMMENT

Login before adding your answer.

Traffic: 2961 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6