Parsing transcript version in Ensembl mouse annotation
1
0
Entering edit mode
2.6 years ago
c0tton • 0

Hi all,

I aligned some data to a Ensembl transcriptome with novel transcripts. I am trying to lift over the sites from transcriptome to genome, which I have previously done using the R package genomicRanges.

The Ensembl FASTA headers look like this and contain a transcript name (e.g. ENSMUST00000178537.2):

>ENSMUST00000178537.2 cdna chromosome:GRCm39:6:41510135:41510146:1 gene:ENSMUSG00000095668.2 ...

However, in the actual transcriptome GTF from Ensembl, the transcriptome names look like this:

> ... transcript_id "ENSMUST00000178537"; transcript_version "2"; ...

So the transcript name is divded between two fields; the actual transcript number (suffix) is encoded in the "transcript_version" column.

Is there any tool or command which can append the transcript version to the transcript ID? I guess I could do it in Excel but it would be less reproducible.

Transcriptome GTF annotation Genome Ensembl • 713 views
ADD COMMENT
1
Entering edit mode

assuming the two tags are always close together in the gtf...

sed 's/"; transcript_version "/./'
ADD REPLY

Login before adding your answer.

Traffic: 2008 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6