biomaRt getSequence results of gene_exon in wrong order
5 months ago
Hematocite • 0

Hi everyone,

I have been using the biomart package to retrieve the exon sequences of certain genes. In particular I am only interested in the first exon of the gene which also includes the 5' UTR.

For this I have used the following code:

ID <- "ENST00000502732.6"
sequence <-getSequence(id = ID ,seqType = "gene_exon", type = "ensembl_transcript_id_version", mart = ensembl)


When I look at the resulting dataframe I obtain all the exon sequences. However, the exons are in the wrong order. Meaning that the sequence of exon1 is not necessarily in row 1 (and the rows are not labeled). Since I am particularly interested in exon 1, this is quite annoying as I would need to check the sequences manually again.

Am I missing something here?

Any help is appreciated. Also if there is another way to retrieve sequences of one particular exon it would be great to know.

5 months ago
swbarnes2 9.8k

I don't think you are missing anything; things come out of biomart unordered. You'll have to sort and filter yourself.

Thanks for your answer. Do I have to do it manually? Or can yo uthink of a different way? I have a list of 100 genes and doing it all manually might be quite labor intensive

Re-query, and this time, ask for exon rank, not just sequence.