I am trying to retrieve the sequence of a number of genes using getSequence., but specifically I only wish to retrieve 500bp upstream and downstream of the start and stop codon separately (1000bp each). However, it seems that biomaRt does not allow retrieval of upstream or downstream sequence using "coding".
I tried the following (I haven't worked out how to trim the sequence into the desired length, and choose 1 out of all the sequences only):
library(biomaRt)
ensembl <- useMart("ensembl")
ex <- c("ACTN4")
mart <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
gene2sequence <- getSequence (id = ex, type = "external_gene_name", seqType = "coding", upstream = "500", mart = mart)
exportFASTA(gene2sequence, file="desktop/test.fasta")
But I will get an error:
> Query ERROR: caught BioMart::Exception::Usage: Filter upstream_flank NOT FOUND
Do I have to get the sequence using coding, 5utr, 3utr separately, and link them together? Or Is there any way I can work around this? Thanks!
Allowed
seqType
arguments are listed on page 10 of biomaRt vignette.3utr
and5utr
are among possible options forseqType
.I tried 3utr and 5utr, but I want to retrieve 500bp sequence around stop codon, which would be part of the 5utr and coding. I'm not sure how to do it though, perhaps get the 5utr and coding separately and combine them? It's possible but I am a beginner on bioinformatics so it would be difficult.
Does this cover you needs)?
I tried
getSequence (id = ex, type = "external_gene_name", seqType = "5utr", downstream = "500", mart = mart)
, but I got an error:That's where I am stuck, I guess upstream and downstream can only be used in seqtype such as coding_transcript_flank' which is mentioned in here