Hello,
I want to extract all the mRNA sequence that have RefSeq ID with the coding sequence, and start and end positions of 3 and 5 prime UTRs. I have tried;
EM = useMart("ensembl", dataset = "hsapiensgeneensembl")
attr = c("ensembltranscriptid", "cdna", "cdnacodingstart", "cdnacodingend", "5utrstart", "5utrend", "3utrstart", "3utrend")
Refseq = getBM(attributes = attr, filters = "withoxrefseq_mrna", values = TRUE, mart = EM, uniqueRows = TRUE)
and keep getting
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line x did not have y elements (depending on variables its different lines etc)
I have tried getting ride of different variables but even just the ID, cDNA sequence, coding start and end it is still the same. What is the problem?
Secondly I am developing an R package that needs a lot of background biological data. Is it a wise move to incorporate biomaRt section in the code to get all the necessary data so the user does not have to provide it by himself? I am not sure how future proof it will be with potential problems even a minor change might cause.
Thanks for your help.
That solves the problems, thanks very much.
And just to clarify, if you do want to fetch sequences, use the getSequence() method - it's described in the biomaRt PDF. Then you can merge the results of getSequence() and getBM() if required.