Biomart Refseq Data Problem
2
0
Entering edit mode
12.2 years ago
Pppp • 0

Hello,

I want to extract all the mRNA sequence that have RefSeq ID with the coding sequence, and start and end positions of 3 and 5 prime UTRs. I have tried;

EM = useMart("ensembl", dataset = "hsapiensgeneensembl")

attr = c("ensembltranscriptid", "cdna", "cdnacodingstart", "cdnacodingend", "5utrstart", "5utrend", "3utrstart", "3utrend")

Refseq = getBM(attributes = attr, filters = "withoxrefseq_mrna", values = TRUE, mart = EM, uniqueRows = TRUE)

and keep getting

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line x did not have y elements (depending on variables its different lines etc)

I have tried getting ride of different variables but even just the ID, cDNA sequence, coding start and end it is still the same. What is the problem?

Secondly I am developing an R package that needs a lot of background biological data. Is it a wise move to incorporate biomaRt section in the code to get all the necessary data so the user does not have to provide it by himself? I am not sure how future proof it will be with potential problems even a minor change might cause.

Thanks for your help.

biomart • 2.7k views
ADD COMMENT
1
Entering edit mode
12.2 years ago
Neilfws 49k

The problem is that you are trying to force sequence data ("cdna") and other types of data into a data frame - which is not going to work.

So just omit "cdna" from attr and it will work fine (I just tried it).

As for the R package, you should read up on how other people do it. For example, many of the Bioconductor packages depend on one another but you don't necessarily "incorporate" code from one in another; you just ensure that it loads when required. The developers section at the Bioconductor website should have more details of best practices.

ADD COMMENT
0
Entering edit mode

That solves the problems, thanks very much.

ADD REPLY
0
Entering edit mode

And just to clarify, if you do want to fetch sequences, use the getSequence() method - it's described in the biomaRt PDF. Then you can merge the results of getSequence() and getBM() if required.

ADD REPLY
0
Entering edit mode
12.2 years ago
Biojl ★ 1.7k

Regarding the second part of your question: Take into account that you can only access through an API-database to the last version of ENSEMBL. Previous versions are only accessible through the website. This may be a problem, because you normally don't finish a project within a single release or have previous work with another release version.

ADD COMMENT

Login before adding your answer.

Traffic: 1995 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6