6.9 years ago
Parham ★ 1.5k

Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB?

6.9 years ago
Malcolm.Cook ★ 1.2k

Looks like you figured out another way of getting what you needed, but, for the record, here is the answer to your question:

S pombe is at http://fungi.ensembl.org/index.html

The biomart is here: http://fungi.ensembl.org/biomart/martview/248a3d2deec76fa7be1e94e32b3972df

Access it using BioConductor's GenomicFeatures as follows. Note the warnings....

library(GenomicFeatures)
library(biomaRt)

txdb<-makeTranscriptDbFromBiomart(
,biomart ="fungi_mart_22"
,dataset = "spombe_eg_gene"
,host="fungi.ensembl.org"
)

Prepare the 'metadata' data frame ... OK
Make the TranscriptDb object ... OK
Warning messages:
1: In .normargSplicings(splicings, transcripts_tx_id) :
no CDS information for this TranscriptDb object
2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
chromosome lengths and circularity flags are not available for this TranscriptDb object

> transcriptsBy(txdb)
GRangesList of length 7017:
$SPAC1002.01 GRanges with 1 range and 2 metadata columns: seqnames ranges strand | tx_id tx_name <Rle> <IRanges> <Rle> | <integer> <character> [1] I [1798347, 1799015] + | 510 SPAC1002.01.1$SPAC1002.02
GRanges with 1 range and 2 metadata columns:
seqnames             ranges strand | tx_id       tx_name
[1]        I [1799061, 1800053]      + |   511 SPAC1002.02.1

\$SPAC1002.03c
GRanges with 1 range and 2 metadata columns:
seqnames             ranges strand | tx_id        tx_name
[1]        I [1799915, 1803141]      - |  2075 SPAC1002.03c.1

...
<7014 more elements>
---
seqlengths:
I       II      III       MT      MTR AB325691
NA       NA       NA       NA       NA       NA

It's strange, yesterday I tried these commands and it built the TranscriptDB but today I am receiving an error! Do you see any problem?

> txdb<-makeTranscriptDbFromBiomart(biomart="fungi_mart_22", dataset="spombe_eg_gene", host="fungi.ensembl.org")
Error in useDataset(mart = mart, dataset = dataset, verbose = verbose) :
The given dataset:  spombe_eg_gene , is not valid.  Correct dataset names can be obtained with the listDatasets function.

Try specifying the mart as:

biomart="fungal_mart"

6.9 years ago

I don't know that it's in Biomart, given that it's not in Ensembl. Just download the GTF or GFF file from pombase and then use makeTranscriptDbFromGFF() from GenomicFeatures.

Edit: I take that back, it is in Ensembl. Here's an example biomart query.

Yes I saw that, thanks! But does it need to set a lot of parameters? I am new to this field and it is very complex at this point for me, when I check the parameters. Is there a straightforward script for it or should I go all through the arguments and choose carefully?

Do you mean parameters for makeTranscriptDbFromGFF()? It only needs the file name.

Yes because when I checked the ?makeTranscriptDbFromGFF it gives a lot of option. That's why I asked! However when try with the file name only I end up with errors for both GFF3 and GTF format.

> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gff3")
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) :
No Transcript information found in gff file
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.21.gtf")
Error in .parse_attrCol(attrCol, file, colnames) :
Some attributes do not conform to 'tag=value' format

txdb <- makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gtf", format="gtf") works. I'd have to look into why it doesn't like the gff3 file.

Ah, the error with the GFF3 file is due to it not having any mRNA features.