Question: Dataset's name in BioMart for S. pombe
1
gravatar for Parham
4.7 years ago by
Parham1.4k
Sweden
Parham1.4k wrote:

Can anybody help me to find the dataset for s. pombe on BioMart? And also some help on how to use makeTranscriptDbFromBiomart to create TranscriptDB?

 

cheers,

 

biomart dataset s. pombe • 1.9k views
ADD COMMENTlink modified 4.7 years ago by Malcolm.Cook970 • written 4.7 years ago by Parham1.4k
4
gravatar for Malcolm.Cook
4.7 years ago by
Malcolm.Cook970
kansas, usa
Malcolm.Cook970 wrote:

Looks like you figured out another way of getting what you needed, but, for the record, here is the answer to your question:

S pombe is at http://fungi.ensembl.org/index.html

The biomart is here:  http://fungi.ensembl.org/biomart/martview/248a3d2deec76fa7be1e94e32b3972df

Access it using BioConductor's GenomicFeatures as follows.  Note the warnings....

 

library(GenomicFeatures)
library(biomaRt)

txdb<-makeTranscriptDbFromBiomart(
            ,biomart ="fungi_mart_22"
            ,dataset = "spombe_eg_gene"
            ,host="fungi.ensembl.org"
            )

 Download and preprocess the 'transcripts' data frame ... OK
 Download and preprocess the 'splicings' data frame ... OK
 Download and preprocess the 'genes' data frame ... OK
 Prepare the 'metadata' data frame ... OK
 Make the TranscriptDb object ... OK
 Warning messages:
 1: In .normargSplicings(splicings, transcripts_tx_id) :
   no CDS information for this TranscriptDb object
 2: In .normargChrominfo(chrominfo, transcripts$tx_chrom, splicings$exon_chrom) :
   chromosome lengths and circularity flags are not available for this TranscriptDb object


> transcriptsBy(txdb)
 GRangesList of length 7017:
 $SPAC1002.01
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand |     tx_id       tx_name
          <Rle>          <IRanges>  <Rle> | <integer>   <character>
   [1]        I [1798347, 1799015]      + |       510 SPAC1002.01.1

 $SPAC1002.02
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id       tx_name
   [1]        I [1799061, 1800053]      + |   511 SPAC1002.02.1

 $SPAC1002.03c
 GRanges with 1 range and 2 metadata columns:
       seqnames             ranges strand | tx_id        tx_name
   [1]        I [1799915, 1803141]      - |  2075 SPAC1002.03c.1

 ...
 <7014 more elements>
 ---
 seqlengths:
         I       II      III       MT      MTR AB325691
        NA       NA       NA       NA       NA       NA
 

 

 

ADD COMMENTlink written 4.7 years ago by Malcolm.Cook970

It's strange, yesterday I tried these commands and it built the TranscriptDB but today I am receiving an error! Do you see any problem?

> txdb<-makeTranscriptDbFromBiomart(biomart="fungi_mart_22", dataset="spombe_eg_gene", host="fungi.ensembl.org")
Error in useDataset(mart = mart, dataset = dataset, verbose = verbose) : 
  The given dataset:  spombe_eg_gene , is not valid.  Correct dataset names can be obtained with the listDatasets function.

 

ADD REPLYlink written 4.7 years ago by Parham1.4k
1

Try specifying the mart as:

biomart="fungal_mart"

 

ADD REPLYlink written 4.7 years ago by Neilfws48k
2
gravatar for Devon Ryan
4.7 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

I don't know that it's in Biomart, given that it's not in Ensembl. Just download the GTF or GFF file from pombase and then use makeTranscriptDbFromGFF() from GenomicFeatures.

Edit: I take that back, it is in Ensembl. Here's an example biomart query.

ADD COMMENTlink modified 4.7 years ago • written 4.7 years ago by Devon Ryan88k

Yes I saw that, thanks! But does it need to set a lot of parameters? I am new to this field and it is very complex at this point for me, when I check the parameters. Is there a straightforward script for it or should I go all through the arguments and choose carefully? 

ADD REPLYlink written 4.7 years ago by Parham1.4k

Do you mean parameters for makeTranscriptDbFromGFF()? It only needs the file name.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k

Yes because when I checked the ?makeTranscriptDbFromGFF it gives a lot of option. That's why I asked! However when try with the file name only I end up with errors for both GFF3 and GTF format.

> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gff3")
extracting transcript information
Error in .prepareGFF3TXS(data, useGenesAsTranscripts) : 
  No Transcript information found in gff file
> makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.21.gtf")
Error in .parse_attrCol(attrCol, file, colnames) : 
  Some attributes do not conform to 'tag=value' format

ADD REPLYlink written 4.7 years ago by Parham1.4k

txdb <- makeTranscriptDbFromGFF("Schizosaccharomyces_pombe.ASM294v2.22.gtf", format="gtf") works. I'd have to look into why it doesn't like the gff3 file.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k

Ah, the error with the GFF3 file is due to it not having any mRNA features.

ADD REPLYlink written 4.7 years ago by Devon Ryan88k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 633 users visited in the last hour