Question: Is it possible to extract the 'intronic' sequences from a trancriptome obtained from Biomart using the Biostrings package (R)? I also have the exon and UTR sequences available.
0
gravatar for vladimiralan
13 days ago by
vladimiralan0 wrote:

Using the biomaRt R package I was able to obtain the transcriptome of an organism as well as all its exons and UTR sequences. Now, using the Biostring package I managed to process these objects so that I can write them as a fasta file; however, I also need to get the intronic sequences and there is no way to do that directly from biomaRt.

Assuming that whatever is present in the transcriptome fasta file that is not present in the exons and UTR fasta files is an intron, is there a way I can extract that from the transcriptome fiasta file using the Biostrings library? I can work with either the fasta files I managed to write, or with the object as it comes as output from the getBM() function... so whatever is easier to work with I'll work with.

If there is no way I can do that... what do you guys think I should do?

*Should I learn and use bedtools instead? Can I use the ranges obtained from Biomart to extract them that way?

*Should I learn and use TxDb instead? I wonder if I can use the data as I have it right now.

*Should I be using base R instead? Something like gsub to mask or delete the exons and UTRs?

Thanks in advance! Any input is totally appreciated.

ADD COMMENTlink modified 13 days ago • written 13 days ago by vladimiralan0

Please consider shortening the title so it is more concise.

ADD REPLYlink written 13 days ago by genomax67k

Are you able to get annotation or at least make a basic file from what you have?
Get intronic and intergenic sequences based on gff file
Get Introns from a gft annotation file

ADD REPLYlink modified 13 days ago • written 13 days ago by genomax67k

Hey! thanks for replying! Sorry about the title... What do you mean by a 'basic file'? I wrote my script so I can download all the transcripts, exons and UTR sequences of an organism in a multi fasta file (well, actually 3 files, one for each biotype), but I mean, I obtain the data from ensembl by using the biomaRt package in R; so yes, I think I could get the annotation (I suppose I only need to learn how to write a GTF/GFF file from biomaRt then).

So you'd say I should take any of the approaches above instead of trying to do something with Biostrings?

ADD REPLYlink written 12 days ago by vladimiralan0
1

I am not familiar with Biostrings but the answers above will work if you are able to get a GTF/GFF file easily I am not sure what organism you are working with but they may already be available via Ensembl genomes pages.

ADD REPLYlink written 12 days ago by genomax67k

Thank you so much for your response! and yeah, it seems like those strategies you linked are the way to go.

You see, I am just an undergraduate student starting to learn R and the Bioconductor library, and just recently started using the biomaRt package. When I found you cannot directly get the intronic sequences of an organism with this package, I decided to try and figure a way to do it with this other new library I discovered (Biostrings), but had no success doing so as of now. So yeah I don't really 'have' a particular organism, just trying to figure out the capabilities and limitations of these wonderful tools. This forum seems to be more focused on solving some specific-real-world bioinformatic problems, so I apologize if this was not the place to ask my question.

Once again, I trully appreciate the time and consideration you took by replying to this question.

ADD REPLYlink modified 12 days ago • written 12 days ago by vladimiralan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2199 users visited in the last hour