extracting snoRNA intronic sequences
2
0
Entering edit mode
6.3 years ago

Hi bioinformaticians,

I am building an index for RNA-Seq purposes. It would be composed of the intronic sequences where snoRNA are encoded.  The problem is that, using the ensEMBL Perl API, I haven't been able to find a way to extract the sequences. The snoRNA genes are not linked to the introns. This suggests that I would be needed to generate a slice. If anyone has any advice, suggestion or alternative to this method, it would be greatly appreciated.

RNA-Seq ensEMBL • 1.4k views
1
Entering edit mode

Are you only interested in intronic snoRNAs or instead the full sequence of introns harboring snoRNAs?

0
Entering edit mode

I would be interested in the sequence harboring the snoRNA between two exon sequences. So I guess the full sequence of introns harboring snoRNAs would be it.

4
Entering edit mode
6.3 years ago

I would do something like the following:

1. Use biomart to get snoRNA coordinates (in BED format or something similar).
2. Use biomart to get the coordinates of all introns (again, in something like BED format).
3. Use bedtools to intersect the two, writing intronic coordinates wholly encompassing snoRNA coordinates.
4. Use bedtools getfasta with the resulting BED file to get a fasta file of intronic sequences harboring snoRNAs.

That shouldn't even require writing any new code :)

0
Entering edit mode

Thank you. I had not thought about using bedtools to complete this.

1
Entering edit mode
6.3 years ago
Tariq Daouda ▴ 210

Hi,

I wrote a python module that would allow you to do that easily. http://pyGeno.iric.ca

from pyGeno.Genome import *

ref = Genome(name = "GRCh37.75")
chro = Genome,get(Chromosome, number = "22")
intronSeq = chro.sequence[x1:x2]

If you know which exons you are interested in you could also do:

exon1 = chro.get(Exon, id = "ENS...")
exon2 = chro.get(Exon, id = "ENS...")
intronSeq = chro.sequence[exon1.end:exon2.start]

If you want to do it by transcripts:
trans = ref.get(Transcript, id = "ENST...") or ref.get(Transcript, name = "whatever-001")

chro = trans.chromosome
intronSeq = chro.sequence[ trans.exons[0].end : trans.exons[1].start ]

This assumes that you are working on the human genome GRCh37.75. If you need something else let me know I would be happy to show you how to import any genome made available by Ensembl onto pyGeno.

Hope that helps,

Cheers,