Question: querying 5'UTR, first exon, and flanking sequences
0
gravatar for 2015rpro
3.8 years ago by
2015rpro0
United States
2015rpro0 wrote:

I am trying to help my PI with a project. He gave me a list of 2700 hgnc_ids and wants me to obtain the sequence of 1000 bases upstream of TSS of each of the 2700 genes, and the first exon of each gene, and the sequence of the 1000 bases downstream of the first exon(the intron).  

I tried ensembl with BiomaRt with R bioconductor, however, i am only able to obtain the 1000 upstream flanking sequences and the exons, but ensembl does not have introns sequence function. 

I also tried BioString and BSgenome but it seems I could only query one gene at a time and it didn't work with all 2700 genes at once. 

Does any body know what I could do?

 

ADD COMMENTlink modified 3.8 years ago by Tariq Daouda210 • written 3.8 years ago by 2015rpro0
0
gravatar for Sukhdeep Singh
3.8 years ago by
Sukhdeep Singh9.7k
Netherlands
Sukhdeep Singh9.7k wrote:

Another way could be,

1) download the whole gene coordinates from Biomart (for all 2700 genes).

2) Repeat the same step but with exons as output.

use subtractBed to fetch the intron list, using above two file (Gene-Exons)

So, now you have exon and intron co-ordinates, you could write small code in R/perl to do whatever you want (get the respective co-ordinates). Make sure, you whatever you do, do it strand specifically (if -ve strand, TSS and TTS are other way around).

Finally, if you want sequences (FASTA), use getFasta

HTH

ADD COMMENTlink written 3.8 years ago by Sukhdeep Singh9.7k
0
gravatar for Tariq Daouda
3.8 years ago by
Tariq Daouda210
IRIC | Institute for Research in Immunology and Cancer
Tariq Daouda210 wrote:

Hi,

If you convert your hgnc_ids to ensembl ids using biomart and don't mind a bit of python, pyGeno allows you to do that quite easily by querying genes/transcripts/exons by ids. Here are some example snipets.

To get the 5'UTR of a transcript:

print trans.UTR5

To get the first exon:

print trans.exons[0].sequence

To get the flanking regions:

chro = trans.chromosome
exon = trans.exons[0]

print chro.getSequence(exon.start - 1000, exon.end + 1000)
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Tariq Daouda210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1958 users visited in the last hour