Question: Filter out all known genes & regulatory elements for a given genome in a local blast search
0
gravatar for Proteus00
14 months ago by
Proteus000
US
Proteus000 wrote:

I am developing a script that will count the number of times a short nucleotide sequence hits non coding regions of the human genome. Based on google searches, Blast+ appears to be the tool to use. They have a few cookbook recipes about masking a database with a FASTA files which I want to leverage.matthew_rich

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file. I did not see anything on ncbi suggesting BLAST could do this task.

Does anyone have a suggestion on how to accomplish this task?

Thanks in advance.

sequence genome • 302 views
ADD COMMENTlink modified 14 months ago by genomax73k • written 14 months ago by Proteus000

You can download all cDNA sequences from Ensembl, not sure what you mean by the buffer sequences though.

ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/cdna/README

##################
Fasta cDNA dumps
#################

These files hold the cDNA sequences corresponding to Ensembl gene 
predictions. cDNA consists of transcript sequences for actual and possible
genes, including pseudogenes, NMD and the like. See the file names 
explanation below for different subsets of both known and predicted 
transcripts.
ADD REPLYlink modified 14 months ago • written 14 months ago by Sej Modha4.5k

Sej already pointed out that you can download cDNA sequences directly. Still, I do not see any biological basis for this "buffer". Do you mean untranslated regions, or gene promoters? Please leave a comment with some more details.

ADD REPLYlink written 14 months ago by ATpoint24k

Yes, extending the sequence beyond the stated gene is desirable to subsume any regulatory elements in the UTR for my mask file. I am trying to create a local blast database that represents a benign DNA, where any alterations would be presumed silent. My strategy for this would be to go through known genes and add additional bps to both ends to also block regulatory elements that my be near by. Also CDNA is undesirable since I would like to avoid all introns as well.

ADD REPLYlink modified 14 months ago • written 14 months ago by Proteus000
0
gravatar for genomax
14 months ago by
genomax73k
United States
genomax73k wrote:

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file.

You can do that using BioMart (Click on BioMart link at top of page). A video tutorial is available here.

There are other ways of getting this information including UCSC table browser.

ADD COMMENTlink modified 14 months ago • written 14 months ago by genomax73k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 912 users visited in the last hour