Question: Filter out all known genes & regulatory elements for a given genome in a local blast search
0
gravatar for Proteus00
7 months ago by
Proteus000
US
Proteus000 wrote:

I am developing a script that will count the number of times a short nucleotide sequence hits non coding regions of the human genome. Based on google searches, Blast+ appears to be the tool to use. They have a few cookbook recipes about masking a database with a FASTA files which I want to leverage.matthew_rich

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file. I did not see anything on ncbi suggesting BLAST could do this task.

Does anyone have a suggestion on how to accomplish this task?

Thanks in advance.

sequence genome • 194 views
ADD COMMENTlink modified 7 months ago by genomax64k • written 7 months ago by Proteus000

You can download all cDNA sequences from Ensembl, not sure what you mean by the buffer sequences though.

ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/cdna/README

##################
Fasta cDNA dumps
#################

These files hold the cDNA sequences corresponding to Ensembl gene 
predictions. cDNA consists of transcript sequences for actual and possible
genes, including pseudogenes, NMD and the like. See the file names 
explanation below for different subsets of both known and predicted 
transcripts.
ADD REPLYlink modified 7 months ago • written 7 months ago by Sej Modha4.1k

Sej already pointed out that you can download cDNA sequences directly. Still, I do not see any biological basis for this "buffer". Do you mean untranslated regions, or gene promoters? Please leave a comment with some more details.

ADD REPLYlink written 7 months ago by ATpoint14k

Yes, extending the sequence beyond the stated gene is desirable to subsume any regulatory elements in the UTR for my mask file. I am trying to create a local blast database that represents a benign DNA, where any alterations would be presumed silent. My strategy for this would be to go through known genes and add additional bps to both ends to also block regulatory elements that my be near by. Also CDNA is undesirable since I would like to avoid all introns as well.

ADD REPLYlink modified 7 months ago • written 7 months ago by Proteus000
0
gravatar for genomax
7 months ago by
genomax64k
United States
genomax64k wrote:

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file.

You can do that using BioMart (Click on BioMart link at top of page). A video tutorial is available here.

There are other ways of getting this information including UCSC table browser.

ADD COMMENTlink modified 7 months ago • written 7 months ago by genomax64k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 763 users visited in the last hour