Question: Filter out all known genes & regulatory elements for a given genome in a local blast search
0
gravatar for Proteus00
2.0 years ago by
Proteus000
US
Proteus000 wrote:

I am developing a script that will count the number of times a short nucleotide sequence hits non coding regions of the human genome. Based on google searches, Blast+ appears to be the tool to use. They have a few cookbook recipes about masking a database with a FASTA files which I want to leverage.matthew_rich

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file. I did not see anything on ncbi suggesting BLAST could do this task.

Does anyone have a suggestion on how to accomplish this task?

Thanks in advance.

sequence genome • 417 views
ADD COMMENTlink modified 2.0 years ago by genomax87k • written 2.0 years ago by Proteus000

You can download all cDNA sequences from Ensembl, not sure what you mean by the buffer sequences though.

ftp://ftp.ensembl.org/pub/release-93/fasta/homo_sapiens/cdna/README

##################
Fasta cDNA dumps
#################

These files hold the cDNA sequences corresponding to Ensembl gene 
predictions. cDNA consists of transcript sequences for actual and possible
genes, including pseudogenes, NMD and the like. See the file names 
explanation below for different subsets of both known and predicted 
transcripts.
ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Sej Modha4.7k

Sej already pointed out that you can download cDNA sequences directly. Still, I do not see any biological basis for this "buffer". Do you mean untranslated regions, or gene promoters? Please leave a comment with some more details.

ADD REPLYlink written 2.0 years ago by ATpoint36k

Yes, extending the sequence beyond the stated gene is desirable to subsume any regulatory elements in the UTR for my mask file. I am trying to create a local blast database that represents a benign DNA, where any alterations would be presumed silent. My strategy for this would be to go through known genes and add additional bps to both ends to also block regulatory elements that my be near by. Also CDNA is undesirable since I would like to avoid all introns as well.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Proteus000
0
gravatar for genomax
2.0 years ago by
genomax87k
United States
genomax87k wrote:

I want to know if there is a way to pull all known transcripts for the human genome and put a 50-100bp buffer on the 5' and 3' ends (to avoid potential regulatory elements) and write those sequences to a file.

You can do that using BioMart (Click on BioMart link at top of page). A video tutorial is available here.

There are other ways of getting this information including UCSC table browser.

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by genomax87k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1447 users visited in the last hour