Question: Extracting An Up Stream Or Downstream Sequence From Given Position
gravatar for Janake
8.6 years ago by
United States
Janake160 wrote:

I need to extract an upstream and a downstream sequences flanking putative small RNA regions. Does anyone know a python script to this? Thanks,

data sequence genomics small • 5.4k views
ADD COMMENTlink modified 8.6 years ago by brentp23k • written 8.6 years ago by Janake160

Unclear: what is a "small RNA region"? A segment of chromosome encoding putative small RNA transcript?

ADD REPLYlink written 8.6 years ago by Neilfws48k

I am trying to find conserved small RNA in using ESTs. I have blast known miRNAs to the EST collection. I have the locations where miRNA match to each EST. Now I need to get upstream and downstream regions from these ESTs and see if they fold like miRNA precursors. Hope this is clear.

ADD REPLYlink written 8.6 years ago by Janake160

I am working a similar project. Please message me if you still need help. I'm a student at CWRU - Biology Dept.

ADD REPLYlink written 8.5 years ago by Moss20
gravatar for Mary
8.6 years ago by
Boston MA area
Mary11k wrote:

Is it internal/unpublished sequence or from one of the existing genomes? If it is a sequenced genome, this is possible from either BioMart or UCSC Genome Browser style browser. If it is an internal project, the script jockeys can help you I'm sure.

BioMart (or the specific Mart for your species if it's out there):

  1. Go to
  2. Click Martview
  3. Select the database and data set.
  4. In Filters, set the region. Use this item to enter your list: Multiple Chromosomal Regions (Chr:Start:End:Strand)
  5. In Attributes, choose Sequences. You also need to pick a nucleotide box* here, but I'm not sure which for this application.
  6. Set and upstream and downstream flank.
  7. Click results.

*Although now that I look at Sequences I'm not sure which radio to choose for this purpose. I'm going to have to ask them that. I think I'd choose exon for now.


  1. Go to the Table Browser
  2. Set appropriate species/assembly stuff.
  3. On Region line, set items in Define Region button.
  4. Choose output format = sequence
  5. click Get Output.
  6. On next page say Genomic
  7. Set upstream/downstream length
  8. Get Sequence button

Or you can do all of either one from Galaxy (, and you can store that as a workflow you can always go back to later if you need it again. There is also a "Get Flanks" in the Galaxy menu "Operate on Genomic Intervals".

ADD COMMENTlink written 8.6 years ago by Mary11k

Thanks Mary. However, this is not a sequenced genome, but I will keep in mind your method for future use. Thanks a lot again.Kiriya

ADD REPLYlink written 8.6 years ago by Janake160
gravatar for brentp
8.6 years ago by
Salt Lake City, UT
brentp23k wrote:

here's a "script-jockey" python solution (untested, but should get you close). if, in your blast the est is the subject, then you can substitute 'query' with 'subject' and 'q(start|end)' with 's(start|end)'

import sys
from pyfasta import Fasta

est_fasta_file = sys.argv[1]

# i'll assume it's tab-delimited...
# and est is the query.
est_mirna_blast = sys.argv[2]

# take 100bp up/down-stream.
xstream = 100

est_fasta = Fasta(est_fasta_file)

for line in open(est_mirna_blast):
    # convert to int and 0-based coords.
    qstart, qstop, sstart, sstop = [int(x) - 1 for x in line.split("\t")[6:10]]
    query, subject = line.split("\t")[:2]
    up = min(0, qstart - xstream)
    down = qstop + xstream + 1

    est_upstream = est_fasta[query][up:qstart]
    est_dowstream = est_fasta[query][qstop:down]

    print ">%s_up" % query
    print est_upstream
    print ">%s_down" % query
    print est_dowstream
ADD COMMENTlink written 8.6 years ago by brentp23k

I'm running ubuntu and am having trouble running pyfasta... any suggestions? I am unfamiliar with python, so I am not sure how to "make" "install".

ADD REPLYlink written 8.5 years ago by Moss20

@Moss just run "sudo python install" or if you have setuptools: "sudo easy_install pyfasta"

if you have any troubles. send me an email bpederse[?]

ADD REPLYlink written 8.5 years ago by brentp23k

Hello brentp I would like to do same thing & I tried the above script you wrote. it gives me the following error message while I am running it

Traceback (most recent call last):
  File "", line 5, in <module>
    EST_030516_lnr_tab.fa = sys.argv[1]
IndexError: list index out of range

Could you please tell me what I did wrong?

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by tcf.hcdg60
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 795 users visited in the last hour