Question: Download Genomic Ranges From A List Of Coordinates Via Web
2
gravatar for Anima Mundi
7.5 years ago by
Anima Mundi2.4k
Italy
Anima Mundi2.4k wrote:

Hello,

given a set of genomic positions (see the example below) in a text.txt, how could I download all the their FASTAs (i. e. 1 Kb upstream and 1Kb downstream in respect to the given position) via web?

Example:

#Name    Chromosome    Position    Strand
Name_1    chr7    103482772    +
Name_2    chr7    103488456    +
ADD COMMENTlink modified 3.8 years ago by Biostar ♦♦ 20 • written 7.5 years ago by Anima Mundi2.4k
1

To reiterate my point, context is still important. The solutions provided thus far are useful, but only for the UCSC genome browser. If you wanted data from, for instance, Carica papaya, then none of the useful answers thus far would address your question, since the UCSC browser does not host data for that organism. So defining the scope of your question would be immensely helpful to people that find this question in the future: do you mean the mouse genome, or the human genome, or common mammal model species, or all species found in the UCSC genome browser?

ADD REPLYlink written 7.5 years ago by Daniel Standage3.9k

The process will be very different depending on the organism the data describe. I assume this is for the human genome. If you're looking for an answer that is specific to the human genome, making that explicit would be helpful.

ADD REPLYlink written 7.5 years ago by Daniel Standage3.9k

This is for the mouse genome, but since I work also with other organisms I would prefer a more "general" solution. Thanks.

ADD REPLYlink written 7.5 years ago by Anima Mundi2.4k

For instance, you have already gotten answers for specific to human and mouse. While you may know what "hg19" means and how to change that to get the data you want in the future, someone else looking for answers to this same question in the future may not.

ADD REPLYlink written 7.5 years ago by Daniel Standage3.9k

For instance, you have already gotten answers specific to human and mouse. While you may know what "hg19" means and how to change that to get the data you want, someone else looking for answers to this same question in the future may not.

ADD REPLYlink written 7.5 years ago by Daniel Standage3.9k

Just to be clear, I don't think this is a bad question. In fact, I think it could be useful to many people in the future. My comments are intended to make this question a more useful resource to them when they are brought here by a Google search.

ADD REPLYlink written 7.5 years ago by Daniel Standage3.9k

I understand your point, and I agree I should have been more specific. While for the contingent issue I needed to solve a problem for the mouse, I wanted a "general" solution, but I agree that the concept of general itself, in this case, is fuzzy. So the question could regard solutions "as broad as possible". Thanks to all of you, you were precious.

ADD REPLYlink written 7.5 years ago by Anima Mundi2.4k
5
gravatar for Sean Davis
7.5 years ago by
Sean Davis25k
National Institutes of Health, Bethesda, MD
Sean Davis25k wrote:
  1. Convert this to a BED file.
  2. Upload the resulting bed file to the UCSC browser as a custom track.
  3. Use the UCSC genome browser track browser to get the DNA sequence.

An alternative, after step 1, is to use Galaxy to do steps 2 and 3.

ADD COMMENTlink written 7.5 years ago by Sean Davis25k

Unfortunately I had a problem while trying to upload the BED file to the UCSC. Galaxy instead worked, thanks.

ADD REPLYlink written 7.5 years ago by Anima Mundi2.4k
5
gravatar for Jeremy Leipzig
7.5 years ago by
Philadelphia, PA
Jeremy Leipzig18k wrote:

this is not via a web-service, but the R code is so portable that shouldn't really matter

library(BSgenome.Mmusculus.UCSC.mm9)
library(ShortRead)

gr<- GRanges("chr7", IRanges(103482772, 103482772),strand='+')
grwider<-flank(gr,1000,both=TRUE)
seqs<-getSeq(Mmusculus,grwider,as.character=FALSE)
names(seqs)<-"Name_1"
writeFasta(seqs,file="myseqs.fa")
ADD COMMENTlink modified 7.5 years ago • written 7.5 years ago by Jeremy Leipzig18k
4
gravatar for Ian
7.5 years ago by
Ian5.5k
University of Manchester, UK
Ian5.5k wrote:

If you prefer a web-based solution try GALAXY. You can upload your coordinates as:

chr start end

and specify the correct genome version, e.g mm9 (mouse).

You can then retrieve the sequence. It is also possible to add/subtract 1000bp to a set of coordinates within GALAXY.

GALAXY is a great tool for "doing things" with genome coordinates.

ADD COMMENTlink written 7.5 years ago by Ian5.5k

This solution is exactly what I was searching for (even if I appreciated also the non-web solutions proposed). I select the Sean's equivalent solution as the chosen answer as it came first. Thanks anyway.

ADD REPLYlink written 7.5 years ago by Anima Mundi2.4k
3
gravatar for Pierre Lindenbaum
7.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum121k wrote:

using bash and the UCSC DAS server:

create the following XSLT stylesheet:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text" encoding="UTF-8"/>

<xsl:template match="/">
  <xsl:value-of select="DASDNA/SEQUENCE/DNA"/>
</xsl:template>

</xsl:stylesheet>

and run the following shell:

IFS="\t"
grep -v "#" input.txt | while read LINE
do
    CHROM=`echo $LINE| cut -d ' ' -f2`
    POS=`echo $LINE| cut -d '   ' -f3`
    echo $LINE | tr "\t" "_" | sed 's/^/>/'
    curl -s  "http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=${CHROM}:$((POS-1000)),$((POS+1000))" |\
    xsltproc --novalid stylesheet.xsl - | tr -d " \n" | fold -w 50
    echo
done

result:

$sh download.sh

>Name_1_chr7_103482772_+
atggatgttttaaaatcattgtcatacaaaccttgggtatctagacaaca
aatatcaaatatttgtcttctttgtcaaagttgtggttgaaggataggaa
tcaatagcagagtttcctttatccacattatacttcagcaagatttgact
cacaatgtctttaattagtttaaagttgcccccacactctcttttaagac
agtgacatacatttctgttccttccaggatgtcattttccttgacaagtc
ctcagttattttaagtttgtgactaaaccttgtgtgaaccccgttttccc
ccaggaatacttttctgtgcttttaatgtgcactctttgagtcttcaaaa
cggtaactagaagttctatgatcccccatctctacaagaaaatgtacata
tgttcataaaaatgtaggctactcgcttccaagaaacacaatgaatattt
tattaccaaaaataacccacctattgataacattacacattcatgttggg
tcaattctatatttcatagatgaaagatgtggattactcaaatctcttta
gtttataatttgccatggttagtgttaaagtgggttcaacaagccttggt
ctatttttcatgggtttcaagaactaagacatctgtggatagggtattac
caaacaaagggcaagtacattaaaattatgatttttttatgtgaaaaata
taaccccatatataaaaatgatacaattgtaaaagaaatattttattatt
tcaaacactttcacaaagcttggtacgatattttttcaggaagtttagca
aagttatcaccttatgtctacataagaagtgtaagctaagaatggcacaa
atatctaagatgatttccatttcctttccttttcatcatttgctctttct
ttaaaagggatatctaaaggcttccatcagttaataaaaaaaaaaaacac
agactgttctgaaaatgtagtttggaaagttagtgttatattgtaaatga
aaaagaaaaaatgaattatagaattcctttttatccctctttaatctgtt
aattcaaatatgaataactgtctacttacagaatttggcttgtctattat
tttctttctttcctaggtcaatgtactcagacatttcaacaaagccaaac
atgatttcatatgctgaaaagtaatcatagaatttcctaaaaacaccctt
attgcagcttatctgtgaagtaccagcctgatgaaaataggtatgaaaac
aatagctcttaagtagagtaatgctacaagatattgaatagtacgtgcac
acacactagcacatatacactgtgtatatttacttttcaaagcactgatt
tgattatttggtttcagatttaagtttaagaagccaaaaagcactaaaac
cttttaaaagtcattctggaattgtgtatctatggacttaagttagaaat
ggaagagaaactacctatttccacacctctagttagttctataaatagag
ccaattctaagtcaacttgattctttccttactcagtgcacttaaaagat
gagatgtcttgatgctgcctccccattcctctcccagaactaccatttac
tgaatgcctctctgtgccacgtttagagaggcaggagaggggaaaagttg
acagcatagaaaccctgtctgcctatgtttagaaccttgctcactgccaa
ggagttgtggaatcttgggcaggttactctatcattctattcctcagttt
cctttccaggaaaatgaggatgataataatagggtagctgtgaagagtaa
gtgagtgtacggcacacagtgttgtacatgttggctattattatcattcc
cattttaaagataaaggaaccaagactcaggaaattttttttttttagga
gacagggtcttgctctgtcacctaggctcaggtgcactggtatgatcaca
gctcactgcagcctcaactttccaggctcaagcaatcctcccacctcagc
c
>Name_2_chr7_103488456_+
ggcgactcctcaaggatctagaaacagaaataccatttgactcagcaatc
ccatcactgggtatatacccaaagggttataaatcattccactataaaga
cacatgcgcatgtatgtttattgcggcactgttcacaatagcaaagcctc
ggaaccaacccaaatgctcatcaatgatagactggataaagaaaatgtgg
cacatatacacctggaatactatgcagccataaaaggatgagttcatgtc
ctttgcagggacatggatgaagctggaaaccatcattctcagcaaactaa
cacaagaacaggaaaccaagcaccgcatgttctcactcataagtgggagt
tgaacaatgagaacacacggacacagggagggaaacatcacacaccaggg
cctgccagagggtggggggctagaggagggatagcactaggagaaatacc
taatgtagatgatgggttgatgggtgcagcaaaccaccaatgcatgtgta
cacctatgtaacaaacctgcatgtcctgcacatgtaccccataacttaaa
gtataataacaaaaaatgatataatatcatgtaactgacagataggtcaa
acttggcatttctttggcagagagaagagaggagaagagacaggcttgag
aatattggaggtagcttttaggacctggtgatagtttattttggttttgt
tttaaagaaagccttgttgaggctctttttactctcctacatggcttgta
tttatatctctaaagcctcctcttctttgagtttctgtggcctgcatatg
tgtaaaatctgactcctaaactggaagttgggttgtgaattaaacacaca
agagagcaggtcacaggagcaggaaccaggtgttaaggcagaatatagtt
ctggacaggtagccagcatgttgccgttagtttcatttagcaaagaaaag
aaaagacaaagaataaaatatttgataaaacatttctacctggagctgtt
tgtatatggcagcagtcacaggttatttagatataacattgctagggata
attcttttaggtcagccaatctggaggtacagttaaaataatcagatcct
gtttctacatacagaggctttctggagttagacaagggttaccacgcctc
ttgttcatcacttctactagggtttcatgctcagcgtgggtgatctccag
atcatttacctgtttaaatggaaattttgtttgagagcaggagaaacaca
gcactgagatatattctttaattctgcaaataaaatttcaacatttaatg
aattgaagccctgggtacagctattgacattttcagttggaaagcacaga
atataacctaattgaggggattttaagacatatagcttttctgggaggcc
tggagaagtaaacttgggtgttgcccagcaaagagtcatgtccaccccac
catgggacaggtccaacataaaaacaacagctacttccccccgaatcaac
agaactttcaccgcagttattcctccaaaataagaatcttcacttgatag
aatactgatctcgccatactgggtccccaggtcacattgcactcttcaat
atccagtctcatgaggctgagtctgataggtgtaaactaggtgacacgcc
ttcaccccagaaaggaaggaactatgggaattgtattttgggaagactat
actcacaatgtgggaaactatataaaaatgttgggcaatcatggatggta
catgcccattacagagggcaagttcaaaatctacaaaattttggacttct
tgatccaaaagtagctaaaataactgatagtttttaaaaattatggcctt
taggaccatttccaagaacctacactttgtgtacctcaccccttaccttg
aatgagcattctgcaatgggaagatattgtttatcacagtcaatctactt
gatgaacagcagcaaaagttccccataccctacctgggacagcctaacag
a
ADD COMMENTlink written 7.5 years ago by Pierre Lindenbaum121k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 821 users visited in the last hour