Is there a way to fetch genomic sequences at given coordinates without downloading fasta files?
3
0
Entering edit mode
5.2 years ago

So I have a list of start and stop positions along chromosomes in different species, and I'd like to get the corresponding DNA sequence for each set of coordinates. In the past, I've just download the genome as a fasta file and then use pyfaidx to extract the sequences at the given positions. But now that I'm working with several species at once, I was wondering if there's any kind of tool in Python or R that can fetch your sequences of interest without downloading a bunch of large files. Thanks

Python R sequence genome DNA • 2.8k views
ADD COMMENT
0
2
Entering edit mode
5.2 years ago
Satyajeet Khare ★ 1.6k

Here is one way to download the sequences using DAS server. You can write a loop and fetch the sequences.

ADD COMMENT
1
Entering edit mode
5.2 years ago
Emily 23k

You can use the Ensembl REST API.

ADD COMMENT
0
Entering edit mode

Does "GET sequence" on the REST API work for previous assemblies/releases of other species? I only know how to do it for grch37, but is that it. Thanks :)

ADD REPLY
0
Entering edit mode

Unfortunately not. We've recently started making REST access archives, which can get you to previous assemblies, but only recent archives are available and the only assembly that's changed in that time is pig. So actually, the answer is yes for pig and for human GRCh37 (as you've discovered), no for any other species.

ADD REPLY
0
Entering edit mode
5.2 years ago

There's a project called genomepy that handles the genome downloads and uses pyfaidx as the interface for queries. You could give that a shot, but you'd still be downloading files, just abstracting away the downloading process.

ADD COMMENT

Login before adding your answer.

Traffic: 2619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6