Fetch genomic region(s) from refseq genomes
2
1
Entering edit mode
8 months ago
usr2 ▴ 10

I would like to fetch specified genomic regions from refseq genomes without having to download the full genome. The regions are previously identified with a hmmer search. To my understandment Ensembl does not have all Refseq genomes. Many thanks, D

ensembl • 634 views
ADD COMMENT
0
Entering edit mode

give us some examples of input.

ADD REPLY
2
Entering edit mode
8 months ago
GenoMax 141k

Using EntrezDirect.

One thing to be aware of : If you use a top level RefSeq accession for bacterial genomes there may be multiple plasmids etc in that accession. Sequences may be retrieved from those as well. You may need to eliminate some of those sequences (example below).

$ esearch -db assembly -query GCF_017165115 | elink -target nuccore | efetch -format fasta -seq_start 20000 -seq_stop 20050
>NZ_CP062740.1:20000-20050 Escherichia coli O157:H7 strain Z1723 plasmid pZ1723-1, complete sequence
CATCCGCTTGCAGCACACCGCTGAAGCAGGCAAGATGAGTCTGCGGATGGA
>NZ_CP062739.1:20000-20050 Escherichia coli O157:H7 strain Z1723 chromosome, complete genome
ACATACATTAAGCCTTAATTTTCCTCTGACAACGGTCAGTGCAGCAAACAA
>CP062740.1:20000-20050 Escherichia coli O157:H7 strain Z1723 plasmid pZ1723-1, complete sequence
CATCCGCTTGCAGCACACCGCTGAAGCAGGCAAGATGAGTCTGCGGATGGA
ADD COMMENT
1
Entering edit mode
8 months ago
barslmn ★ 2.1k

samtools can retrieve sequences without downloading the full genome.

eg.

samtools faidx https://igv.genepattern.org/genomes/seq/hg19/hg19.fasta chr21:10,000,000-10,000,020
ADD COMMENT

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6