Importing big sequences into Python
1
0
Entering edit mode
8.2 years ago
auryndb ▴ 70

How does one goes importing big sequences into Python (for example a complete genome of one chromosome)? Is there any practical way to import smaller chunks of data instead of loading the whole data into Python? Does it matter? What function do you use? Do you simply use the read function?

sequencing genome • 2.1k views
ADD COMMENT
2
Entering edit mode
8.2 years ago

That depends what you want to do with the data -

If you want to cycle over all sequences in a fasta file, I'd use Biopython's SeqIO.parse("your_file.fasta","fasta") iterator. This won't load the entire file into memory.

from Bio import SeqIO
total = 0
for seq in SeqIO.parse("file","fasta"):
  total += len(seq)
print total

is a roundabout way to count the total length of all sequences in a fasta-file, for example.

If you need random access to entire sequences, I'd use SeqIO.index_db("index", "your_file.fasta", "fasta") - this will create an on-disk SQLite database with the file offsets (here "index"), so it's only slow the first time you run the script, after that it'll use the index instead, and the created index works just like a Python dictionary. Using an index means that you won't have to load the whole file into memory.

from Bio import SeqIO
index = SeqIO.index_db("index_file","file","fasta"):
my_seq = index["contig_1"]

If you need random access to regions of random sequences, I'd use bedtools getfasta.

If you just want to read a single sequence into memory use SeqIO.read(), which complains if you have more than one sequence in a fasta file, but that will load the whole thing into memory.

ADD COMMENT
1
Entering edit mode

I would use pyfaidx for all of these use cases, but I'm biased... It will re-use the samtools index, and provides random access to sequences and sequence substrings using a dictionary interface, so it's one less API to learn, and it's very efficient.

ADD REPLY
0
Entering edit mode

How did I miss this!?

ADD REPLY
0
Entering edit mode

I'll fire my marketing team.

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6