Hadoop InputFormat for FASTA files?
0
2
Entering edit mode
9.4 years ago

I'm interested in analyzing large FASTA files (like the human genome and proteome) in parallel using Spark or pydoop. Is there a library which implements FASTA parsing as a Hadoop InputFormat?

hadoop fasta • 2.6k views
ADD COMMENT
0
Entering edit mode

"Hadoop FASTA reader" at gist.github.com/jflatow/45551 ?

ADD REPLY
0
Entering edit mode

This looks like it works well for a FASTA file with many small records (since it seeks locally on each worker). However, for a FASTA file with large contigs (like the genome) this wouldn't perform very well.

ADD REPLY

Login before adding your answer.

Traffic: 3009 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6