Entering edit mode
12 days ago
Sowmya Pulapet ▴ 30
Could anybody please suggest me a tool to convert the Nano-pore fast5 files to fasta files ? I have already tried Poretools and Nanopolish but wasn't successful. I understand that fast5 can be converted to fastq directly but I need them in fasta format too. Would it work if I convert fast5 to fastq and fastq to fasta ?
Hope somebody would help me on this.
Hey! I have some experience working with Nanopore data.
You have .fast5 files, are these directly from the sequencer or are these files that have been processed and re-compressed back into .fast5?
If the files are straight off the sequencer, they are not basecalled and the only thing in them is raw signal data. To get from your files to fastq files for each read, you will need to basecall them.
I recommend using guppy. Here is a good guide. You can download guppy here, requires ONT account. It is written in CUDA so it requires an Nvidia GPU. I would not recommend trying to basecall nanopore data on CPU. If you do not have access to a Nvidia GPU you can try to basecall with dorado. download dorado here. It is written in LibTorch and runs on apple silicon, AMD, and Nvidia GPUs. I have only tested it on Apple Silicon and it had decent performance.
The main issue with dorado is that it does not have demultiplexing built in(yet) but if your samples are not barcoded that is not a problem. If they are but you can notaccess an Nvidia GPU, you can probably use one of dozens of tools to demux your samples.(have not done this yet)
You can also try to basecall with bonito. It is written in PyTorch. download bonito here
For working with fast5 files should your data already be basecalled, here is a primer on that: epi2me-labs: fast5 tutorial but from what I can tell, you probably need to basecall your data first. Once you get your data into fastq format, you can align to reference with minimap2
Let me know if that did not answer your question! I am more than happy to help!
Good luck with your analysis!
I believe I have raw signal data and not the basecalled ones. I will try Guppy to basecall and convert the fast5 files to fastq format.
I assume after that I can convert the fastq to fasta format. Unfortunately, I need the files in fasta format also.
Thank you for the detailed information.
It might help if you include some information on the error messages from Poretools and Nanopolish.
Nanopolish is just displaying the message :
The command used is :
nanopolish extract -o fast5.fasta ~/path/to/fast5/folder
Poretools didn't give any errors or other messages when I ran. It simply stopped running after creating an empty output file.
poretools fasta ~/path/to/fast5/folder > fast5.fa
Do you need the fasta for each individual read? or are you trying to build a reference genome?
Yes, I need fasta for individual read.