Hi,
I have received nanopore data with primary analysis from a sequencing center. It includes pod5 file, fastq files and bam files (both pass and fail folders for each). I checked the bam files and looks like all reads are currently unmapped, despite the methylation information (MM/ML tags) are avaialable. So, looks like the basecalling was performed using a modified-base-aware model but the alignment itself has been skipped or wrong reference genome or something like this. So, I have 2 questions: 1) is this normal to receive the bam files in this shape and 2) is there a way to do the alignment using the fastq files or extracted sequences from bam files (using example minimap2) and merge that with the methyl info available at bam files? So, I hoping not to require to go back to pod5 files and both base-calling and alignment (example with dorado) as the pod5 files are extremely large. Thanks in advance!
I think OP wants to align AND keep the methylation info, which is indeed the tricky part ...
Not entirely sure but I think the
dorado aligner
subcommand takes bam as input (and I would thus assume/hope it will take over the extra bam info such as for instance the methylation info)Then use something like
colindaven has an answer to start with fastq files (if they have calls) --> Nanopore long-read sequencing doubts and problems
yes keeping both mapping info and methylation info at the same time is the intention. Thanks so much for the reosponse! I will try it
When you have a bam make sure you check the methyl tags MM and ML are present. Something like