Hi,
I want to compare a Nanopore read in fastq format to a reference genome file of an animal in fna format in order to determine whatever the read came from this animal DNA.
What are some recommended tools and methodology to use for this ?
I tried starting with this steps:
A. Filter by quality and cut the adapters with Trimmomatic- I am not sure if that step is ok, since it seems like "Trimmomatic" handles illumina adapters but may not handle Nanopore adapters. Although the original fastqc review file indicated that the encoding for this read is Sanger/illumina so I don't know what it means.
B. Use minimap2 to align
C. Use Samtools to create a Bam file for the read
D. Index the Bam file with samtools
....
What additional steps and tools should I use to take to make the actual compression ?
Call SNP's using the aligned data file.
Clair3
(LINK) is a great option for that. This will show you positions that are different compared to the reference. This only makes sense if the alignments are good. You could check that with a simplesamtools idxstat
run.Thanks, is there something else ? There is this solving environment and frozen state issue (even with new environments). How the samtools idxstats output should look like if it's good ?
There's hundreds of really nice tutorials and explanations for all this basic stuff here:
https://training.galaxyproject.org/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html