Hello everyone, First, I am so sorry for this long and very amateur question. I am trying to build a pipeline for SNP calling for Oxford Nanopore MinION based long reads. I need to test the pipeline but apparently the number of test data is really low. I only have Na12878 data from this address: https://github.com/nanopore-wgs-consortium/NA12878/blob/master/Genome.md
I downloaded the FAST5 data coded as "FAB43577" (it is said that data has 427,215 reads and 2,776,702,333 bases). I used Guppy V5.0.1 as basecaller with the command:
guppy_basecaller -i /home/huk/Desktop/nanopore_data/na12878_fast5/data2/UCSC/FAB43577-3574887596_Multi -s /home/huk/Desktop/nanopore_data/na12878_fast5/data2/guppy_out -c dna_r9.4.1_450bps_fast.cfg --trim_barcodes --trim_strategy dna --num_callers 1 --cpu_threads_per_caller 12
Then I merged all FASTQ files inside the "pass" folder of Guppy results with "cat" command and obtained single FASTQ.
minimap2 -ax map-ont -t 12 /home/huk/Desktop/references/hg38/hg38.mmi /home/huk/Desktop/nanopore_data/na12878_fast5/data2/guppy_out/pass/all_data.fastq --MD > /home/huk/Desktop/nanopore_data/na12878_fast5/data2/minimap_output/mapped_12878_2.md.sam
I transformed the SAM file to BAM file with samtools. I indexed and sorted the file as well. Then I used longshot for variant calling only on chr20 via the command:
longshot --bam /home/huk/Desktop/nanopore_data/na12878_fast5/data2/minimap_output/mapped_12878_2_sorted_md.bam --ref /home/huk/Desktop/references/hg38/hg38.fa -F -r chr20 --out /home/huk/Desktop/nanopore_data/na12878_fast5/data2/vcf_output/longshot_result.vcf
My final VCF have 827 (without filtering) variants. I downloaded the high confidence VCF file of NA12878 from this link https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv3.3.2/GRCh38/supplementaryFiles/
In this VCF, chr20 have about 67957 SNPs. I compared the variants and only 8 of them are common in both VCFs.
I also used nanopolish index and nanopolish variants for variant calling but the final VCF is completely empty (only headers and comments of standard VCF).
I am not sure why I have very low number of variants. If anyone can give me a hint or tell me what I am doing wrong I would be really grateful. I am completely stuck here. If you know another test data (if there is) for variant calling of Oxford Nanopore MinION, it would be awesome too.
Thank you in advance.