I have a small question I hope someone could elucidate me on. I have 12 samples of ONT from my species and I'm going to use them to improve the annotation of the genome.
I concatenated all the reads into a big file (all_reads.fastq) and I´m currently doing an alignment with minimap2. My code is as follows
minimap2 -k 14 -I 1000G -d cro_v2_asm.mmi cro_v2_asm.fasta minimap2 -t 8 -ax splice cro_v2_asm.mmi all_reads_nano.fastq > all_reads.sam
I have used minimap before, but only to align single files or to compare transcripts databases. My output is like so:
[WARNING] Indexing parameters (-k, -w or -H) overridden by parameters used in the prebuilt index. [M::main::0.990*1.00] loaded/built the index for 2090 target sequence(s) [M::mm_mapopt_update::1.323*1.00] mid_occ = 477 [M::mm_idx_stat] kmer size: 14; skip: 10; is_hpc: 0; #seq: 2090 [M::mm_idx_stat::1.517*1.00] distinct minimizers: 21032401 (46.44% are singletons); average occurrences: 4.535; average spacing: 5.673 [M::worker_pipeline::667.809*7.91] mapped 926625 sequences [M::worker_pipeline::1358.631*7.93] mapped 1054148 sequences [M::worker_pipeline::1946.186*7.93] mapped 979868 sequences [M::worker_pipeline::2521.346*7.94] mapped 990987 sequences [M::worker_pipeline::3107.039*7.94] mapped 953722 sequences [M::worker_pipeline::3740.257*7.94] mapped 976724 sequences [M::worker_pipeline::4417.527*7.94] mapped 1133642 sequences [M::worker_pipeline::5062.460*7.94] mapped 1034305 sequences [M::worker_pipeline::5811.450*7.94] mapped 1164408 sequences [M::worker_pipeline::6558.900*7.94] mapped 1139990 sequences [M::worker_pipeline::6861.026*7.94] mapped 477750 sequences [M::main] Version: 2.15-r905 [M::main] CMD: minimap2 -t 8 -ax splice genome_illumina_annot/cro_v2_asm.mmi 01_filtering/after_trim/all_reads_nano.fastq [M::main] Real time: 6861.119 sec; CPU: 54470.448 sec; Peak RSS: 8.451 GB
As you can see it seems that minimap2 is mapping fractions of the input file at a time, probably due to memory.
I was wondering if there is any nuances or changes in the output file, or I can freely process it with samtools and assemble the transcriptome using stringtie2.