Question

Question regarding minimap2 presets

0

Entering edit mode

4.8 years ago

Rogerio Ribeiro ▴ 110

Hi biostars.

I have a question regarding minimap2. For my biological problem, I am doing differential expression analysis and differential transcript usage for six (3 vs 3) samples, which have been sequenced using a cDNA protocol with MinIOn. To start I built a genome-guided transcriptome using minimap2 + stringtie. During the alignment step this is the comand I have used:

minimap2 -t 8 -ax splice cro_v2_asm.mmi 01_filtering/after_trim/all_reads_nano.fastq > 02_annotation/all_reads.sam
samtools view -q 40 -b -F 2304 02_annotation/all_reads.sam | samtools sort -@ 8 -o all_reads_filtered.bam
samtools index all_reads_filtered.bam

Recently I saw a post suggesting that aligning Nanopore reads should be performed using the -ax map-ont option. However, since reading the manual, I assumed this option was reserved for genomic reads, which I assumed was reads obtained from genome sequencing.

From the minimap2 manual:

./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam         # Oxford Nanopore genomic reads
./minimap2 -ax splice ref.fa rna-reads.fa > aln.sam       # spliced long reads (strand unknown)

The question arose as to when I was messing with GitHub pipelines for DEG, one of the pipelines used the minimap2 map-ont option to map long reads to a transcriptome, hence my confusion regarding the manual.

What is usually the best present when mapping ONT RNA-seq reads to a genome?

minimap2 alignment rna-seq • 3.3k views

ADD COMMENT • link 4.8 years ago by Rogerio Ribeiro ▴ 110

0

Entering edit mode

Can you do 2 alignments using the options you mention above and compare? The result may change in a data dependent manner. This being long read data one may have kb long reads or 50 kb reads.

ADD REPLY • link 4.8 years ago by GenoMax 152k

0

Entering edit mode

Yes, my long reads spawn from 50 bps to 30 kps, with the most of them being in (roughly) between 350-900. I think that most of my data is not complete transcripts, based on the read size compared to the average predicted gene size in my species. Furthermore, the RIN values from my samples were mostly bellow 7 (one sample was soo degraded that the machine was not even able to compute a RIN value).

I computed alignment using both options and these are my results:

minimap2 -t 8 -ax splice cro_v2_asm.mmi 01_filtering/after_trim/all_reads_nano.fastq > 02_annotation/all_reads.sam
samtools view -q 40 -b -F 2304 02_annotation/all_reads.sam | samtools sort -@ 8 -o all_reads_filtered.bam

5847950 reads aligned

minimap2 -t 8 -ax map-ont genome_illumina_annot/cro_v2_asm.mmi 01_filtering/after_trim/all_reads_nano.fastq > all_reads.sam
samtools view -q 40 -b -F 2304 all_reads.sam > all_reads_filtered_test.bam

5699078 reads aligned

I counted the number of reads aligned and not the number of alignments, by doing:

samtools sort -n all_reads.bam | awk '{print $1}' | wc -l

It seems using the -ax splice worked better in this case. But my original question was if map-ont should be used to align RNA-seq reads to a genome

ADD REPLY • link 4.8 years ago by Rogerio Ribeiro ▴ 110

0

Entering edit mode

Sounds like in your case it is not going to make a big difference. I think using splice would be appropriate to align RNAseq data expected to be spliced, where one has reads longer than a 1-2 kb.

ADD REPLY • link 4.8 years ago by GenoMax 152k