Quantification using salmon in alignment-based mode after minimap2 run
2
0
Entering edit mode
1 day ago
Assa Yeroslaviz ★ 1.9k

I'm trying to quantify my ONT samples after using minimap2 to align them tho the genome.

The samples were extracted using a direct-RNA protocol and were therefore mapped using the fasta file from the ensembl reporitory, listing the chromosomes, not the transcriptome of the mouse.

REFERENCE='Mmu.GrCm39.fa' # chromosomes 
minimap2 -ax splice -k 14 -uf \
     --secondary=no -G 25000 -t 24 ${REFERENCE} file.fastq > file.sam

Now, the bam file lists the chromosomes in the header.

@HD     VN:1.6  SO:coordinate
@SQ     SN:1    LN:195154279
@SQ     SN:10   LN:130530862
@SQ     SN:11   LN:121973369
@SQ     SN:12   LN:120092757
@SQ     SN:13   LN:120883175
@SQ     SN:14   LN:125139656
@SQ     SN:15   LN:104073951
@SQ     SN:16   LN:98008968
@SQ     SN:17   LN:95294699
...

If I understand it correctly, to run salmon quant I must have a transcriptome as reference.

Does it mean, I have to re-run minimap2 against the mouse transcriptome, Mus_musculus.GRCm39.cdna.all.fa instead?

Are there other quantification workflows without re-runnung the mapping against the transcriptome?

Thanks

Assa

nanopore ont salmon • 257 views
ADD COMMENT
0
Entering edit mode

In case you were not aware ONT also provides a transcriptome workflow for long reads: https://github.com/epi2me-labs/wf-transcriptomes

ADD REPLY
0
Entering edit mode

Thanks for the comment. I am aware of the workflow. I saw they use StringTie for quantification. I've had good experience with the pseudo-alignment tools such as kallisto and salmon in the past for bulk-RNA projects, and thought I could apply it here as well.

ADD REPLY
2
Entering edit mode
1 day ago
Rob 7.1k

As ATpoint suggests, you must align against the transcriptome. Also, the other relevant flags should be set (i.e. --secondary=no will disable reporting secondary alignments, but you want to retain multimapping for the purposes of quantification). Regardless, if you are looking to quantify long read RNA-seq data (ONT, or PacBio reads), then I'd recommend you consider using oarfish (paper here) rather than salmon with the --ont flag. This is because we've designed oarfish from the ground up for long read quantification. Also, oarfish supports directly aligning your raw reads against a target transcriptome (it uses minimap2-rs, a rust wrapper around minimap2 internally, so you don't have to worry about getting all of the flags / settings right yourself).

ADD COMMENT
0
Entering edit mode

great response. thank for mentioning this one. Ill definitely try oarfish. would you recommend running it with the raw sequencing files or is it better to run it against the aligned bam files?

alignment-based:

$ REFERENCE='Mmu.GrCm39.cdna.fa'
$ minimap2 -t 30 --eqx -N 100 -ax map-ont ${REFERENCE} file.fastq | samtools view -@24 -b -o alignments.bam
$ oarfish -j 30 -a alignments.bam -o sample1 --filter-group no-filters --model-coverage

or reads-based:

$ oarfish -j 30 --reads file.fastq --annotated ${REFERENCE} --seq-tech ont-drna -o sample1 \
                --filter-group no-filters --model-coverage
ADD REPLY
0
Entering edit mode

Hi Assa,

Unless you have an additional use for the intermediate BAM file, I would recommend the read-based quantification mode. It is simpler, avoids the intermediate disk space (and I/O) of the BAM file, and, for that reason, is also a bit faster.

Best, Rob

ADD REPLY
1
Entering edit mode
1 day ago
ATpoint 89k

Yes, must strictly be transcriptome. Reference from developer: long read + salmon? (transcript abundance)

ADD COMMENT

Login before adding your answer.

Traffic: 3458 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6