Question: Salmon and reads order in fastq in quasi-mapping-based mode
0
gravatar for ZheFrench
14 months ago by
ZheFrench190
France
ZheFrench190 wrote:

Really a naïve question but it's a little while I have this one in mind.

From Salmon doc : "If your reads or alignments do not appear in a random order with respect to the target transcripts, please randomize / shuffle them before performing quantification with Salmon."

I don't understand if they are talking about bam and fastq. I understand that a bam can be ordered in different ways but fastq....

But I'm using Salmon in quasi-mapping-based mode on the fastqs. I was wondering if they can be ordererd in a way that need to be shuffled before use with salmon ?

I mean when you download paired-end fastq using sra split-3 option, you will have several R1 & R2 files ordered(how by the way ? ) Same question when you received data directly from your sequencing platform. You pull your different R1 , R2 files separately to use salmon.

Do you need to shuffle the reads in fastq before launching salmon ?

salmon • 617 views
ADD COMMENTlink modified 14 months ago by ATpoint13k • written 14 months ago by ZheFrench190

Unless you have a co-ordinate sorted BAM alignment file that you converted to fastq (or you had used a program like clumpify from BBMap, which re-orders raw reads, when it does de-duplication based on sequence alone), you should not have your reads in any kind of order.

ADD REPLYlink modified 14 months ago • written 14 months ago by genomax62k
1
gravatar for ATpoint
14 months ago by
ATpoint13k
Germany
ATpoint13k wrote:

If the fastqs come right from the sequencer, you are fine. The thing is that you can transform BAM back to fastq for realignments/requantification, and as BAMs are often coordiate-sorted, the resulting fastq would not be randomly ordered. Therefore the recommendation is to shuffle fastq prior to quantification. Btw, the same holds true for every alignment. E.g. with BWA mem, the fastq is expected to be randomly ordered because BWA estimates the true insert sizes in paired-end mode from the chunk of reads that are currently processed. In case of coordinate-sorted fastq (from a BAM) you'd get chunks from repetitive or low-complexity regions which would skew insert size estimation for that region, leading to false mapping results. Random fastq order compensates for this, as the probability to get chunks that origin from the same genomic region are quiet low.

ADD COMMENTlink modified 14 months ago • written 14 months ago by ATpoint13k

This makes sense, but what about when putting BAM files directly into salmon?

I've taken fastq files from the sequencer, aligned them with STAR and then sorted with samtools, and then put the sorted BAMs into salmon. The text from the Salmon doc indicates that this could be a problem, but I'm not sure if it's only relevant to processing fastq files directly...

ADD REPLYlink written 5 days ago by MaxF10

The text from the Salmon doc indicates that this could be a problem

Which text? Did you align against a transcriptome rather than genome? See

ADD REPLYlink written 5 days ago by ATpoint13k

I used STAR in transcriptome mode (--quantMode TranscriptomeSAM), but I did align it against the hg38 genome.

The text I was referring to is what ATpoint quoted: "If your reads or alignments do not appear in a random order with respect to the target transcripts, please randomize / shuffle them before performing quantification with Salmon."

ADD REPLYlink written 5 days ago by MaxF10

Upon further testing, sorting the file with samtools was the problem.

ADD REPLYlink written 3 days ago by MaxF10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 633 users visited in the last hour