Question: Salmon warning detected suspicious pair
gravatar for bharata1803
5.2 years ago by
bharata1803490 wrote:

I got this error while trying to do salmon from an aligned bam file. The bam file is an output of tophat from paired end fastq file. 

WARNING: Detected suspicious pair --- 
    The names are different:
    read1 : SRR2103637.12484815
    read2 : SRR2103637.12492049
    The proper-pair statuses are inconsistent:
read1 [SRR2103637.12484815] : proper-pair; mapped; matemapped

read2 : [SRR2103637.12484815] : no proper-pair; mapped; matemapped


My salmon command is:

salmon quant -t ../idx/Ch38.cdna.all.fa -l IU -a sorted.bam -g ../gene_map.txt -o salmon_out

I tried both unsorted bam and sorted bam and both resulted the same warning.

My questions is:

1. What can I do to fix this or I can just ignore it? 

2. For libtype parameter in Salmon, how to choose the correct parameter? How can I check whether it is inward, backward, or matching and stranded of not stranded? Can I check from the fastq files or it has something to check from the experiment itself? I downloaded the data from NCBI GEO and it said it is from Illumina HiSeq2500 100bp paired-end mode of the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina) 


Thank you for your answer and suggestion.


rna-seq salmon • 2.2k views
ADD COMMENTlink modified 5.2 years ago by andrew.j.skelton736.1k • written 5.2 years ago by bharata1803490

Hi baharata,

One thing that seems strange to me in your description is this:

The bam file is an output of tophat from paired end fastq file

The fact that the alignments are the output of TopHat suggests that the alignment was against the genome. However, Salmon requires alignment against the transcriptome (where the aligner might be e.g. Bowtie2). It is, of course, possible that you aligned against the genome with TopHat and then converted the alignments into transcriptomic coordinates, but since this is uncommon and you didn't mention this, I assume this isn't the case. Can you clarify if these alignments are to the genome or to the transcriptome (as Salmon expects).

ADD REPLYlink modified 14 months ago by _r_am32k • written 5.2 years ago by Rob4.6k

I used tophat to align fastq files to transcriptome reference from Ensembl (human cDNA reference) so I think it is okay to process with Salmon directly after that.

ADD REPLYlink written 5.2 years ago by bharata1803490

That's interesting, as TopHat is generally a split-read aligner that is used to map RNA-seq reads to the genome. When aligning directly to the transcriptome, one wants to avoid split-read mappings. I might see how things differ on one of the samples if you map with Bowtie2 (or something comparable) instead of TopHat.

ADD REPLYlink modified 13 months ago by _r_am32k • written 5.2 years ago by Rob4.6k

I am having the same problem, did you manage to solve it. I have tried,

  1. all combinations of library type

  2. using fasta file of transcripts (.fa) as reference

  3. indexed transcripts as reference

The major problem I am facing is, I just have BAM files aligned using HISAT. I do not have any source FASTQ files.

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by EagleEye6.7k

Hi EagleEye,

If you are using pre-aligned reads with salmon, the "reference" sequences that you pass to salmon should be the FASTA file containing your reference transcripts. However, if your reads have been aligned using HISAT, the bigger concern is that the alignment is likely done with respect to the genome rather than the transcriptome. A tool like sam-xlate (mentioned here should be able to covert from genomic to transcriptomic alignments). It is also important to note that Salmon (like RSEM) requires the alignment records for a given read, in case of multimapping, to (1) be consecutive in the input SAM/BAM file and (2) for the records for mates (i.e. left and right reads) to be consecutive. The most common alignment tools (e.g. Bowtie2, BWA, STAR) do this, and I believe HISAT does as well (since it uses much of Bowtie2's infrastructure for file parsing / writing), but that is worth verifying. Finally, in a worst-case scenario, you could always consider trying to recover some FASTQ file from the BAM (using e.g. bam2fastq) and then performing your quantification with that.

ADD REPLYlink written 4.5 years ago by Rob4.6k

Thanks a lot for your suggestions, I will get back once I try these solutions.

ADD REPLYlink written 4.5 years ago by EagleEye6.7k
gravatar for andrew.j.skelton73
5.2 years ago by
andrew.j.skelton736.1k wrote:

Shouldn't your Salmon index be a directory, not a fasta file?

ADD COMMENTlink written 5.2 years ago by andrew.j.skelton736.1k

From the documentation, I am trying to use the alignment based and this is the command:

> ./bin/salmon quant -t transcripts.fa -l <LIBTYPE> -a aln.bam -o salmon_quant

I think the fasta file is correct to be used according to this documentation.

ADD REPLYlink written 5.2 years ago by bharata1803490

What version of Salmon? Rob Patro might see this and comment. As a quick check to make sure it's not Salmon that's the issue, you can convert the bam into fastq and try it with reads instead of alignment

ADD REPLYlink modified 14 months ago by _r_am32k • written 5.2 years ago by andrew.j.skelton736.1k

I just updated to the latest, 0.5.0 I think. I used salmon before for single end fastq so this is the first time I tried with paired end. I have the original fastq, so I will try reads method.

ADD REPLYlink written 5.2 years ago by bharata1803490
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1752 users visited in the last hour