Question: Salmon warning detected suspicious pair
1
gravatar for bharata1803
3.5 years ago by
bharata1803420
Japan
bharata1803420 wrote:

I got this error while trying to do salmon from an aligned bam file. The bam file is an output of tophat from paired end fastq file. 

WARNING: Detected suspicious pair --- 
    The names are different:
    read1 : SRR2103637.12484815
    read2 : SRR2103637.12492049
    The proper-pair statuses are inconsistent:
read1 [SRR2103637.12484815] : proper-pair; mapped; matemapped

read2 : [SRR2103637.12484815] : no proper-pair; mapped; matemapped

 

My salmon command is:

salmon quant -t ../idx/Ch38.cdna.all.fa -l IU -a sorted.bam -g ../gene_map.txt -o salmon_out

I tried both unsorted bam and sorted bam and both resulted the same warning.

My questions is:

1. What can I do to fix this or I can just ignore it? 

2. For libtype parameter in Salmon, how to choose the correct parameter? How can I check whether it is inward, backward, or matching and stranded of not stranded? Can I check from the fastq files or it has something to check from the experiment itself? I downloaded the data from NCBI GEO and it said it is from Illumina HiSeq2500 100bp paired-end mode of the TruSeq Rapid PE Cluster kit and TruSeq Rapid SBS kit (Illumina) 

 

Thank you for your answer and suggestion.

 

rna-seq salmon • 1.7k views
ADD COMMENTlink modified 3.5 years ago by andrew.j.skelton735.6k • written 3.5 years ago by bharata1803420

Hi baharata,

One thing that seems strange to me in your description is this:

The bam file is an output of tophat from paired end fastq file

The fact that the alignments are the output of TopHat suggests that the alignment was against the genome.  However, Salmon requires alignment against the transcriptome (where the aligner might be e.g. Bowtie2).  It is, of course, possible that you aligned against the genome with TopHat and then converted the alignments into transcriptomic coordinates, but since this is uncommon and you didn't mention this, I assume this isn't the case.  Can you clarify if these alignments are to the genome or to the transcriptome (as Salmon expects).

ADD REPLYlink written 3.5 years ago by Rob3.3k

I used tophat to align fastq files to transcriptome reference from Ensembl (human cDNA reference) so I think it is okay to process with Salmon directly after that.

ADD REPLYlink written 3.4 years ago by bharata1803420

That's interesting, as TopHat is generally a split-read aligner that is used to map RNA-seq reads to the genome. When aligning directly to the transcriptome, one wants to avoid split-read mappings.  I might see how things differ on one of the samples if you map with Bowtie2 (or something comparable) instead of TopHat.

ADD REPLYlink written 3.4 years ago by Rob3.3k

I am having the same problem, did you manage to solve it. I have tried,

  1. all combinations of library type

  2. using fasta file of transcripts (.fa) as reference

  3. indexed transcripts as reference

The major problem I am facing is, I just have BAM files aligned using HISAT. I do not have any source FASTQ files.

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by EagleEye6.2k

Hi EagleEye,

If you are using pre-aligned reads with salmon, the "reference" sequences that you pass to salmon should be the FASTA file containing your reference transcripts. However, if your reads have been aligned using HISAT, the bigger concern is that the alignment is likely done with respect to the genome rather than the transcriptome. A tool like sam-xlate (mentioned here should be able to covert from genomic to transcriptomic alignments). It is also important to note that Salmon (like RSEM) requires the alignment records for a given read, in case of multimapping, to (1) be consecutive in the input SAM/BAM file and (2) for the records for mates (i.e. left and right reads) to be consecutive. The most common alignment tools (e.g. Bowtie2, BWA, STAR) do this, and I believe HISAT does as well (since it uses much of Bowtie2's infrastructure for file parsing / writing), but that is worth verifying. Finally, in a worst-case scenario, you could always consider trying to recover some FASTQ file from the BAM (using e.g. bam2fastq) and then performing your quantification with that.

ADD REPLYlink written 2.8 years ago by Rob3.3k

Thanks a lot for your suggestions, I will get back once I try these solutions.

ADD REPLYlink written 2.8 years ago by EagleEye6.2k
1
gravatar for andrew.j.skelton73
3.5 years ago by
London
andrew.j.skelton735.6k wrote:

Shouldn't your Salmon index be a directory, not a fasta file?

ADD COMMENTlink written 3.5 years ago by andrew.j.skelton735.6k

From the documentation, I am trying to use the alignment based and this is the command:

> ./bin/salmon quant -t transcripts.fa -l <LIBTYPE> -a aln.bam -o salmon_quant

I think the fasta file is correct to be used according to this documentation.

ADD REPLYlink written 3.5 years ago by bharata1803420

What version of Salmon? Rob Patro might see this and comment. As a quick check to make sure it's not Salmon that's the issue, you can convert the bam into fastq and try it with reads instead of alignment 

ADD REPLYlink written 3.5 years ago by andrew.j.skelton735.6k

I just updated to the latest, 0.5.0 I think. I used salmon before for single end fastq so this is the first time I tried with paired end. I have the original fastq, so I will try reads method.

ADD REPLYlink written 3.5 years ago by bharata1803420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 645 users visited in the last hour