Question

Salmon WARNING: Detected suspicious pair

0

Entering edit mode

3.1 years ago

lluc.cabus ▴ 30

Hello,

I have a problem with using salmon with the output from STAR aligning to the transcriptome ordered by name and not by coordinate, the number of counts of salmon is very low.

I have looked at the salmon logs and I saw this warning a total of 196026 times:

WARNING: Detected suspicious pair ---
    The names are different:

After getting into the bam files, I saw that the first problematic read is one that appears 5 times in the bam file. Salmon is taking the 5th read and pairing with the next one, which is not its pair, and therefore there is a problem.

To illustrate the problem, the bam file looks like this:

A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      339     ENST00000593393.1       2170    1       34M     =       1945    -259    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:2  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      339     ENST00000593393.1       2346    1       34M     =       1945    -435    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:3  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      83      ENST00000444227.2       236     1       34M     =       11      -259    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:1  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      419     ENST00000593393.1       1945    1       46M     =       2170    259     GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAGCCCTTCCTGAA  FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:2  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      163     ENST00000444227.2       11      1       46M     =       236     259     GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAGCCCTTCCTGAA  FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5385:34053_AGATATAAAGTT     99      ENST00000624866.1       111     255     99M     =       99      99      TTAAAAAGGTGCCATTCCAGCCCTTTCCAGCTCTCACCTCCCCACTCCCTTATAAGTGACACCGCCTTTCCCCACCAGGCCCTGACTCAGGCCCAGAGA     FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF     NH:i:1  HI:i:1  RG:Z:CRC01_001

and the first warning of salmon is:

WARNING: Detected suspicious pair ---
    The names are different:
    read1 : A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC
    read2 : A00125:488:H2YHYDSX2:2:1120:5385:34053_AGATATAAAGTT

Do you know how could I solve this?

Thanks! Lluc

Salmon STAR • 1.4k views

ADD COMMENT • link updated 3.1 years ago by GenoMax 154k • written 3.1 years ago by lluc.cabus ▴ 30

2

Entering edit mode

It is also possible that your input fastq files were out of sync (perhaps trimmed independently). You should use repair.sh from BBMap suite to re-sync them to remove singletons. Then realign fixed files.

ADD REPLY • link 3.1 years ago by GenoMax 154k

0

Entering edit mode

Hi! The output from STAR to quantify using salmon? What are you trying to do? Why don't you give the fastq files directly to salmon?

ADD REPLY • link 3.1 years ago by iraun 6.2k

0

Entering edit mode

I need to use umi deduplication, so I need to use an aligner

ADD REPLY • link 3.1 years ago by lluc.cabus ▴ 30

0

Entering edit mode

I would double check that your fastq files used in STAR alignment are properly formatted and paired. Try running them through seqkit sana to remove or rescue malformed reads, and then seqkit pair to make sure that the R1 and R2 reads are properly paired.

You should also include all of the code you ran so we can check whether there were any errors.

ADD REPLY • link 3.1 years ago by rpolicastro 13k