Hello,
I have a problem with using salmon with the output from STAR aligning to the transcriptome ordered by name and not by coordinate, the number of counts of salmon is very low.
I have looked at the salmon logs and I saw this warning a total of 196026 times:
WARNING: Detected suspicious pair ---
    The names are different:
After getting into the bam files, I saw that the first problematic read is one that appears 5 times in the bam file. Salmon is taking the 5th read and pairing with the next one, which is not its pair, and therefore there is a problem.
To illustrate the problem, the bam file looks like this:
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      339     ENST00000593393.1       2170    1       34M     =       1945    -259    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:2  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      339     ENST00000593393.1       2346    1       34M     =       1945    -435    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:3  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      83      ENST00000444227.2       236     1       34M     =       11      -259    GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGA      FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF      NH:i:3  HI:i:1  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      419     ENST00000593393.1       1945    1       46M     =       2170    259     GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAGCCCTTCCTGAA  FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:2  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC      163     ENST00000444227.2       11      1       46M     =       236     259     GCCCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAGCCCTTCCTGAA  FFFFFFFFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFF  NH:i:3  HI:i:1  RG:Z:CRC01_001
A00125:488:H2YHYDSX2:2:1120:5385:34053_AGATATAAAGTT     99      ENST00000624866.1       111     255     99M     =       99      99      TTAAAAAGGTGCCATTCCAGCCCTTTCCAGCTCTCACCTCCCCACTCCCTTATAAGTGACACCGCCTTTCCCCACCAGGCCCTGACTCAGGCCCAGAGA     FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF     NH:i:1  HI:i:1  RG:Z:CRC01_001
and the first warning of salmon is:
WARNING: Detected suspicious pair ---
    The names are different:
    read1 : A00125:488:H2YHYDSX2:2:1120:5376:2440_TTCAGGAAGGGC
    read2 : A00125:488:H2YHYDSX2:2:1120:5385:34053_AGATATAAAGTT
Do you know how could I solve this?
Thanks! Lluc
It is also possible that your input fastq files were out of sync (perhaps trimmed independently). You should use
repair.shfrom BBMap suite to re-sync them to remove singletons. Then realign fixed files.Hi! The output from
STARto quantify usingsalmon? What are you trying to do? Why don't you give the fastq files directly tosalmon?I need to use umi deduplication, so I need to use an aligner
I would double check that your fastq files used in STAR alignment are properly formatted and paired. Try running them through seqkit sana to remove or rescue malformed reads, and then seqkit pair to make sure that the R1 and R2 reads are properly paired.
You should also include all of the code you ran so we can check whether there were any errors.