Problems with fragment length distribution output with Salmon
Entering edit mode
15 months ago
robeaumont • 0

Hi all,

New to RNA-Seq and I'm struggling with my Salmon alignment output. I tried to find an answer to this question on older posts but I couldn't locate any other discussions, so apologies in advance if this has been covered before.

My data is from Illumina 150 bp pair-end reads (5 control samples, 5 drug-treated). Using Salmon for alignment, I first created an index file with the reference transcriptome from NCBI (fasta format) and then ran each of my pair of fastq files against this reference index file. An example of the code is shown below.

salmon_quant -i reference_transcriptome -lA -1 fastq_1 -2 fastq_2 -o ouput_file_name --gcBias

When checking my output in MULTIQC, my 'Fragment Length Distribution' is primarily located around 300 bp, when i've been informed this should be around 150 bp. Previous data from the lab using Salmon with 100 bp and 150 bp pair-end reads showed the distribution around the 100 and 150 bp mark, so I'm confused as to why mine is predominantly located around the 300 bp range.

Any suggestions on why this is? Is there something wrong/missing from my code? The image from my MULTIQC is shown below

Thanks in advance!

enter image description here

distribution Salmon RNA-Seq fragment length multiqc • 648 views
Entering edit mode
15 months ago

There is nothing wrong with this. Whoever made the library probably selected for fragments of this size. You lose transcriptome information from looking at short 150 bp fragments which will contain overlapping paired-end reads (although if you are calling variants or something it can be useful) so this is actually to your benefit.


Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6