Question

kallisto -s o error

0

Entering edit mode

13 months ago

1_GOld • 0

Hello, I am a student who is trying to do deg through kallisto. The fastq data I use is single end, all sequences are 100bp, and sd is 0. I used kallisto quant -i index -o output --single -l 100 -s 0 fastq and an error occurs. Is there a way to solve this problem? Thank you.

kallisto RNA-seq • 949 views

ADD COMMENT • link 13 months ago by 1_GOld • 0

0

Entering edit mode

Welcome to biostars 1_GOld! Please consider adding more information about the error in order to get a more useful answer. Copy pasting the error you are getting in the terminal is usually a good idea :).

ADD REPLY • link 13 months ago by iraun 6.2k

1

Entering edit mode

13 months ago

dsull ★ 5.8k

That's not what the -l and -s options are used for. Those options have nothing to do with your sequences being 100 bp.

Those options are for approximating the fragment length distribution which is needed to properly normalize for transcript length (remember: you're sequencing fragments of transcripts, not full transcripts).

If you don't know what to set for those, just go with something reasonable like -l 200 -s 20.

ADD COMMENT • link 13 months ago by dsull ★ 5.8k

0

Entering edit mode

thanks for your comment!

ADD REPLY • link 13 months ago by 1_GOld • 0

score 6 · Accepted Answer · 2023-03-09

Hi 1_GOld,

This can be very confusing when you first start, but the fragment length is not the same as the read length.

When sequencing is conducted, the nucleic acids are fragmented - this can be done physically, using something like sonication, enzymically, using a nuclease of some sort (e.g. RNAse I in Ribo-seq) or the library can be generated using a randomly inserting transposase, as in tagmentation. Either way, you are left with a collection of randomly generated "fragments" of the original molecule. Sequencing adaptors are then added to the ends of these fragments, and a fixed number of nucleotides is sequenced from one (single-end) or both (paired-end) ends of the fragment. Usually only a portion of the fragment is sequenced, and we rarely have the full sequence of a fragment.

So in RNA-seq a full length cDNA might be broken up so:

|>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>|
                                 | framgementation
                                 V
|>>>>>>>|  |>>>>>>>>>>>>>>>>>>>>|  |>>>>>>>>>>>>>>>>>>>>>>>>| |>>>>>|
                                 | adaptor ligation
                                 V
--|>>>>>>>|--    --|>>>>>>>>>>>>>>>>>>>>|--    --|>>>>>>>>>>>>>>>>>>>>>>>>|--   --|>>>>>|--
                                 |   Single end unstranded sequencing
                                 V
|->>>>>| 100nt               100nt |<<<<<-|    |->>>>>| 100nt                100nt |<<<<<-|
--|>150bp>|--    --|>>>>>>>300bp>>>>>>>>|--    --|>>>>>>>>>350bp>>>>>>>>>>|--   --|120bp|--

In the above, the fragments are 150, 300, 350 and 120bp in length (mean: 230, sd: 112), but the reads are all 100nt long.

You can have a guess at the fragment size distribution using the QC (e.g. a tape station or bioanalzyer run) from the library prep if you have it - if you had your library sequenced commercially, they might have provided this. In the days of full alignment based workflows, there were ways of estimating this from the alignments, but for alignment-free or pseudo-alignment methods, you will need to guess if you don't have the QC.