Sjdboverhang Option In Star
2
11
Entering edit mode
10.2 years ago
Martombo ★ 3.1k

I have some difficulties in understanding the option sjdbOverhang in STAR. This option is set when making use of a splice junctions database. The manual defines it to be: "the length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)". It seems to be a very important option, because if it is set to 0 (default), the splice junctions database is not used.

I don't think it's the minimal alignment length for a read spanning the junction, because there's already the option alignSJDBoverhangMin for that, which is defined as "the minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments".

is it then the expected length maybe?

• 32k views
ADD COMMENT
14
Entering edit mode
10.2 years ago
Martombo ★ 3.1k

here's the answer from Alexander Dobin, the developer of the algorithm (to whom I'm very grateful):

the "Overhang" in these parameters has different meanings - bad naming choice, unfortunately.

The --sjdbOverhang is used only at the genome generation step, and tells STAR how many bases to concatenate from donor and acceptor sides of the junctions. If you have 100b reads, the ideal value of --sjdbOverhang is 99, which allows the 100b read to map 99b on one side, 1b on the other side. One can think of --sjdbOverhang as the maximum possible overhang for your reads.

On the other hand, --alignSJDBoverhangMin is used at the mapping step to define the minimum allowed overhang over splice junctions. For example, the default value of 3 would prohibit overhangs of 1b or 2b.

ADD COMMENT
2
Entering edit mode

This also means that for every different read-length to be aligned a new genome SA needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY
0
Entering edit mode

If we have data from 2 batches with different read lengths, does that means we suppose to map the data separately with different indexes?

ADD REPLY
2
Entering edit mode

In my understanding, that would be optimal, but since v2.4 STAR has an option to set --sjdbOverhang and other sjdb options on the fly during alignment: https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf

ADD REPLY
0
Entering edit mode

This is what I found. https://biocorecrg.github.io/RNAseq_course_2019/alnpractical.html It usually equals to the minimum read size minus 1; This also means that for every different read-length to be aligned a new STAR index needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY
0
Entering edit mode

Thanks for the follow up.

ADD REPLY
0
Entering edit mode

Here's how to find read length: How do I find out the read lenght of a fastq file?

ADD REPLY
5
Entering edit mode
10.2 years ago

This is quiet close to what you have asked:

https://groups.google.com/forum/#!topic/rna-star/J6qH9JCysZw

Basically, sjdbOverhang should be set as readlength -1. So if you have 75 bp read then it should be set to 74. Whereas, alignSJDBoverhangMin option ignores the alignment with a small spilce overhangs. I use the default settings for this parameter.

ADD COMMENT
0
Entering edit mode

thank you for the reply. I read that this option should be set to mate_length - 1, which should be equivalent to read_length -1, as you wrote. but do you know what it is used for, anyway?

ADD REPLY
1
Entering edit mode

We have two dataset. One dataset is generated in our lab and has 58 read length and other dataset obtained from a paper which contains read length of 75bp. In such case what should be the value of sjdboverhang

ADD REPLY
1
Entering edit mode

This is what I found. https://biocorecrg.github.io/RNAseq_course_2019/alnpractical.html It usually equals to the minimum read size minus 1; This also means that for every different read-length to be aligned a new STAR index needs to be generated. Otherwise a drop in aligned reads can be experienced.

ADD REPLY

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6