Indexing human chromosome assembly of GRCh38.p14 using STAR
1
0
Entering edit mode
6 months ago
mthm ▴ 50

I want to index the genome assembly "GRCh38.p14" before aligning to my reads. however one parameter that STAR needs is the overhang length --sjdbOverhang ReadLength-1I only have the chromosome assembly and the gtf file, how should I find out what is the length of reads for this assembly?

star • 820 views
ADD COMMENT
0
Entering edit mode
6 months ago
GenoMax 141k

That parameter comes for your own data. If you have 100 bp reads then use that number for ReadLength.

From STAR manual:

specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junctions database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.

ADD COMMENT
0
Entering edit mode

Thank you I already have read the manual about the length. It is not my data, I am using the published human genome release 44. What I understood from the paper, is that they have used different combination of sequencing technologies (short read and long read) to generate the chromosome assembly. I would appreciate if anyone with the knowledge on this could help.

ADD REPLY
0
Entering edit mode

I am not sure then what your exact question is. STAR is an aligner and can't do any assemblies. The parameter you are asking about in original post is only relevant for creating an index from an existing reference genome file. Whether short or long read data was used for generating that assembly is not important for the purpose of creating the aligner index.

ADD REPLY
0
Entering edit mode

maybe I didn't really understand what is this "overhang length for constructing the splice junctions" parameter referring to, I thought it means the length of the raw reads. but anyway, I used the 100 and it apparently worked!

ADD REPLY

Login before adding your answer.

Traffic: 1780 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6