STAR index generation for bacterial genome
0
1
Entering edit mode
6 weeks ago

Hi,

I'm trying to analyze RNA-Seq data for a bacteria - Mycobacterium tuberculosis. I used the FASTA and GTF files from NCBI to create the index, and set the --genomeSAindexNbases at 8 based on this previous post. The bash script I used is: 

# load modules

# launch star
--runMode genomeGenerate \
--genomeDir /home/xyz/scratch/sanraffaele/indices/star/ \
--genomeFastaFiles ~/reference_data/NC000962_3.fasta \
--sjdbGTFfile ~/reference_data/NC000962_3.gtf \
--genomeSAindexNbases 8


The index generation is taking ~15 seconds, and on reviewing the files in the folder it appears that the index has only 70 or so transcripts. Between the short time to generate the index (genome length is 4M bp) and the presence of so few transcripts, I know that something is wrong. Any suggestions about what I should differently?

STAR bacteria index • 321 views
1
Entering edit mode

Since you don't need to worry about splicing there is no specific advantage to using STAR`. You could use any aligner.

it appears that the index has only 70 or so transcripts

Not sure what you mean by that. It is not unusual to have the index finish quickly. You have a small genome. You can try doing an alignment and see what you get.

0
Entering edit mode

Thank you - I will try that.

0
Entering edit mode

Update: I realized that generating the index needs only the FASTA file. The GTF file is necessary only if one is interested in generating a read count matrix. For bacterial GTF files, Alex Dobin recommends changing column 3 to "exon" for all entries as discussed in this post