I'm trying to analyze RNA-Seq data for a bacteria - Mycobacterium tuberculosis. I used the FASTA and GTF files from NCBI to create the index, and set the --genomeSAindexNbases at 8 based on this previous post. The bash script I used is: `
# load modules module load gcc/6.2.0 star/2.7.0a # launch star STAR --runThreadN 8 \ --runMode genomeGenerate \ --genomeDir /home/xyz/scratch/sanraffaele/indices/star/ \ --genomeFastaFiles ~/reference_data/NC000962_3.fasta \ --sjdbGTFfile ~/reference_data/NC000962_3.gtf \ --genomeSAindexNbases 8
The index generation is taking ~15 seconds, and on reviewing the files in the folder it appears that the index has only 70 or so transcripts. Between the short time to generate the index (genome length is 4M bp) and the presence of so few transcripts, I know that something is wrong. Any suggestions about what I should differently?