How to know the genome indices from STAR is correct
1
0
Entering edit mode
7 weeks ago
Chris ▴ 30

Hi Bioinformaticians,

I run STAR with this command:

STAR --runThreadN ${NSLOTS} \ --runMode genomeGenerate \ --genomeDir /home/doan/hg38/hg38_index_new \ --genomeFastaFiles /home/doan/hg38/Homo_sapiens.GRCh38.dna_sm.prima$ --sjdbGTFfile /home/doan/hg38/Homo_sapiens.GRCh38.107.gtf \ --sjdbOverhang 99

I got this output:

chrLength.txt geneInfo.tab sjdbInfo.txt
chrNameLength.txt Genome sjdbList.fromGTF.out.tab
chrName.txt genomeParameters.txt sjdbList.out.tab
chrStart.txt Log.out transcriptInfo.tab
exonGeTrInfo.tab SA
exonInfo.tab SAindex

The size of this genome indices is 28Gb. Then I run alignment but the size of all output files is 0. Would anyone please tell me what is wrong?

STAR • 1.2k views
1
Entering edit mode
7 weeks ago
GenoMax 120k

Then I run alignment but the size of all output files is 0.

You are not running an alignment just yet. A STAR genomegenerate job can take an hour or two for human genome. If you are simply looking at the files right after starting the job then they will be zero bytes. If the job completed and you still have some files that are zero bytes then you should look at the log file. If you did not capture the log during run then you may need to do so again and capture the standard out/error streams to files.

If you had not generated the genome index then what were you trying to align against in your last thread?

0
Entering edit mode

Creating genome indices took less than 1 hour with the output I listed above but the alignment is less than 1 minute so as you said there was something wrong here. A submitted job on the server, a job will disappear from status when it finishes.

1
Entering edit mode

Are you sure the genome generate job completed successfully i.e. there were no errors? Can you show a listing of the files above so we can see their sizes? e.g. ls -lh *?

0
Entering edit mode

I run STAR by submitting the script so maybe the error if exists, it won't show as run directly from the shell.

-rw-r-----. 1 doanc2 doanc2 1.2K Aug 4 14:54 chrLength.txt
-rw-r-----. 1 doanc2 doanc2 3.1K Aug 4 14:54 chrNameLength.txt
-rw-r-----. 1 doanc2 doanc2 1.9K Aug 4 14:54 chrName.txt
-rw-r-----. 1 doanc2 doanc2 2.1K Aug 4 14:54 chrStart.txt
-rw-r-----. 1 doanc2 doanc2 56M Aug 4 14:53 exonGeTrInfo.tab
-rw-r-----. 1 doanc2 doanc2 23M Aug 4 14:54 exonInfo.tab
-rw-r-----. 1 doanc2 doanc2 2.4M Aug 4 14:53 geneInfo.tab
-rw-r-----. 1 doanc2 doanc2 3.0G Aug 4 15:38 Genome
-rw-r-----. 1 doanc2 doanc2 844 Aug 4 15:38 genomeParameters.txt
-rw-r-----. 1 doanc2 doanc2 34K Aug 4 15:38 Log.out
-rw-r-----. 1 doanc2 doanc2 24G Aug 4 15:38 SA
-rw-r-----. 1 doanc2 doanc2 1.5G Aug 4 15:38 SAindex
-rw-r-----. 1 doanc2 doanc2 12M Aug 4 15:34 sjdbInfo.txt
-rw-r-----. 1 doanc2 doanc2 12M Aug 4 14:54 sjdbList.fromGTF.out.tab
-rw-r-----. 1 doanc2 doanc2 8.8M Aug 4 15:34 sjdbList.out.tab
-rw-r-----. 1 doanc2 doanc2 16M Aug 4 14:54 transcriptInfo.tab

I am not sure if this error is related or not but it is the content of a file name star.e101042.

/usr/global/sge/default/spool/fenn03/job_scripts/101042: line 62: let: TOTAL=1659388592 - : syntax error: operand expected (error token is "- ")

(standard_in) 2: syntax error

1
Entering edit mode

These files look to be about the right size when I compare them (for qualitative reason) so I am going to hazard a guess that the index should be good.

What do you see when you do tail -n 6 Log.out in directory above? It should show something like following if the index is complete.

Jan 29 14:41:26 ... writing SAindex to disk
Writing 8 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Writing 120 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Writing 1565873491 bytes into .//SAindex ; empty space on disk = 1209157942247424 bytes ... done
Jan 29 14:41:33 ..... finished successfully
DONE: Genome generation, EXITING


It is possible that if you copy/pasted the genome generate command from like a PDF it is possible that hyphens were converted to "smart hyphens" (are you on macOS by chance?).

0
Entering edit mode

Yes, I am on macOS and I am surprised when I type period here, it was converted to a question mark.

tail -n 6 Log.out

Number of fastq files for each mate = 1

1
Entering edit mode

I was asking you for the tail output of the Log.out file for index creation. Looks like you probably ran the alignment in the same directory so the output must have got overwritten with one for the alignment.

Looks like your input file is not in the same directory, or has the proper path or has the correct name. Which of the three is an issue?

0
Entering edit mode

Sorry for misunderstanding your request. Here is the output of Log.out file for index creation.

    tail -n 6 Log.out

Aug 04 15:38:43 ... writing SAindex to disk
Writing 8 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done
Writing 120 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done
Writing 1565873491 bytes into /home/doanc2/hg38/hg38_index//SAindex ; empty space on disk = 378587739848704 bytes ... done
Aug 04 15:38:46 ..... finished successfully
DONE: Genome generation, EXITING

1
Entering edit mode

Looks like your index is OK. So the issue you are having with alignments should not be related to the index.

0
Entering edit mode

Thank you so much for your help! Would you have any recommendations for me to fix the alignment issue?

1
Entering edit mode

While it is not recommended you could simply type the STAR command out on the login/head node prompt and see if job starts (running it interactively). Be ready to kill the job (ctrl + C) so it does not actually continue running. Once you know the command works (i.e. it does not generate any errors), you can simply copy/paste it in your job submission script. This will help you debug the issue with file paths etc.

0
Entering edit mode

You answered the title question. Yes, it is. Because the output files after alignment are 0 sizes so converting from a wrong sam file to a bam file doesn't make any sense. Do I need to create a new thread?

1
Entering edit mode

Did you try running the STAR command directly on the terminal prompt as I suggested above?

0
Entering edit mode

Aug 09 12:43:41 ..... started STAR run
Aug 09 12:45:05 ..... started mapping

1
Entering edit mode

Ok what ever this command line is it seems to be working. Kill this job and then copy this command into your job submission script.

0
Entering edit mode

GenoMax I open the file created from the submitted job and got this:

cat test.e101168


cat: /tmp/101168.1.all.q/machines: No such file or directory

The error when run STAR:

cat star.e101169


line 69: let: TOTAL=1660095741 - : syntax error: operand expected (error token is "- ") (standard_in) 2: syntax error