Dear BioStar,
I was trying to run htseq-count (htseq 0.11.0) after getting the bam files from STAR ( 2.6.0b). I have tried bam sorted both by name and by coordinates, but the program runs about 7 million bam lines and exits, with exit code 120. The output directory has an empty count file and a 4GB file called "name".
There are no other errors. Not sure if the warning suggests something bad - I saw some posts suggesting running the paired and unpaired reads separately but I still failed.
I requested 24 processors, 120 GB of memory and 6 hours of wall time on a Seadragon HPC. I'm pretty sure it was not the computing resource that limits it. Ref genome and GTF files are both from Gencode.
My questions are:
- what does exit code 120 mean?
- Any advice on how to make htseq-count work?
- Does the STAR's --quantMode GeneCounts give similar results as htseq-count? (from my reading of the STAR manual, it should the case.)
Any hints will be highly appreciated!
----------The following was from my script:---------------
module load samtools
module load htseq
mybam=/mypath/Aligned.sortedByCoord.out.bam
mygff=/mypath/gencode.vM19.annotation.gtf
htseq-count -f bam -r name -s reverse -t gene $mybam $mygff > Clone1_1_S7.count
------output--------------
Exited with exit code 120.
Resource usage summary:
CPU time : 1398.99 sec.
Max Memory : 1 GB
Average Memory : 0.68 GB
Total Requested Memory : 128.00 GB
Delta Memory : 127.00 GB
Max Swap : -
Max Processes : 4
Max Threads : 32
Run time : 1305 sec.
Turnaround time : 1306 sec.
The output (if any) follows:
100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
(i truncated it here)
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1809862 GFF lines processed.
Warning: Read NB500999:188:HC7G5BGX9:3:11512:26280:4597 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
(I truncated this part)
6900000 SAM alignment record pairs processed.
7000000 SAM alignment record pairs processed.
You could do that. But if you already have the aligned files you can try
featureCounts
as an alternate to htseq-count. You need to use sorted files for counting. featureCounts can sort the files for you automatically. You can also feed it all sample files to get a matrix of counts for your entire dataset where the rows are genes and samples are in each column.Thanks genomax. I will definitely try featureCounts.