htseq-count stopped with 0 bytes in count file
1
0
Entering edit mode
5.3 years ago
wbliu ▴ 20

Dear BioStar,

I was trying to run htseq-count (htseq 0.11.0) after getting the bam files from STAR ( 2.6.0b). I have tried bam sorted both by name and by coordinates, but the program runs about 7 million bam lines and exits, with exit code 120. The output directory has an empty count file and a 4GB file called "name".

There are no other errors. Not sure if the warning suggests something bad - I saw some posts suggesting running the paired and unpaired reads separately but I still failed.

I requested 24 processors, 120 GB of memory and 6 hours of wall time on a Seadragon HPC. I'm pretty sure it was not the computing resource that limits it. Ref genome and GTF files are both from Gencode.

My questions are:

  1. what does exit code 120 mean?
  2. Any advice on how to make htseq-count work?
  3. Does the STAR's --quantMode GeneCounts give similar results as htseq-count? (from my reading of the STAR manual, it should the case.)

Any hints will be highly appreciated!

----------The following was from my script:---------------

module load samtools
module load htseq

mybam=/mypath/Aligned.sortedByCoord.out.bam
mygff=/mypath/gencode.vM19.annotation.gtf

htseq-count -f bam -r name -s reverse -t gene $mybam $mygff >  Clone1_1_S7.count

------output--------------

Exited with exit code 120.

Resource usage summary:

    CPU time :                                   1398.99 sec.
    Max Memory :                                 1 GB
    Average Memory :                             0.68 GB
    Total Requested Memory :                     128.00 GB
    Delta Memory :                               127.00 GB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                32
    Run time :                                   1305 sec.
    Turnaround time :                            1306 sec.

The output (if any) follows:

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
(i truncated it here)
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1809862 GFF lines processed.
Warning: Read NB500999:188:HC7G5BGX9:3:11512:26280:4597 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
(I truncated this part)
6900000 SAM alignment record pairs processed.
7000000 SAM alignment record pairs processed.
RNA-Seq software error • 2.7k views
ADD COMMENT
0
Entering edit mode

Does the STAR's --quantMode GeneCounts give similar results as htseq-count? (from my reading of the STAR manual, it should the case.)

You could do that. But if you already have the aligned files you can try featureCounts as an alternate to htseq-count. You need to use sorted files for counting. featureCounts can sort the files for you automatically. You can also feed it all sample files to get a matrix of counts for your entire dataset where the rows are genes and samples are in each column.

ADD REPLY
0
Entering edit mode

Thanks genomax. I will definitely try featureCounts.

ADD REPLY
0
Entering edit mode
4.0 years ago
xrao ▴ 30

It could be due to that your home directory ran out of space, so that some log/output files cannot be written to your home directory. I had the similar problem, which was solved by putting enough space to my home directory. Hope that helps!

ADD COMMENT

Login before adding your answer.

Traffic: 1973 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6