Question: htseq-count stopped with 0 bytes in count file
0
gravatar for wbliu
10 months ago by
wbliu0
wbliu0 wrote:

Dear BioStar,

I was trying to run htseq-count (htseq 0.11.0) after getting the bam files from STAR ( 2.6.0b). I have tried bam sorted both by name and by coordinates, but the program runs about 7 million bam lines and exits, with exit code 120. The output directory has an empty count file and a 4GB file called "name".

There are no other errors. Not sure if the warning suggests something bad - I saw some posts suggesting running the paired and unpaired reads separately but I still failed.

I requested 24 processors, 120 GB of memory and 6 hours of wall time on a Seadragon HPC. I'm pretty sure it was not the computing resource that limits it. Ref genome and GTF files are both from Gencode.

My questions are:

  1. what does exit code 120 mean?
  2. Any advice on how to make htseq-count work?
  3. Does the STAR's --quantMode GeneCounts give similar results as htseq-count? (from my reading of the STAR manual, it should the case.)

Any hints will be highly appreciated!

----------The following was from my script:---------------

module load samtools
module load htseq

mybam=/mypath/Aligned.sortedByCoord.out.bam
mygff=/mypath/gencode.vM19.annotation.gtf

htseq-count -f bam -r name -s reverse -t gene $mybam $mygff >  Clone1_1_S7.count

------output--------------

Exited with exit code 120.

Resource usage summary:

    CPU time :                                   1398.99 sec.
    Max Memory :                                 1 GB
    Average Memory :                             0.68 GB
    Total Requested Memory :                     128.00 GB
    Delta Memory :                               127.00 GB
    Max Swap :                                   -
    Max Processes :                              4
    Max Threads :                                32
    Run time :                                   1305 sec.
    Turnaround time :                            1306 sec.

The output (if any) follows:

100000 GFF lines processed.
200000 GFF lines processed.
300000 GFF lines processed.
400000 GFF lines processed.
(i truncated it here)
1600000 GFF lines processed.
1700000 GFF lines processed.
1800000 GFF lines processed.
1809862 GFF lines processed.
Warning: Read NB500999:188:HC7G5BGX9:3:11512:26280:4597 claims to have an aligned mate which could not be found in an adjacent line.
100000 SAM alignment record pairs processed.
200000 SAM alignment record pairs processed.
(I truncated this part)
6900000 SAM alignment record pairs processed.
7000000 SAM alignment record pairs processed.
rna-seq software error • 588 views
ADD COMMENTlink modified 10 months ago by genomax74k • written 10 months ago by wbliu0

Does the STAR's --quantMode GeneCounts give similar results as htseq-count? (from my reading of the STAR manual, it should the case.)

You could do that. But if you already have the aligned files you can try featureCounts as an alternate to htseq-count. You need to use co-ordinate sorted files for counting. featureCounts can sort the files for you automatically. You can also feed it all sample files to get a matrix of counts for your entire dataset where the rows are genes and samples are in each column.

ADD REPLYlink modified 10 months ago • written 10 months ago by genomax74k

Thanks genomax. I will definitely try featureCounts.

ADD REPLYlink written 10 months ago by wbliu0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1876 users visited in the last hour