Question: htseq-count cannot locate gff
0
gravatar for Melonhead
11 weeks ago by
Melonhead10
Melonhead10 wrote:

Hi there

I am struggling with the trivial task of running htseq-count. My command is as follows:

htseq-count -r pos \
-s no \
-t exonic_part \
-i transcripts \
--additional-attr=gene_id \
-m union \
--nonunique=all \
-f sam \
Analysis/Samples/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" \
Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff >  \
Analysis/Samples/htseq_output/"${prefix}_counts_dexseq_int.counts"

Every time it exits with an error message:

    usage: .htseq-count-real [options] alignment_file gff_file.htseq-count-real: error: the following arguments are required: featuresfilename

This is not the standard .gff gile. Yet, even if I use the standard Ensembl .gff and correct arguments for -i, -t options I am still receiving the same error.

What could be messed up in my command?

Will be grateful for any advice

rna-seq alignment • 162 views
ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by Melonhead10
1

Can you show us what you get if you do cat -A HTseq_count_int.sh? Or, try not to loop over ${prefix} but run the command in shell directly using one mapping sam with Tab completion for the file path, and see what you get.

If in the script it can detect the text Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff but the path is wrong, it should show something like:

Error occured when processing GFF file (line 1 of file Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff):
  [Errno 2] No such file or directory: 'Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff'
  [Exception type: FileNotFoundError, raised in __init__.py:47]
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by SMK1.8k

The output of cat -A HTseq_count_int.sh :

#!/usr/bin/bash$
$
#Script to convert .BAM to .SAM and quantify gene expression with HTseq$
$
#Get WD$
currentdir=$(cd -P -- "$(dirname -- "$0")" && pwd -P)$
wd=$(dirname $currentdir)$
wdsamp="Analysis/Samples"$
$
#Declare an array of IDs$
declare -a arr=("SRR8919665" $
                "SRR8919666"$
                "SRR8919667"$
                "SRR8919668"$
                "SRR8919669"$
                "SRR8919670"$
                "SRR8919671"$
                "SRR8919672"$
                "SRR8919673"$
                "SRR8919674")$
 $
#Function to convert bams to sams $
convert_to_sam () {$
samtools sort -o "$wd"/"$wdsamp"/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" -O sam -T "${prefix}" -@ 2 "$wd"/"$wdsamp"/bam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.bam"$
}$
$
#Function to compute counts with HTseq                               $
quantify_exp () {$
htseq-count \$
    -r pos \$
    -s no \$
    -t exonic_part \$
    -i transcripts \$
    --additional-attr=gene_id \$
    -m union --nonunique=all \$
    -f sam "$wd"/"$wdsamp"/sam/"${prefix}_Mapped.sortedByCoord.out.q10.R1.sam" \ $
    Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff > \$
    "$wd"/"$wdsamp"/htseq_output/"${prefix}_counts_dexseq_int.counts"$
}$
$
#Loop over$
for prefix in "${arr[@]}"; do convert_to_sam "$prefix" & done; wait && \$
for prefix in "${arr[@]}"; do htseq-count "$prefix" & done; wait$
$
ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Melonhead10

Oh yes, now I see a mistake. I mislabeled the function in for loop

Thank you!

ADD REPLYlink written 11 weeks ago by Melonhead10

You're welcome, Melonhead. Glad that the issue is solved.

ADD REPLYlink written 11 weeks ago by SMK1.8k

did you check whether the file (gff) is accessible from the location where you run the script? eg. can you do ls Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff ?

ADD REPLYlink written 11 weeks ago by lieven.sterck5.5k

It should be:

-rw-r--r-- 1  1,4K May 29 14:49 HTseq_count_int.log
-rw-r--r-- 1  1,6K May 29 14:49 HTseq_count_int.sh
-rw-r--r-- 1   70M May 29 14:49 Rattus_norvegicus.Rnor_6.0.96.dexseq.int.gff

I also tried providing the absolute path to the .gff. Still of no avail.

ADD REPLYlink modified 11 weeks ago • written 11 weeks ago by Melonhead10

Not to go too off-topic, but is there still any reason to use htseq-count over featureCounts? The last I've tried the former it was orders of magnitude slower than featureCounts.

ADD REPLYlink written 11 weeks ago by predeus1.2k
1

To be honest, I've never used any of these before :^)

I will gladly try different approach. Thank you for the suggestion!

ADD REPLYlink written 11 weeks ago by Melonhead10
1

AFAIK they basically do the same thing (unlike the fancy EM-based methods, that also attempt to rescue multi-mappers). But featureCounts is a lot faster, like I said.

ADD REPLYlink written 11 weeks ago by predeus1.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1131 users visited in the last hour