Question: Interpreting HTSeq output file
0
gravatar for makwana.kd
5 weeks ago by
makwana.kd10
makwana.kd10 wrote:

I have an output file (text format) which I exported into excel spreadsheet. I see three columns, but I do not see the numeric value for the counts. Is this normal?

Column1 Column2 Column3

"   XF"     Z   __ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2]

"   XF"     Z   __ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2]

"   XF"     Z   __alignment_not_unique

"   XF"        Z    __alignment_not_unique

"   XF"     Z   __alignment_not_unique

"   XF"       Z __alignment_not_unique

"   XF"       Z __no_feature

"   XF"       Z __no_feature

"   XF"     Z   __alignment_not_unique
rna-seq • 167 views
ADD COMMENTlink modified 4 weeks ago by brianj.park0 • written 5 weeks ago by makwana.kd10

3 (5) words: "never use excel" ( for this)

importing this kind of data files into excel can often cause unexpected behaviour.

You're better of processing this file commandline in your linux environment. I assume you ran the previous steps commandline as well, no?

Now for your specific issue: can you post the output of head <your htseq output file>

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by lieven.sterck4.5k

Hi Lieven, Following is the command i used :

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o counread.text

The bam file is name sorted

head command gives me the following output:

XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__no_feature XF:Z:__no_feature XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique

ADD REPLYlink modified 5 weeks ago • written 5 weeks ago by makwana.kd10

from which file is this the head ?

It does not looks to be from counread.text , is it? If so then the output from your htseq command is not correct

ADD REPLYlink written 5 weeks ago by lieven.sterck4.5k

Sorry, there was a misspelling in the above-mentioned command. This is the corrected one:

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o countread.text

Yes, the head command output was for countread.text

krishna@dntdaretouchit:/mnt/e/cannon$ head countread.text XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__ambiguous[ENSMUSG00000098178.1+ENSMUSG00000106106.2] XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique XF:Z:__no_feature XF:Z:__no_feature XF:Z:__alignment_not_unique XF:Z:__alignment_not_unique krishna@dntdaretouchit:/mnt/e/cannon$

ADD REPLYlink written 5 weeks ago by makwana.kd10

Did you not mention in a previous post you converted the bam file to sam format. If so then you need to change your htseq command accordingly.

In any case the output of your countread.text file is not correct (looks like a kind of sam format?)

ADD REPLYlink written 5 weeks ago by lieven.sterck4.5k

That was a different BAM file which was giving me an error, so I converted it to SAM file and I ran through HTSeq, that file gave me the following output:

chr1 3206084 255 1S139M = 3206084 -139 NTACAGTTAACCAACTTATACAGTTAACCAACTCCTACACTAGGTTCCTGAGCATTTCCTTAAACTTGCTAGTTCTGGTTTCCTGGCATGTGAGAGTAAGTCACATGGTAGGAGGCTGCCTTTCTATCJJJFJJFJJJAJFJJJFJFFJFJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJFJJJJJA<<jjjjjjjjjjjjjjjjjjjjjjjjjjjjjjfjjjjjjjffjjjjjfjfjjjjjjjjfjjjjjjjjjjjjjjafaaa nh:i:1="" hi:i:1="" as:i:276="" nm:i:0="" xf:z:ensmusg00000051951.5="" gwnj-0965:181:gw180227920:7:2124:9881:65265="" 163="" chr1="" 3206084="" 255="" 139m1s="3206084" 139="" tacagttaaccaacttatacagttaaccaactcctacactaggttcctgagcatttccttaaacttgctagttctggtttcctggcatgtgagagtaagtcacatggtaggaggctgcctttctatcattcaattttagn<="" p="">

Because I wanted to bypass the BAM-SAM conversion step (I have 36 files and each SAM file would be around 40,000,000KB), I wanted to try a different BAM file and hence I generated a new BAM file STAR aligner which was sorted by name and ran it through the HTSeq. And this is the file which is giving me above mentioned output in this post.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by makwana.kd10
1
gravatar for lieven.sterck
4 weeks ago by
lieven.sterck4.5k
VIB, Ghent, Belgium
lieven.sterck4.5k wrote:

OK, clearly I was not paying attention :/

You are looking at the wrong file counread.text which, despite its name, will contain the SAM alignments, not the read counts. The read count output is written to STDOUT (== your screen in this case) , you will have to capture that in a file to have the read count table

try a cmdline as follows:

htseq-count -m union -f bam -s no -r name ALZT22-2Cunsorted.bam geneassembly.gff3 -o alns.sam > countread.text

now your read count table will be in the file counread.text

ADD COMMENTlink written 4 weeks ago by lieven.sterck4.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1775 users visited in the last hour