Question: Htseq-count result assistance
0
gravatar for Liftedkris
5 weeks ago by
Liftedkris0
Germany
Liftedkris0 wrote:

i want to use Htseq-count to be used to count the number of reads for the features i have. i have run the program and got the following as part of my result.

ENSG00000202526 0
ENSG00000206882 0
ENSG00000207129 0
ENSG00000207186 0
ENSG00000207277 0
ENSG00000207341 0
ENSG00000210082 5001
ENSG00000211459 1006
ENSG00000212138 0
ENSG00000212154 0
ENSG00000212171 0
ENSG00000212176 0
ENSG00000212204 0
ENSG00000212237 0

i got up to 500 genes but only about 4 or 5 has counts greater than one. is this correct or have i done something wrongly. thanks so much for your assistance

rna-seq • 170 views
ADD COMMENTlink modified 5 weeks ago by michael.ante3.2k • written 5 weeks ago by Liftedkris0
1

i got up to 500 genes

You get exactly the number of genes which were present in your gtf annotation file.

ADD REPLYlink written 5 weeks ago by WouterDeCoster38k
1

It is not uncommon that many genes are lowly or not expressed in RNA-seq. If you want beyond-eyeballing idea, load the counts into R (or similar) and create a histogram of the counts. It should approximately follow a negative binomial distribution.

ADD REPLYlink written 5 weeks ago by ATpoint15k

Are you suggesting that nothing is wrong with my analysis? I re-downloaded a new gtf file and got more counts between 1 and 3 but most genes still returned a count of zero

ADD REPLYlink written 5 weeks ago by Liftedkris0
1

Its possible nothing is wrong. You really need the full numbers. Depending on your GTF, most genes will have a count of zero, even in a good sample.... but how many can still be a mesaure of how good the sample is. If you are using the full ensembl GTF, which, I believe has about 50,000 genes in it, I'd probably expect that around 20k or so of them to be non-zero.

If you don't want to look at the full distribution in something like R, try the following:

awk '$2 > 0' counts.tsv | wc -l
ADD REPLYlink written 5 weeks ago by i.sudbery4.3k

Thanks very much sir! I have a total of 6601 genes. I am thinking this number represents the total genes present in my gtf file... the ones with values greater than zero should be the genes present in my sample. Is this correct?

ADD REPLYlink written 5 weeks ago by Liftedkris0

Thanks very much sir! I have a total of 6601 genes.

You would expect more genes in a human genome... how did you obtain this gtf?

ADD REPLYlink written 5 weeks ago by WouterDeCoster38k

I obtained it from ftp://ftp.ensembl.org/pub/release-95/gtf/homo_sapiens/

I downloaded the 3rd one

ADD REPLYlink written 5 weeks ago by Liftedkris0

And where did you get the genome reference (fasta) from. The one you used to align the reads?

ADD REPLYlink written 5 weeks ago by i.sudbery4.3k

Check your alignment stats. While aligning RNA-SEQ data to all patches, haplotype annotations you might get a lot of multi-mapping reads which will not be counted with default Htseq-count settings.

ADD REPLYlink written 5 weeks ago by michael.ante3.2k

Getting zero for a lot of genes is expected as not all genes are expressed in all tissues.

ADD REPLYlink written 5 weeks ago by WouterDeCoster38k

Hi @citynsukka I edit your post to make the list of counts format correctly. You can do this by highlighting the part of the post to be formatted as-is and pressing the code button (labelled 101010). I've also changed the category - the tool category is used to announce/promote new tools.

ADD REPLYlink written 5 weeks ago by i.sudbery4.3k
2
gravatar for michael.ante
5 weeks ago by
michael.ante3.2k
Austria/Vienna
michael.ante3.2k wrote:

Hi citynsukka,

The genes with a read count greater than 0 are MT-genes. Please check if your GTF is consistent with your reference used. Use head to get the chromosome names out of yor GTF file and samtools view -H to get the names out of your bam file.

Cheers,

Michael

ADD COMMENTlink written 5 weeks ago by michael.ante3.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 791 users visited in the last hour