Question: HTSeq-count ambiguous results - all transcripts vs. single transcript
0
gravatar for oriolebaltimore
2.2 years ago by
United States
oriolebaltimore130 wrote:

My aim is to count reads aligned to introns of human BRCA1. I am using HTSeq-count. I made a GTF file by obtaining bed file of intronic regions (Hg38) from UCSC table browser. I converted BED file to GTF based on Ensembl data format.

Bed file:

17  43045802    43047642    NM_007294.3_intron_0_0_chr17_43045803_r 0   -
17  43047703    43049120    NM_007294.3_intron_1_0_chr17_43047704_r 0   -
17  43049194    43051062    NM_007294.3_intron_2_0_chr17_43049195_r 0   -
17  43051117    43057051    NM_007294.3_intron_3_0_chr17_43051118_r 0   -
17  43057135    43063332    NM_007294.3_intron_4_0_chr17_43057136_r 0   -
17  43063373    43063873    NM_007294.3_intron_5_0_chr17_43063374_r 0   -
17  43063951    43067607    NM_007294.3_intron_6_0_chr17_43063952_r 0   -
17  43067695    43070927    NM_007294.3_intron_7_0_chr17_43067696_r 0   -
17  43071238    43074330    NM_007294.3_intron_8_0_chr17_43071239_r 0   -

GTF file:

17  Refseq  intron  43045802    43047642    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_0";
17  Refseq  intron  43047703    43049120    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_1";
17  Refseq  intron  43049194    43051062    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_2";
17  Refseq  intron  43051117    43057051    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_3";
17  Refseq  intron  43057135    43063332    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_4";
17  Refseq  intron  43063373    43063873    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_5";
17  Refseq  intron  43063951    43067607    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_6";
17  Refseq  intron  43067695    43070927    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_7";
17  Refseq  intron  43071238    43074330    .   -   .   transcript_id "NM_007294"; intron_number "NM_007294_intron_8";

There are multiple transcripts for this gene. If I count reads aligned to introns of NM_007294, I get the result.

NM_007294_intron_0  28
NM_007294_intron_1  21
NM_007294_intron_2  31
NM_007294_intron_3  22
NM_007294_intron_4  30
NM_007294_intron_5  10
NM_007294_intron_6  96
NM_007294_intron_7  73
NM_007294_intron_8  42

Now when I run HTSeq on same bam file giving GTF file for all BRCA transcript, I cannot get the same result:

NM_007294_intron_0  0
NM_007294_intron_1  0
NM_007294_intron_10 0
NM_007294_intron_11 0
NM_007294_intron_12 0
NM_007294_intron_13 0
NM_007294_intron_14 0
NM_007294_intron_15 0
NM_007294_intron_16 0
NM_007294_intron_17 0
NM_007294_intron_18 0
NM_007294_intron_19 0
NM_007294_intron_2  0
NM_007294_intron_20 0
NM_007294_intron_21 0
NM_007294_intron_3  0
NM_007294_intron_4  0
NM_007294_intron_5  0
NM_007294_intron_6  0
NM_007294_intron_7  0
NM_007294_intron_8  0
NM_007294_intron_9  0
NM_007297_intron_0  0
NM_007297_intron_1  0
NM_007297_intron_10 0
NM_007297_intron_11 0
NM_007297_intron_12 0
NM_007297_intron_13 0
NM_007297_intron_14 0
NM_007297_intron_15 0
NM_007297_intron_16 0
NM_007297_intron_17 0
NM_007297_intron_18 0
NM_007297_intron_19 0
NM_007297_intron_2  0
NM_007297_intron_20 0
NM_007297_intron_3  0
NM_007297_intron_4  0
NM_007297_intron_5  0
NM_007297_intron_6  0
NM_007297_intron_7  0
NM_007297_intron_8  0
NM_007297_intron_9  0
NM_007298_intron_0  0
NM_007298_intron_1  0
NM_007298_intron_10 0
NM_007298_intron_11 0
NM_007298_intron_12 0
NM_007298_intron_13 0
NM_007298_intron_14 0
NM_007298_intron_15 0
NM_007298_intron_16 0
NM_007298_intron_17 0
NM_007298_intron_18 0
NM_007298_intron_19 0
NM_007298_intron_2  0
NM_007298_intron_20 0
NM_007298_intron_3  0
NM_007298_intron_4  0
NM_007298_intron_5  0
NM_007298_intron_6  0
NM_007298_intron_7  0
NM_007298_intron_8  0
NM_007298_intron_9  0
NM_007299_intron_0  0
NM_007299_intron_1  0
NM_007299_intron_10 0
NM_007299_intron_11 0
NM_007299_intron_12 0
NM_007299_intron_13 0
NM_007299_intron_14 0
NM_007299_intron_15 0
NM_007299_intron_16 0
NM_007299_intron_17 0
NM_007299_intron_18 0
NM_007299_intron_19 0
NM_007299_intron_2  0
NM_007299_intron_20 0
NM_007299_intron_3  0
NM_007299_intron_4  0
NM_007299_intron_5  0
NM_007299_intron_6  0
NM_007299_intron_7  0
NM_007299_intron_8  0
NM_007299_intron_9  0
NM_007300_intron_0  0
NM_007300_intron_1  0
NM_007300_intron_10 0
NM_007300_intron_11 0
NM_007300_intron_12 0
NM_007300_intron_13 0
NM_007300_intron_14 0
NM_007300_intron_15 0
NM_007300_intron_16 0
NM_007300_intron_17 0
NM_007300_intron_18 0
NM_007300_intron_19 0
NM_007300_intron_2  0
NM_007300_intron_20 0
NM_007300_intron_21 0
NM_007300_intron_22 0
NM_007300_intron_3  0
NM_007300_intron_4  0
NM_007300_intron_5  0
NM_007300_intron_6  0
NM_007300_intron_7  0
NM_007300_intron_8  0

NM_007300_intron_9  0

How is this possible. whe I give a gtf file with one transcript, I get counts, however when GTF file with all transcripts of BRCA1 the resulting counts are 0.

It cannot be chromosome mapping issue (chr17 vs. 17) or any other issue that I can think of.

Can anyone help understand this ambiguous behavior of HTSeq.

htseq-count -f bam -r name -s reverse -t intron -i intron_number myBAM.bam introns_NM_007294.gtf ---- WORKS.

htseq-count -f bam -r name -s reverse -t intron -i intron_number myBAM.bam introns_all_transcripts.gtf --- Doesn't WORK.

Thanks Adrian.

GTF file:

17      Refseq  intron  43045802        43047642        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_0";
17      Refseq  intron  43047703        43049120        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_1";
17      Refseq  intron  43049194        43051062        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_2";
17      Refseq  intron  43051117        43057051        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_3";
17      Refseq  intron  43057135        43063332        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_4";
17      Refseq  intron  43063373        43063873        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_5";
17      Refseq  intron  43063951        43067607        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_6";
17      Refseq  intron  43067695        43070927        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_7";
17      Refseq  intron  43071238        43074330        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_8";
17      Refseq  intron  43074521        43076487        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_9";
17      Refseq  intron  43076614        43082403        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_10";
17      Refseq  intron  43082575        43090943        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_11";
17      Refseq  intron  43091032        43091434        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_12";
17      Refseq  intron  43094860        43095845        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_13";
17      Refseq  intron  43095922        43097243        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_14";
17      Refseq  intron  43097289        43099774        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_15";
17      Refseq  intron  43099880        43104121        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_16";
17      Refseq  intron  43104261        43104867        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_17";
17      Refseq  intron  43104956        43106455        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_18";
17      Refseq  intron  43106533        43115725        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_19";
17      Refseq  intron  43115779        43124016        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_20";
17      Refseq  intron  43124115        43125270        .       -       .       transcript_id "NM_007294"; intron_number "NM_007294_intron_21";
17      Refseq  intron  43045802        43047642        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_0";
17      Refseq  intron  43047703        43049120        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_1";
17      Refseq  intron  43049194        43051062        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_2";
17      Refseq  intron  43051117        43057051        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_3";
17      Refseq  intron  43057135        43063332        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_4";
17      Refseq  intron  43063373        43063873        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_5";
17      Refseq  intron  43063951        43067607        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_6";
17      Refseq  intron  43067695        43070927        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_7";
17      Refseq  intron  43071238        43074330        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_8";
17      Refseq  intron  43074521        43076487        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_9";
17      Refseq  intron  43076614        43082403        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_10";
17      Refseq  intron  43082575        43090943        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_11";
17      Refseq  intron  43091032        43091434        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_12";
17      Refseq  intron  43094860        43095845        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_13";
17      Refseq  intron  43095922        43097243        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_14";
17      Refseq  intron  43097289        43099774        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_15";
17      Refseq  intron  43099880        43104121        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_16";
17      Refseq  intron  43104261        43104867        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_17";
17      Refseq  intron  43104956        43106455        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_18";
17      Refseq  intron  43106533        43124016        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_19";
17      Refseq  intron  43124115        43125276        .       -       .       transcript_id "NM_007297"; intron_number "NM_007297_intron_20";
17      Refseq  intron  43045802        43047642        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_0";
17      Refseq  intron  43047703        43049120        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_1";
17      Refseq  intron  43049194        43051062        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_2";
17      Refseq  intron  43051117        43057051        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_3";
17      Refseq  intron  43057135        43063332        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_4";
17      Refseq  intron  43063373        43063873        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_5";
17      Refseq  intron  43063951        43067607        .       -       .       transcript_id "NM_007298"; intron_number "NM_007298_intron_6";
rna-seq gene htseq • 710 views
ADD COMMENTlink modified 2.2 years ago • written 2.2 years ago by oriolebaltimore130

Thanks genomax for very clean formatting my question. I could not figure out how to do it.Thanks Adrian

ADD REPLYlink written 2.2 years ago by oriolebaltimore130

Counting programs do not count multimapping reads by default. Have you looked into that as a reason for getting 0 counts?

For future reference: You can use the formatting bar (especially the code option) by highlighting text you want to format and then clicking the button in red-box below. Formatting bar

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by genomax84k

i'm also going for the "multimapping answer" of genomax

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by lieven.sterck7.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 864 users visited in the last hour