Question: Read Counts From Sam File Mapped To De Novo Assembled Transcripts Using Htseq-Count
gravatar for alan.sm310
7.1 years ago by
alan.sm3100 wrote:


I tried using HTSeq-count to extract read counts per transcript from the SAM file (generated using Bowtie2 and only uniquely aligned reads were considered) mapped to de novo assembled transcripts (for DE analysis). I made GTF file for the assembled transcripts FASTA file with a Perl script. Here are few lines of my GTF file.

Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 AssembledTranscriptome exon 1 1485 . + . gene_id "AssemTrans1"; transcript_id "Locus_47_Transcript_16/31_Confidence_0.158_Length_1485";

Locus_58_Transcript_85/85_Confidence_0.017_Length_650 AssembledTranscriptome exon 1 650 . + . gene_id "AssemTrans1"; transcript_id "Locus_58_Transcript_85/85_Confidence_0.017_Length_650";

Transcript start is by default 1 and end is the length of the transcript and Strand is + for all.

It looks like it works great but I'm not sure if this is the right way to do it. If I have to worry about what Simon Anders as mentioned "If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene."

Any thoughts/comments/suggestions are much appreciated.

Thanks, Alan

sam • 3.6k views
ADD COMMENTlink modified 7.1 years ago by Vitis2.4k • written 7.1 years ago by alan.sm3100

Having not done similar studies I will only comment on what I feel I understood above. When you proceed in the way you describe the read counts describing a transcript will be affected by both actual expression level and uniqueness of various regions of the transcript. In that case the coverages would not correspond to the actual differential expression between transcripts.

ADD REPLYlink modified 7.1 years ago • written 7.1 years ago by Istvan Albert ♦♦ 84k
gravatar for Vitis
7.1 years ago by
New York
Vitis2.4k wrote:

I think some de novo assembler (like Trinity) try getting transcript forms from different splicing variants. So before mapping and counting, you may need to collapse the transcript forms to conform with Simon's comments about mapping to genes instead of transcripts. For that purpose, I used Vmatch ( before but I'm sure there are more other tools could do the same thing.

ADD COMMENTlink written 7.1 years ago by Vitis2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour