Question

Read Counts From Sam File Mapped To De Novo Assembled Transcripts Using Htseq-Count

0

Entering edit mode

10.7 years ago

alan.sm310 • 0

Hello,

I tried using HTSeq-count to extract read counts per transcript from the SAM file (generated using Bowtie2 and only uniquely aligned reads were considered) mapped to de novo assembled transcripts (for DE analysis). I made GTF file for the assembled transcripts FASTA file with a Perl script. Here are few lines of my GTF file.

Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 AssembledTranscriptome exon 1 1485 . + . gene_id "AssemTrans1"; transcript_id "Locus_47_Transcript_16/31_Confidence_0.158_Length_1485";

Locus_58_Transcript_85/85_Confidence_0.017_Length_650 AssembledTranscriptome exon 1 650 . + . gene_id "AssemTrans1"; transcript_id "Locus_58_Transcript_85/85_Confidence_0.017_Length_650";

Transcript start is by default 1 and end is the length of the transcript and Strand is + for all.

It looks like it works great but I'm not sure if this is the right way to do it. If I have to worry about what Simon Anders as mentioned "If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene."

Any thoughts/comments/suggestions are much appreciated.

Thanks, Alan

sam • 4.3k views

ADD COMMENT • link updated 10.7 years ago by Vitis ★ 2.5k • written 10.7 years ago by alan.sm310 • 0

0

Entering edit mode

Having not done similar studies I will only comment on what I feel I understood above. When you proceed in the way you describe the read counts describing a transcript will be affected by both actual expression level and uniqueness of various regions of the transcript. In that case the coverages would not correspond to the actual differential expression between transcripts.

ADD REPLY • link 10.7 years ago by Istvan Albert 100k

score 0 · Answer 1 · 2013-08-02

I think some de novo assembler (like Trinity) try getting transcript forms from different splicing variants. So before mapping and counting, you may need to collapse the transcript forms to conform with Simon's comments about mapping to genes instead of transcripts. For that purpose, I used Vmatch (http://www.vmatch.de) before but I'm sure there are more other tools could do the same thing.