Read Counts From Sam File Mapped To De Novo Assembled Transcripts Using Htseq-Count
1
0
Entering edit mode
10.7 years ago
alan.sm310 • 0

Hello,

I tried using HTSeq-count to extract read counts per transcript from the SAM file (generated using Bowtie2 and only uniquely aligned reads were considered) mapped to de novo assembled transcripts (for DE analysis). I made GTF file for the assembled transcripts FASTA file with a Perl script. Here are few lines of my GTF file.

Locus_47_Transcript_16/31_Confidence_0.158_Length_1485 AssembledTranscriptome exon 1 1485 . + . gene_id "AssemTrans1"; transcript_id "Locus_47_Transcript_16/31_Confidence_0.158_Length_1485";

Locus_58_Transcript_85/85_Confidence_0.017_Length_650 AssembledTranscriptome exon 1 650 . + . gene_id "AssemTrans1"; transcript_id "Locus_58_Transcript_85/85_Confidence_0.017_Length_650";

Transcript start is by default 1 and end is the length of the transcript and Strand is + for all.

It looks like it works great but I'm not sure if this is the right way to do it. If I have to worry about what Simon Anders as mentioned "If you must align against the transcriptome, make sure that you count for genes, not transcripts, and remove reads mapping to transcripts from more than one gene."

Any thoughts/comments/suggestions are much appreciated.

Thanks, Alan

sam • 4.3k views
ADD COMMENT
0
Entering edit mode

Having not done similar studies I will only comment on what I feel I understood above. When you proceed in the way you describe the read counts describing a transcript will be affected by both actual expression level and uniqueness of various regions of the transcript. In that case the coverages would not correspond to the actual differential expression between transcripts.

ADD REPLY
0
Entering edit mode
10.7 years ago
Vitis ★ 2.5k

I think some de novo assembler (like Trinity) try getting transcript forms from different splicing variants. So before mapping and counting, you may need to collapse the transcript forms to conform with Simon's comments about mapping to genes instead of transcripts. For that purpose, I used Vmatch (http://www.vmatch.de) before but I'm sure there are more other tools could do the same thing.

ADD COMMENT

Login before adding your answer.

Traffic: 2646 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6