Question: How to have the count of reads mapping each transcript of a fasta file ?
gravatar for agtbeeman
26 days ago by
agtbeeman0 wrote:

Hi everybody,

I am new to bioinformatics and I am following a de novo transcriptom assembly workflow. I have about 20 paired end fastq files (10*2 files). The assembly has finished and I got a Trinity.fasta as output. I am now working on a susbset of interested transcripts in a fasta file : subset.fasta.

What I did is running a bowtie2 (using my reads and subset.fasta) and now I have 10 .bam files associated. I have checked some tools like HTseq but I am honestly lost in the documentation.

Does anyone know an easy ro reach my goal from there (the number of reads mapping each transcript of subset.fasta for each of my 10 samples) ?

Thank you for your support !

rna-seq alignment • 115 views
ADD COMMENTlink modified 26 days ago by h.mon31k • written 26 days ago by agtbeeman0

It's still not clear to me what you are doing. This is DNASeq? RNASeq? Do you have assemblies or just bams?

ADD REPLYlink written 26 days ago by swbarnes29.1k

It is RNA seq. From my reads files I got an assembly Trinity.fasta. I am working now on a subset of the Trinity.fasta : subset.fasta. And now I have 10 .bam files that I got thanks to bowtie2 (using as inputs all the reads and subset.fasta).

ADD REPLYlink modified 26 days ago • written 26 days ago by agtbeeman0

When you align to a subset of what you know is there, it can force reads to align to places they really don't belong. Better to align to the whole reference, then filter out what you don't care about after.

Samtools idxstats will quickly give you a count of how many reads aligned to each sequence in your reference, though you'll need to think about what you want to do with reads that align to multiple places, versus what Bowtie actually does to such reads.

ADD REPLYlink modified 26 days ago • written 26 days ago by swbarnes29.1k
gravatar for h.mon
26 days ago by
h.mon31k wrote:

I would advise you to perform the downstream analyses on the whole "Trinity.fasta" assembly, and not on a subset, as you may introduce unforeseen biases. For example, mapping on a subset may produce spurious mappings, as many reads may map to a wrong transcript if the correct transcript is missing from the assembly.

The Trinity wiki is a very rich source of documentation, and Trinity provides a rich plethora of scripts to perform downstream analyses. For example, you may read Trinity Transcript Quantification, which answers your question. Then, you could follow with QC Samples and Biological Replicates, or Trinity Differential Expression, among other suggestions from the Trinity docs.

ADD COMMENTlink written 26 days ago by h.mon31k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1307 users visited in the last hour