Question

How to have the count of reads mapping each transcript of a fasta file ?

0

Entering edit mode

3.5 years ago

agtbeeman • 0

Hi everybody,

I am new to bioinformatics and I am following a de novo transcriptom assembly workflow. I have about 20 paired end fastq files (10*2 files). The assembly has finished and I got a Trinity.fasta as output. I am now working on a susbset of interested transcripts in a fasta file : subset.fasta.

What I did is running a bowtie2 (using my reads and subset.fasta) and now I have 10 .bam files associated. I have checked some tools like HTseq but I am honestly lost in the documentation.

Does anyone know an easy ro reach my goal from there (the number of reads mapping each transcript of subset.fasta for each of my 10 samples) ?

Thank you for your support !

RNA-Seq alignment • 1.0k views

ADD COMMENT • link updated 3.5 years ago by h.mon 35k • written 3.5 years ago by agtbeeman • 0

0

Entering edit mode

It's still not clear to me what you are doing. This is DNASeq? RNASeq? Do you have assemblies or just bams?

ADD REPLY • link 3.5 years ago by swbarnes2 14k

0

Entering edit mode

It is RNA seq. From my reads files I got an assembly Trinity.fasta. I am working now on a subset of the Trinity.fasta : subset.fasta. And now I have 10 .bam files that I got thanks to bowtie2 (using as inputs all the reads and subset.fasta).

ADD REPLY • link 3.5 years ago by agtbeeman • 0

1

Entering edit mode

When you align to a subset of what you know is there, it can force reads to align to places they really don't belong. Better to align to the whole reference, then filter out what you don't care about after.

Samtools idxstats will quickly give you a count of how many reads aligned to each sequence in your reference, though you'll need to think about what you want to do with reads that align to multiple places, versus what Bowtie actually does to such reads.

ADD REPLY • link 3.5 years ago by swbarnes2 14k

score 1 · Answer 1 · 2020-10-28

I would advise you to perform the downstream analyses on the whole "Trinity.fasta" assembly, and not on a subset, as you may introduce unforeseen biases. For example, mapping on a subset may produce spurious mappings, as many reads may map to a wrong transcript if the correct transcript is missing from the assembly.

The Trinity wiki is a very rich source of documentation, and Trinity provides a rich plethora of scripts to perform downstream analyses. For example, you may read Trinity Transcript Quantification, which answers your question. Then, you could follow with QC Samples and Biological Replicates, or Trinity Differential Expression, among other suggestions from the Trinity docs.