How to search for gene isoforms in FASTQ file
20 months ago
nkabo

Dear all,

I am a very beginner of RNA-seq analysis. I had completed the RNA-seq process by following the workflow in the link below; I had used Salmon and tximport:

https://bioconductor.org/packages/release/bioc/vignettes/tximport/inst/doc/tximport.html

After this part, I had used Ensembl for transcript annotation, but I only get the names for genes (not for transcripts) as shown below:

1. ENSG00000000003
2. ENSG00000000005
3. ENSG00000000419
4. ENSG00000000457 etc.

However, I was asked to detect whether a specific isoform of a given gene is present or not in the samples. The gene is present in the final result table (as an "average" of the present isoforms I assume). However, the result table provided me only with list of genes and not the transcripts, isoforms are not annotated.

It might be a silly question, but is it possible to obtain FASTA format of this desired specific isoform of the given gene and search it into the FASTQ file of the samples? Is it logical? Or could you suggest me a better way? I have the quantification and abundances, maybe I should use another annotation method to obtain transcript IDs but I am not sure about which part I should change. Thanks in advance

20 months ago
ATpoint

tximport aggregates or "summarizes" the isoforms to the gene level, so it is the sum rather than the average. That is the whole point of tximport so take the transcript level counts and make it ready for gene level analysis.

If you want to check the expression level of the isoforms you can simply take the TPMs per transcript produced by salmon in the quant.sf file in every of the quantification directories per sample.

Thanks a lot, it worked.

