Question

Best Transcriptome file for Salmon after STAR alignment

0

Entering edit mode

11 months ago

rayanelkholdi • 0

Hi everyone !

I'm analyzing bulk RNAseq paired-end, this is my workflow for now:

fastp for QC and trimming
STAR for alignment to the genome (with --quantMode TranscriptomeSAM)
samtools to sort by coordinates and index the transcriptome.bam file generated by STAR
umi tools to deduplicate the umi
samtools collate to randomize the reads for salmon
Salmon to quantify

My question was about the transcriptome.fa file that I should give to Salmon as I mapped with STAR to the genome. Should I use the one from cDNA on Ensembl ? Or should I use gffread on the same genome fasta I used for my Star alignment and then use this generated transcriptome fasta for salmon ?

Thanks in advance !

fasta salmon star rna-seq alignment • 788 views

ADD COMMENT • link updated 11 months ago by biofalconch ★ 1.3k • written 11 months ago by rayanelkholdi • 0

0

Entering edit mode

It seems that it might be a little bit troublesome to go down the route you want to go trough. From the Salmon documentation:

Genomic vs. Transcriptomic alignments

Salmon expects that the alignment files provided are with respect to the transcripts given in the corresponding FASTA file. That is, Salmon expects that the reads have been aligned directly to the transcriptome (like RSEM, eXpress, etc.) rather than to the genome (as does, e.g. Cufflinks). If you have reads that have already been aligned to the genome, there are currently 3 options for converting them for use with Salmon. First, you could convert the SAM/BAM file to a FAST{A/Q} file and then use the lightweight-alignment-based mode of Salmon described below. Second, given the converted FASTA{A/Q} file, you could re-align these converted reads directly to the transcripts with your favorite aligner and run Salmon in alignment-based mode as described above. Third, you could use a tool like sam-xlate to try and convert the genome-coordinate BAM files directly into transcript coordinates. This avoids the necessity of having to re-map the reads. However, we have very limited experience with this tool so far.

You mapped against the genome so they give you three options or either remap the whole thing to a transcriptome fasta.

ADD REPLY • link 11 months ago by biofalconch ★ 1.3k