Downstream analysis from STAR Alignment at transcript level
1
0
Entering edit mode
21 months ago
bassanio ▴ 70

Dear Team,

I have two groups of samples which have went through the following pipeline

RAW FASTQ > Trimmed Fastq >STAR (Genome) >Htseq-Count & Cufflinks

I was going through the Vignettes from IsoformSwitchAnalyzeR and I am unable to follow how to move forward

Under Isoform/transcript quantification Option A:

It's says to use quantification from Salmon/kalisto. In STAR aligner I use the genome sequence to align against but according to salmon and Kalisto it requires transcriptome fasta(cDNA) as reference. So how move forward? Is it as discussed in the salmon where we redo the alignment to cDNA?

Under Isoform/transcript quantification Option B:

I felt this is the most suitable method from the pipeline similar to the existing method I used. The confusion for me is on the step 4 . After running the Cuffmerge for creating the merged gtf for including novel transcript should I need to run Cuffdiff ? or Is it that we use the gtf file from cuffmerge and use salmon downstream? If so the above issue of different reference exist here also (genome fasta and cDNA fasta) secondly for using the salmon with the new gtf file which fasta I need to use

star IsoformSwitchAnalyzeR Transcript-level deseq2 salmon • 3.1k views
ADD COMMENT
0
Entering edit mode
21 months ago
Rob 7.2k

For option A, you also need to instruct STAR to produce a transcriptome-coordinate BAM file. This is easy to do; when you run STAR you should pass additional options --quantMode TranscriptomeSAM and ---quantTranscriptomeBan Singleend. See page 18 of the STAR manual for a description of these features. With the transcriptome BAM file, you can then quantify abundance at the isoform level using salmon.

ADD COMMENT
0
Entering edit mode

@Rob Thank you for the reply. I have both Transcriptome bam and genome bam. But as you see the salmon documentation salmon needs bam/sam aligned to the cDNA reference not the Chromosome(DNA) reference. My concern is with this

ADD REPLY
1
Entering edit mode

The transcriptome bam file contains the genomic alignments projected to the transcriptome annotation. You should be able to see this if you look at the header of the AlignedToTranscriptome.out.bam file. So long as those records (sequence names and lengths) match the transcriptome file you pass to salmon, everything should be in working order. In fact, this very pipeline STAR -- projected to transcriptome --> salmon is the default quantification pipeline in nf-core/rnaseq.

ADD REPLY
0
Entering edit mode

in the nfcore the salmon has index with both genome and transcriptome fasta before running salmon quant where as in the same pipeline in star only uses genome.fasta. My issue is that I don't have the transcript_fasta for which I have the used the genome fasta

cat $transcript_fasta $genome_fasta > $gentrome

salmon \\
    index \\
    --threads $task.cpus \\
    -t $gentrome \\
    -d decoys.txt \\
    $args \\
    -i salmon

But thank you I will follow as what you have suggested in the nfcore. Still I am Concerned [:-)]

ADD REPLY
1
Entering edit mode

That is the indexing rule, but if you look at the quantification rule, you'll see that the way salmon is invoked is different depending upon if the user is using "alignment mode" or if the user is providing the raw FASTQ files directly to salmon for it's builtin selective-alignment. For example, see here. Not to attempt to make an argument from authority here ;P, but I am an author and maintainer of salmon, and we use it ourselves frequently with the AlignedToTranscriptome.out.bam from STAR. It works well (you can read more in this paper we published a few years ago "Alignment and mapping methodology influence transcript abundance estimation".

Perhaps your concern is that you don't have the transcriptome FASTA file itself to pass to salmon? That should be fairly easy to obtain from the genome fasta and the GTF file you provided to STAR, and can be done using a tool like gffread or the rsem-prepare-reference tool that you can see used in nfcore here.

ADD REPLY
0
Entering edit mode

I think I kind of get the point of bassanio since I have a similar confusion:

STAR uses genome fasta and genome gtf files for indexing and alignment, producing transcriptome-coordinate BAM file (Aligned.toTranscriptome.out.bam) with additional options --quantMode TranscriptomeSAM. Then when using salmon for quantification, a transcript.fasta file is required. How about using a transcript fasta file directly downloaded from GENCODE/UCSC/...? Will the downloaded transcript.fasta file work as same as the tool generated fasta file (As you mentioned, tools like gffread or rsem-prepare-reference can be used for generating the transcript.fasta file)?

ADD REPLY
0
Entering edit mode

Why the double back-slashes?

ADD REPLY

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6