Question

Running salmon in alignment mode?

0

Entering edit mode

22 months ago

pearl2070 ▴ 10

This is my first time working with RNA data. So far, I've run the dataset through trimmomatic, sortmeRNA, megahit and BWA. I'm trying to run Salmon using the SAM file output from BWA and the .fa resulting from megahit as the transcriptome.

I run this line:

./salmon-1.8.0_linux_x86_64/bin/salmon quant -p 12 -t Sample1_megahit.contigs.fa -l A -a Sample1_megahit.annotation_bwa.sam -o Sample1_salmon

I get this error at the end of a stream of lines that all say a variation of "this transcript not found in reference" and I'm not sure what reference it's referring to: Please provide a reference FASTA file that includes all targets present in the BAM header.

Should I be passing the unassembled transcriptome from before megahit or something? The megahit file filtered to have only transcripts that had successful BWA alignments? I'm not sure how to do that. The data was originally paired end, if that is relevant.

Thanks in advance!

transcriptomics salmon metatranscriptomics rna rna-seq • 1.5k views

ADD COMMENT • link 22 months ago by pearl2070 ▴ 10

1

Entering edit mode

Which fasta did you use for BWA alignment?

ADD REPLY • link 22 months ago by Shred ★ 1.4k

0

Entering edit mode

For BWA, I ran:
bwa index -a bwtsw microbial_all_cds.fasta

Followed by:
bwa mem -t 32 microbial_all_cds.fasta Sample1_megahit.contigs.fa > Sample1_megahit.annotation_bwa.sam

ADD REPLY • link 22 months ago by pearl2070 ▴ 10

0

Entering edit mode

Salmon is telling you that the names of the contigs in Sample1_megahit.contigs.fa are not the same as the names of the contifg in Sample1_megahit.annotation_bwa.sam

ADD REPLY • link 22 months ago by i.sudbery 19k

0

Entering edit mode

What could be causing this? Could it be because there are many contigs in Sample1_megahit.contigs.fa that didn't get annotated, and therefore aren't present in Sample1_megahit.annotation_bwa.sam?

ADD REPLY • link 22 months ago by pearl2070 ▴ 10

0

Entering edit mode

In your comment to @Shred you say that you aligned to microbial_all_cds.fasta. If that's the case, then you must pass microbial_all_cds.fasta to salmon.

ADD REPLY • link 22 months ago by i.sudbery 19k

0

Entering edit mode

Where? If I run

./salmon-1.8.0_linux_x86_64/bin/salmon quant -p 12 -t microbial_all_cds.fasta -l A -a Sample1_megahit.annotation_bwa.sam -o Sample1_salmon

Then I encounter errors that read " Transcript appears twice in the transcript FASTA file" and "Transcript appears in the reference but did not appear in the BAM."

ADD REPLY • link updated 22 months ago by GenoMax 141k • written 22 months ago by pearl2070 ▴ 10

1

Entering edit mode

This means that you have multiple entires in your FASTA file that have the same name, which isn't allowed.

ADD REPLY • link 22 months ago by i.sudbery 19k

0

Entering edit mode

How do I resolve this? Is there some way to remove duplicates? I don't think I had a specific step to dereplicate sequences in my pipeline, actually. Could that have caused this problem?

ADD REPLY • link 22 months ago by pearl2070 ▴ 10

0

Entering edit mode

Are you sure megahit is appropriate to use with RNAseq data? It appears to be a genome assembler.

ADD REPLY • link 22 months ago by GenoMax 141k

0

Entering edit mode

I have seen it used in a few metatranscriptomics studies, but if it's likely to be the cause of this issue, I can try a different assembler.

ADD REPLY • link 22 months ago by pearl2070 ▴ 10