I have used Salmon tool to quantify and was doing further analysis in R for Ref-based RNAseq. While using
txi.salmon function I came across this error :
reading in files with read_tsv 1 2 3 4 Error in .local(object, ...) : None of the transcripts in the quantification files are present in the first column of tx2gene. Check to see that you are using the same annotation for both.
Example IDs (file): [NMUH01000001.1, NMUH01000002.1, NMUH01000003.1, ...]
Example IDs (tx2gene): [rna-gnl|WGS:NMUH|Taro_000008-RA_mrna, rna-gnl|WGS:NMUH|Taro_000009-RA_mrna, rna-gnl|WGS:NMUH|Taro_000015-RA_mrna, ...]
I understand that the IDs in the txdb object and the quant.sf files are different. This is the gff file I have used :
My quant.sf file:
The IDs in quant.sf are sequence names and not the corresponding transcript IDs. This is really confusing me. If this has something to do with quantifying what should be changed ? Hope anyone would help me with this.
Thanks in advance.
When you were generating the salmon index where did you get the transcriptome from, or how did you generate it? Most sources will have some way to map transcripts to genes such as a GTF file, a database like biomart, etc.
The transcriptome and the gff were downloaded from ncbi. Downloaded ASM1336462v1 from this link.
The gff file has the gene to transcript mapping, so you can extract and use the information from it.
Yes, I understand that. Usually salmon would pick the correct transcript ID which would also be present in the gff file. But here as you can see the transcript IDs are different. So I don't understand how to match these.
Oh sorry, I didn't even notice that it quantified at the chromosome level. What code did you use to generate the transcriptome?
I used the genome fasta file downloaded from the link I attached previously.
You have to use a transcriptome, not a genome.