How can I produce gene level quantification using Salmon pseudo-aligner?
3
0
Entering edit mode
5.8 years ago
Angelique ▴ 10

Hi !

I am using Salmon in order to permform pseudo-alignment on paired end rna-seq data. I want a gene quantification but i obtain files cith transcripts quantification : command line used :

salmon quant -i Transcriptome_GH38_release_92/Homo_sapiens.GRCh38.92.cdna.ncrna.fa_quasi_index/ -l A -1 Test/SRX2264036_1.fastq.gz -2 Test/SRX2264036_2.fastq.gz -o test_quanti_36 -p 8

extract of obtained quantification file :

Name    Length  EffectiveLength TPM NumReads
ENST00000434970.2   9   5.093   0.000000    0.000000
ENST00000448914.1   13  6.885   0.000000    0.000000
ENST00000415118.1   8   4.489   0.000000    0.000000
ENST00000632684.1   12  6.443   0.000000    0.000000
ENST00000430425.1   17  9.050   0.000000    0.000000
ENST00000390578.1   31  15.313  0.000000    0.000000
ENST00000450276.1   17  9.050   0.000000    0.000000
ENST00000431870.1   16  8.504   0.000000    0.000000
ENST00000390567.1   20  10.664  0.000000    0.000000
ENST00000390590.1   31  15.313  0.000000    0.000000

I tried to used the -g option to provide a gtf annotation file but the resulting file is still at the transcript level.

How can I produce gene level quantification using Salmon ?

Thank you & Have a good day

Salmon quantification Gene RNA-Seq • 5.0k views
ADD COMMENT
3
Entering edit mode
5.8 years ago
ATpoint 81k

Never used Salmon with -g but there is the tximport package to aggregate transcript quantifications to the gene level. Was developped for exactly this purpose.

ADD COMMENT
0
Entering edit mode
5.8 years ago

It is also possible to get counts aggregated on the gene level with salmon directly. I am not aware of the exact command on the command line, since I run salmon on the Galaxy platform, but it is possible to provide a table matching each transcript to a gene and get a seperate output for counts at the transcript and the gene level.

Note that depending on which tool you will use for your downstream analysis you will need either TPM or raw counts.

ADD COMMENT
0
Entering edit mode
2.4 years ago

Generate a simple tab deliminate text file in the following format:

transcript_id   gene_id
ENST00000456328.2   ENSG00000223972.5
ENST00000461467.1   ENSG00000237613.2

And then use it instead of your annotation file in -g option. You can use the folowing python code to convert gtf format (downloaded from GENCODE) to the mentioned format:

f_out=open(file='output file for salmon.txt',mode='w')
f_out.write('transcript_id\tgene_id\n')
with open(file='input file downloaded from GENCODE.gtf',mode='r') as f_in:
    for line in f_in:
        if(line[0]!='#'):
            id_column=line.split('\t')[8]
            gene_id=id_column.split(';')[0]
            tr_id=id_column.split(';')[1]
            if(('gene_id' in gene_id)&('transcript_id' in tr_id)):
                gene_id=gene_id.replace('gene_id', '')
                gene_id=gene_id.replace('"', '')
                gene_id=gene_id.strip()
                tr_id=tr_id.replace('transcript_id', '')
                tr_id=tr_id.replace('"', '')
                tr_id=tr_id.strip()
                f_out.write(tr_id+'\t'+gene_id+'\n')

f_out.close()
ADD COMMENT

Login before adding your answer.

Traffic: 1709 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6