Question: Kallisto in snakePipes instead of salmon
1
gravatar for arunprasanna83
8 months ago by
arunprasanna8340 wrote:

Hi,

Salmon version installed inside snakepipes env seems to be 0.7.x, whereas latest version -s 1.1.0. Following are the errors encountered:

After it starts computing gene-level abundance: too many lines with

[jointLog] [warning] Feature has no GFF ID
[jointLog] [info] There were 0 transcripts mapping to 0 genes
[jointLog] [warning] couldn't find transcritpt named [xxxx] in transcript <-> gene map; returning transcript as it's own gene
[jointLog] [warning] NOTE: We recommend using tximport for aggregating transcript-level salmon abundance...

apparently the warning in last line, not seems to be straight-forward and an issue discussed in https://github.com/COMBINE-lab/salmon/issues/198 and https://github.com/COMBINE-lab/salmon/issues/98

Not sure about what the solution is !.

@ATpoint Please find the snippets below with 1 example:

My original gtf file:

CC7scaffold1    AUGUSTUS    exon    23573   23678   .   -   .   transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    AUGUSTUS    exon    24472   24635   .   -   .   transcript_id "AIPCC7_15333.t1"; gene_id "AIPCC7_15333"; gene_name "AIPCC7_15333.t1";

genes.filtered.gtf

CC7scaffold1    stdin   exon    23573   23678   .   -   .   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    stdin   CDS 23576   23678   .   -   1   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "1"; exon_id "AIPCC7_15333.t1.1"; gene_name "AIPCC7_15333.t1";
CC7scaffold1    stdin   exon    24472   24635   .   -   .   gene_id "AIPCC7_15333"; transcript_id "AIPCC7_15333.t1"; exon_number "2"; exon_id "AIPCC7_15333.t1.2"; gene_name "AIPCC7_15333.t1";

genes.filtered.fa

AIPCC7_15333.t1 ATGGGAAACGGTTCGTGGATCGACCAATGCACCAGTCTTGGATCTAAAGGCTCGAACTTGCTTCTGATGGCAA.............................................................................................................................................................................................................

gene.filtered.t2g

AIPCC7_15333.t1 AIPCC7_15333 AIPCC7_15333.t1

Thanks in advance.

#snakepipes • 383 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by arunprasanna8340
7
gravatar for Devon Ryan
8 months ago by
Devon Ryan97k
Freiburg, Germany
Devon Ryan97k wrote:

I'm not going to put kallisto in snakePipes because I don't want to deal with Lior (the other authors of the tool seem fine). The Salmon authors have always been amazingly responsive to questions and bug reports, which is among the reasons we use it.

The most recent snakePipes release uses salmon 0.13.1, we'll update that for the next release since the newer versions have some very nice new features. In general, pipelines should be rather slow to update software versions.

The warning in your message suggests that the input GTF file had no gene ID for a transcript. That happens, some GTF files are malformed.

ADD COMMENTlink written 8 months ago by Devon Ryan97k
1

Wow. Your paper on snakePipes states "snakePipes provides a set of best-practices workflows". Yet it seems you're not interested at all in providing users best-practice workflows, rather you're using your tool to exercise a personal vendetta against me. Your excuse rings hollow because for starters, incorporating kallisto in snakePipes wouldn't require any interaction with me at all. Too bad for your users.

ADD REPLYlink modified 8 months ago by RamRS30k • written 8 months ago by Lior Pachter540

Hi @Devon Ryan please take a look at my edit. I have added the snippets of original gtf as well as annotation folder contents for your reference. Another observation is that, DE analysis fails not only if sample names have numbers but also special characters. For instance one of my samples had a name "C23-32L_R1.fastq.gz". Sleuth_salmon failed with a log "unable to find file or directory: /path/to/salmon/C23.32L.quant.sf. "Then I replaced the '-' with 't', after which it worked. Nevertheless, the GTF error still remains the same. Kindly let me know the preferable format of GTF.

ADD REPLYlink written 8 months ago by arunprasanna8340

Ah, yeah R really doesn't like some characters in column names, so - ends up getting converted to .. I thought we had a warning about that printed to the screen, but I should double check and probably make it an error since there's so much sent to the screen that it'd be hard to notice.

I've reworked the GTF handling in the next release and will double check how this particular step is working to ensure this issue goes away.

ADD REPLYlink written 8 months ago by Devon Ryan97k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1529 users visited in the last hour