Question: Viral RNA-Seq Cufflinks not producing output
1
gravatar for Dan Gaston
2.6 years ago by
Dan Gaston7.1k
Canada
Dan Gaston7.1k wrote:

I'm running cufflinks on some data produced from a viral system using Ion Proton sequencing. I've been running several different RNA-Seq analysis workflows on this data. In particular this set is aligned with STAR + Bowtie2. I've also done HiSat2 + StringTie + Ballgown analysis on this data since that is sort of the replacement workflow for the traditional Tuxedo suite.

However I am running into an issue with the Cufflinks data. When running using:

cufflinks -g reference.gff -b reference.fa -u -p 24 --library-type fr-secondstrand sample.merged.sorted.bam

I end up not getting any results. Both skipped.gtf and transcript.gtf are empty, the tracking files just have the header.

Output to the screen:

You are using Cufflinks v2.2.1, which is the most recent release.
[04:39:37] Loading reference annotation. [04:39:37] Inspecting reads and determining fragment length distribution. Processed 1 loci.       [*************************] 100% 
Map Properties:     
Normalized Map Mass: 16399479.00    
Raw Map Mass: 16399479.00   
Fragment Length Distribution: Truncated Gaussian (default)
      Default Mean: 200
     Default Std Dev: 80 [04:41:49] 
Assembling transcripts and estimating abundances. 
Processed 1 loci.                           [*************************] 100% 
[04:44:31] Loading reference annotation and sequence. 
[04:44:31] Learning bias parameters.
Processed 0 loci.                            [*************************] 100% 
[04:45:50] Re-estimating abundances with bias correction. Processed 0 loci.

I tried setting -max-bundle-frags to a higher number; however then cufflinks gets stalled at 99% complete, waiting for 1 thread to finish for about a day or so, which normally these runs finish in a few minutes. I'm using a custom GFF file but I've also tried using another one that exists for this reference sequence, tried both GFF and GTF files. The gffread utility doesn't show any problems with these files. I've tried not using the -u flag.

Any suggestions what might be going on? The same dataset of Ion Proton reads has also been aligned against the human genome (since the viral system is infecting human cell culture) and that seems to work fine.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Dan Gaston7.1k

Sorry did not read the full post. Please ignore this. Cufflinks takes a gtf. You seem to have given a gff. Although don't know for sure if that is causing it.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by microfuge1.0k

Cufflinks will take both a GTF or a GFF. They mix and match which they specify in various in instructions but also, specifically for -g specify it as gff/gtf. Also, as described I tried both the GFF and GTF versions of my annotation files with the same results.

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k
1

My apologies. I replied too fast without reading the entire post. I could not find any option to delete the post soon after posting. On another note, cufflinks has a --verbose option, just in case it hints to what is going on.

ADD REPLYlink written 2.6 years ago by microfuge1.0k

Trying that now. Not sure why I didn't even thing of that for more debugging info

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k

I get some GFF Warnings (which oddly I don't get when I use gffread on the file). It seems to be something that happens around the second time it loads the reference annotation:

1: GQ994935.1
[03:49:21] Loading reference annotation.
GFF Warning: merging overlapping/adjacent feature segment CDS (1127-2779) with exon (1097-2000) for GFF ID transORF4a on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (1127-2779) with exon (1097-2234) for GFF ID transORF4b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (2609-2956) with CDS (1097-2779) for GFF ID transORF4a on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (2609-2956) with CDS (1097-2779) for GFF ID transORF4b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (20038-21051) with exon (18575-20248) for GFF ID transORFK3_70C on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (20788-21099) with CDS (18575-21051) for GFF ID transORFK3_70C on GQ994935.1
GFF Warning: invalid coordinates for mRNA parent feature (ID=transORF42)
GFF Warning: merging overlapping/adjacent feature segment exon (69389-69782) with CDS (69228-69731) for GFF ID transORF47_46_45b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (82043-83361) with exon (82042-82791) for GFF ID transORF57A on GQ994935.1
GFF Warning: invalid coordinates for mRNA parent feature (ID=transORFK12_Kaposinb)
Warning: gene geneORF8 found without exon segments (adding default exon).
Warning: gene geneORF11 found without exon segments (adding default exon).
Warning: gene geneK3 found without exon segments (adding default exon).
Warning: gene geneORF70 found without exon segments (adding default exon).
Warning: gene geneK5_6-AS found without exon segments (adding default exon).
Warning: gene geneORF21 found without exon segments (adding default exon).
Warning: gene geneORF24 found without exon segments (adding default exon).
Warning: gene geneORF28 found without exon segments (adding default exon).
Warning: gene geneORF36 found without exon segments (adding default exon).
Warning: gene geneORF43 found without exon segments (adding default exon).
Warning: gene geneORF46 found without exon segments (adding default exon).
Warning: gene geneORF47 found without exon segments (adding default exon).
Warning: gene geneORF48 found without exon segments (adding default exon).
Warning: gene geneORF50 found without exon segments (adding default exon).
Warning: gene geneORF49 found without exon segments (adding default exon).
Warning: gene geneORF53 found without exon segments (adding default exon).
Warning: gene geneORF57 found without exon segments (adding default exon).
Warning: gene geneORF59 found without exon segments (adding default exon).
Warning: gene geneORF60 found without exon segments (adding default exon).
Warning: gene geneORF61 found without exon segments (adding default exon).
Warning: gene geneORF62 found without exon segments (adding default exon).
Warning: gene geneORF66 found without exon segments (adding default exon).
Warning: gene geneORF67 found without exon segments (adding default exon).
Warning: gene geneORF67A found without exon segments (adding default exon).
Warning: gene geneORF71 found without exon segments (adding default exon).
Warning: gene geneORF72 found without exon segments (adding default exon).
Warning: gene geneORF73 found without exon segments (adding default exon).
Warning: gene geneK14 found without exon segments (adding default exon).
 Kept 128 transfrags out of 128
1: GQ994935.1
[03:49:21] Inspecting reads and determining fragment length distribution.
Inspecting bundle GQ994935.1:0-138146 with 18687057 reads
Processed 1 loci.                           
> Map Properties:
>   Normalized Map Mass: 16399479.00
>   Raw Map Mass: 16399479.00
>   Number of Multi-Reads: 951317 (with 3215392 total hits)
>   Fragment Length Distribution: Truncated Gaussian (default)
>                 Default Mean: 200
>              Default Std Dev: 80
**0 ReadHits still live**
Found 2 reference contigs
    Total map density: 16399479.000124
[03:51:33] Assembling transcripts and initializing abundances for multi-read correction.
GQ994935.1:0-138146 Processing new bundle with 18687057 alignments
Processed 1 loci.                           
[03:54:16] Loading reference annotation and sequence.
 **Kept 0 transfrags out of 0**
[03:54:16] Learning bias parameters.
Processed 0 loci.                           
[03:55:35] Re-estimating abundances with bias and multi-read correction.
Processed 0 loci.
ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k

Can you please also post a few lines of the vcf. Just in case is the strand in gtf a '+' and '-' and not set to a '.'. ?

ADD REPLYlink written 2.6 years ago by microfuge1.0k

Do you mean the GTF (there is no VCF). There is strand information in the GTF files (+ and -) for all of the features.

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k

Ah my bad. Turns out it was a wrong lead. I was trying to grep the error message 'Kept \d+ transfrags out of 0' in the cufflinks source. Apparently the fault lies elsewhere.

ADD REPLYlink written 2.6 years ago by microfuge1.0k

I've not had any luck trying to calculate DE on viral expression with cufflinks. I've tried everything from the NCBI provided GFF/GTFs to making my own from scratch (not hard for ssRNA viruses) without success. I think that the way viruses are represented in GFF/GTFs is somehow valid to gffread but causes problems to parts of cuff* tool set. Then again I've also had instances with mammalian GTF/GFFs going through gffread without issue but parts of cuff* throwing a fit, so I don't think gffread is a completely reliable means for checking.

PS: What is a "viral system"?

ADD REPLYlink written 2.6 years ago by pld4.8k

I was just being non-specific about the virus, etc. But it is human cells infected with virus. So the mRNAs captured are a mix of virus transcripts and human transcripts. Viral system seemed an apt description for the experiment and set up.

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k

And thanks for the input. It definitely seems like it is some issue with the annotations and how they are interpreted.

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k

Ah, I thought you had something funky going on, like a system of multiple viral species. The cufflinks verbose output gave away the virus ;).

Have you tried deleting or manually fudging the regions cufflinks is complaining about? I've tried that without success, maybe it will work for you.

Here's a paper I had looked at previously, but haven't had a chance to try: http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1003896#s4

ADD REPLYlink written 2.6 years ago by pld4.8k

Yes, I'm now looking at featureCounts + DESeq2, so similar to HTSeq + edgeR

ADD REPLYlink written 2.6 years ago by Dan Gaston7.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1631 users visited in the last hour