0
0
Entering edit mode
3.3 years ago
erbear • 0

I am attempting RNA Seq for the first time and having some difficulty. I am from a biochemistry background, and my coding is weak to nonexistant. I am supposed to be following the tuxedo pipeline for consistency with previously analyzed samples, but at this point I'm willing to try whatever works to get DEGs.

The data: I have 12 sets of RNA seq data (fastq). They are from 2 biological replicates, half are control, half are infected. They are taken 4, 8, and 16 hpi. They are single-end reads.

What I've done: I have mapped them to a reference genome successfully using tophat2.

Now: cuffdiff to get the DEGs??

I think I should be running a time function, and put them in their duplicates (so compare samples 1&4 vs 2&5 vs 3&6 to get change over time). I also need to compare infected vs uninfected, so 1&4 vs 7&10 etc. Or should I just compare them all and then sort it out manually? I am having a lot of trouble with the scripts. What do I run??

The program wants a gtf file and I only have the reference in gb or fasta. I followed a script to get it to gtf but I don't know what I'm doing and it may not have worked properly. It says "Error parsing value of GFF attribute "transcript_id", line:..." and I don't know how to fix it. How to I get a usable GTF reference file?

Then what do I do with it/how do I view results?

I've been trying for weeks and reading through online help and tutorials but am just not getting through. Thanks!

rna-seq • 1.4k views
0
Entering edit mode

You should know that the old 'Tuxedo' pipeline of Tophat(2) and Cufflinks is no longer the "advisable" tool for RNA-seq analysis. The software is deprecated/ in low maintenance and should be replaced by HISAT2, StringTie and ballgown. See this paper: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. There are also other alternatives, including alignment with STAR and bbmap, or pseudo-alignment using salmon.

The program wants a gtf file and I only have the reference in gb or fasta. I followed a script to get it to gtf but I don't know what I'm doing and it may not have worked properly.

Tell us what is the script you are using, where did you get it from, and how you are using it. Also tell us where did you get the fasta and genbank files. Then we can help - it is impossible to know what went wrong if we don't even know what you are trying.

One more tutorial for you to read: How To Ask Good Questions On Technical And Scientific Forums

0
Entering edit mode

Thanks! I have found out that the genbank file I used to make the gtf is missing the locus tag and it needs a locus tag to work so I will try to replace the locus tags.

0
Entering edit mode

I found a program that made a gff file that seems to work and ran:chaos:

Vero Cell BAM files erika$cuffdiff -v BA71Vvocs.gff vero7_BA71V.bam vero8_BA71V.bam vero9_BA71V.bam vero10_BA71V.bam vero11_BA71V.bam vero12_BA71V.bam Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu). [12:30:10] Loading reference annotation. Segmentation fault: 11 chaos:Vero Cell BAM files erika$

0
Entering edit mode

First try to read a good hands-on (with datasets and references provided) tutorial like https://github.com/griffithlab/rnaseq_tutorial/wiki. Then you will get an idea how to use data, references and tools. Then try to reproduce the same workflow on one or two data sets on your data. You can also have a look at blogs like https://digibio.blogspot.com/2017/10/rnaseq-data-analysis-tuxedo-new-protocol.html and https://digibio.blogspot.com/2015/09/rnaseq-data-analysis-and-tuxedo-workflow.html for new and old tuxedo protocols.

0
Entering edit mode

As a complement to previous answers, DEWE (http://sing-group.org/dewe/) offers a user-friendly GUI to run the HISAT2, StringTie and Ballgown workflow, with the addition of edgeR to the DE analysis stage. It also produces a rich variety of outputs related with the DE results. Regards.