I am attempting RNA Seq for the first time and having some difficulty. I am from a biochemistry background, and my coding is weak to nonexistant. I am supposed to be following the tuxedo pipeline for consistency with previously analyzed samples, but at this point I'm willing to try whatever works to get DEGs.
The data: I have 12 sets of RNA seq data (fastq). They are from 2 biological replicates, half are control, half are infected. They are taken 4, 8, and 16 hpi. They are single-end reads.
What I've done: I have mapped them to a reference genome successfully using tophat2.
Now: cuffdiff to get the DEGs??
I think I should be running a time function, and put them in their duplicates (so compare samples 1&4 vs 2&5 vs 3&6 to get change over time). I also need to compare infected vs uninfected, so 1&4 vs 7&10 etc. Or should I just compare them all and then sort it out manually? I am having a lot of trouble with the scripts. What do I run??
The program wants a gtf file and I only have the reference in gb or fasta. I followed a script to get it to gtf but I don't know what I'm doing and it may not have worked properly. It says "Error parsing value of GFF attribute "transcript_id", line:..." and I don't know how to fix it. How to I get a usable GTF reference file?
Then what do I do with it/how do I view results?
I've been trying for weeks and reading through online help and tutorials but am just not getting through. Thanks!