RNA seq pipeline
0
1
Entering edit mode
5.3 years ago
dimitrischat ▴ 180

Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :

• tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
• cuffmerge -g ...hg19.gtf h0-3.txt
• cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam

I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.

Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.

i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..

RNA-Seq • 2.6k views
3
Entering edit mode

I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with featureCounts and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.

0
Entering edit mode

Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?

0
Entering edit mode

I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..

0
Entering edit mode

Hopefully you used hisat2 rather than hisat.

0
Entering edit mode

yes, hisat2 was used.

1
Entering edit mode

Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.

2
Entering edit mode

You beat me by a minute!

1
Entering edit mode

We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.

2
Entering edit mode

Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.

1
Entering edit mode

What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.

0
Entering edit mode

thank you very much. What is the pipeline with those? I could really use your help.

2
Entering edit mode

SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.

1
Entering edit mode
fastq file-->{STAR/BBmap/HISAT2/any splice aware aligner}-->bam file--> featureCounts -> DESeq2/edgeR
fastq file-->KALLISTO/SALMON-->SLEUTH/DESeq2


hg38 was released in December 2013. It will never be finished since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.

0
Entering edit mode

deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?

1
Entering edit mode

That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).

1
Entering edit mode

dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.

0
Entering edit mode

...or do all of the calculations manually?