Question: RNA seq pipeline
1
gravatar for dimitrischat
2.3 years ago by
dimitrischat100
dimitrischat100 wrote:

Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :

  • tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
  • cufflinks -g ...hg19.gtf -o ./cufflinks ...acceptedhits.bam
  • cuffmerge -g ...hg19.gtf h0-3.txt
  • cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam

I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.

Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.

i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..

thanks in advance..

rna-seq • 1.6k views
ADD COMMENTlink modified 2.3 years ago • written 2.3 years ago by dimitrischat100
3

I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with featureCounts and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.

ADD REPLYlink written 2.3 years ago by Devon Ryan94k

Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?

ADD REPLYlink written 2.3 years ago by dimitrischat100

I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..

ADD REPLYlink written 2.3 years ago by popayekid5570

Hopefully you used hisat2 rather than hisat.

ADD REPLYlink written 2.3 years ago by Devon Ryan94k

yes, hisat2 was used.

ADD REPLYlink written 2.3 years ago by popayekid5570
1

Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax78k
2

You beat me by a minute!

ADD REPLYlink written 2.3 years ago by Devon Ryan94k
1

We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.

ADD REPLYlink written 2.3 years ago by genomax78k
2

Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.

ADD REPLYlink written 2.3 years ago by WouterDeCoster43k
1

What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.

ADD REPLYlink written 2.3 years ago by Kevin Blighe54k

thank you very much. What is the pipeline with those? I could really use your help.

ADD REPLYlink written 2.3 years ago by dimitrischat100
2

SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax78k
1
fastq file-->{STAR/BBmap/HISAT2/any splice aware aligner}-->bam file--> featureCounts -> DESeq2/edgeR
fastq file-->KALLISTO/SALMON-->SLEUTH/DESeq2

hg38 was released in December 2013. It will never be finished since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax78k

deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?

ADD REPLYlink written 2.3 years ago by dimitrischat100
1

That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).

ADD REPLYlink written 2.3 years ago by Devon Ryan94k
1

dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax78k

...or do all of the calculations manually?

ADD REPLYlink written 2.3 years ago by Kevin Blighe54k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2015 users visited in the last hour