Question: RNA seq pipeline
1
gravatar for dimitrischat
17 months ago by
dimitrischat60
dimitrischat60 wrote:

Hello again. I am new to bioninformatics and i need some guidance. I am trying to do an RNA seq analysis but the output seems a bit wrong. I got a type of cells for example h at 0,3 and 6 hour, and i got 2 replicates. And i want to compare 0h to 3h and 0h to 6h. So my sampels are: h0b,h0c,h3b,h3c,h6b,h6c(each .fastq, for example h0b.fastq is about 5.8g). I got the gtf for hg19 from table browser in ucsc(Refseq genes). And my pipeline is :

  • tophat2 --library-type fr-unstranded -g 1 -G ...hg19.gtf ....bowtie2index/hg19genome h0b.fastq
  • cufflinks -g ...hg19.gtf -o ./cufflinks ...acceptedhits.bam
  • cuffmerge -g ...hg19.gtf h0-3.txt
  • cuffdiff -L h0,h3 ...h0-3merged.gtf h0bacceptedhits.bam,h0cacceptedhits.bam h3bacceptedhits.bam,h3cacceptedhits.bam

I dont specify -p because i have seen many bugs in case i used it. Small in size outputs. And i say a bit wrong output because of the difference of results with a colleague who ran this process in usegalaxy ( but i am trying to learn how to use terminal ) and the gene_exp.diff ( open with excel ) seems a bit off.

Do you see any msitakes i made ? Could i clarify something better for you? Because i am pretty sure i dont explain everything as good as i could.

i uploaded an output so you can see my problem: https://files.fm/u/vss3dd6v and if you select only the significant, they are very few..

thanks in advance..

rna-seq • 1.1k views
ADD COMMENTlink modified 17 months ago • written 17 months ago by dimitrischat60
3

I strongly encourage you to not use tophat2 or any of the cuff* tools. These are out-dated and no longer best-practice. You are encouraged to use more modern aligners, such as STAR or hisat2. You'd also be better served with featureCounts and DESeq2/edgeR/limma rather than cufflinks. These will also prove to be much faster. Finally, for new projects, you'd be best served by using hg38 rather than hg19.

ADD REPLYlink written 17 months ago by Devon Ryan89k

Thank you for your reply. So what pipeline should i use? I have no idea with these programs. Also is hg38 complete? Someone told me that it isnt finished(?) yet?

ADD REPLYlink written 17 months ago by dimitrischat60

I have seen many post suggesting the usage of Hisat over tophat. It is peculiar to observ odd behaviour. One of human rnaseq, hisat gave me an alignment % of 59 but tophat an overall of 91 for same against grch38.p10..

ADD REPLYlink written 17 months ago by popayekid5550

Hopefully you used hisat2 rather than hisat.

ADD REPLYlink written 17 months ago by Devon Ryan89k

yes, hisat2 was used.

ADD REPLYlink written 17 months ago by popayekid5550
1

Please use a current program for RNAseq analysis. There are plenty out there (STAR, HISAT2, BBMap and more) and you would have an overall better experience with any one of them.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k
2

You beat me by a minute!

ADD REPLYlink written 17 months ago by Devon Ryan89k
1

We need to have a "sticky" text for TopHat questions that can be reused. TopHat still seems attractive to new users since it is a "complete" pipeline.

ADD REPLYlink written 17 months ago by genomax65k
2

Perhaps we should have a bot which posts default reactions to questions mentioning 'Tophat'.

ADD REPLYlink written 17 months ago by WouterDeCoster38k
1

What about a 'latest technology' section for programs? As an example, even I, before I joined, was not aware that HISAT2 had replaced TopHat2. I gave up TopHat2 for Kallisto long ago though.

ADD REPLYlink written 17 months ago by Kevin Blighe41k

thank you very much. What is the pipeline with those? I could really use your help.

ADD REPLYlink written 17 months ago by dimitrischat60
2

SALMON - Getting started
Kallisto - Getting started
STAR - manual
HISAT/StringTie/Ballgown - protocol Link for reference. User HISAT2 the latest version, if you do use this option.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k
1
fastq file-->{STAR/BBmap/HISAT2/any splice aware aligner}-->bam file--> featureCounts -> DESeq2/edgeR
fastq file-->KALLISTO/SALMON-->SLEUTH/DESeq2

hg38 was released in December 2013. It will never be finished since human genome is not likely to, for a while. At some point it will be superseded by a new major genome release.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k

deseq2 and sleuth are both in R, right? is there any other program out there that doesnt need r?

ADD REPLYlink written 17 months ago by dimitrischat60
1

That will pass through peer review? Probably not. R is the standard environment for statistical analysis. That's also used in the standard Galaxy workflows (STAR -> featureCounts -> DESeq2 or hisat2 -> featureCounts -> DESeq2).

ADD REPLYlink written 17 months ago by Devon Ryan89k
1

dimitrischat : If you have access to commercial software packages (e.g. CLC Genomics Workbench, Partek Genomics Suite, GeneSpring etc) then yes.

ADD REPLYlink modified 17 months ago • written 17 months ago by genomax65k

...or do all of the calculations manually?

ADD REPLYlink written 17 months ago by Kevin Blighe41k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1219 users visited in the last hour