Question

Alignment of seq reads to a genome, process after STAR?

0

Entering edit mode

7.7 years ago

Biogeek ▴ 470

Hey guys,

Just a quick question and some advice. I've indexed my target organism's genome and I am now aligning my cleaned reads back to the genome with Star again. My reads were cleaned with Trimmomatic.

I've read that people regularly use cufflinks package for all in one analysis;however, I would be keen on using EdgeR. Once my reads have been aligned. Is there anyway I can use the SAM/ converted BAM files to calculate counts then feed them into R and EdgeR? Most of my experience has so far been in de novo assembly.

Are there any good tutorials I can visit online?

Thanks.

genome rnaseq reference genome • 3.6k views

ADD COMMENT • link updated 7.7 years ago by Devon Ryan 104k • written 7.7 years ago by Biogeek ▴ 470

0

Entering edit mode

Yes you can. STAR now has the ability to generate counts during alignments or you could use featureCounts with the aligned sequence files to generate the count matrix.

ADD REPLY • link 7.7 years ago by GenoMax 141k

0

Entering edit mode

Hi Genomax2,

This presumably does away with the need of using cufflinks software? I have multiple replicates per treatment and I read cuff-merge is good for this. Any obvious advantages to using cufflinks or straight up STAR?

Thanks

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470

0

Entering edit mode

Yes. You would want to use DESeq2 or edgeR anyway. Sounds like you are all set with replicates etc. See the paper Devon linked below. Vignette for DESeq2 would be similarly useful.

ADD REPLY • link 7.7 years ago by GenoMax 141k

score 2 · Answer 1 · 2016-08-01

2

Entering edit mode

7.7 years ago

Devon Ryan 104k

This F1000 article has commands for generating counts (near the end, note that they use featureCounts from within R, though you can use it at the command line too) and using edgeR. That'll be a good tutorial to base your analysis on.

ADD COMMENT • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

Thanks for the article Devon, much appreciated. I've had a read and whilst appealing, I am going to try using STAR first with the new transcript counts feature in Version 2.4.2a. I'll then feed the BAM into RSEM and do my usual pipeline from there on in. I am determining if de novo is better than using the draft genome of the organism in terms of coverage. Perhaps I may venture into using Rsubread down the line. Thanks.

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470

1

Entering edit mode

If you want to go that route you might appreciate that Salmon or Kallisto will get you similar results in a fraction of the time.

ADD REPLY • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

I would second salmon or kallisto in that case since both will run faster generating counts and tpm for each replicates and finally one can aggregate the results to generate the matrix. If I am not wrong the latest version of salmon already has trascript to gene summarisation if one is keen on gene count matrix else you will have transcript counts. Good luck!

ADD REPLY • link 7.7 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Thanks guys. I've already completed the de novo analysis using RSEM and EdgeR, so I guess it would be most appropriate to stick with RSEM again and EdgeR, as to not go off a beaten track......The reason I'm doing such analysis as additional work to the de novo, is so that I can compare coverage of the genome in case I'm asked when defending my thesis why I didn't use the reference.

Bit of a generalized question. Have any of you attempted a hybrid assembly, or is that highly time consuming and requiring a lot of knowledge?

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470