Question

Merging all left / all right reads before STAR align?

0

Entering edit mode

7.7 years ago

Biogeek ▴ 470

Hey guys,

Simple and maybe a rather stupid question. I have several sets of paired end reads and I want to align them back to the reference genome. Does it make sense to cat all my forward reads together and all my reverse reads together for paired end data, so that I am giving STAR aligner 2 files as input?

or should I be aligning them seperately then merging the output? Thanks.

RNA-Seq genome • 3.0k views

ADD COMMENT • link 7.7 years ago by Biogeek ▴ 470

score 3 · Answer 1 · 2016-08-11

3

Entering edit mode

7.7 years ago

Devon Ryan 104k

Give STAR both files for each sample at once. This is holds for all aligners.

ADD COMMENT • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, so what I make out- I shouldn't concatenate all my forward and reverse reads into one forward and one reverse file prior to star?

What about the output. I need one sorted by coordinate bam file. Is there a way to combine all bam files? Thanks

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470

0

Entering edit mode

Yes, you can concatenate all the files to end up with a single R1 file and a single R2 file first.

ADD REPLY • link 7.7 years ago by Brian Bushnell 20k

0

Entering edit mode

If I do each individual sample, I can divide and run two jobs in parallel to make it very fast, but then I'm left with 12 BAM files ( 12 samples). If I took this approach, is it appropriate to merge these together and sort them, if how?

If I concatenate all my L and R reads into two files, I have a 256GB RAM allocation with 60 threads and run the same, I get a FATAL error due to BAM sorting. Ive tried to limit BAMsortRAM to 30GB and still it crashes. Any idea?

Does joining before or after cause any differences in the final BAM sorted file for downstream analysis?

The reason I want to do this, is so I have one BAM sortedbycoordinates file to feed into Trinity for reference guided assembly. Can I also ask what the advantages are of doing the Trinity assembly over just simply aligning to the genome with STAR and doing RSEM counts downstream?THANKS!

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470

0

Entering edit mode

I think we started at the wrong place with this question. What sort of experiment are you doing and what sort of data do you have? If whatever genome/transcriptome you have is of sufficient quality to use RSEM (save a few hours and just use Salmon or Kallisto) then you wouldn't need to bother with assembly at all.

ADD REPLY • link 7.7 years ago by Devon Ryan 104k

0

Entering edit mode

Hi Devon, RNA seq experiment.

Aligning to a draft genome as a reference (approx 70% complete) with fragmented absences, hence my approach. I'm combining a reference guided assembly with a de novo assembly then removing redundancies and unlikely coding sequences with a downstream approach I found in papers. Any help greatly appreciated.

ADD REPLY • link 7.7 years ago by Biogeek ▴ 470