Question

Running STAR aligner with paired-end and single-end reads simultaneously

0

Entering edit mode

5.1 years ago

nanoide ▴ 120

Hi all

So, I recently got some RNA-seq raw reads, both paired end (2 x 150 bp) and single-end (1x75 bp) I want to map them using STAR aligner. My main questions are then, how would you deal with these? Can STAR take both paired-end and single-end .fq files simultaneosuly? Or mapping separetely and then merging the bam files is also possible?

Any ideas?

Thank you for your advice

STAR RNA-Seq paired-end single-end • 5.9k views

ADD COMMENT • link 5.0 years ago by nanoide ▴ 120

1

Entering edit mode

Thank you all for your thoughts and useful comments!

ADD REPLY • link 5.1 years ago by nanoide ▴ 120

1

Entering edit mode

You have not said if this is the same library sequenced two different ways. That will also have implications on how you do the data analysis.

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

Hi, thanks for your responses. This is indeed the same library sequenced two different ways. I'll check out BBmap suite, thanks

ADD REPLY • link 5.0 years ago by nanoide ▴ 120

0

Entering edit mode

Thank you all for your responses.

Please allow to ask for a couple of clarifications: * I get STAR cannot deal with paired-end sequencing and single-end sequencing of different length at the same time. Do you know any aligner that can? * Do you know of any published paper that have done something similar to what you suggested?

Thank you very much

ADD REPLY • link 5.0 years ago by nanoide ▴ 120

1

Entering edit mode

BBMap suite has a tool that allows bbmap.sh to be used for single and paired-end reads in the same alignment job.

$ bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append

ADD REPLY • link 5.0 years ago by GenoMax 141k

1

Entering edit mode

5.1 years ago

Charles Warden 8.2k

I typically use single-end 50 bp reads for gene expression analysis.

If you are just interested in getting counts for differential expression (and FPKM/CPM for visualization), perhaps trim the longer R1 from the PE experiment to 75 bp?

To be safe, I probably would start by processing them separately and seeing how well the replicates cluster. If they really look like technical replicates, I think you could justify combined analysis with the trimmed reads in your Supplemental Materials.

ADD COMMENT • link 5.1 years ago by Charles Warden 8.2k

1

Entering edit mode

5.1 years ago

Ashastry ▴ 60

Hello,

I would also recommend producing a correlation (Spearman or Pearson) distance matrix to see how well the samples correlate within their group. DESEQ2 has this option to produce heatmaps of distance matrix as well.

ADD COMMENT • link 5.1 years ago by Ashastry ▴ 60

score 3 · Accepted Answer · 2019-03-13

3

Entering edit mode

5.1 years ago

swbarnes2 14k

I don't think STAR can take them both together. I'd process the two separately, and merge results at the end if it looks like the two experiments are telling you the same thing.

ADD COMMENT • link 5.1 years ago by swbarnes2 14k

3

Entering edit mode

And I would suggest to maybe hold on to merging the results until you look at a PCA of the data first to ensure there is not a batch effect of sequencing types etc...

ADD REPLY • link 5.1 years ago by lshepard ▴ 470

0

Entering edit mode

Thank you both for your answers. So I guess I will map separately. Then with the bam files I will use deeptools plotPCA (maybe plotCorrelation too?) to check and then samtools merge. Does that sound good? Any thoughts?

Thanks!

ADD REPLY • link 5.1 years ago by nanoide ▴ 120

0

Entering edit mode

Hi, please let me ask you for a clarification. When you stated 'merge the results at the end if it looks like the two experiments are telling you the same thing', did you mean merging the output from STAR (i.e. bam files) or counting also independently and then suming the counts if they are correlated, cluster together... etc Thank you!

ADD REPLY • link 5.1 years ago by nanoide ▴ 120

1

Entering edit mode

Depends on how you are normalizing reads. If you are just generating raw counts, you could just combine the counts. Otherwise, you should probably merge the bams, recalculate the counts and renormalize.