Question: Running STAR aligner with paired-end and single-end reads simultaneously
0
gravatar for nanoide
4 months ago by
nanoide30
nanoide30 wrote:

Hi all

So, I recently got some RNA-seq raw reads, both paired end (2 x 150 bp) and single-end (1x75 bp) I want to map them using STAR aligner. My main questions are then, how would you deal with these? Can STAR take both paired-end and single-end .fq files simultaneosuly? Or mapping separetely and then merging the bam files is also possible?

Any ideas?

Thank you for your advice

ADD COMMENTlink modified 3 months ago • written 4 months ago by nanoide30
1

Thank you all for your thoughts and useful comments!

ADD REPLYlink written 4 months ago by nanoide30
1

You have not said if this is the same library sequenced two different ways. That will also have implications on how you do the data analysis.

ADD REPLYlink written 3 months ago by genomax69k

Hi, thanks for your responses. This is indeed the same library sequenced two different ways. I'll check out BBmap suite, thanks

ADD REPLYlink written 3 months ago by nanoide30

Thank you all for your responses.

Please allow to ask for a couple of clarifications: * I get STAR cannot deal with paired-end sequencing and single-end sequencing of different length at the same time. Do you know any aligner that can? * Do you know of any published paper that have done something similar to what you suggested?

Thank you very much

ADD REPLYlink written 3 months ago by nanoide30
1

BBMap suite has a tool that allows bbmap.sh to be used for single and paired-end reads in the same alignment job.

$ bbwrap.sh in1=read1.fq,singletons.fq in2=read2.fq,null out=mapped.sam append
ADD REPLYlink written 3 months ago by genomax69k
3
gravatar for swbarnes2
4 months ago by
swbarnes26.0k
United States
swbarnes26.0k wrote:

I don't think STAR can take them both together. I'd process the two separately, and merge results at the end if it looks like the two experiments are telling you the same thing.

ADD COMMENTlink written 4 months ago by swbarnes26.0k
3

And I would suggest to maybe hold on to merging the results until you look at a PCA of the data first to ensure there is not a batch effect of sequencing types etc...

ADD REPLYlink modified 4 months ago • written 4 months ago by lshepard340

Thank you both for your answers. So I guess I will map separately. Then with the bam files I will use deeptools plotPCA (maybe plotCorrelation too?) to check and then samtools merge. Does that sound good? Any thoughts?

Thanks!

ADD REPLYlink written 4 months ago by nanoide30

Hi, please let me ask you for a clarification. When you stated 'merge the results at the end if it looks like the two experiments are telling you the same thing', did you mean merging the output from STAR (i.e. bam files) or counting also independently and then suming the counts if they are correlated, cluster together... etc Thank you!

ADD REPLYlink written 4 months ago by nanoide30
1

Depends on how you are normalizing reads. If you are just generating raw counts, you could just combine the counts. Otherwise, you should probably merge the bams, recalculate the counts and renormalize.

ADD REPLYlink written 4 months ago by swbarnes26.0k

Ok, thank you very much

ADD REPLYlink written 4 months ago by nanoide30
1
gravatar for Charles Warden
4 months ago by
Charles Warden7.0k
Duarte, CA
Charles Warden7.0k wrote:

I typically use single-end 50 bp reads for gene expression analysis.

If you are just interested in getting counts for differential expression (and FPKM/CPM for visualization), perhaps trim the longer R1 from the PE experiment to 75 bp?

To be safe, I probably would start by processing them separately and seeing how well the replicates cluster. If they really look like technical replicates, I think you could justify combined analysis with the trimmed reads in your Supplemental Materials.

ADD COMMENTlink written 4 months ago by Charles Warden7.0k
1
gravatar for Ashastry
4 months ago by
Ashastry60
Ashastry60 wrote:

Hello,

I would also recommend producing a correlation (Spearman or Pearson) distance matrix to see how well the samples correlate within their group. DESEQ2 has this option to produce heatmaps of distance matrix as well.

ADD COMMENTlink written 4 months ago by Ashastry60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 543 users visited in the last hour