Question: How to treat the pair-end files as single-end when RNA-Seq align?
0
gravatar for Grace_G
20 months ago by
Grace_G20
Grace_G20 wrote:

No more special reasons´╝îif I want treat pair-end files (so one sample two xxx.fastq.gz files) as single-end to process.

  1. It means double sequence coverage?
  2. For the align step by star, should I merge these two files together to map and I wonder how to merge just by cat samplex_R1.fastq.gz samplex_R2.fastq.gz, or give up one file of them two? Orjust do map firstly as single-end(I worried about this way, after all, the contains of file are different), then merge their bam files together?
  3. I confused about why wc -l samplex_R1.fastq.gz is different from wc -l samplex_R2.fastq.gz since they are pair?

Any idea will be very grateful!!!

rna-seq • 967 views
ADD COMMENTlink modified 12 months ago by Biostar ♦♦ 20 • written 20 months ago by Grace_G20
1

Treat each pair as a single end fastq and after quantification you can take the average of the TPM/RPKM/FPKM of the read pairs.

Cheers !!

ADD REPLYlink written 20 months ago by Praneet Chaturvedi110
1

While you can do that, that doesn't mean that one should do that (one shouldn't).

ADD REPLYlink written 20 months ago by Devon Ryan96k

Useful view, thanks!

ADD REPLYlink written 20 months ago by Grace_G20

My data (m+n) samples. m sample just few part for all, but we got pair-end RNA-Seq data from company, then, we got single-end data for next n samples, since they (experiment people) changed idea for some reasons, but for us, we plan process all of them as single-end way. For me, I'm not very understand what it means for treat that m sample's pair-end RNA-Seq as single-end, but I guess it should start from mapping step, so for details, is what put in star now.

ADD REPLYlink modified 20 months ago • written 20 months ago by Grace_G20
1

In that case they want you to only use the R1 files.

ADD REPLYlink written 20 months ago by Devon Ryan96k

Thanks! Sounds reasonable, but I'm not sure it's right or not. Have you processed data by this way?

ADD REPLYlink modified 20 months ago • written 20 months ago by Grace_G20
1

First of all, if you have paired-end data then you why you want to process them as single end. Paired end data provides you with better mappability and alignment as you have two mates to support the mapping. I provided solution in case of the pairs is not sequenced correctly or you have a mix of paired and single end data to work with. The solution I SUGGESTED not a regular analysis step.

Hope this helps.

Cheers !!!

ADD REPLYlink written 20 months ago by Praneet Chaturvedi110

I see, thank you,I think my reason is mix of paired and single end data to work with. Best.

ADD REPLYlink modified 20 months ago • written 20 months ago by Grace_G20

No, absolutely don't. This may sound logical but cases where one mate is mapped and the other isn't would greatly affect the result as it would effectively divide the counts of the first mate by two (because one adds + 0 from mate two). If you want to use single-end reads, take the forward reads of each sample and proceed with standard tools such as salmon for quantification or star for alignment. Don't do any custom/untested procedures. That only creates bias.

ADD REPLYlink written 12 months ago by ATpoint36k
6
gravatar for Devon Ryan
20 months ago by
Devon Ryan96k
Freiburg, Germany
Devon Ryan96k wrote:
  1. Only if you lie to yourself.
  2. Yes, just cat them, though note that if you need to use featureCounts later that you'll have to use an unstranded counting method, with the problems inherent to that.
  3. Either one is corrupt or your files were preprocessed.
ADD COMMENTlink modified 20 months ago • written 20 months ago by Devon Ryan96k
2

Will wc -l used on gzipped files return the right answer? I'd try zcat samplex_R1.fastq.gz | wc -l before deciding the files are broken

ADD REPLYlink written 20 months ago by swbarnes28.0k
1

I kind of assumed they were actually using zcat somewhere and just didn't show it, but you're right that it's good to ask :)

ADD REPLYlink written 20 months ago by Devon Ryan96k

Thank you Ryan, admire your ability a long time. Before your idea, I prefer give up one file, now I'll try to cat them then do map by star to test, in my view, though here we can cat them become one file, but it is different from real single end file.

ADD REPLYlink written 20 months ago by Grace_G20
1

star supports paired end alignment IMO Grace_G

ADD REPLYlink written 20 months ago by cpad011213k

Yes, of course. But here I be told to process these sample's paired end fastq data as single end, actually I'm not very understand how to do that.

ADD REPLYlink written 20 months ago by Grace_G20
1

You should ask them why they (whoever is telling you to do this) want you to process things in such an unusual way.

ADD REPLYlink written 20 months ago by Devon Ryan96k

What's more, if I cat pair file together, then one file two direction? Could Star accept (give right result)? Look forward your view, thanks!

ADD REPLYlink written 20 months ago by Grace_G20
1

STAR's aligner will not have a problem. If you use GeneCounts to count reads hitting genes, the results from that might be confused, because your results will definitely be a mix of forward and reverse reads, even if the prep was stranded. Let me reiterate that FWIW, I strongly agree with Devon; analyzing these as single end is a bad idea, it will result in you giving the submitters distorted data, and an inaccurate picture of what their experiment really shows. Having 10 pairs of reads aligning to a gene is not at all the same as having 20 single end reads that align.

ADD REPLYlink written 20 months ago by swbarnes28.0k

Thanks for your consideration, it's much helpful too. So for this star step, how about directly process these m samples as pair-end, n samples as single-end, since when they go to bam step, they will become same for next process?

ADD REPLYlink written 20 months ago by Grace_G20

Yes, of course. But here I be told to process these sample's paired end fastq data as single end, actually I'm not very understand how to do that.

ADD REPLYlink written 20 months ago by Grace_G20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1123 users visited in the last hour