How to treat the pair-end files as single-end when RNA-Seq align?
1
0
Entering edit mode
4.3 years ago
Grace_G ▴ 20

No more special reasons，if I want treat pair-end files (so one sample two xxx.fastq.gz files) as single-end to process.

1. It means double sequence coverage?
2. For the align step by star, should I merge these two files together to map and I wonder how to merge just by cat samplex_R1.fastq.gz samplex_R2.fastq.gz, or give up one file of them two? Orjust do map firstly as single-end(I worried about this way, after all, the contains of file are different), then merge their bam files together?
3. I confused about why wc -l samplex_R1.fastq.gz is different from wc -l samplex_R2.fastq.gz since they are pair?

Any idea will be very grateful!!!

RNA-Seq • 2.6k views
1
Entering edit mode

Treat each pair as a single end fastq and after quantification you can take the average of the TPM/RPKM/FPKM of the read pairs.

Cheers !!

1
Entering edit mode

While you can do that, that doesn't mean that one should do that (one shouldn't).

0
Entering edit mode

Useful view, thanks!

0
Entering edit mode

My data (m+n) samples. m sample just few part for all, but we got pair-end RNA-Seq data from company, then, we got single-end data for next n samples, since they (experiment people) changed idea for some reasons, but for us, we plan process all of them as single-end way. For me, I'm not very understand what it means for treat that m sample's pair-end RNA-Seq as single-end, but I guess it should start from mapping step, so for details, is what put in star now.

1
Entering edit mode

In that case they want you to only use the R1 files.

0
Entering edit mode

Thanks! Sounds reasonable, but I'm not sure it's right or not. Have you processed data by this way?

1
Entering edit mode

First of all, if you have paired-end data then you why you want to process them as single end. Paired end data provides you with better mappability and alignment as you have two mates to support the mapping. I provided solution in case of the pairs is not sequenced correctly or you have a mix of paired and single end data to work with. The solution I SUGGESTED not a regular analysis step.

Hope this helps.

Cheers !!!

0
Entering edit mode

I see, thank you,I think my reason is mix of paired and single end data to work with. Best.

0
Entering edit mode

No, absolutely don't. This may sound logical but cases where one mate is mapped and the other isn't would greatly affect the result as it would effectively divide the counts of the first mate by two (because one adds + 0 from mate two). If you want to use single-end reads, take the forward reads of each sample and proceed with standard tools such as salmon for quantification or star for alignment. Don't do any custom/untested procedures. That only creates bias.

6
Entering edit mode
4.3 years ago
1. Only if you lie to yourself.
2. Yes, just cat them, though note that if you need to use featureCounts later that you'll have to use an unstranded counting method, with the problems inherent to that.
3. Either one is corrupt or your files were preprocessed.
2
Entering edit mode

Will wc -l used on gzipped files return the right answer? I'd try zcat samplex_R1.fastq.gz | wc -l before deciding the files are broken

1
Entering edit mode

I kind of assumed they were actually using zcat somewhere and just didn't show it, but you're right that it's good to ask :)

0
Entering edit mode

Thank you Ryan, admire your ability a long time. Before your idea, I prefer give up one file, now I'll try to cat them then do map by star to test, in my view, though here we can cat them become one file, but it is different from real single end file.

1
Entering edit mode

star supports paired end alignment IMO Grace_G

0
Entering edit mode

Yes, of course. But here I be told to process these sample's paired end fastq data as single end, actually I'm not very understand how to do that.

1
Entering edit mode

You should ask them why they (whoever is telling you to do this) want you to process things in such an unusual way.

0
Entering edit mode

What's more, if I cat pair file together, then one file two direction? Could Star accept (give right result)? Look forward your view, thanks!

1
Entering edit mode

STAR's aligner will not have a problem. If you use GeneCounts to count reads hitting genes, the results from that might be confused, because your results will definitely be a mix of forward and reverse reads, even if the prep was stranded. Let me reiterate that FWIW, I strongly agree with Devon; analyzing these as single end is a bad idea, it will result in you giving the submitters distorted data, and an inaccurate picture of what their experiment really shows. Having 10 pairs of reads aligning to a gene is not at all the same as having 20 single end reads that align.

0
Entering edit mode

Thanks for your consideration, it's much helpful too. So for this star step, how about directly process these m samples as pair-end, n samples as single-end, since when they go to bam step, they will become same for next process?

0
Entering edit mode

Yes, of course. But here I be told to process these sample's paired end fastq data as single end, actually I'm not very understand how to do that.