Paired end sequencing in case of a short biological insert
2
0
Entering edit mode
3.7 years ago
Aspire ▴ 300

I am working on data (RNA-Seq) that was sequenced as 151 bp PE using NovaSeq.

Looking at the QC that was generated after library construction (the library was constructed using TruSeq RNA Access ), the size of the fragments is given as ~300 on average. Since this is after library construction, I understand that this includes adapters as well.

I'm still new in bioinformatics, but I completely fail to understand the logic behind sequencing 151 bp paired end with a fragment that after adapter removal should be around 170 bp only. As far as I understand, R2 would mostly overlap with R1. While 151 bp single read sequencing could make sense, I fail to see the logic behind paired-end sequencing of such a fragment.

The answer I have received from the company as to why 150 paired ending was performed is

"From what we checked, 150PE sequencing is agreed on the quotation and we followed the settings. As the fragments size is longer than 150bp, 150PE sequencing seems acceptable. Additionally the sequencing result shows high quality(e.g. Q30 is higher than 90). If you want it to be 100PE sequencing, please let me know. We will trim the data and send it back to you."

1) Am I correct that sequencing paired-end (as opposed to single-read) was completely useless in this case?

2) Does the person to whom the data belongs have a moral case for asking a refund? While he might not have a legal case (as the company states, 150 PE is agreed on the quotation), I think they should have definitely warned him that 150 paired end is not needed if the biological fragment is 170 bp long. He could have saved part of the money (I guess that a large part, though I don't know) by sequencing single end.

paired-end RNA-Seq • 2.3k views
ADD COMMENT
2
Entering edit mode

Am I correct that sequencing paired-end (as opposed to single-read) was completely useless in this case

That question dos not have a right answer. Could one have done with just single end sequencing, sure. But that is hindsight 20/20.

Does the person to whom the data belongs have a moral case for asking a refund?

Depends. Who made the libraries? Submitter did or the sequencing provider did. If submitter submitted pre-made libraries then it is their fault that the inserts are not long. If provider made the libraries you could request that they re-make them, if they advertise/guarantee a 300-400 bp insert size. This also would depend on the initial material that was submitted. If it was not of good quality/intact this is the best result you are going to get.

ADD REPLY
0
Entering edit mode

That question dos not have a right answer. Could one have done with just single end sequencing, sure. But that is hindsight 20/20.

The sequencing provider most definitely knew that the fragment is ~ 300 bp including adapters. I think that rpolicastro's comment is correct, and the R1,R2 would be redundant almost "by definition", so to speak - not as hindsight.

Depends. Who made the libraries? Submitter did or the sequencing provider did. If submitter submitted pre-made libraries then it is their fault that the inserts are not long. If provider made the libraries you could request that they re-make them, if they advertise/guarantee a 300-400 bp insert size

I do not know who made the libraries, but even if it was the submitter, the sequencing provider should have at least warned him that it is senseless (and costly!!) to sequence 150 from each end of the fragment. I assume that the less the sequence length, the less the cost.

ADD REPLY
1
Entering edit mode

Cost-effectiveness with Novaseq comes from pooling samples on the same flowcell together with many other libraries and 2x150 is a common read length setup. The other samples (e.g. exomes, WGS...) might have benefitted from that setup and they simply included your libraries since there was free space.

ADD REPLY
0
Entering edit mode

Was this 150bp for both the forward and reverse read (for a total of 300bp), or was it 150bp split in half between the forward and reverse read (so 75bp in either direction)?

ADD REPLY
0
Entering edit mode

The first. 150 for each of the reads, not split between them.

ADD REPLY
1
Entering edit mode

If the average fragment size was ~170 bp, then it did not make sense to sequence 150bp in both directions. The R1 and R2 reads would be nearly identical in the best case, and would be reading into the adapter on the other side in the worst case.

ADD REPLY
1
Entering edit mode
3.7 years ago
GenoMax 141k

the sequencing provider should have at least warned him that it is senseless (and costly!!) to sequence 150 from each end of the fragment

Should sequencing provider have pointed out that the reads will overlap, sure. Many times people are focused most on cost savings that things like this get overlooked.

As said by @ATPoint below, the price break that you get from large sequencing providers for 2 x 150 sequencing is because they can multiplex a large number of samples. If you had asked for shorter length (if the provider was offering that option) then they may have simply trimmed the data down to that length and given you that result.

I assume that the less the sequence length, the less the cost.

If you have enough samples to fill a whole flowcell or if the provider has other samples of the same shorter length then yes. That said cost of sequencing 2 x 75 bp is not half of 2 x 150 bp so the savings only go so far. For counting applications like RNAseq, 50 bp reads are generally enough (read length versus unique alignment rate )

ADD COMMENT
0
Entering edit mode

If you change your comments into an answer, I will accept the answer.

ADD REPLY
0
Entering edit mode
3.7 years ago
igor 13k

If the actual fragments are 170bp on average, then half the fragments are larger. Those would theoretically benefit. Although the difference will be marginal. Keep in mind the library size measured pre-sequencing does not always perfectly match what ends up being sequenced, so you may want to also determine the fragment length from the actual reads.

Regardless, you don't need to guess. You can trim the reads to any length and see if it actually makes a noticeable difference for your particular experiment.

ADD COMMENT

Login before adding your answer.

Traffic: 2480 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6