Illumina technology output, 16S amplicon
1
0
Entering edit mode
3.4 years ago
Jelo • 0

Hello everyone I am new to NGS and confusing about the sequence output

If the fragment of DNA (e.g. GACTACACGGGTATCTAATCCCGTTCGCTCCCCTGGCTTTCGCGCCTCAG) will occur only once after library preparation (if more than one it will be considered as duplicates) how many times this sequence of DNA will be read in the Illumina sequencer?

I have 100,000 reads per sample for 16S amplicon sequencing sequined by Illumina Miseq, the second reads have not good quality score, If I removed the bad quality reads (e.g. less than 24 QC score) does that will effect the diversity of microbial community by removing that bad reads?

thank you in advance

next-gen sequencing • 776 views
ADD COMMENT
0
Entering edit mode

This is a tricky question to answer. If there was no further amplification done then there will be one copy of the sequence to start with so assuming adapters were successfully added to it, it should lead to a single cluster (after bridge amplification in Illumina and thus one final sequence. If there are amplification steps in between and multiple copies of the sequence were made then they can lead to multiple clusters. Each of those cluster, in theory, will produce the same sequence.

Low nucleotide diversity or very short inserts can lead to problems with quality scores. Poor quality with read 2 in your case could be indicative of either.

ADD REPLY
1
Entering edit mode
3.4 years ago

Hi,

Assuming that you will not lose that sequence when providing the library prep to the sequencer, I would say that if you have one unique sequence (a sequence that appears only once) after you've done the library prep, you should have one single read corresponding to the same sequence, i.e., a singleton. As far I understand, if you've a unique sequence after the library prep, the sequencer will read only once this sequence.


EDIT

T̶h̶e̶ ̶q̶u̶a̶l̶i̶t̶y̶ ̶o̶f̶ ̶t̶h̶e̶ ̶r̶e̶v̶e̶r̶s̶e̶ ̶r̶e̶a̶d̶s̶ ̶(̶I̶ ̶g̶u̶e̶s̶s̶ ̶t̶h̶a̶t̶ ̶b̶y̶ ̶s̶e̶c̶o̶n̶d̶ ̶r̶e̶a̶d̶s̶,̶ ̶y̶o̶u̶ ̶m̶e̶a̶n̶ ̶"̶r̶e̶v̶e̶r̶s̶e̶"̶)̶ ̶i̶s̶ ̶a̶l̶w̶a̶y̶s̶ ̶w̶o̶r̶s̶e̶ ̶t̶h̶a̶n̶ ̶f̶o̶r̶w̶a̶r̶d̶ ̶r̶e̶a̶d̶s̶,̶ ̶t̶h̶i̶s̶ ̶i̶s̶ ̶b̶e̶c̶a̶u̶s̶e̶ ̶t̶h̶e̶ ̶s̶e̶q̶u̶e̶n̶c̶e̶-̶b̶y̶-̶s̶y̶n̶t̶h̶e̶s̶i̶s̶ ̶p̶r̶o̶c̶e̶s̶s̶ ̶i̶s̶ ̶a̶n̶ ̶e̶n̶z̶y̶m̶a̶t̶i̶c̶ ̶p̶r̶o̶c̶e̶s̶s̶ ̶t̶h̶a̶t̶ ̶u̶s̶u̶a̶l̶l̶y̶ ̶i̶s̶ ̶w̶o̶r̶s̶e̶ ̶i̶n̶ ̶t̶h̶e̶ ̶r̶e̶v̶e̶r̶s̶e̶ ̶(̶3̶'̶ ̶t̶o̶ ̶5̶'̶)̶ ̶d̶i̶r̶e̶c̶t̶i̶o̶n̶.̶ ̶T̶h̶e̶r̶e̶f̶o̶r̶e̶,̶ ̶t̶h̶a̶t̶ ̶i̶s̶ ̶n̶o̶t̶ ̶s̶u̶r̶p̶r̶i̶s̶i̶n̶g̶.̶ ̶A̶s̶s̶u̶m̶i̶n̶g̶ ̶t̶h̶a̶t̶ ̶a̶l̶l̶ ̶y̶o̶u̶r̶ ̶s̶a̶m̶p̶l̶e̶s̶ ̶w̶e̶r̶e̶ ̶s̶e̶q̶u̶e̶n̶c̶e̶d̶ ̶i̶n̶ ̶t̶h̶e̶ ̶s̶a̶m̶e̶ ̶s̶e̶q̶u̶e̶n̶c̶i̶n̶g̶ ̶r̶u̶n̶,̶ ̶I̶ ̶w̶o̶u̶l̶d̶ ̶e̶x̶p̶e̶c̶t̶ ̶a̶ ̶r̶e̶l̶a̶t̶i̶v̶e̶l̶y̶ ̶s̶i̶m̶i̶l̶a̶r̶ ̶n̶u̶m̶b̶e̶r̶ ̶o̶f̶ ̶b̶a̶d̶ ̶b̶a̶s̶e̶s̶ ̶o̶r̶ ̶b̶a̶d̶ ̶r̶e̶a̶d̶s̶ ̶̶p̶e̶r̶̶ ̶s̶a̶m̶p̶l̶e̶,̶ ̶m̶e̶a̶n̶i̶n̶g̶ ̶t̶h̶a̶t̶,̶ ̶a̶l̶t̶h̶o̶u̶g̶h̶ ̶t̶h̶e̶ ̶r̶e̶m̶o̶v̶a̶l̶ ̶o̶f̶ ̶t̶h̶e̶s̶e̶ ̶r̶e̶a̶d̶s̶ ̶w̶i̶l̶l̶ ̶a̶f̶f̶e̶c̶t̶ ̶t̶h̶e̶ ̶d̶i̶v̶e̶r̶s̶i̶t̶y̶ ̶-̶ ̶i̶t̶'̶l̶l̶ ̶d̶e̶c̶r̶e̶a̶s̶e̶ ̶-̶,̶ ̶t̶h̶i̶s̶ ̶w̶o̶u̶l̶d̶ ̶a̶f̶f̶e̶c̶t̶ ̶i̶n̶ ̶a̶ ̶s̶i̶m̶i̶l̶a̶r̶ ̶m̶a̶n̶n̶e̶r̶ ̶a̶l̶l̶ ̶t̶h̶e̶ ̶s̶a̶m̶p̶l̶e̶s̶.̶ ̶O̶n̶ ̶t̶h̶e̶ ̶o̶t̶h̶e̶r̶ ̶h̶a̶n̶d̶,̶ ̶i̶f̶ ̶y̶o̶u̶ ̶d̶o̶n̶'̶t̶ ̶r̶e̶m̶o̶v̶e̶ ̶b̶a̶d̶ ̶r̶e̶a̶d̶s̶ ̶t̶h̶a̶t̶ ̶c̶o̶u̶l̶d̶ ̶i̶n̶f̶l̶a̶t̶e̶ ̶t̶h̶e̶ ̶d̶i̶v̶e̶r̶s̶i̶t̶y̶,̶ ̶b̶u̶t̶ ̶t̶h̶o̶s̶e̶ ̶r̶e̶a̶d̶s̶ ̶d̶o̶ ̶n̶o̶t̶ ̶r̶e̶p̶r̶e̶s̶e̶n̶t̶ ̶t̶r̶u̶e̶ ̶b̶i̶o̶l̶o̶g̶i̶c̶a̶l̶ ̶s̶e̶q̶u̶e̶n̶c̶e̶s̶.̶ ̶

Please see the explanation provided by GenoMax below. That is the right answer to your second question. Apologies for my mistake.


I hope this answers your question!

António

ADD COMMENT
2
Entering edit mode

this is because the sequence-by-synthesis process is an enzymatic process that usually is worse in the reverse (3' to 5') direction.

This is not true. Synthesis always happens in 5' --> 3' direction for both ends.

What is true is reagents even though they are kept at 4-6 C still degrade over time. While this has become less of an issue of late it used to be more prominent in past when runs went on for ~7 days (original GAII). Clusters get fatter over time and software starts having a problem keeping track of them, thus Q score values start suffering. So there is a general trend of lower Q values later in the run. Even this has been addressed to a large extent with use of patterned flowcells and better software.

ADD REPLY
0
Entering edit mode

Thank you @GenoMax for the correction.

I always thought that the synthesis of the reverse reads was worse due to the enzymatic nature of the process, although I know that phasing can be an issue too.

I'm really amaze that the synthesis of the reverse is done 5' --> 3'. Do you have or know any further text/paper/document describing this that you could share, please? I would be very interest in reading more about that.

Because even when I look into videos on youtube about the Illumina sequencing process, the way they color the adapters and demonstrate the whole process suggests that the reverse reads are read in the 3' --> 5' direction (at least is my interpretation): https://www.youtube.com/watch?v=fCd6B5HRaZ8. For instance, in this case if the direction is always 5' --> 3', in the video the SBS process should be inverted for the reverse read right? Starting from the bottom (flowcell attached) to the top of the read?

I also found this blog post stating the same as you: https://www.cureffi.org/2012/12/19/forward-and-reverse-reads-in-paired-end-sequencing/.

I'll edit the post then.

ADD REPLY
0
Entering edit mode

Definitely it is clear answer for the second reads, yes I mean the reverse reads bc I used paired end sequencing.

Thank you antonioggsousa

ADD REPLY

Login before adding your answer.

Traffic: 1503 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6