Question: illumina sequencing dual index query
0
gravatar for prasundutta87
3 months ago by
prasundutta87360
prasundutta87360 wrote:

Hi,

In Illumina sequencing, where dual indexes are used (i5 and i7), the i7 indexing read contains a 9bp molecular tag (UMI) in the form of 'N' in addition to the unique 8bp sample index.N can be any base. I am aware that when bcl2fastq is used for demultiplexing, both i7 and i5 index sequences are given as a parameter and it is advised not to add the Ns after the i7 sequences. I know that N can be any sequence, so when the index/barcode is chopped off the read and put at the end of the read name, what happens to the Ns?

sequencing • 232 views
ADD COMMENTlink modified 3 months ago • written 3 months ago by prasundutta87360

AFAIK UMI are trimmed and transferred to read header.

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax84k

Not really, I just have 'i7 index'+'i5 index' at the end of the read header. The i7 index is just 8 bp long. I am not understanding why do sequencing protocols have UMI as N's? Isn't it the whole point that when sequencing is done, that UMI should be known? Is there any document/website where I can understand if there is any historical reason for this?

ADD REPLYlink written 3 months ago by prasundutta87360

Please provide some reproducible examples, be it screenshots or a read example.

ADD REPLYlink written 3 months ago by ATpoint35k

So, basically, the i7 index looks like this- TACTAGTANNNNNNNNN and the i5 index looks like this-GATCGACA

My read header looks like this-

@______:__:_________-_____:_:____:_____:____ _:_:_:TACTAGTA+GATCGACA

Please let me know if this information is enough.

According to me, N can be any base. Why does any sequencing protocol have such Ns? What's the purpose of being UMI if the base is not known?

Secondly, bcl2fastq does not allow Ns to be added along with the i7 index. Where does the NNNNNNNNN go? Is it something basic I should know about?

ADD REPLYlink written 3 months ago by prasundutta87360

These are not standard illumina i7 indexes, correct? Are these from a different provider? bcl2fastq can only deal with UMI's that are part of read 1 and 2 (not index reads).

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax84k

I just made them up for understanding the concept. But, to answer your question, they are from a different provider, but the sequencing has been done on an Illumina machine. But still, shouldn't the UMIs be known? Should they be asked from the protocol providers directly?

ADD REPLYlink modified 3 months ago • written 3 months ago by prasundutta87360
1

Take a look at these adapters from IDT that do have UMI's. These would not be processed by bcl2fastq since it can only deal with UMI's that are in-line. See the answer in this thread on how the data using IDT adapters would be processed: bcl2fastq with xGen Dual Index UMI Adapters to produce 3 read and 2 index fastqs

If you need them added to the fastq header then you will need to do some additional work: How to append the cell barcode and UMI information to the fastq header in paired-end single-cell RNA-seq data? (and a couple others)

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax84k

Thanks a lot, let me go through these documents and pages. Will come back here in case of any doubts.

ADD REPLYlink written 3 months ago by prasundutta87360
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 857 users visited in the last hour