Question

illumina sequencing dual index query

0

Entering edit mode

4.2 years ago

prasundutta87 ▴ 660

Hi,

In Illumina sequencing, where dual indexes are used (i5 and i7), the i7 indexing read contains a 9bp molecular tag (UMI) in the form of 'N' in addition to the unique 8bp sample index.N can be any base. I am aware that when bcl2fastq is used for demultiplexing, both i7 and i5 index sequences are given as a parameter and it is advised not to add the Ns after the i7 sequences. I know that N can be any sequence, so when the index/barcode is chopped off the read and put at the end of the read name, what happens to the Ns?

sequencing • 3.1k views

ADD COMMENT • link 4.2 years ago by prasundutta87 ▴ 660

0

Entering edit mode

AFAIK UMI are trimmed and transferred to read header.

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Not really, I just have 'i7 index'+'i5 index' at the end of the read header. The i7 index is just 8 bp long. I am not understanding why do sequencing protocols have UMI as N's? Isn't it the whole point that when sequencing is done, that UMI should be known? Is there any document/website where I can understand if there is any historical reason for this?

ADD REPLY • link 4.2 years ago by prasundutta87 ▴ 660

0

Entering edit mode

Please provide some reproducible examples, be it screenshots or a read example.

ADD REPLY • link 4.2 years ago by ATpoint 81k

0

Entering edit mode

So, basically, the i7 index looks like this- TACTAGTANNNNNNNNN and the i5 index looks like this-GATCGACA

My read header looks like this-

@______:__:_________-_____:_:____:_____:____ _:_:_:TACTAGTA+GATCGACA

Please let me know if this information is enough.

According to me, N can be any base. Why does any sequencing protocol have such Ns? What's the purpose of being UMI if the base is not known?

Secondly, bcl2fastq does not allow Ns to be added along with the i7 index. Where does the NNNNNNNNN go? Is it something basic I should know about?

ADD REPLY • link 4.2 years ago by prasundutta87 ▴ 660

0

Entering edit mode

These are not standard illumina i7 indexes, correct? Are these from a different provider? bcl2fastq can only deal with UMI's that are part of read 1 and 2 (not index reads).

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

I just made them up for understanding the concept. But, to answer your question, they are from a different provider, but the sequencing has been done on an Illumina machine. But still, shouldn't the UMIs be known? Should they be asked from the protocol providers directly?

ADD REPLY • link 4.2 years ago by prasundutta87 ▴ 660

1

Entering edit mode

Take a look at these adapters from IDT that do have UMI's. These would not be processed by bcl2fastq since it can only deal with UMI's that are in-line. See the answer in this thread on how the data using IDT adapters would be processed: bcl2fastq with xGen Dual Index UMI Adapters to produce 3 read and 2 index fastqs

If you need them added to the fastq header then you will need to do some additional work: How to append the cell barcode and UMI information to the fastq header in paired-end single-cell RNA-seq data? (and a couple others)

ADD REPLY • link 4.2 years ago by GenoMax 141k

0

Entering edit mode

Thanks a lot, let me go through these documents and pages. Will come back here in case of any doubts.

ADD REPLY • link 4.2 years ago by prasundutta87 ▴ 660