Question

How important is the second column in NGS(Illumina) read name for an assembler?

0

Entering edit mode

6.2 years ago

avneeshbt • 0

Hello everyone, Someone please explain or share some information from where I can know how exactly the second column in a NGS read name line i.e. line starting with "@" is used by downstream softwares, be it assembler or mapping tools. Thanks

Assembly next-gen sequencing • 998 views

ADD COMMENT • link updated 6.2 years ago by lieven.sterck 15k • written 6.2 years ago by avneeshbt • 0

0

Entering edit mode

What exactly do you mean with second column?

Given a read identifier like this

@EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1

Are you referring to the 1:N:18:1 part or to the 136 (assuming ':' as column identifier)?

In the former case, this information holds the read pair information (1:N:18:1 = read1, 2:N:18:1 = read2, s. lieven.sterck's answer), in the latter case it's the run id and does not matter for downstream analysis (also think about the fact that when you upload your reads e.g. to SRA, the read names will be changed to a header much shorter than the original one).

ADD REPLY • link 6.2 years ago by cschu181 ★ 2.8k

0

Entering edit mode

I am referring to 1:N:18:1. For a program we are anyways mentioning which file is P1 or P2 in a paired-end data, than why this is still required. For example in case of SPAdes, if I use reads file without 1:N:18:2 column, it will give an error, even if I specify p1 and p2 as separate files.

ADD REPLY • link 6.2 years ago by avneeshbt • 0

0

Entering edit mode

mainly because every cleaver implemented software will do some double checking to verify that what you enter on the command line is also actually true (data-wise). Also how else would he know which read is left and which one is right?

ADD REPLY • link 6.2 years ago by lieven.sterck 15k

0

Entering edit mode

From the order of files specified, but you're right.

ADD REPLY • link 6.2 years ago by cschu181 ★ 2.8k

0

Entering edit mode

That is indeed strange. Especially when you think of the fact that some public, e.g. from SRA, data will not contain the second column anymore. What kind of error are you getting? It might also be possible to add /1 and /2 to the read names in order to reconstruct the pair information.

ADD REPLY • link 6.2 years ago by cschu181 ★ 2.8k

score 0 · Answer 1 · 2018-03-09

0

Entering edit mode

6.2 years ago

lieven.sterck 15k

The main usage of those naming lines is to do the linking of pairs in paired-end or mate-pair data, I would say. I think none is using the name as such.

ADD COMMENT • link 6.2 years ago by lieven.sterck 15k