How important is the second column in NGS(Illumina) read name for an assembler?
1
0
Entering edit mode
6.2 years ago
avneeshbt • 0

Hello everyone, Someone please explain or share some information from where I can know how exactly the second column in a NGS read name line i.e. line starting with "@" is used by downstream softwares, be it assembler or mapping tools. Thanks

Assembly next-gen sequencing • 998 views
ADD COMMENT
0
Entering edit mode

What exactly do you mean with second column?

Given a read identifier like this

@EAS139:136:FC706VJ:2:2104:15343:197393 1:N:18:1

Are you referring to the 1:N:18:1 part or to the 136 (assuming ':' as column identifier)?

In the former case, this information holds the read pair information (1:N:18:1 = read1, 2:N:18:1 = read2, s. lieven.sterck's answer), in the latter case it's the run id and does not matter for downstream analysis (also think about the fact that when you upload your reads e.g. to SRA, the read names will be changed to a header much shorter than the original one).

ADD REPLY
0
Entering edit mode

I am referring to 1:N:18:1. For a program we are anyways mentioning which file is P1 or P2 in a paired-end data, than why this is still required. For example in case of SPAdes, if I use reads file without 1:N:18:2 column, it will give an error, even if I specify p1 and p2 as separate files.

ADD REPLY
0
Entering edit mode

mainly because every cleaver implemented software will do some double checking to verify that what you enter on the command line is also actually true (data-wise). Also how else would he know which read is left and which one is right?

ADD REPLY
0
Entering edit mode

From the order of files specified, but you're right.

ADD REPLY
0
Entering edit mode

That is indeed strange. Especially when you think of the fact that some public, e.g. from SRA, data will not contain the second column anymore. What kind of error are you getting? It might also be possible to add /1 and /2 to the read names in order to reconstruct the pair information.

ADD REPLY
0
Entering edit mode
6.2 years ago

The main usage of those naming lines is to do the linking of pairs in paired-end or mate-pair data, I would say. I think none is using the name as such.

ADD COMMENT

Login before adding your answer.

Traffic: 1312 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6