Question: Illumina Read Names: /2 Vs. /3
4
gravatar for Newvin
8.2 years ago by
Newvin340
Newvin340 wrote:

The sequencing core at my university performed paired end RNA-seq on some of our lab's samples using Illumina sequencing technology. My understanding is that generally the forward and reverse read names are designated with trailing /1 and /2 e.g.

D5KHLFN1_0181:1:1101:1209:2028#0/1

D5KHLFN1_0181:1:1101:1209:2028#0/2

However, our results came back with /1 and /3 suffixes instead. The sequencing core claims this is an artifact of "tru-seq" sequencing. I was wondering if anyone could confirm this and/or elaborate.

(The main reason for my interest here is that certain de novo transcriptome assemblers actually require the "/2" rather than "/3", forcing me to do a sed search/replace)

Thanks!

ADD COMMENTlink written 8.2 years ago by Newvin340
10
gravatar for Ido Tamir
8.2 years ago by
Ido Tamir5.0k
Austria
Ido Tamir5.0k wrote:

I expand on my previous answer on request and also push illumina2bam.

Illumina allows combining multiple libraries into one lane using multiplexing. Illumina multiplexes with an additional read that reads a short sequence within in the adapter after it has read the first read. This results in a sequence of read1:index-read for single end reads and read1:index-read:read2 for paired end reads. So paired end read2 -> /3. Seems like in your case an indexed read was specified - maybe it was necessary for other lanes, or the wrong program was chosen.

It is superior to simply adding a short barcode at the beginning of the product because you have less problems with basecalling (normal complexity at start of reads).

I suggest everyone involved in collecting data from the machine to have a look at bam as primary output format instead of fastq and maybe push for it:

  1. you have less problems with the scale of the quality values. This was changed 4 times now.
  2. more important: all the provenance information is saved within the file, and if you have a correctly working pipeline set up - I am far from that :-( - all programs save the transformations on the data in the file. You know exactly what happened (which parameters, which version etc...).

2 possibilities exist to my knowledge:

* [illumina2bam] which reads directly from the saved bcl files and its **easy** to use!
* [IlluminaBasecallsToSam] picards which I think starts from the qseq files.

In the case of illumina2bam there is a great pipeline that takes the basecalls and puts the index read into the tags of the read in the bam file. Easy to parse, easy to split, merge etc.

ADD COMMENTlink modified 8.2 years ago • written 8.2 years ago by Ido Tamir5.0k
2

I rewrote the answer. I started with my agenda in promoting bam.

ADD REPLYlink written 8.2 years ago by Ido Tamir5.0k
1

Errrrm ... I did not get that. Care to explain a bit more in-depth?

ADD REPLYlink written 8.2 years ago by Bach550

Thank you bery much for that ... I learned something really new and valuable today. Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.

ADD REPLYlink written 8.2 years ago by Bach550

Thank you very much for that ... I learned something really new and valuable today.

Although I am not sure I like the BAM idea. FASTQ is nice because one can do a lot of tricks already on the command line with head, tail, sed, etc.pp and that is not possible anymore with BAM.

ADD REPLYlink written 8.2 years ago by Bach550
1
gravatar for User 1244
8.1 years ago by
User 124420
User 124420 wrote:

I had the same issue, I simply used sed to convert my reads to /2 and moved on. Sequencing artifact is absurd. This is explained better in the previous answer.

ADD COMMENTlink written 8.1 years ago by User 124420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1299 users visited in the last hour