Question: How Can I Find Out Unknown Illumina Barcode.
5
gravatar for samsara
5.1 years ago by
samsara570
The Earth
samsara570 wrote:

I have fastq file with undetermined reads from Illumina HiSeq. The reads in fastq file contains illumina barcode. I do now know what barcode has been used in the lab and there is no way i could find it out. Is it possible to find it out from the fastq file I received. Are there any tools that show most repeated sub-sequence in each reads of fastq file ?

[14-01-2014] Edit1: I have ilumina base call files (raw data from sequencer). I have possibility to convert .bcl files to .fastq using bcl2fastq provided by Illumina.

reads illumina fastq hiseq • 12k views
ADD COMMENTlink modified 2.1 years ago by elgart0 • written 5.1 years ago by samsara570
7
gravatar for BruceB
5.1 years ago by
BruceB320
Cambridge, UK
BruceB320 wrote:

Assuming the demultiplexing was carried out specifying the index sequence location, then this could be used:

grep ^@HISEQ lane1_Undetermined_L001_R1_001.fastq | cut -d":" -f10 | uniq -c | sort -nr > barcodes.txt

This outputs a text file containing every barcode in the FASTQ file, with a count of how many times it is seen. The most frequent is at the top of the list. Could help you find the barcode?

ADD COMMENTlink written 5.1 years ago by BruceB320
1

I think you should sort before count uniq, then sort again to see the most frequent at the top of the list. grep ^@HISEQ lane1_Undetermined_L001_R1_001.fastq | cut -d":" -f10 | sort -r | uniq -c | sort -nrk1,1 > barcodes.txt

ADD REPLYlink written 3.3 years ago by henryvuong750
2
gravatar for IV
5.1 years ago by
IV1.2k
USA
IV1.2k wrote:

I think that you'd be thrilled to see what minion can do for you.

http://ebi.edu.au/ftp/contrib/enrightlab/kraken/reaper/src/reaper-13-100/doc/minion.html

You can run it as follows:

minion search-adapter -do $millions -show 10 -i $file

Where $millions is the number of reads you want to use for the sequence analysis and $file is the filename to analyze.

Cheers,

IV

ADD COMMENTlink written 5.1 years ago by IV1.2k
1
gravatar for Steve Lianoglou
5.1 years ago by
Steve Lianoglou5.0k
US
Steve Lianoglou5.0k wrote:

If it isn't a custom barcode adapter/barcode, simply running FastQC on the fastq file might be all that you need.

The program is distributed with a "contimants.txt" file that lists many of the known adapters from different library prep methods. FastQC looks for over-represented sequences in the fastq file. If they match to any of the known adapters there, it will report that. If there is no match, you will still see the over-represented sequence, which you might consider stripping out anyway (perhaps you might want to blast the unknown sequence against the reference genome to ensure that you aren't removing any "signal" from you assay).

ADD COMMENTlink written 5.1 years ago by Steve Lianoglou5.0k

I run FastQC to check over-represented sequences, before posting this question, but FastQC did not give any over-represented sequences.

ADD REPLYlink written 5.1 years ago by samsara570

minion is a lot more sensitive, especially if you use all the available sequences in the analysis.

For instance, if you have 44.3M sequences in your file, you can run:

minion search-adapter -do 44 -show 10 -i sequences.fastq

Fastqc utilizes 1M reads for its analysis. The only problem with minion is that afterwards you have to manually inspect the 10 overrepresented sequences.

ADD REPLYlink written 5.1 years ago by IV1.2k
0
gravatar for elgart
2.1 years ago by
elgart0
elgart0 wrote:

I found that for Nextera-type barcodes (where there are two different barcodes on Forward and Reverse reads which together identify the sample) none of the above solutions work. So here is my take on it (python 2.7 code without dependencies):

ADD COMMENTlink written 2.1 years ago by elgart0

Are you sure? What does you undetermined FASTQ look like? You should be able to us same grep-based command (the chosen answer) on all the different barcode configurations. For dual-index Nextera, they would come out as XXXXXXXX-XXXXXXXX (similar to single index, but with a dash).

ADD REPLYlink modified 2.1 years ago • written 2.1 years ago by igor7.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1694 users visited in the last hour