demultiplexing with fastq but without barcode read fastq
1
0
Entering edit mode
7.5 years ago
tonja.r ▴ 600

It seems that I am missing something, so I will just describe my problem. I have paired-end illumina reads in fastq format. In .txt I have the sequence for forward and reverse primers and tags for each experiment. I will attach an example file. The read has following format: tag-primer-fragment I need to demultiplex the reads according to the experiment and get rid of the adapters, primers, experiment sequences. There are two scripts that could do that in QIIME:

split_libraries_fastq.py - but I do not have The barcode read fastq files

demultiplex_fasta.py - it operates only on fasta format but I do not want to loose the quality information as in further I might want to filter according to the quality.

Is there any other way I could demultiplex without losing quality information?

next-gen sequencing • 3.1k views
ADD COMMENT
2
Entering edit mode

If the tag was before the sequencing primer that would not be captured in the reads (unless I am missing something here). Perhaps primer in your schema is something other than sequencing primer? Are you able to see the tags at the beginning of the reads?

If the construct is logically correct (and you do have tags visible in the reads) then this thread may help: Count and location of strings in fastq file reads

ADD REPLY
2
Entering edit mode
7.5 years ago
charbo24 ▴ 40

Are you using barcode and tag interchangeably? So you have reads that are:

unknownbarcodesequence-amplificationprimer-fragment

If so, STACKS has a de-multiplexing script that will do what you want, but it needs a list of barcodes. Whoever did your library preps should have that list and what experiment each one belonged to.

If that metadata is gone forever, you should still be able to recover the list of barcodes with a BASH script:

  1. Pulling out everything from ^ to the primer sequence in your reads (awk should work for this)

  2. sort/uniquify the list ( sort | uniq )

That will give you every unique barcode, which will almost certianly be more barcodes than you used, because some will be sequencing errors. Any barcodes with only 1 or 2 reads associated with them are probably just errors and can be discarded, what's left is your likely barcode list.

Of course, that won't help you associate the barcodes with your experimental variable, but if I understand your question correctly, that is probably unrecoverable.

ADD COMMENT

Login before adding your answer.

Traffic: 1679 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6