0
1
Entering edit mode
3.7 years ago

Hi everyone!

I have two fastq files containing forward and reverse reads respectively from 54 samples. I want to demultiplex reads to their respective samples based on barcodes (4 nucleotides) present on each forward and reverse reads. I have multiple samples that may contain same forward and reverse barcodes, but every samples have unique barcodes when forward and reverse barcodes combined together. I am wondering if there is any easy way to separate each samples' forward and reverse reads in two different files? Any help will be appreciated.

Thank you

sequence • 2.7k views
1
Entering edit mode

I have multiple samples that may contain same forward and reverse barcodes, but every samples have unique barcodes when forward and reverse barcodes combined together.

Can you provide an example of this?

I would recommend doing some quality control, first, such as removing low quality sequences. Then merge read pairs, and demultiplexing based on barcode per sample.

0
Entering edit mode

I used to do as you suggest before. This time, I am trying to use both forward and reverse reads of each samples separately for finding OTUs without merging them using CD-HIT-OTU-MiSeq. Regarding the examples: one sample have F1R1 barcodes combination while another have F1R2 i.e same forward barcode but different combinations, I hope you understand now.

Thank you

0
Entering edit mode

Have you looked into QIIME?

0
Entering edit mode

I am a QIIME user, but I don't think QIIME has an option to do what I want. QIIME can demultiplex merged reads based on barcodes. But I want to demultiplex before merging. Do you have any ideas? Thank you

0
Entering edit mode

So dual indexing, right?

0
Entering edit mode

Same problem but on a NextSeq with 6bp barcodes. All of these reads got dumped into "Undetermined_S0_L00X_RX_001.fastq.gz"

How do I fix this?

0
Entering edit mode

Have you checked to see the index sequences (which you can find in the fastq headers of that Undetermined file)? People often think then know what the index sequences should be but what you see in this file is what the sequencer actually saw during the run.

Common mistake people initially make is to provide wrong indexes, provide them as reverse complemented (or switch i7-i5 indexes, in a 2D run) when creating a SampleSheet.csv file, which is used for demultiplexing.

Once you identify a cause, it should be easy to edit the SampleSheet.csv file and then re-run the demultiplexing (on instrument, BaseSpace or using bcl2fastq).

Note: Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.