How to deal with demultiplexed Miseq pair-end (2*250bp) 16S data using QIIME?
4
2
Entering edit mode
6.5 years ago
nkuyfq ▴ 60

I have sequenced several samples on Ilumina Miseq, generating paired-end reads (2*250bp) spanning V3-V4 region of 16S rDNA.

I want to analyze these samples with QIIME v1.8. The script split_libraries_fastq.py in QIIME aimes to demultiplex and quality filter raw fastq seqeunces, with seperate fastq files for sequence and barcode reads as input.

However, at present I only have demultiplexed paired-end fastq files produced by Miseq, which were generated by Miseq by default when sequencing was completed. And I don't have the barcode fastq files required by QIIME. What should I do to proceed with  split_libraries_fastq.py to process my demultiplexed files?

Many thanks!

sequencing next-gen • 15k views
1
Entering edit mode

Do you mean, you don't have the bar codes used for different samples at all? In that case, there is not much to do in my understanding.

1
Entering edit mode

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the split_libraries_fastq.py in QIIME.

1
Entering edit mode
6.5 years ago
nkuyfq ▴ 60

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the split_libraries_fastq.py in QIIME.

2
Entering edit mode

for split_libraries_fastq.py you just put flag --barcode_type 'not-barcoded' and for the -m flag make a simple map file with a linker primer sequence.

1
Entering edit mode
6.5 years ago

I have exactly the same problem as nkuyfq. Demultiplexed fastq files (one per library) containing the reads (barcode have been removed), and no barcode files. Here is an outline of what my mapping file looks like.

ID1    ACGATACACT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID1
ID2    AGTCAGACGC    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID2
ID3    GCTGACAGAG    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID3
ID4    ATGTCATGCT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID4

I have tried to run the following command (simplified for ease of reading):

split_libraries_fastq.py -m /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/Pdam_metadata.csv -i file1,file2,file3,file4 --barcode_type 'not-barcoded' --sample_ids ID1,ID2,ID3,ID4  -o /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/3_split_libraries

However I get an error message saying that there are errors in the mapping file. I re-checked the mapping file with validate_mapping_file.py and no errors were found.

Any ideas?

Thanks!

Chris

2
Entering edit mode

Hi, sheridan! I seemed to have found a solution.

The mapping file (one for per sample) should be like this:

ID1        ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID1

You may find that the field corresponding to 'BarcodeSequence' is empty, and it is empty! You can use the following command to check this mapping file:

validate_mapping_file.py -o dir -m your_map_file -b

Assume that you have four map files,then you can handle your demultiplexed fastq files like this:

split_libraries_fastq.py -i file1,file2,file3,file4 --sample_id ID1,ID2,ID3,ID4 -o raw_fastq_qc/ -m map1.txt,map2.txt,map3.txt,map4.txt -q 19 --barcode_type 'not-barcoded'

Now QIIME should be able to process your fastq files. If there are any bugs, please inform me. Thanks!



1
Entering edit mode
4.9 years ago

Hi all,

I know this's an old question, but many people face this same mapping file issue fairly often. This is a link to QIIME's support site explaining how to deal with Illumina paired-end output (only to extend nkuyfq answer):

0
Entering edit mode
4.8 years ago
tovia • 0

Hi all; I am following the same protocol as discussed above; After i perform the same split_libraries_fastq.py file, im looking at the seqs.fna output

The headers of the fasta sequences have this annotation

>S1,S2_0 M...
>S1,S2_1 M...
`

I used only two samples as a test and i find in every header of every fasta sequence both samples; Is this correct? thx tom