Question

How to deal with demultiplexed Miseq pair-end (2*250bp) 16S data using QIIME?

2

Entering edit mode

9.6 years ago

nkuyfq ▴ 70

I have sequenced several samples on Ilumina Miseq, generating paired-end reads (2*250bp) spanning V3-V4 region of 16S rDNA.

I want to analyze these samples with QIIME v1.8. The script split_libraries_fastq.py in QIIME aimes to demultiplex and quality filter raw fastq seqeunces, with seperate fastq files for sequence and barcode reads as input.

However, at present I only have demultiplexed paired-end fastq files produced by Miseq, which were generated by Miseq by default when sequencing was completed. And I don't have the barcode fastq files required by QIIME. What should I do to proceed with split_libraries_fastq.py to process my demultiplexed files?

Many thanks!

next-gen-sequencing • 17k views

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by nkuyfq ▴ 70

1

Entering edit mode

Do you mean, you don't have the bar codes used for different samples at all? In that case, there is not much to do in my understanding.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by tommivat ▴ 250

1

Entering edit mode

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the `split_libraries_fastq.py` in QIIME.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by nkuyfq ▴ 70

Ram · Answer 1 · 2014-09-28

1

Entering edit mode

9.6 years ago

nkuyfq ▴ 70

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the split_libraries_fastq.py in QIIME.

ADD COMMENT • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by nkuyfq ▴ 70

2

Entering edit mode

for split_libraries_fastq.py you just put flag --barcode_type 'not-barcoded' and for the -m flag make a simple map file with a linker primer sequence.

ADD REPLY • link updated 2.3 years ago by Ram 43k • written 9.6 years ago by marina.v.yurieva ▴ 570

Ram · Answer 2 · 2014-09-30

I have exactly the same problem as nkuyfq. Demultiplexed fastq files (one per library) containing the reads (barcode have been removed), and no barcode files. Here is an outline of what my mapping file looks like.

#SampleID    BarcodeSequence    LinkerPrimerSequence   Description
ID1    ACGATACACT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID1
ID2    AGTCAGACGC    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID2
ID3    GCTGACAGAG    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID3
ID4    ATGTCATGCT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID4

I have tried to run the following command (simplified for ease of reading):

split_libraries_fastq.py \
  -m /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/Pdam_metadata.csv \
  -i file1,file2,file3,file4 \
  --barcode_type 'not-barcoded' \
  --sample_ids ID1,ID2,ID3,ID4  \
  -o /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/3_split_libraries

However I get an error message saying that there are errors in the mapping file. I re-checked the mapping file with validate_mapping_file.py and no errors were found.

Any ideas?

Thanks!

Chris

score 1 · Answer 3 · 2016-05-25

1

Entering edit mode

7.9 years ago

montoya.oscar ▴ 50

Hi all,

I know this's an old question, but many people face this same mapping file issue fairly often. This is a link to QIIME's support site explaining how to deal with Illumina paired-end output (only to extend nkuyfq answer):

http://qiime.org/tutorials/processing_illumina_data.html

ADD COMMENT • link 7.9 years ago by montoya.oscar ▴ 50

GenoMax · Answer 4 · 2016-06-13

Hi all; I am following the same protocol as discussed above; After i perform the same split_libraries_fastq.py file, i`m looking at the seqs.fna output

The headers of the fasta sequences have this annotation

>S1,S2_0 M...
>S1,S2_1 M...

I used only two samples as a test and i find in every header of every fasta sequence both samples; Is this correct? thx tom