Question: How to deal with demultiplexed Miseq pair-end (2*250bp) 16S data using QIIME?
2
gravatar for nkuyfq
5.0 years ago by
nkuyfq60
China
nkuyfq60 wrote:

I have sequenced several samples on Ilumina Miseq, generating paired-end reads (2*250bp) spanning V3-V4 region of 16S rDNA.

I want to analyze these samples with QIIME v1.8. The script `split_libraries_fastq.py` in QIIME aimes to demultiplex and quality filter raw fastq seqeunces, with seperate fastq files for sequence and barcode reads as input.

However, at present I only have demultiplexed paired-end fastq files produced by Miseq, which were generated by Miseq by default when sequencing was completed. And I don't have the barcode fastq files required by QIIME. What should I do to proceed with  `split_libraries_fastq.py` to process my demultiplexed files?

Many thanks!

 

sequencing next-gen • 14k views
ADD COMMENTlink modified 3.3 years ago by tovia0 • written 5.0 years ago by nkuyfq60
1

Do you mean, you don't have the bar codes used for different samples at all? In that case, there is not much to do in my understanding.

ADD REPLYlink written 5.0 years ago by tommivat240
1

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the `split_libraries_fastq.py` in QIIME.

ADD REPLYlink written 5.0 years ago by nkuyfq60
1
gravatar for nkuyfq
5.0 years ago by
nkuyfq60
China
nkuyfq60 wrote:

I have barcode sequences for all samples. But I don't have the so-called barcode fastq files required by QIIME.

The Miseq instrument has demultiplexed all samples for me, therefore there is no need to use the demultiplex function of QIIME at all. I only want to use quality control function of the `split_libraries_fastq.py` in QIIME.

ADD COMMENTlink written 5.0 years ago by nkuyfq60
2

for split_libraries_fastq.py you just put flag --barcode_type 'not-barcoded' and for the -m flag make a simple map file with a linker primer sequence. 

ADD REPLYlink written 5.0 years ago by marina.v.yurieva480
1
gravatar for sheridan.christopher
5.0 years ago by
Belgium
sheridan.christopher10 wrote:

I have exactly the same problem as nkuyfq. Demultiplexed fastq files (one per library) containing the reads (barcode have been removed), and no barcode files. Here is an outline of what my mapping file looks like.

#SampleID    BarcodeSequence    LinkerPrimerSequence   Description
ID1    ACGATACACT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID1
ID2    AGTCAGACGC    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID2
ID3    GCTGACAGAG    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID3
ID4    ATGTCATGCT    ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA   ID4

 

I have tried to run the following command (simplified for ease of reading):

split_libraries_fastq.py -m /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/Pdam_metadata.csv -i file1,file2,file3,file4 --barcode_type 'not-barcoded' --sample_ids ID1,ID2,ID3,ID4  -o /home/chris/Bioinformatics/Data_Analysis/Guam_2013/MiSeq/3_QIIME/3_split_libraries

However I get an error message saying that there are errors in the mapping file. I re-checked the mapping file with validate_mapping_file.py and no errors were found.

Any ideas?

Thanks!

Chris

 

ADD COMMENTlink written 5.0 years ago by sheridan.christopher10
2

Hi, sheridan! I seemed to have found a solution.

The mapping file (one for per sample) should be like this:

#SampleID    BarcodeSequence    LinkerPrimerSequence   Description
ID1        ACACTGACGACATGGTTCTACAGTGCCAGCMGCCGCGGTAA    ID1

You may find that the field corresponding to 'BarcodeSequence' is empty, and it is empty! You can use the following command to check this mapping file:

validate_mapping_file.py -o dir -m your_map_file -b

Assume that you have four map files,then you can handle your demultiplexed fastq files like this:

split_libraries_fastq.py -i file1,file2,file3,file4 --sample_id ID1,ID2,ID3,ID4 -o raw_fastq_qc/ -m map1.txt,map2.txt,map3.txt,map4.txt -q 19 --barcode_type 'not-barcoded'

Now QIIME should be able to process your fastq files. If there are any bugs, please inform me. Thanks!

 
 
ADD REPLYlink written 5.0 years ago by nkuyfq60
1
gravatar for montoya.oscar
3.3 years ago by
montoya.oscar50 wrote:

Hi all,

I know this's an old question, but many people face this same mapping file issue fairly often. This is a link to QIIME's support site explaining how to deal with Illumina paired-end output (only to extend nkuyfq answer):

http://qiime.org/tutorials/processing_illumina_data.html

ADD COMMENTlink written 3.3 years ago by montoya.oscar50
0
gravatar for tovia
3.3 years ago by
tovia0
tovia0 wrote:

Hi all; I am following the same protocol as discussed above; After i perform the same split_libraries_fastq.py file, i`m looking at the seqs.fna output

The headers of the fasta sequences have this annotation

>S1,S2_0 M...
>S1,S2_1 M...

I used only two samples as a test and i find in every header of every fasta sequence both samples; Is this correct? thx tom

ADD COMMENTlink modified 3.3 years ago by genomax71k • written 3.3 years ago by tovia0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1562 users visited in the last hour