Hi,
I have a Miseq data with about 300 bp long, paired end reads, barcode is trnL(c)/UAA(h)
The first step I am attempting is use pandaseq to assemble the pe reads
this is the command line:
pandaseq -f lane1-s001-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-V-1_S1
_L001_R1_001.fastq -r lane1-s001-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-V-1_S1_L001_R2_001.fastq -o 50 -F -N -A simple_bayesian > test.fastq
I was expecting:
[forward primer][barcode][reverse primer]
however, I noticed that the assemble sequences looked like this:
[forward primer][some sequence][forward primer][some sequence]
and there was no matching of reverse primers
I am not sure how it is generated in this form and I hope you can shed some light in it
forward read: https://drive.google.com/open?id=1WlY0mNUgqemqHAiasxyhmQpbPPZXIQig reverse read: https://drive.google.com/open?id=1-LhK3lS7hr7eB4fvdnVc02y-q1z_MFI0 output: https://drive.google.com/open?id=1keIISY-rPQEXynBiT_1ve2XWAdNKx12c
xp
Can you post some command output?; Were there any warnings or errors returned in the logs?
Where exactly are you looking when you see:
[forward primer][some sequence][forward primer][some sequence]
?Your results file, i.e., the assembled sequences, is test.fastq
Hi Kevin, I just added the links to the reads. Hope you can take a look at them.
I played around the tools such as bbmerge and pandaseq with trimming and no trimming with trimmomatic 0.36
the # of raw reads in R1 is 194543 No trimming:
$BBMerge in1=V1_R1.fastq in2=V1_R2.fastq out=bbmap_notrim_merged.fq outu=bbmap_notrim_unmerged.fq \
adapters=NexteraPE-PE.fa ihist=ihist_notrim.txt ecct extend2=20 iterations=5 k=62
bbmerge generated a merged fastq with 175003 reads, about 10% loss, the sequences begins with trnL(c) primers whereas no UAA(h) primers could be found.pandaseq -f V1_R1.fastq -r V1_R2.fastq -A simple_bayesian -l 100 -N -F -t 0.8 -w pandaseq_merged.fq
pandaseq generated a paired fastq with 14238 reads, about 93% loss, and I observed [trnL(c)][some sequence][trnL(c) primer][some sequence] pattern
with trimming:
java -jar $Trim PE V1_R1.fastq V1_R2.fastq V1_R1_paired.fastq V1_R1_unpaired.fastq V1_R2_paired.fastq V1_R2_unpaired.fastq \
ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100
$BBMerge in1=V1_R1_paired.fastq in2=V1_R2_paired.fastq out=bbmap_notrim_merged.fq outu=bbmap_notrim_unmerged.fq \
adapters=NexteraPE-PE.fa ihist=ihist_notrim.txt ecct extend2=20 iterations=5 k=62
bbmerge generated merged reads # at 162668, about 16 % loss, the merged sequences begins with trnL(c) primers whereas no UAA(h) primers could be found.
java -jar $Trim PE V1_R1.fastq V1_R2.fastq V1_R1_paired.fastq V1_R1_unpaired.fastq V1_R2_paired.fastq
V1_R2_unpaired.fastq \
ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100
pandaseq -f V1_R1_paired.fastq -r V1_R2_paired.fastq -A simple_bayesian -l 100 -N -F -t 0.8 -w pandaseq_merged_trim.fq
pandaseq generated 162242 reads, so about 16 % loss, the merged sequences begins with trnL(c) primers whereas no UAA(h) primers could be found
I am thinking the low quality of the reads have more effects on pandaseq's matching algorithm.
xp
Hi Xiao, yes, I noticed the huge read loss (after PANDAseq) by just looking at the file sizes of your files.
Did you also look at general quality using FastQC?? This also runs in JAVA and can help to identify systematic problems with your reads.
Hi Kevin,
here are the fastqc report, the quality of 3' is not good as it normally is for NGS Foward QC Reverse QC
What do you think would be the practice here to analyze metabarcoding data? Assembly first or trim first?
Thanks for your help
Xiao
Hi Xiao,
The quality of the reads is very poor at the 3' end. You definitely need to trim these reads prior to using PANDAseq. To ensure high quality reads, you could use (with trimmomatic):
However, the best parameters will be decided through experimentation.
Hi Kevin, thanks for the reply. One more question: Should the output of pandaseq assembly look like [forward primer][sequence][reverse primer] without trimming the primers?
Hi Ziao, yes, that is what the output should be, as per Figure 1 of the published work:
[source: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-31]
Try again with the higher quality reads (after filtering with Trimmomatic), and see what happens. Also, check that the orientation of the reads is correct and that the insert size (expected gap between matching pairs) is not too large.