Question

a confusion about pandaseq assembly

1

Entering edit mode

6.4 years ago

xioli2013 ▴ 10

Hi,

I have a Miseq data with about 300 bp long, paired end reads, barcode is trnL(c)/UAA(h)

The first step I am attempting is use pandaseq to assemble the pe reads

this is the command line: pandaseq -f lane1-s001-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-V-1_S1 _L001_R1_001.fastq -r lane1-s001-indexN716-B-S502-B-ACTCGCTA-CTCTCTAT-V-1_S1_L001_R2_001.fastq -o 50 -F -N -A simple_bayesian > test.fastq

I was expecting:

[forward primer][barcode][reverse primer]

however, I noticed that the assemble sequences looked like this:

[forward primer][some sequence][forward primer][some sequence]

and there was no matching of reverse primers

I am not sure how it is generated in this form and I hope you can shed some light in it

forward read: https://drive.google.com/open?id=1WlY0mNUgqemqHAiasxyhmQpbPPZXIQig reverse read: https://drive.google.com/open?id=1-LhK3lS7hr7eB4fvdnVc02y-q1z_MFI0 output: https://drive.google.com/open?id=1keIISY-rPQEXynBiT_1ve2XWAdNKx12c

xp

pandaseq Miseq • 3.0k views

ADD COMMENT • link updated 6.4 years ago by Kevin Blighe 87k • written 6.4 years ago by xioli2013 ▴ 10

0

Entering edit mode

Can you post some command output?; Were there any warnings or errors returned in the logs?

Where exactly are you looking when you see: [forward primer][some sequence][forward primer][some sequence] ?

Your results file, i.e., the assembled sequences, is test.fastq

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, I just added the links to the reads. Hope you can take a look at them.

ADD REPLY • link 6.4 years ago by xioli2013 ▴ 10

0

Entering edit mode

I played around the tools such as bbmerge and pandaseq with trimming and no trimming with trimmomatic 0.36

the # of raw reads in R1 is 194543 No trimming: $BBMerge in1=V1_R1.fastq in2=V1_R2.fastq out=bbmap_notrim_merged.fq outu=bbmap_notrim_unmerged.fq \ adapters=NexteraPE-PE.fa ihist=ihist_notrim.txt ecct extend2=20 iterations=5 k=62 bbmerge generated a merged fastq with 175003 reads, about 10% loss, the sequences begins with trnL(c) primers whereas no UAA(h) primers could be found.

pandaseq -f V1_R1.fastq -r V1_R2.fastq -A simple_bayesian -l 100 -N -F -t 0.8 -w pandaseq_merged.fq

pandaseq generated a paired fastq with 14238 reads, about 93% loss, and I observed [trnL(c)][some sequence][trnL(c) primer][some sequence] pattern

with trimming:

java -jar $Trim PE V1_R1.fastq V1_R2.fastq V1_R1_paired.fastq V1_R1_unpaired.fastq V1_R2_paired.fastq V1_R2_unpaired.fastq \ ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100

$BBMerge in1=V1_R1_paired.fastq in2=V1_R2_paired.fastq out=bbmap_notrim_merged.fq outu=bbmap_notrim_unmerged.fq \ adapters=NexteraPE-PE.fa ihist=ihist_notrim.txt ecct extend2=20 iterations=5 k=62

bbmerge generated merged reads # at 162668, about 16 % loss, the merged sequences begins with trnL(c) primers whereas no UAA(h) primers could be found.

java -jar $Trim PE V1_R1.fastq V1_R2.fastq V1_R1_paired.fastq V1_R1_unpaired.fastq V1_R2_paired.fastq V1_R2_unpaired.fastq \ ILLUMINACLIP:NexteraPE-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:20 MINLEN:100

pandaseq -f V1_R1_paired.fastq -r V1_R2_paired.fastq -A simple_bayesian -l 100 -N -F -t 0.8 -w pandaseq_merged_trim.fq

pandaseq generated 162242 reads, so about 16 % loss, the merged sequences begins with trnL(c) primers whereas no UAA(h) primers could be found

I am thinking the low quality of the reads have more effects on pandaseq's matching algorithm.

xp

ADD REPLY • link 6.4 years ago by xioli2013 ▴ 10

0

Entering edit mode

Hi Xiao, yes, I noticed the huge read loss (after PANDAseq) by just looking at the file sizes of your files.

Did you also look at general quality using FastQC?? This also runs in JAVA and can help to identify systematic problems with your reads.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin,

here are the fastqc report, the quality of 3' is not good as it normally is for NGS Foward QC Reverse QC

What do you think would be the practice here to analyze metabarcoding data? Assembly first or trim first?

Thanks for your help

Xiao

ADD REPLY • link 6.4 years ago by xioli2013 ▴ 10

1

Entering edit mode

Hi Xiao,

The quality of the reads is very poor at the 3' end. You definitely need to trim these reads prior to using PANDAseq. To ensure high quality reads, you could use (with trimmomatic):

LEADING:20
TRAILING:20
SLIDINGWINDOW:4:30
MINLEN:50

However, the best parameters will be decided through experimentation.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Hi Kevin, thanks for the reply. One more question: Should the output of pandaseq assembly look like [forward primer][sequence][reverse primer] without trimming the primers?

ADD REPLY • link 6.4 years ago by xioli2013 ▴ 10

1

Entering edit mode

Hi Ziao, yes, that is what the output should be, as per Figure 1 of the published work:

[source: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-31]

Try again with the higher quality reads (after filtering with Trimmomatic), and see what happens. Also, check that the orientation of the reads is correct and that the insert size (expected gap between matching pairs) is not too large.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k