Question

PacBio amplicon sequencing

0

Entering edit mode

6.1 years ago

williamsbrian5064 ▴ 510

Hi,

Someone in my lab recently got some PacBio sequencing done on a single amplicon. We're trying to figure out how to even look at the data. The company that we used did some of the work for us and generated a ccs filter fastq file. They also gave us favor .bam files We just don't know what to do with the data. There doesn't seem to be much documentation on what to do with the amplicon data. They say you can use IGV for their data but IGV seems to be struggling on our Mac laptops. I think we have a new version of Java that IGV doesn't like.

genome next-gen sequencing assembly • 1.5k views

ADD COMMENT • link updated 6.1 years ago by colindaven 6.4k • written 6.1 years ago by williamsbrian5064 ▴ 510

0

Entering edit mode

What kind of analysis are you planning to do? If you have ccs filtered fastq files then you can use those for downstream analysis. Alignments with minimap2 can be one option.

ADD REPLY • link 6.1 years ago by GenoMax 142k

0

Entering edit mode

So here's another issue we have. So the organism we are working with has gaps in its reference genome. This amplicon is about 75% unmapped territory. We know the start and the end of the sequence, but we don't know if we can align it too the reference genome with such wide "grey" area.

ADD REPLY • link 6.1 years ago by williamsbrian5064 ▴ 510

1

Entering edit mode

If you have enough coverage (which you should) then you can try an assembly first to create your own reference then.

ADD REPLY • link 6.1 years ago by GenoMax 142k

0

Entering edit mode

hmm doesn't sounds too bad. Any suggestion on software to use for the assembly?

ADD REPLY • link 6.1 years ago by williamsbrian5064 ▴ 510

score 0 · Answer 1 · 2018-04-19

What I do with long read amplicon data: The high coverage and high error rates are the problem

downsample (reduce) the number of reads so you can actually view it in a genome browser as you might have 10000x + coverage, samtools view -h -s 0.01 x_100k.fa_s.bam > x_100k.fa_s_one_percent.bam
call SNVs - use Pacbio pipeline if possible and installed, ...... otherwise.....
self correct data - eg with the assembler Canu, however, this will remove any reliable phasing information
realign the corrected data, eg with minimap2
call SNVs and indels, eg with Freebayes or bbmap callvariants or Strelka