Someone in my lab recently got some PacBio sequencing done on a single amplicon. We're trying to figure out how to even look at the data. The company that we used did some of the work for us and generated a ccs filter fastq file. They also gave us favor .bam files We just don't know what to do with the data. There doesn't seem to be much documentation on what to do with the amplicon data. They say you can use IGV for their data but IGV seems to be struggling on our Mac laptops. I think we have a new version of Java that IGV doesn't like.
What kind of analysis are you planning to do? If you have ccs filtered fastq files then you can use those for downstream analysis. Alignments with minimap2 can be one option.
So here's another issue we have. So the organism we are working with has gaps in its reference genome. This amplicon is about 75% unmapped territory. We know the start and the end of the sequence, but we don't know if we can align it too the reference genome with such wide "grey" area.
What I do with long read amplicon data: The high coverage and high error rates are the problem
downsample (reduce) the number of reads so you can actually view it in a genome browser as you might have 10000x + coverage, samtools view -h -s 0.01 x_100k.fa_s.bam > x_100k.fa_s_one_percent.bam
call SNVs - use Pacbio pipeline if possible and installed, ...... otherwise.....
self correct data - eg with the assembler Canu, however, this will remove any reliable phasing information
realign the corrected data, eg with minimap2
call SNVs and indels, eg with Freebayes or bbmap callvariants or Strelka
What kind of analysis are you planning to do? If you have ccs filtered fastq files then you can use those for downstream analysis. Alignments with minimap2 can be one option.
So here's another issue we have. So the organism we are working with has gaps in its reference genome. This amplicon is about 75% unmapped territory. We know the start and the end of the sequence, but we don't know if we can align it too the reference genome with such wide "grey" area.
If you have enough coverage (which you should) then you can try an assembly first to create your own reference then.
hmm doesn't sounds too bad. Any suggestion on software to use for the assembly?