PacBio Pipeline and Tools for Variant Call
1
1
Entering edit mode
12 months ago
Kiran ▴ 80

Hi,

I am new to long read seq, I am trying to call Variants on GIAB Trio samples from PacBio data

Initially i Aligned reads with Pbmm2 tool, then variant call by DeepVariant 1.5, Phasing through Whatshap.

My queries are as follows Like Illumina NextSeq/NovaSeq data

What are Quality parameters that should be taken care ( Adapter trim, Reads filter, Min Data size, ) How to trim adapter, What tools to use for FastQ/BAM input from PacBio for QC.

I have Phased data after Whatshap which are represented "|" in VCF, Does that mean should i filter out the variants that has "\" before downstrean analysis annotation ( Are they bad quality variants )

longread pacbio whatshap pbmm2 • 1.3k views
ADD COMMENT
1
Entering edit mode
12 months ago

What are Quality parameters that should be taken care ( Adapter trim, Reads filter, Min Data size, ) How to trim adapter, What tools to use for FastQ/BAM input from PacBio for QC.

All of this happens on instrument, upstream of the datasets you're using. If you want to characterize the datasets, we often look at insert length (read length) histograms and read quality (rq tag) histograms.

I have Phased data after Whatshap which are represented "|" in VCF, Does that mean should i filter out the variants that has "\" before downstrean analysis annotation ( Are they bad quality variants )

You should not do any filtering on these GT characters. Phased variants use |, unphased variants use /. You can read more about this in the VCF specifications.

Description of the GT field from VCF specification

ADD COMMENT
0
Entering edit mode

Hi William , Thank you for the explanation, From VCF should i omit variants with "/" and keep ones with phased "|" for further annotation / variants reporting..? If So variants calls by GATK HaplotypeCaller i dont see something like Phase Unphase variants, All the Variant calls in VCF are Unphased "/", Just a confusion, Sorry if its very naïve question, why there is separate Phasing step in PacBio Variants calls from DeepVariant and not in GATK ( Illumina ) which has all "/"

ADD REPLY
0
Entering edit mode

There's no reason to omit unphased genotypes, in general. You'll notice that many of the unphased (/) genotypes are homozygous. These are still valid, high quality genotypes, it's just that it isn't meaningful to assign them to a haplotype block because they are present in _both_ haplotype blocks.

Unlike GATK HaplotypeCaller, there isn't a post-calling filter step applied to variant calls from DeepVariant. You'll see in the VCF that there are some sites with the RefCall FILTER. From the DeepVariant documentation:

RefCalls are candidates that were determined to match the reference and are therefore not variants, although they are included in the VCF file (see FILTER column)

You can potentially apply your own filters, based on QUAL, GQ, or other features, using bcftools filter, but phased/unphased doesn't reflect the quality in any way.

ADD REPLY
0
Entering edit mode

Thanks a lot William, I am Clear now

ADD REPLY

Login before adding your answer.

Traffic: 1021 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6