What are "must have" data types one should extract from WGS .fq data?
0
0
Entering edit mode
17 months ago

I have multiple .fq whole genomes prepared for variant calling. However, it is quite expensive to repeat the whole pipeline so i wonder what data types are must have to extract?

Right now i am planning to extract the following:

1. Variants (indels, SNPs), with HaplotypeCaller
2. Structural variants (<1000 bp long), with Manta

What are some other "must have" i could extract?

wgs • 843 views
0
Entering edit mode

why just <1000 bp long ?

0
Entering edit mode

It's the maximum length generated by Manta for my dataset (~100 bp reads).

0
Entering edit mode

Do you have a reference for that?

0
Entering edit mode

Nope, i manually measured the length of structural variants after conversion to PLINK format. Now when i think about it i see there might be a crack in my logic due to PLINK limits on allele naming length.

0
Entering edit mode
bcftools query -f '%INFO/SVLEN\n'  in.vcf | tr -d '-' | sort -n | tail