Entering edit mode
17 months ago
optimistsso4co3 ▴ 100
I have multiple .fq whole genomes prepared for variant calling. However, it is quite expensive to repeat the whole pipeline so i wonder what data types are must have to extract?
Right now i am planning to extract the following:
- Variants (indels, SNPs), with HaplotypeCaller
- Structural variants (<1000 bp long), with Manta
What are some other "must have" i could extract?
<1000 bp long?
It's the maximum length generated by Manta for my dataset (~100 bp reads).
Do you have a reference for that?
Nope, i manually measured the length of structural variants after conversion to PLINK format. Now when i think about it i see there might be a crack in my logic due to PLINK limits on allele naming length.