Artic protocol
0
0
Entering edit mode
4 months ago
juanjo75es ▴ 130

I am trying to apply the Artic bioinformatics protocol to a simulated SARS-Cov-2 raw sequencing dataset.

The dataset was simulated from a manually edited sequence containing these variants:

#CHROM  POS ID  REF ALT
sars3   14599   .   CT  CTAGATAT
sars3   16842   .   AG  AAGATATG
sars3   18215   .   GT  GAGATATT


The simulation was obtained with this pbsim2 command:

pbsim --depth 1000 --hmm_model data/P6C4.model [raw_fastq_file]


That generates a fastq file with the reads. Then I call:

artic minion ncov-2019 results/my_id --read-file results/my_id.fastq --medaka --medaka-model r941_prom_variant_g360


The resulting vcf file shows a lot of unexistent SNP's & indels, and only finds the first of the really existing variants.

Can I improve my test? What params are really used in the protocol that they use for SARS-COV-2? I don't get why they let params to be selected (and poorly documented) if this is a protocol. Maybe it's a problem in the simulation?

EDIT: I see now that it indeed doesn't find any variant that passes the filter. It finds in merged.vcf lots of inexistent ones and one that exists, but none passes the filter. Two existing variants are not even found in merged.vcf.

variant-calling artic nanopre sars-cov-2 medaka • 191 views