Artic protocol
Entering edit mode
4 months ago
juanjo75es ▴ 130

I am trying to apply the Artic bioinformatics protocol to a simulated SARS-Cov-2 raw sequencing dataset.

The dataset was simulated from a manually edited sequence containing these variants:

sars3   14599   .   CT  CTAGATAT     
sars3   16842   .   AG  AAGATATG     
sars3   18215   .   GT  GAGATATT

The simulation was obtained with this pbsim2 command:

pbsim --depth 1000 --hmm_model data/P6C4.model [raw_fastq_file]

That generates a fastq file with the reads. Then I call:

artic minion ncov-2019 results/my_id --read-file results/my_id.fastq --medaka --medaka-model r941_prom_variant_g360

The resulting vcf file shows a lot of unexistent SNP's & indels, and only finds the first of the really existing variants.

Can I improve my test? What params are really used in the protocol that they use for SARS-COV-2? I don't get why they let params to be selected (and poorly documented) if this is a protocol. Maybe it's a problem in the simulation?

EDIT: I see now that it indeed doesn't find any variant that passes the filter. It finds in merged.vcf lots of inexistent ones and one that exists, but none passes the filter. Two existing variants are not even found in merged.vcf.

variant-calling artic nanopre sars-cov-2 medaka • 191 views

Login before adding your answer.

Traffic: 2302 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6