Processing and calling SVs from PacBio data
Entering edit mode
18 days ago
adarsh_pp ▴ 40


I am very new to Long-read sequence data processing.

I have downloaded raw data from NCBI SRA from this paper:

My study also involves finding structural variants involved in Alpha-thalassemia from long-read data. So I took this as an example data. However, the data processing steps are not clearly mentioned in the paper.

Following details are only given in the paper.

After purification and quantification, the pooled library was converted to a SMRTbell library with Sequel Binding and Internal Ctrl Kit 3.0 (Pacific Biosciences) and sequenced on the Sequel II platform (Pacific Biosciences) under CCS mode. Then raw subreads were analyzed by CCS software (Pacific Biosciences) to generate CCS reads, debarcoded by lima in the Pbbioconda package (Pacific Biosciences) and aligned to genome build hg38 by pbmn2 (Pacific Biosciences). Finally, structural variations were identi- fied according to the HbVar, Ithanet and LOVD databases. SNVs and indels were identified by FreeBayes1.3.4

Now I have downloaded data from SRA which is in fastq format.

I would like to know which aligner and structural variant will be suitable for this data? I have already alignment using ngmlr, pbmm2 and variant calling using sniffles2, pbsv. But I could not replicate the results.

Please suggest me some methods.

PacBio long-read variant-calling • 190 views
Entering edit mode
18 days ago
Michael 54k

In my understanding, the art of detecting structural variants at the moment is where SNP detection was maybe 10 years ago. Possibly, this Benchmark Paper may help. Their comparison led the authors to combine results from all the investigated SV callers into one meta-caller they call combiSV.

Entering edit mode

Just to add to what michael said, you generally see at least 2 SV callers used, and a consensus between the calls is what is taken forward. Another option that I like for merging calls is SURVIVOR - I'm unfamiliar with combiSV. Though I am unsure what the output formats for long read callers is, but I suspect it's also a VCF in most cases.


Login before adding your answer.

Traffic: 1140 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6