Processing and calling SVs from PacBio data
1
0
Entering edit mode
8 months ago
adarsh_munna ▴ 50

Hi,

I am very new to Long-read sequence data processing.

I have downloaded raw data from NCBI SRA from this paper: https://doi.org/10.1016/j.gene.2022.146438

My study also involves finding structural variants involved in Alpha-thalassemia from long-read data. So I took this as an example data. However, the data processing steps are not clearly mentioned in the paper.

Following details are only given in the paper.

After purification and quantification, the pooled library was converted to a SMRTbell library with Sequel Binding and Internal Ctrl Kit 3.0 (Pacific Biosciences) and sequenced on the Sequel II platform (Pacific Biosciences) under CCS mode. Then raw subreads were analyzed by CCS software (Pacific Biosciences) to generate CCS reads, debarcoded by lima in the Pbbioconda package (Pacific Biosciences) and aligned to genome build hg38 by pbmn2 (Pacific Biosciences). Finally, structural variations were identi- fied according to the HbVar, Ithanet and LOVD databases. SNVs and indels were identified by FreeBayes1.3.4

Now I have downloaded data from SRA which is in fastq format.

I would like to know which aligner and structural variant will be suitable for this data? I have already alignment using ngmlr, pbmm2 and variant calling using sniffles2, pbsv. But I could not replicate the results.

Please suggest me some methods.

PacBio long-read variant-calling • 404 views
ADD COMMENT
1
Entering edit mode
8 months ago
Michael 55k

In my understanding, the art of detecting structural variants at the moment is where SNP detection was maybe 10 years ago. Possibly, this Benchmark Paper may help. Their comparison led the authors to combine results from all the investigated SV callers into one meta-caller they call combiSV.

ADD COMMENT
1
Entering edit mode

Just to add to what michael said, you generally see at least 2 SV callers used, and a consensus between the calls is what is taken forward. Another option that I like for merging calls is SURVIVOR - I'm unfamiliar with combiSV. Though I am unsure what the output formats for long read callers is, but I suspect it's also a VCF in most cases.

ADD REPLY

Login before adding your answer.

Traffic: 1261 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6