Current challenges in SARS-CoV-2 analysis?
3
2
Entering edit mode
13 months ago
Hernán ▴ 200

Hi.

As bioinformatician which never worked before analyzing virus sequences, I'd like to know which current challenges exists in the analysis of SARS-CoV-2, in any of the stages: 1) detection & diagnosis 2) In prevention and 3) In treatment & therapeutics.

I saw the people at Nextrain did an impressive phylogenetics analysis for Genomic epidemiology. The group behind artic-network setup a plataform for nanopore analysis of sequences. And there are other APIs for example for geovisualization.

But there's a typical "the standard workflow for a virus strain"? What is the current "bottleneck" in bioinformatics for analyzing COVID-19 sequences? It's about finding better primer design sets? It's "Low-level RNA detection" like scRNA-seq analysis or dPCR over and nothing to do there? RNA modification analysis? Transcriptome data enrichment? High-Throughput (HT) SELEX? Do they do a Network analysis or Integrative analysis?

As you can see I am a little bit (completely) lost in this virology for bioinformatics. But would love to read some guidelance from an expert.

Cheers,

Hernán

RNA-Seq next-gen covid-19 Transcriptomics • 973 views
ADD COMMENT
4
Entering edit mode
13 months ago

There are many challenges here.

@gb, it was not wrong of you to mention regulations; those are one of the biggest challenges with Covid-19 bioinformatics. Due to privacy regulations, people change their experimental design and processing pipelines, and restrict data-sharing, to reduce the risk of human data leaking out, which can cause severe problems in the data processing and analysis phases. For analysis, this affects the amount of effort required, the results, and the timeframe, all of which are crucial in a pandemic.

There are additionally challenges related to efficiently sequencing viral reads, but those take a back-seat to the challenges caused by regulation, since they are possible to solve by the efforts of a single person. Problems that require an institution are essentially intractable in the timeframe of a crisis, if the institution is not agreeable. Problems that just require a capable individual or small group will almost always be solved.

Partly due to privacy concerns, and partly due to cost, Covid is currently sequenced primarily via primer amplification, which causes a host of issues. To summarize, I think coverage spikiness and strand bias are the worst, followed by chimeric primer ligation to real reads (or something like that; a lot of the reads are chimeric), then the potential of viruses to - due to a lack of error-correction - spew nonviable copies that get sequenced, but don't actually propagate (not sure how much of this is occurring compared to sample-prep-induced errors). Then maybe PCR artifacts due to low sample volume. If you look at the end result in IGV it's a huge mess with tons of artifacts and a lot of regions that look heterozygous, which generally should not be the case in this relatively slowly-mutating virus. I'm currently ignoring read ends via local alignment soft-clipping, a practice which I highly recommend against in normal shotgun sequencing. In the Covid samples I've seen, read ends appear to be chimeric so often that you need to soft-clip them to increase the signal to noise ratio.

ADD COMMENT
0
Entering edit mode

Partly due to privacy concerns, and partly due to cost,

I believe the fraction of viral RNA vs total human RNA in a typical sample will be very small, so shotgun sequencing would require very high coverage.

ADD REPLY
3
Entering edit mode
13 months ago

I think there are far larger wet-lab/logistic challenges than bioinformatic/analysis challenges.

ADD COMMENT
0
Entering edit mode

It is toooo dangerous.

ADD REPLY
1
Entering edit mode
13 months ago
gb ★ 1.9k

There is (as far as I know) no bottleneck for bioinformatics. The biggest bottleneck in solving this issue is regulations. A virus genome is also relatively small so it is possible to process a lot of data in once.

EDIT:

Btw, most things you are mentioning now is not typical bio-informatics work.

ADD COMMENT
0
Entering edit mode

Thanks. Yes I know, since this is not typical situation I wondered which type of tools are used to analyze this type of virus (I still wonder which specific tool are used). Would you mind to share some of the current regulations challenges you mention?

ADD REPLY
1
Entering edit mode

I think it was wrong from me to mention regulations. Also because you asked about bioinformatics specifically. There are regulations like that you can not "just" test some vaccines on random people. You could see it as a time bottleneck, but again I think now I was wrong that I mentioned it. And it is an important thing that you should and can not avoid.

Typical analyses that I do is variant calling. Virus samples are sequenced really deep and after that you compare them with a reference. Each position need to have at least a coverage of around 5000 reads and you already call variants with a frequency of 1 percent. Other scientist use this data to monitor the virus and the type of mutations. Those depth and frequency numbers are determined by organisations like the fda. (https://www.fda.gov/media/129126/download)

There many papers and tutorials about variant calling tools, pipelines and methods. So if you want to know which tools are being used you can start your research by looking at variant calling methods. In the basics it is a mapping tool like BWA or bowtie and variant calling tools like freebayes and GATK. You need (much) more but again, in the basics.

EDIT:

I came across this tutorial where you can easily try it yourself without much effort and setting things up (not viral). https://galaxyproject.github.io/training-material/topics/variant-analysis/tutorials/non-dip/tutorial.html

ADD REPLY

Login before adding your answer.

Traffic: 2533 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6