Question

Conceptual questions regarding computational genome/annotation versions and their impact on the actual truth of sequencing?

2

Entering edit mode

19 months ago

Pratik ★ 1.0k

Hello Biostars Community,

General question(s) here, specifically in regards to sequencing platforms and some related questions on sequencing, as well?

Could using a newer/the newest computational genome/annotation (for example, presently, Ensembl 107 or the newest Gencode version) adversely effect the actual truth of what was sequenced?
When sequencing is done through an Illumina machine or other big name company machines, are those sequencing platforms completely independent from the genome or DNA/cDNA being sequencing?
What happens if, for example, "famous gene ABC" and "low-profile gene XYZ" are found to have different 3' and/or 5' ends by some new discovery, in Illumina, would adapters still link to them to perform those bridge PCR reactions on the flow cell lanes, or would it be that all the data published before on "famous gene ABC" and "low-profile gene XYZ" should be revisited? Or is it like question #2, "completely independent" - are even adapters independent from genes?

I was doing some reading, and I guess the gene sequence really only matters for probe-based sequencing (chips and arrays?). Hopefully this question could be a good resource for others?

Thank you in advance.

Pratik

wes sequencing wgs rna-seq • 804 views

ADD COMMENT • link updated 19 months ago by benformatics 3.9k • written 19 months ago by Pratik ★ 1.0k

score 5 · Accepted Answer · 2022-09-21

using a newer/the newest computational genome/annotation (for example, presently, Ensembl 107 or the newest Gencode version) adversely effect the actual truth of what was sequenced?

No. Annotation represents the current understanding of genome of an organism. Could it have errors? Possibly/more than likely. But those errors get corrected over time (patch releases). Sequence that you got from a run is not going to change. It is independent of annotation/reference you use. Reference used will influence your conclusions, so reference will/does play a vital role in final outcome/conclusion.

are those sequencing platforms completely independent from the genome or DNA/cDNA being sequencing?

This will be influenced by limits/characteristics of sequencing technology. e.g. some platforms may not be able to sequence more than a certain number of base homopolymers (most platforms will have some limitations w.r.t this). It may be difficult to get representation of certain areas of genome because they are hard to convert into sequenceable libraries.

adapters still link to them to perform those bridge PCR reactions on the flow cell lanes

Remember that you are adding the adapters to create necessary flowcell compatible ends. The fragments that do not have these ends are not going to bind and will not be sequenced. As long as your fragments have compatible ends (T overlang etc) they will be made into sequence-able libraries.

gene sequence really only matters for probe-based sequencing

It may matter in case of technologies where you need to be able to unwind the two strands. Sequence may form secondary structures that could be hard to resolve/sequence through.

score 3 · Accepted Answer · 2022-09-22

A newer genome annotation/curation would be independent and not affect the actually nucleic acids sequenced by the machine (unless they were captured using a targeted approach prior to sequencing - e.g. whole exome sequencing). Whole genome sequencing will include ALL fragments present (regardless of genome/annotation) - assuming they are physically "able to be sequenced" (see homopolymers, etc... in the comment above).

Adapters are independent of genes

was doing some reading, and I guess the gene sequence really only matters for probe-based sequencing (chips and arrays?). Hopefully this question could be a good resource for others?

You are correct

score 2 · Accepted Answer · 2022-09-22

Excellent responses by GenoMax to wonderful questions by Pratik. I echo what GenoMax says.

IMHO, the platforms ar einherent to capture technologies as well. NOT necessarily or everytime one would intend to use the same samples with different sequencing platforms/technologies unless the lab is rich. Even otherwise, I wouldn't recommend that. Taken together, the need of the our is to emply machine learning heuristics and predict such outcomes on a large scale. Could completely independent be dependent can give some interesting reasons for us to explore.

Just my two cents Prash