How accurate it is for the genome project like goat or dog?
2
0
Entering edit mode
5.8 years ago
Murphy • 0

I am working on a uncommen gene of bovin. Although the full sequence is available in genbank by whole genome sequencing, the status of this gene is still "predicted", whicn means no cloning and expression experiment has been done yet.

The question is, how accurate should I trust the genome data.

Do they already sequence and assembly the genome several times and ensure that the sequence is indeed accurate enough? (at least is true for the biosample that they collected)

Or maybe the genome is only a draft reference, so that in most cases researchers should redo the cDNA cloning and sequencing to get the accurate gene sequence in their specific study.

Which one is right?

genome • 978 views
ADD COMMENT
0
Entering edit mode

Your text talks about a bovin gene, your title about goat or dog. Can you clarify?

ADD REPLY
2
Entering edit mode
5.8 years ago
Emily 23k

The quality of the genome can be assessed by things like the coverage and N50. The higher the numbers of both the better.

Predicted genes are often annotated by mapping genes from other species onto the genome. We assume that the location identified is the orthologue of the original gene.

ADD COMMENT
0
Entering edit mode

Thank you! One additional question. Within a regular genome project, What is the mean coverage of normal region? (~10, ~100, or even higher?). Is there any possibility that 30% of the reads indicate "A" , while 70% others indicate "G"? How to judge in such conditions. Sorry, I am not working in this field maybe my question is very stupid. :)

ADD REPLY
1
Entering edit mode

There are no "regular" genome projects - each uses several different strategies, probably funding being the most important.

For Bos taurus, the three most recent assemblies vary a lot in coverage - and on sequencing technologies used, and assembly software, and... you have to read the papers (or white papers, or dedicated sites) describing each particular assembly. You can find very condensed and incomplete information on the NCBI Assembly page for ARS-UCD1.2 (80x coverage), UMD_3.1.1 (9x coverage) and Btau_5.0.1 (19x coverage).

Is there any possibility that 30% of the reads indicate "A" , while 70% others indicate "G"?

In general, such things are left out of assemblies, at least fasta assemblies. For some organisms, there are SNP databases.

ADD REPLY
1
Entering edit mode
5.8 years ago
h.mon 35k

TL;DR: the status of your gene of interest is predicted, so trust, but verify. For example, check if RNAseq mapping confirms the gene annotation - there is plenty of bovine RNAseq data available.

Longer explanation:

The genome data and its annotation are two different, but inter-related, entities. The sequence of the genome will depend on coverage, sequencing technologies, assembler used, heterozygosity rate and / or number of individuals sequenced, and so on. Usually, the paper describing the genome in question, and the NCBI genome page, have information about its sequencing and assembly, and are good sources of information about its quality.

The annotation is performed over a certain genome assembly of choice, and thus depends in part on the quality of the assembled genome. Thus, if an assembly i fragmented, its gene predictions will be also incomplete and fragmented. However, as Emily_Ensembl said, annotations rely a lot on mapping genes and proteins from other species, so genomes from poorly studied taxa will have poorer annotations, because there are less experimental data validating good gene models. Availability of good sequencing RNAseq data, from several tissues, help in annotation as well.

A further point is most genomes are assembled from a single individual. While this is good to get a better quality assembly, it also means the reference genome sequence may be different from your sample of interest.

ADD COMMENT
0
Entering edit mode

Thank you very much for you comprehensive reply. Indeed, I had done the RNAseq mapping of this specific gene. Theoretically, the gene is only expressed in stomach, and luckly there are quite a few stomach transcriptome archives availble. Through local BLAST, I picked the matched RNAseq of this certain gene and examined the sequence. Unfortunately, they could only cover 40% of the full sequence(~1000bp). Indeed there are two mismatch (identical for all RNA seqs, but distinct from genome data). However for the translated protein sequence, both sites appears to be nonsense mutant, that is the amino acids are identical. So in my specific case, I'd prefer to accept that the quality is likey to be acceptable but the intraspecific gene polymorphism is to be treated with caution.

ADD REPLY

Login before adding your answer.

Traffic: 1944 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6