Why are number of CDS smaller than corresponding genes [M. tb]
0
0
Entering edit mode
23 months ago
Alewa ▴ 150

seems weird. could someone please help me understand why? and how to possibly resolve this.

[sta55@cbsukim tb_genes_fasta]$ esearch -db nuccore -query 'Mycobacterium tuberculosis H37Rv[Organism] AND NC_000962.3[ACCN]' | efilter -feature gene | efetch -format gene_fasta | grep "^>" | wc -l
4008
[sta55@cbsukim tb_genes_fasta]$ esearch -db nuccore -query 'Mycobacterium tuberculosis H37Rv[Organism] AND NC_000962.3[ACCN]' | efilter -feature gene | efetch -format fasta_cds_aa | grep "^>" | wc -l
3906

Background

I'm extracting the nucleotide sequence of M. TB genes and their corresponding cds(protein) sequences. https://www.ncbi.nlm.nih.gov/nuccore/NC_000962#locus_448814763

NCBI entrez bash genes • 734 views
ADD COMMENT
3
Entering edit mode

At a guess one explanation could be that some features annotated as genes would be RNAs etc which aren't coding for proteins, thus there are more genes than CDSs (i.e. more functionally annotated "things" than just proteins")

ADD REPLY
0
Entering edit mode

Joe - thanks for chiming in. but in my case there were less cds than the genes. or maybe I'm not doing the gene filtering right? :(

ADD REPLY
1
Entering edit mode

That's what I said, no? You have fewer annotated CDSs than genes. Remember what "CDS" actually means: coding sequences.

This is usually taken to mean they give rise to a functional protein, but the definition of a gene is broader these days and can include non-coding RNAs.

Hence number of CDS + number of non-CDS functional elements = number of "genes".

Or more simply: gene != CDS.

This is still a guess on my part as it could be due to any number of annotation artefacts etc, but I don't see an obvious problem here - the numbers you've retrieved make intuitive sense.

ADD REPLY

Login before adding your answer.

Traffic: 2420 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6