count protein-coding genes per contig
1
0
Entering edit mode
11 months ago
Diego ▴ 110

Hi everyone,

Any idea on how to count protein-coding genes only per contig/scaffold/chr?

Thanks, Diego

number protein-coding contig genes genome • 692 views
ADD COMMENT
4
Entering edit mode
11 months ago
iraun 6.2k

If you have an annotation file, as for example, the following GTF from human:

1   havana  gene    11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene";
1   havana  transcript  11869   14409   .   +   .   gene_id "ENSG00000223972"; gene_version "5"; transcript_id "ENST00000456328"; transcript_version "2"; gene_name "DDX11L1"; gene_source "havana"; gene_biotype "transcribed_unprocessed_pseudogene"; transcript_name "DDX11L1-202"; transcript_source "havana"; transcript_biotype "lncRNA"; tag "basic"; transcript_support_level "1";

You could get the number of protein coding genes per chromosome using:

awk '$3=="gene"' Homo_sapiens.GRCh38.98.chr.gtf | grep protein_coding | cut -f1 | sort | uniq -c
ADD COMMENT
0
Entering edit mode

thanks!!. It worked perfectly!

ADD REPLY
0
Entering edit mode

Glad it helped :). Please consider marking the answer as "accepted" :).

ADD REPLY

Login before adding your answer.

Traffic: 2734 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6