Exclude non-coding genes
I have HT-Seq read count with around 60000 genes (with ENSEMBLE ID).
I only want to keep protein coding genes.
How can I remove non-coding genes?
go to http://www.ensembl.org/biomart/martview
database: gene / human gene
attribute : gene stable id
gene: -> gene_type "protein coding"
export as a text file
filter your list with
grep -w -f coding.txt you_ids.txt
Traffic: 1841 users visited in the last hour