Entering edit mode
                    5.2 years ago
        Rob
        
    
        ▴
    
    180
    Hello, I have HT-Seq read count with around 60000 genes (with ENSEMBLE ID). I only want to keep protein coding genes. How can I remove non-coding genes?
Thanks
Thanks a lot Pierre It was awesome and so easy. I just do not understand your last point: "filter your list with grep -w -f coding.txt you_ids.txt"
how should I do this? sorry I am new to this field and not familiar with things and need more explanation.
You can do this through Terminal if you are on a Mac/Linux as grep is used for Unix operating systems.
First
cdto the folder/directory your files (the protein coding genes and your list containing the 6000 genes) are in:cd /Users/your/directory/Run the
grepcommand to filter your your_ids.txt (6000 genes file) using coding.txt (list from Ensembl).grep -w -f coding.txt your_ids.txtThis will output to terminal, you can save the output to a file with
> output.txtgrep -w -f coding.txt your_ids.txt > output.txtHello I am using windows system. Is there any method for wimdows?
Hello, I'm not familiar with Windows. There is the option to install WSL (Windows Subsystem for Linux) which lets you run the above commands on your Windows machine. Perhaps someone else can suggest a better alternative.