Question: How to mach gene names and source (lincRNA, antisense, protein_coding,....) with specific list of genes id?
0
gravatar for M K
4.4 years ago by
M K460
United States
M K460 wrote:

I have a list of gene id's and I want to match those with gene names and source (lincRNA, antisense, protein_coding,....) from ensemble gtf file . For example this is a small part of the list as shown:

Gene id strand
ENSG00000242959 1
ENSG00000160396 -1
ENSG00000229494 1
ENSG00000230262 -1
ENSG00000229240 -1
ENSG00000223569 1
   
   

I got help before by using awk command to match gene id with gene name, so how can we include the source with them 

rna-seq next-gen R • 2.1k views
ADD COMMENTlink modified 4.4 years ago by geek_y9.6k • written 4.4 years ago by M K460
0
gravatar for geek_y
4.4 years ago by
geek_y9.6k
Barcelona/CRG/London/Imperial
geek_y9.6k wrote:

Take only the column 1.

awk '{ print $1 }' input_list | sort | uniq > gene_names

Now take the gene names and grep against GTF file.

while read line; do grep $line genes.gtf; done < gene_names > gene_names.gtf

This will be a bit slower but does the job. If you want a super fast program, you may need to wait for a perl/Python script.

 

 

ADD COMMENTlink modified 4.4 years ago • written 4.4 years ago by geek_y9.6k

Hi Geek,

Komal helped me by using the following awk command

awk '{                                
    for (i = 1; i <= NF; i++) {
        if ($i ~ /gene_id|gene_name/) {
            printf "%s ", $(i+1)
        }
    }
    print ""
}' Homo_sapiens.GRCh37.70.gtf | sed -e 's/"//g' -e 's/;//g' -e 's/ /\t/' | sort -k1,1 | uniq > Homo_sapiens.GRCh37.70.txt

and it works very well and I merged the result file with my file using R, So I wounder if we can add the source column in this command.

ADD REPLYlink written 4.4 years ago by M K460

M K I have replied to you on the previous question. Also, do not duplicate your posts.

ADD REPLYlink written 4.4 years ago by komal.rathi3.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1018 users visited in the last hour