Question: extract or recode a gtf file based on a gene id list
0
gravatar for berge2015
2.1 years ago by
berge201580
berge201580 wrote:

Hi,

Does anyone here know how to extract lines from a gtf file using a list/subset of gene id obtained from the same gtf file? I basically want a 'recoded' (in vcf terminology) gtf file containing information for only those genes which I am interested in.

I tried awk awk 'FNR==NR {a[$0];next} {for (i in a) if (i~$1) print i}' and grep grep -Fwf but these have not yielded what I want. Thank you for your help.

snp rna-seq gtf gene • 1.2k views
ADD COMMENTlink modified 2.1 years ago by shenwei3564.6k • written 2.1 years ago by berge201580

can you please paste some sample data?

ADD REPLYlink written 2.1 years ago by shenwei3564.6k
0
gravatar for berge2015
2.1 years ago by
berge201580
berge201580 wrote:

After playing with awk for a while, I came up with a solution: awk -F'"' 'FNR==NR {block[$0];next} $2 in block' gene_id_list.txt ref_CDS.gtf > out.txt [Note the quote delimeter]

While not the most elegant solution, this does what's asked in the question. Hope it helps anyone else looking for something similar.

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by berge201580
0
gravatar for shenwei356
2.1 years ago by
shenwei3564.6k
China
shenwei3564.6k wrote:

try the powerfull CSV/TSV toolkit csvtk, usage of csvtk grep

csvtk grep -H -t -f 1 -r -P gene_id_list.txt ref_CDS.gtf

you may change the column index -f 1 where the gene id locates

ADD COMMENTlink modified 2.1 years ago • written 2.1 years ago by shenwei3564.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1639 users visited in the last hour