Question: Extract random number of transcript and exons from gencode annotation
0
gravatar for zoegward
24 months ago by
zoegward90
zoegward90 wrote:

Hi, I want to extract say 1000 transcripts and related exons from my gencode.gtf. Does anyone know how to do this? I'm a beginner and have used grep to extract 'transcripts' and 'exons' only but as there are not a set number of exons connected to the transcripts I'm not sure how to count to extract 1000 transcripts??

At the moment I have:

 grep 'protein_coding' gencode.v27.gtf | awk '{if($3=="transcript" || $3=="exon")print$0}'

Many thanks!

sequencing genome • 668 views
ADD COMMENTlink modified 24 months ago by michael.ante3.6k • written 24 months ago by zoegward90
1
gravatar for zoegward
24 months ago by
zoegward90
zoegward90 wrote:

Okay kind of worked out a dirty way of doing it so any more elegant solutions welcome. I found the id of the 1000th transcript and then used sed to print out everything before this id:

> grep 'protein_coding' gencode.v27.gtf | awk '{if($3=="transcript")print$0}'  > test.gtf
## to get the id of the 1000th transcript
 > cat test.gtf | sed -n -e '1000p'
### print everything from the first line to the 1000th transcript id
> sed -n '1,/<TRANSCRIPT_ID FROM THE PREVIOUS LINE OF CODE>/ p' test.gtf
ADD COMMENTlink modified 24 months ago • written 24 months ago by zoegward90
0
gravatar for michael.ante
24 months ago by
michael.ante3.6k
Austria/Vienna
michael.ante3.6k wrote:

Hi Zoegward,

I think you need to extract 1000 transcript IDs from the gtf. Something like

awk '$3=="transcript" && match ($0, /protein_coding/){for(i=1;i<NF;i++){if(match($i,/transcript_id/)){print $(i+1)}}}' gencode.v27.gtf > transcript-list.txt

To extract all Transcript IDs from protein coding genes. In order to get 1000 IDs just uses head -n 1000, tail -n 1000, and if you want to shuffle it shuf it before using tail /head.

The resulting list is used for a grep search:

grep -f 1000-transcript-ids.txt gencode.v27.gtf > out.gtf

Cheers,

Michael

ADD COMMENTlink written 24 months ago by michael.ante3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1581 users visited in the last hour