Hi,
I am quite new to bioinformatics. I have a gtf file and also bed file which includes the trascript_name Start End position. For every transcript's start end position, I would like to extract all the exons present between the start and end coordiantes of transcript. For an examples. imagine a gtf file like following
Scaffold1 cuff transcript 344  540  100 + geneid "cuff_45"
Scaffold1 cuff exon 344  400  100 + geneid "cuff_45"
Scaffold1 cuff exon 484  540  100 + geneid "cuff_45"
Scaffold1 cuff transcript 800  1200  100 + geneid "cuff_46"
Scaffold1 cuff exon 800  928  100 + geneid "cuff_46"
Scaffold1 cuff exon 980  1100  100 + geneid "cuff_46"
Scaffold1 cuff exon 1100  1200  100 + geneid "cuff_46"
Scaffold2 cuff transcript 1 500 1000 - gene_id "cuff_47"
Scaffold2 cuff exon 1 500 1000 - gene_id "cuff_47"
and a bed file like following
Scaffold1 344 540
Then I would like extract entries of Scaffold1 and its exons from gtf file like following
Scaffold1 cuff transcript 344  540  100 + geneid "cuff_45"
Scaffold1 cuff exon 344  400  100 + geneid "cuff_45"
Scaffold1 cuff exon 484  540  100 + geneid "cuff_45"
Can someone suggest any tool to achieve my goal.
Thanks in advance.
Use unix utility:
grep "Scaffold1" your.gtf > scaf1.gtfDo you also want a BED file from
your.gtfor you already have that?Thanks for your reply. I already tried with unix but it also incldes all the entries have Scaffold1 but I need to extract entries between the transcript start and end. I have updated the gtf file in my question. Kindly take a look and guide me
There are 7 lines that have
Scaffold1in your example.grepabove would get all 7. So instead of all 7, you just want the lines that match the interval in your BED file?Exactly. I just need need to extartc all the exons confined within the transcript start and end like mentioned in the above.