Question: Retrieve the sequence based on the start and end position in the cuffmerged.gtf
0
gravatar for Chao.wang2
3.8 years ago by
Chao.wang240
Canada
Chao.wang240 wrote:

Hi guys,

Is there any one who knowns how to retrieve the gene sequence based on the staring and ending position in the cuffmerged.gtf file. Since there are some genes only tracking Ids and starting and ending positions available. I want to retrieve these sequences and annotate it. I will really appreciate for you guys help.

 

Thanks a lot

 

 

 

rna-seq • 1.6k views
ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by Chao.wang240

Thanks very much

Sounds helpful.

I will try it tomorrow.

ADD REPLYlink modified 15 months ago by RamRS25k • written 3.8 years ago by Chao.wang240
1
gravatar for igor
3.8 years ago by
igor8.9k
United States
igor8.9k wrote:

A one-line solution:

bedtools getfasta -fi genome.fa -bed cuffmerged.gtf -fo out.fa

Yes, the -bed parameter can actually take BED/GFF/VCF files. Full documentation here

ADD COMMENTlink modified 15 months ago by RamRS25k • written 3.8 years ago by igor8.9k

yes, this is brilliant

ADD REPLYlink written 3.8 years ago by Chao.wang240

Hi igor,

Thanks for your solution. However I want to extract the sequence corresponding to one cufflink tracking ID, the bedtools getfasta return several exon sequences for each tracking ID, Do you think there is a way to get around that? I also checked the cufflink website, there is a gffread utility.

which was designed to handle the cufflink output, however it extrac transcript sequences based on transcript ID in the cuffmerged.gtf not gene ID, Do you think there is a way to change it?

Thanks very much

ADD REPLYlink modified 15 months ago by RamRS25k • written 3.8 years ago by Chao.wang240
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1423 users visited in the last hour