Question: extract fasta sequence based on gene distance
0
gravatar for yaghoub.amraei
4 months ago by
yaghoub.amraei10 wrote:

Hello everyone ... I have a cuffcompare output that has the same genes names and transcripts names, but the gene distance of each transcript is different with the same name. How can I get the fastasequence of any genes distance with the Bad Tools or GFF Read package?

1   Cufflinks   exon    58474   61195   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "1"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    61423   61573   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "2"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    61669   61794   .   +   .   gene_id "XLOC_000007"; transcript_id "TCONS_00022739"; exon_number "3"; gene_name "Os01g0101150"; oId "CUFF.865.1"; nearest_ref "Os01t0101150-00"; class_code "o"; tss_id "TSS17";
1   Cufflinks   exon    163041  164107  .   +   .   gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
1   Cufflinks   exon    163041  164107  .   +   .   gene_id "XLOC_000015"; transcript_id "TCONS_00007850"; exon_number "1"; gene_name "Os01g0102850"; oId "CUFF.36.1"; nearest_ref "Os01t0102850-00"; class_code "o"; tss_id "TSS43";
sequence assembly • 203 views
ADD COMMENTlink modified 4 months ago • written 4 months ago by yaghoub.amraei10

Your GFF file is reporting exons, so for each line you have the genomic range for an exon of a particular gene. Do you want to get the sequence for each individual exon? Or do you want the sequence of the entire gene region on DNA?

ADD REPLYlink written 4 months ago by rpolicastro3.2k

my gold is LncRNA detection. The number of each line in the GTF file, of which I want to identify the LncRNA, is about 50,000 transcripts, and when I get the Festa file with GFF, I have about 17,000 transcripts. That is, GFFRID gives the Festa file based on a transcript, not based on the gene distance of each transcript.

ADD REPLYlink written 4 months ago by yaghoub.amraei10

getFasta function from bedtools might help you.

ADD REPLYlink written 4 months ago by cpad011214k

hello. amazing, as usual.

ADD REPLYlink written 4 months ago by yaghoub.amraei10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2398 users visited in the last hour
_