Question: how to find sequence of the new gene that find by using RNA-seq and bioinformatic tools cufflinks
Dear all i have done RNA-seq project and have some question about cufflinks and other related program that links to cufflinks like cuffmerge and ....

after using Cufflinks package we get this document : cds.diff gene expression.diff and ... that contain this column :

test_id,    gene_id,    gene,   locus,  sample_1,   sample_2,   status, value_1,    value_2,    log2(fold_change),  test_stat,  p_value,    q_value,    significant,

XLOC_000302,    XLOC_000302,    -,  1:9748739-9749918,  D,  Q,  ,OK,    1.35346,    25.6511,    4.2443  ,4.96161,   5.00E-05,   0.000162672,    yes,

my question is : how i can find sequence of this differential expression gene ?

sequence of this genes is really important to me

thanks all


See the section on "Extracting transcript sequences" here.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by GenoMax94k

in this scrip : gffread -w transcripts.fa -g /path/to/genome.fa transcripts.gtf

transcripts.fa : my raw RNA-seq data ?

transcripts.gtf : gtf file that i download from internet or file that i get from cuflinks ?

and how can i get exit file ?

thanks for your answers

ADD REPLYlink written 2.7 years ago by mra818720

-w filename is output file with spliced exons for each transcript. transcripts.gtf is the file that has the XLOC id you are interested in. If you only want one XLOC id you could make a subset file.

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by GenoMax94k
  • transcripts.fa- output sequences in fasta format.
  • transcripts.gtf - transcripts of interest from analysis
  • reference_sequence.fa - reference sequence in fasta format. Index the genome sequence before you proceed. Example code:

    $ samtools faidx reference_sequence.fa

try in linux:

$ gffread -w transcripts.fa -g reference_sequence.fa transcripts.gtf
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by cpad011214k
