Question: cuffcompare output results
gravatar for hana
6.1 years ago by
hana180 wrote:


I'm interested in identifying potential novel isoforms form my RNA-seq data. I would like to know after running the cuffcompare how I can get only the list of novel isoform ( code "U" and "j" ) and extract their sequences and validate them ?


thank you



rna-seq • 5.5k views
ADD COMMENTlink modified 6.1 years ago by Manvendra Singh2.1k • written 6.1 years ago by hana180
gravatar for Manvendra Singh
6.1 years ago by
Manvendra Singh2.1k
Berlin, Germany
Manvendra Singh2.1k wrote:

Actually , with class codes "x" (cis antisense), "i" (intronic), "u" (intergenic) and "j" (alternatively spliced), given by cuffcompare are those transcripts which are non annotated in gtf files which you are providing during RABT assembly.

you can fetch these transcripts by their class codes e.g. for alternatively spliced

awk '$22 ~ /j/ { print }' cuffcompare_combined.gtf > Alternatively_spliced.gtf

now you need to do some filtering e.g. length of transcripts more than 200

awk '{ if ($5-$4>200) print $0 }'  Alternatively_spliced.gtf > Alternatively_spliced_200.gtf

you also get separate file as cuffcompare.tracking containing FPKM values for each detected loci

you can then make threshold of FPKM and filter out those which are less abundant

convert the resultant file in bed format and fetch the sequences from bedtools

bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> -s

-fo would provide you the sequences of coordinates you provide in -bed option from refseq you provide in -fi option, -s is for strandness



ADD COMMENTlink written 6.1 years ago by Manvendra Singh2.1k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1712 users visited in the last hour