Question: cuffcompare output results
0
gravatar for hana
4.3 years ago by
hana170
Malaysia
hana170 wrote:

Hi

I'm interested in identifying potential novel isoforms form my RNA-seq data. I would like to know after running the cuffcompare how I can get only the list of novel isoform ( code "U" and "j" ) and extract their sequences and validate them ?

 

thank you

 

 

rna-seq • 4.4k views
ADD COMMENTlink modified 4.3 years ago by Manvendra Singh2.0k • written 4.3 years ago by hana170
9
gravatar for Manvendra Singh
4.3 years ago by
Manvendra Singh2.0k
Berlin, Germany
Manvendra Singh2.0k wrote:

Actually , with class codes "x" (cis antisense), "i" (intronic), "u" (intergenic) and "j" (alternatively spliced), given by cuffcompare are those transcripts which are non annotated in gtf files which you are providing during RABT assembly.

you can fetch these transcripts by their class codes e.g. for alternatively spliced

awk '$22 ~ /j/ { print }' cuffcompare_combined.gtf > Alternatively_spliced.gtf

now you need to do some filtering e.g. length of transcripts more than 200

awk '{ if ($5-$4>200) print $0 }'  Alternatively_spliced.gtf > Alternatively_spliced_200.gtf

you also get separate file as cuffcompare.tracking containing FPKM values for each detected loci

you can then make threshold of FPKM and filter out those which are less abundant

convert the resultant file in bed format and fetch the sequences from bedtools

bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> -s

-fo would provide you the sequences of coordinates you provide in -bed option from refseq you provide in -fi option, -s is for strandness

 

HTH

ADD COMMENTlink written 4.3 years ago by Manvendra Singh2.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1542 users visited in the last hour