Question: Cufflinks gffread utility
0
gravatar for Thomas
4.7 years ago by
Thomas100
Cambridge, UK
Thomas100 wrote:

Hello,

This is a question regarding the following utility:

http://cole-trapnell-lab.github.io/cufflinks/file_formats/#the-gffread-utility

If I use gffread for genomic features labelled as existing on the negative strand, will gffread find the reverse complement automatically when extracting the sequence from the fasta file, or will I have to implement an additional step to get that?

Any info on this would be greatly appreciated.

ADD COMMENTlink modified 2.4 years ago by madzayasodara0 • written 4.7 years ago by Thomas100

Hello! Any ideas why I'm getting a seg fault when I run this?

gffread BF_annotation.gtf -W -O -E -L -F -w BF_transcripts.fa -g BF_genome.fa
GFF Warning: merging overlapping/adjacent feature segment exon (8436614-8437382) with exon (8436535-8436613) for GFF ID KDM4A on scaffold_1
GFF Warning: merging overlapping/adjacent feature segment exon (319524-319638) with exon (319006-319523) for GFF ID ZFYVE28 on scaffold_199
GFF Warning: merging overlapping/adjacent feature segment exon (1352880-1353047) with exon (1351726-1352879) for GFF ID FLAD1 on scaffold_218
Segmentation fault (core dumped)

and when I run without certain arguments: gffread -w BF_transcripts.fa -W -O -E -g ./BF_genome.fa BF_annotation.gtf

it outputs warnings:

GFF Warning: merging overlapping/adjacent feature segment exon (8436614-8437382) with exon (8436535-8436613) for GFF ID KDM4A on scaffold_1
GFF Warning: merging overlapping/adjacent feature segment exon (319524-319638) with exon (319006-319523) for GFF ID ZFYVE28 on scaffold_199
GFF Warning: merging overlapping/adjacent feature segment exon (1352880-1353047) with exon (1351726-1352879) for GFF ID FLAD1 on scaffold_218
Error (GFaSeqGet): subsequence cannot be larger than 1560
Error getting subseq for BLEC (1..1561)!

I hadn't seen this post before and made a related post: C: Get sequences for all genes in a GTF file

Thanks!

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by madzayasodara0

I cleaned the .gtf from some genes (4 of them) and it run without warnings, but I'm not clear this is how I should have proceeded.

ADD REPLYlink written 2.4 years ago by madzayasodara0

In the warning messages, it is just saying that some of your gene exon co-ordinates overlap. For example:

GENE1
___________________________________________
TRANSCRIPTION --> 
                                 _______________________________________________________
                                 TRANSCRIPTION -->               
                                                                                   GENE2

You can see that there is some overlap in the middle. This is the situation that you have on 3 of your scaffolds (1, 199, 218).

This is not necessarily a problem. Even in the human transcriptome CDS FASTA file from GENCODE, many genes have the same sequences because they also overlap. It may be a problem depending on what are your next analysis steps.

ADD REPLYlink modified 2.4 years ago • written 2.4 years ago by Kevin Blighe69k
5
gravatar for Kevin Blighe
3.2 years ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Sorry the the late response.

Yes, it certainly will read the reverse complement and take strand-specific information into account:

cat mergedAll.gtf 
BacterialGenome.1   Cufflinks   exon    2   500 .   +   .   gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; oId "CUFF.10.1"; tss_id "TSS1";
BacterialGenome.1   Cufflinks   exon    2   500 .   -   .   gene_id "XLOC_000628"; transcript_id "TCONS_00000630"; exon_number "1"; oId "CUFF.1.1"; tss_id "TSS630";

.

gffread -w test.fasta -W -O -E -L -F -g BacterialGenomeReference.fasta mergedAll.gtf

.

cat test.fasta 
>TCONS_00000001 gene=XLOC_000001 loc:BacterialGenome.1|2-500|+ exons:2-500 segs:1-499 oId=CUFF.10.1 tss_id=TSS1
CATCACAGACACTACGCATAAGGGGCTATCACCCTCTATGGCCGGACTTTCCATTCCGTTCTGCTTCTTC
TGCAACAATCACGGGCCTGTTCCGCGTTCGCTCGCCACTACTAGCGGAATCTCAATTGATGTCTTTTCCT
CCGGGTACTTAGATGTTTCAGTTCCCCGGGTTCGCCTCATGCCCCTATGTATTCAGAACATGATACCCAT
CGCTGGGTGGGTTGCCCCATTCAGATATCCACGGATCAAAGCCTGCTCGCGGCTCCCCATGGCTTTTCGC
AGCGTGCCACGTCTTTCATCGCCTCCTGGTGCCAAGGCATCCACCGAATGCCCTTATCGCGCTCATTCAC
CACACATGCACAGGAGCCATCCACCCTGGGGCAGACAGTCCCGCACATAAGAAAGTGCAGTTCATCTCAC
GACCACTCTATTCTCTTTACGTCGTCCATGTTCGCTTACGCCACATCGCACAGCACGCAATACCCCGGTT
CCCCGGACC
>TCONS_00000630 gene=XLOC_000628 loc:BacterialGenome.1|2-500|- exons:2-500 segs:1-499 oId=CUFF.1.1 tss_id=TSS630
GGTCCGGGGAACCGGGGTATTGCGTGCTGTGCGATGTGGCGTAAGCGAACATGGACGACGTAAAGAGAAT
AGAGTGGTCGTGAGATGAACTGCACTTTCTTATGTGCGGGACTGTCTGCCCCAGGGTGGATGGCTCCTGT
GCATGTGTGGTGAATGAGCGCGATAAGGGCATTCGGTGGATGCCTTGGCACCAGGAGGCGATGAAAGACG
TGGCACGCTGCGAAAAGCCATGGGGAGCCGCGAGCAGGCTTTGATCCGTGGATATCTGAATGGGGCAACC
CACCCAGCGATGGGTATCATGTTCTGAATACATAGGGGCATGAGGCGAACCCGGGGAACTGAAACATCTA
AGTACCCGGAGGAAAAGACATCAATTGAGATTCCGCTAGTAGTGGCGAGCGAACGCGGAACAGGCCCGTG
ATTGTTGCAGAAGAAGCAGAACGGAATGGAAAGTCCGGCCATAGAGGGTGATAGCCCCTTATGCGTAGTG
TCTGTGATG
ADD COMMENTlink written 3.2 years ago by Kevin Blighe69k

In gffread, if I need only TCONS_00033653 and gene_id "XLOC_028018 ID. what is command I should use?

Actually, I used gffread -w ECvsEL_transcript.fasta -g v.corymbosum_GDV_reftransV1.fasta EC_vs_EL.gtf

But I get only TCONS_00033653 ID

Thank you so much kanjana

ADD REPLYlink written 15 months ago by worarado.kan10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1332 users visited in the last hour
_