Exon count and GC content
0
0
Entering edit mode
6 months ago
Varsha • 0

I have a list of novel transcripts after filtering and also the gtf file and fasta file of these novel transcripts.. I also have their chromosome locations. How do I calculate their GC content and find out their exon count? How do I proceed with it? this is how my fasta file looks:

>NET.31408.1 loc:17|336737-341315|+ exons:336737-337034,338681-339015,339449-339828,341216-341315 segs:1-298,299-633,634-1013,1014-1113
CACAAACCACCCTTGCATTCTGCCAGCCACCCCTCTGCAAAACTGGTGCTCCAGCCACCAGGACCTTGAG
GCATTGCAGCCGCCCAGCCTTGTCTCCGGCACCCTCTCACCTGGAACGCCTTTGGTTCACGCTGTCTACC
TCCTCCACCTGGGGGTGGCCCAGCACCACCTCCCCTGGGAATTCTCCAGCTCCTCCCATCAGGCTCCCAT
TCGGCTTGAGCCCACCGCCCTCCCGTCAGCATTTCATTCCGCCCGCATCCTCGGGGGCATTTACCTGTTA
CCCCGATGCCCAGACATGAAACGCAGGCCTGCTTCCATTTACGTGATGTATGTGGGTCTGGAAAGTCCCA
GGCAGGCAGAATCTTTGCAGAGGAAACCTGATTTCGGCTCCCACCTGGGAACTGCTTGTTGAAGGAGCCC
AAGAGAAACCTCTCCATGAAGCAGAGAAGCTTCTAGGGAAAAAGAAGCCTCAACCCTCCTCACCCGCTTG
GAAAAGGCCCAGTCCTCAGGTGTGCTGAGGGCGGTGCTCCAGGCCCCGGGGGGCAGCGTCCCACACCCCT
GCCTCCGCCAGCAGCTTCTGCACGGCCCAGCCCAGACTCCAGCTCCCAGGTGGCTCTCCGCGGGTCCTGC
CAGCCTGACCCTGCACTACCAAACTGGGAGAGGAAGAAGCCGCCTCCATGGGTGCTGCCCACCTGCCAGG
TGCCCGCCACTGGCTGACCAACTGGAATCATCACAAGCCCCAGAGGACGACGTGATCATCACTCCTTTCA
GAAAAGAAGAAACCAGCTCGAGAGGGGCAGCCACGTGCCCAAGGCCCCATAAGCTGGCACCAGGTGCCCA
GTTTGGCCCAACGGAGCTGGGCTGAGCCCAGGTGCTTTCTATCCCCCTCCTCCTCCCAAGGCGTCGGGTT
GCAGGTGCGGTGCCTACAGGTGCCTAACGAAAGCAATGAGCCGGGTATTCTCCGAGCACCTGCCACACAC
CCAGCAGCGGGGAGCACAGAGTTCCCAGAAACTAACTTCGCAGTTTCTGGTGAACGTGTCTGACCTCTCC
TACTGGACCAAACACTCCCTCAGAGCAGGATGCCTCCTGCCCATATGGTACTGAAAACTGTGG
>NET.31408.2 loc:17|336737-342935|+ exons:336737-337034,338681-339015,342853-342935 segs:1-298,299-633,634-716
CACAAACCACCCTTGCATTCTGCCAGCCACCCCTCTGCAAAACTGGTGCTCCAGCCACCAGGACCTTGAG
GCATTGCAGCCGCCCAGCCTTGTCTCCGGCACCCTCTCACCTGGAACGCCTTTGGTTCACGCTGTCTACC
TCCTCCACCTGGGGGTGGCCCAGCACCACCTCCCCTGGGAATTCTCCAGCTCCTCCCATCAGGCTCCCAT
TCGGCTTGAGCCCACCGCCCTCCCGTCAGCATTTCATTCCGCCCGCATCCTCGGGGGCATTTACCTGTTA
CCCCGATGCCCAGACATGAAACGCAGGCCTGCTTCCATTTACGTGATGTATGTGGGTCTGGAAAGTCCCA
GGCAGGCAGAATCTTTGCAGAGGAAACCTGATTTCGGCTCCCACCTGGGAACTGCTTGTTGAAGGAGCCC
AAGAGAAACCTCTCCATGAAGCAGAGAAGCTTCTAGGGAAAAAGAAGCCTCAACCCTCCTCACCCGCTTG
GAAAAGGCCCAGTCCTCAGGTGTGCTGAGGGCGGTGCTCCAGGCCCCGGGGGGCAGCGTCCCACACCCCT
GCCTCCGCCAGCAGCTTCTGCACGGCCCAGCCCAGACTCCAGCTCCCAGGTGGCTCTCCGCGGGTCCTGC
CAGAGAATTTATAGAGTCTCATTGACCAACCAGCCAGACATGATGCTAATCTGGGTTCCAAAAACAAGAA
ACACCACGACAGATCA

and the GTF file:

17  StringTie   transcript  6670990 6676823 1000    -   .   gene_id "NET.31822"; transcript_id "NET.31822.1"; 
17  StringTie   exon    6670990 6671996 1000    -   .   gene_id "NET.31822"; transcript_id "NET.31822.1"; exon_number "1"; 
17  StringTie   exon    6676715 6676823 1000    -   .   gene_id "NET.31822"; transcript_id "NET.31822.1"; exon_number "2"; 
8   StringTie   transcript  140349489   140350371   1000    +   .   gene_id "NET.78699"; transcript_id "NET.78699.2"; 
8   StringTie   exon    140349489   140349957   1000    +   .   gene_id "NET.78699"; transcript_id "NET.78699.2"; exon_number "1"; 
8   StringTie   exon    140350234   140350371   1000    +   .   gene_id "NET.78699"; transcript_id "NET.78699.2"; exon_number "2"; 
3   StringTie   transcript  14136345    14137669    1000    +   .   gene_id "NET.53089"; transcript_id "NET.53089.5"; 
3   StringTie   exon    14136345    14136680    1000    +   .   gene_id "NET.53089"; transcript_id "NET.53089.5"; exon_number "1"; 
3   StringTie   exon    14137357    14137669    1000    +   .   gene_id "NET.53089"; transcript_id "NET.53089.5"; exon_number "2"; 
20  StringTie   transcript  58657036    58659388    1000    -   .   gene_id "NET.49267"; transcript_id "NET.49267.2"; 
20  StringTie   exon    58657036    58657089    1000    -   .   gene_id "NET.49267"; transcript_id "NET.49267.2"; exon_number "1"; 
20  StringTie   exon    58658947    58659388    1000    -   .   gene_id "NET.49267"; transcript_id "NET.49267.2"; exon_number "2"; 
8   StringTie   transcript  79826214    79827927    1000    -   .   gene_id "NET.77436"; transcript_id "NET.77436.1"; 
8   StringTie   exon    79826214    79826726    1000    -   .   gene_id "NET.77436"; transcript_id "NET.77436.1"; exon_number "1"; 
8   StringTie   exon    79827716    79827927    1000    -   .   gene_id "NET.77436"; transcript_id "NET.77436.1"; exon_number "2"; 
17  StringTie   transcript  336737  341315  1000    +   .   gene_id "NET.31408"; transcript_id "NET.31408.1"; 
17  StringTie   exon    336737  337034  1000    +   .   gene_id "NET.31408"; transcript_id "NET.31408.1"; exon_number "1"; 
17  StringTie   exon    338681  339015  1000    +   .   gene_id "NET.31408"; transcript_id "NET.31408.1"; exon_number "2"; 
17  StringTie   exon    339449  339828  1000    +   .   gene_id "NET.31408"; transcript_id "NET.31408.1"; exon_number "3"; 
17  StringTie   exon    341216  341315  1000    +   .   gene_id "NET.31408"; transcript_id "NET.31408.1"; exon_number "4"; 
3   StringTie   transcript  171240150   171244033   1000    +   .   gene_id "NET.56472"; transcript_id "NET.56472.1"; 
3   StringTie   exon    171240150   171240201   1000    +   .   gene_id "NET.56472"; transcript_id "NET.56472.1"; exon_number "1"; 
3   StringTie   exon    171243772   171244033   1000    +   .   gene_id "NET.56472"; transcript_id "NET.56472.1"; exon_number "2"; 
2   StringTie   transcript  8701295 8702421 1000    -   .   gene_id "NET.41416"; transcript_id "NET.41416.1"; 
Exon-count GC-content • 417 views
ADD COMMENT
0
Entering edit mode

I have difficulty reading this (format lost in translation?), but unless your trying to do this programmatically as an exercise, I would just use the available tools, maybe seqkit, fastqc, subread, or bedtools?

ADD REPLY

Login before adding your answer.

Traffic: 1390 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6