Question: Obtaining Fastas From .Gtf File For Splicing Variants
0
gravatar for Anima Mundi
5.8 years ago by
Anima Mundi2.4k
Italy
Anima Mundi2.4k wrote:

Hello, I have a RNASeq.gtf file containing splicing variants of a long series of genes. I would like to obtain:

  • a) a text file listing all the spliced FASTA sequences for every variant;
  • b) a text file listing all the common (between splicing variants) spliced FASTA sequences for every gene.

For the point a) I fixed the input file format for the UCSC TableBrowser, I uploaded it as a custom track, I downloaded all the subregions of the track listed as exons on UCSC Table Browser. Even if the overall results appear fine, some sequences (once BLATed at Ensembl) appear strongly 3'-truncated. Could it just be essentially due to inaccuracies of the RNASeq file?

For the point b) I was thinking that somehow extracting a consensus from the .gtf file would basically output a list of all the common (between splicing variants) unspliced FASTA sequences for every gene (one way would probably be to use SamTools, but currently I do not know how to do this). Repeating the exon extraction as done for the point a), if correct, would give me the b) list.

In summary, I am asking:

  • is the approach I am using valid? Are there better alternatives?
  • how to extract a consensus file from a .gtf file?

Thanks in advance.

gtf splicing samtools rna-seq • 1.9k views
ADD COMMENTlink modified 5.8 years ago by Giovanni M Dall'Olio26k • written 5.8 years ago by Anima Mundi2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1334 users visited in the last hour