Making GTF File
0
0
Entering edit mode
2.1 years ago

So I am trying to analyze some data for aberrant splicing following a previously published protocol. In it they describe making a custom GTF file based on both the downloadable UCSC gtf and their seq data to use with rMATS. I have used STAR and GRCh38 to align my reads as well as generated gtf files with StringTie from the BAM files, but I am unsure how to concatenate the ensemble gtf and my StringTie gtfs to use with rMATS. I would greatly appreciate any advice in this regard. The format for the StringTie output is:

seqname source      feature     start   end     score   strand  frame attributes
      chrX    StringTie   transcript  281394  303355  1000    +       .     gene_id "ERR188044.1"; transcript_id "ERR188044.1.1"; reference_id "NM_018390"; ref_gene_id "NM_018390"; ref_gene_name "PLCXD1"; cov "101.256691"; FPKM "530.078918"; TPM "705.667908";
      chrX    StringTie   exon        281394  281684  1000    +       .     gene_id "ERR188044.1"; transcript_id "ERR188044.1.1"; exon_number "1"; reference_id "NM_018390"; ref_gene_id "NM_018390"; ref_gene_name "PLCXD1"; cov "116.270836";
rMATS StringTie splicing STAR • 817 views
ADD COMMENT
0
Entering edit mode

You may want to include some more information on what that "previously published protocol" exactly does, especially with respect to your gtf, and what you did, what gtf you download is, etc. With the information you provide, at least I'd have to do a lot of guess work.

Classic problems with gtfs/gffs are the column 1 sequence identifiers don't match the expected format, for example UCSC and NCBI identifiers for the human genome. In case that's your issue vkkodali_ncbi 's chtreepo might help you

ADD REPLY
0
Entering edit mode

From Dolatshad, et al Leukemia(2016) reads were aligned using STAR29 against the human genome assembly (NCBI build37 (hg19) UCSC transcripts). Non-uniquely mapped reads and reads that were identified as PCR duplicates using Samtools30 were discarded. The aligned reads were reconstructed into transcripts using Cufflinks31 and were then merged into a single assembly, along with known isoforms from the NCBI build37 (hg19) UCSC transcripts. This reference-guided assembly was then used as the transcripts annotation by rMATS

I download is the hsGRCH38_HapScaf.gtf.

ADD REPLY

Login before adding your answer.

Traffic: 3063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6