Question

Analyzing types of Junctions from BAM and GTF files

0

Entering edit mode

7.9 years ago

luisccleto ▴ 30

Hello, I'm rather new to this field so I'm mostly lost and looking for pointers in the right direction regarding information and documentation sources.

My end goal is to be able to determine the impact of (aberrant) alternative splicing events on the transcriptome of an individual. To do this I have come up with the following steps:

From a reference GTF file extract all unique junctions. (.gtf file from gencodegenes)
From a set of mapped reads, extract the splice-reads and junction information (also relevant to record the number of reads per junction for filtering purposes) (this is a .bam file with mapped reads from ENCODE)
Based on reads and junctions, discern the following situations: annotated junction, exon-skipping, alternative 3', alternative 5' and novel junctions

However, I'm not familiar with the bioinformatics field at all and I'm at a loss for which technology/file types to use for each of the steps I'm considering. I apologize if my question is improper or goes out of scope for what should be asked here but I've been pulling hairs for the past couple of weeks and arriving nowhere. (if this is the case I'd appreciate a pointer towards where to start learning more on this field)

RNA-Seq junctions splicing tophat samtools • 4.2k views

ADD COMMENT • link 7.9 years ago by luisccleto ▴ 30

score 2 · Answer 1 · 2016-05-26

2

Entering edit mode

7.9 years ago

Jeffin Rockey ★ 1.3k

Alignment with STAR (https://github.com/alexdobin/STAR) will output a SJ.out.tab which should help with point 1 and 2. Please see the section 4.4 of STAR manual available here (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) especially the columns 6,7 and 8 specified.

ADD COMMENT • link 7.9 years ago by Jeffin Rockey ★ 1.3k

0

Entering edit mode

So, if I understand correctly, running STAR with --sjdbGTFfile pointing to my GTF and --inputBAMfile to my BAM file should generate the output file I want for steps 1 and 2? Could I also generate a .bed with the unique junctions from the gtf using tophat's gtf_juncs and feed it to STAR using the --sjdbFileChrStartEnd, achieving the same using --outSJfilterReads=Unique?

I'm still browsing through the manual and the colossal ammounts of options available so I haven't been able to install STAR and play with it yet but it seems very promising. Thank you for the quick reply!

ADD REPLY • link 7.9 years ago by luisccleto ▴ 30

0

Entering edit mode

What I was suggesting was some thing like below. 1) From the github page mentioned , go the releases section , and download the latest release. 2) Go to your environment which I presume as a linux distribution. 3) tar -xvzf should do the extraction . 4) The yourSTARLocation/bin/Linux_x86_64_static/STAR can then be used directly. 5) To start with, first create genome index (a one time activity) and subsequently do alignment using the index. 6) For genome index creation, I would suggest you to start with using the options in (section 2.1 Basic options). In case you need to limit RAM usage make use of --limitGenomeGenerateRAM in also. 7) Then do alignments using options in sections 3.1 Basic options 8) STAR is quite fast and unless the data is extremely big, alignment should be over in few hours at max using multiple threads. 9) Then go through the SJ.out.tab produced keeping in mind the explanations of columns 6,7,8 from manual. 10) Once you reach this far , then add in more options that may find required for you study. 11) https://groups.google.com/forum/#!forum/rna-star will be of great help if keep using STAR. Specific options and their utilities are frequently discussed there .

But, if you have aligned bam files only instead of raw reads, may be my answers wont be useful to you.I haven't used STAR with bam though I suppose it has some options for that as well. If that's the case , please specify what all you have as inputs which would help you in getting a more helpful answer.

ADD REPLY • link 7.9 years ago by Jeffin Rockey ★ 1.3k

0

Entering edit mode

My inputs are an annotation (.gtf) file and file with mapped reads (.bam file from ENCODE)

ADD REPLY • link 7.9 years ago by luisccleto ▴ 30