Question: Analyzing types of Junctions from BAM and GTF files
0
gravatar for luisccleto
4.0 years ago by
luisccleto30
luisccleto30 wrote:

Hello, I'm rather new to this field so I'm mostly lost and looking for pointers in the right direction regarding information and documentation sources.

My end goal is to be able to determine the impact of (aberrant) alternative splicing events on the transcriptome of an individual. To do this I have come up with the following steps:

  1. From a reference GTF file extract all unique junctions. (.gtf file from gencodegenes)
  2. From a set of mapped reads, extract the splice-reads and junction information (also relevant to record the number of reads per junction for filtering purposes) (this is a .bam file with mapped reads from ENCODE)
  3. Based on reads and junctions, discern the following situations: annotated junction, exon-skipping, alternative 3', alternative 5' and novel junctions

However, I'm not familiar with the bioinformatics field at all and I'm at a loss for which technology/file types to use for each of the steps I'm considering. I apologize if my question is improper or goes out of scope for what should be asked here but I've been pulling hairs for the past couple of weeks and arriving nowhere. (if this is the case I'd appreciate a pointer towards where to start learning more on this field)

ADD COMMENTlink modified 4.0 years ago • written 4.0 years ago by luisccleto30
2
gravatar for Jeffin Rockey
4.0 years ago by
Jeffin Rockey1.1k
Karimannoor
Jeffin Rockey1.1k wrote:

Alignment with STAR (https://github.com/alexdobin/STAR) will output a SJ.out.tab which should help with point 1 and 2. Please see the section 4.4 of STAR manual available here (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf) especially the columns 6,7 and 8 specified.

ADD COMMENTlink written 4.0 years ago by Jeffin Rockey1.1k

So, if I understand correctly, running STAR with --sjdbGTFfile pointing to my GTF and --inputBAMfile to my BAM file should generate the output file I want for steps 1 and 2? Could I also generate a .bed with the unique junctions from the gtf using tophat's gtf_juncs and feed it to STAR using the --sjdbFileChrStartEnd, achieving the same using --outSJfilterReads=Unique?

I'm still browsing through the manual and the colossal ammounts of options available so I haven't been able to install STAR and play with it yet but it seems very promising. Thank you for the quick reply!

ADD REPLYlink written 4.0 years ago by luisccleto30

What I was suggesting was some thing like below. 1) From the github page mentioned , go the releases section , and download the latest release. 2) Go to your environment which I presume as a linux distribution. 3) tar -xvzf should do the extraction . 4) The yourSTARLocation/bin/Linux_x86_64_static/STAR can then be used directly. 5) To start with, first create genome index (a one time activity) and subsequently do alignment using the index. 6) For genome index creation, I would suggest you to start with using the options in (section 2.1 Basic options). In case you need to limit RAM usage make use of --limitGenomeGenerateRAM in also. 7) Then do alignments using options in sections 3.1 Basic options 8) STAR is quite fast and unless the data is extremely big, alignment should be over in few hours at max using multiple threads. 9) Then go through the SJ.out.tab produced keeping in mind the explanations of columns 6,7,8 from manual. 10) Once you reach this far , then add in more options that may find required for you study. 11) https://groups.google.com/forum/#!forum/rna-star will be of great help if keep using STAR. Specific options and their utilities are frequently discussed there .

But, if you have aligned bam files only instead of raw reads, may be my answers wont be useful to you.I haven't used STAR with bam though I suppose it has some options for that as well. If that's the case , please specify what all you have as inputs which would help you in getting a more helpful answer.

ADD REPLYlink written 4.0 years ago by Jeffin Rockey1.1k

My inputs are an annotation (.gtf) file and file with mapped reads (.bam file from ENCODE)

ADD REPLYlink written 4.0 years ago by luisccleto30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1199 users visited in the last hour