Question: Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR aligner
0
gravatar for akhan
8 weeks ago by
akhan0
United States
akhan0 wrote:

Hi,

I aligned some of my RNA_seq samples Using STAR aligner. The command I used to align the samples shown below:

#to create count files and splice junction files.
STAR --genomeDir ./mouse_genome_test --runThreadN 12 --runMode alignReads --outSAMtype BAM SortedByCoordinate Unsorted \
--outFilterMultimapNmax 1 --quantMode GeneCounts --twopassMode Basic \
--outFileNamePrefix ./star_A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC \
--readFilesIn A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC_R1_fastq.gz A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC_R2_fastq.gz \
--readFilesCommand zcat --sjdbGTFfile ./mouse_genome_test/Mus_musculus.GRCm38.95.gtf --sjdbOverhang 74

After alignment, several files were successfully generated including star_A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC.SJ.out.tab file. In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates. A snapshot of the data in the file looks like following. Is there any way I can Annotate each chromosome coordinates to a valid transcript ID. I greatly appreciate your help with this issue.

Note: Genome was generated using the following command

STAR --runMode genomeGenerate --genomeDir ./mouse_genome_test --genomeFastaFiles ./Mus_musculus.GRCm38.dna.primary_assembly.fa \
--sjdbGTFfile ./mouse_genome_test/Mus_musculus.GRCm38.95.gtf --sjdbOverhang 74


1   3144863 3207067 2   2   1   3   0   34
1   3207318 3213431 2   2   1   3   0   29
1   3207318 3213438 2   2   1   78  0   37
1   3207318 3213608 2   2   1   2   0   21
1   3207318 3215989 2   2   1   1   0   18

.

ADD COMMENTlink modified 8 weeks ago by genomax74k • written 8 weeks ago by akhan0
0
gravatar for swbarnes2
8 weeks ago by
swbarnes26.9k
United States
swbarnes26.9k wrote:

In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates.

The manual is pretty clear about what goes in SJ.out.tab, and its not transcript IDs.

Is there any way I can Annotate each chromosome coordinates to a valid transcript ID.

Probably some trickery with bedtools and your gtf, but why do you want to annotate the splice junction file?

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by swbarnes26.9k

Hi,

Thanks for your reply. I have tried with bedtools, bedops and the gtf file to get transcript ID for the chromosome coordinate in sj.out.tab file, but it generates completely different coordinates for the bed files and using bedop tools I also tried to map bed file to gtf with no luck having same coordinate as sj.out.tab file. I was trying to use a package called SUPPA written in python to analyze splice junction using sj.out.tab file as input. But, it seems SUPPA requires annotation of chromosome coordinate. I am very new to this type of analysis(especially splice junction) . I would greatly appreciate input about how I should analyze this sj.out.tab file.

Note: I have about 256 files , it seems each files has different numbers of rows with differing chromosome coordinate position.

Thanks.

Arshad

From: swbarnes2 on Biostar [mailto:mailer@biostars.org] Sent: Thursday, September 12, 2019 2:34 PM To: Khan, Arshad Subject: [biostar] Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR aligner

Activity on a post you are following on Biostarhttps://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=L2qBAqqFS2cf-3WaSuyxhLBNT7uKxlvESvnZmpfUoCE&e=

User swbarnes2https://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org_u_4407_&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=XMohsp7b_z9P8BFjnTr4vHrMs1WKdP8WYUwNu0V-85c&e= wrote Answer: Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR alignerhttps://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org_p_398523_-23398528&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=qDrRPMtYMnsOGzmJeStP8G0PXQha1txOihSvXCl53u4&e=:

In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates.

The manual is pretty clear about what goes in SJ.out.tab, and its not transcript IDs.

Is there any way I can Annotate each chromosome coordinates to a valid transcript ID.

Probably some trickery with bedtools and your gtf, but why do you want to annotate the splice junction file?

ADD REPLYlink written 8 weeks ago by akhan0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1197 users visited in the last hour