Question

Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR aligner

0

Entering edit mode

5.8 years ago

akhan • 0

Hi,

I aligned some of my RNA_seq samples Using STAR aligner. The command I used to align the samples shown below:

#to create count files and splice junction files.
STAR --genomeDir ./mouse_genome_test --runThreadN 12 --runMode alignReads --outSAMtype BAM SortedByCoordinate Unsorted \
--outFilterMultimapNmax 1 --quantMode GeneCounts --twopassMode Basic \
--outFileNamePrefix ./star_A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC \
--readFilesIn A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC_R1_fastq.gz A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC_R2_fastq.gz \
--readFilesCommand zcat --sjdbGTFfile ./mouse_genome_test/Mus_musculus.GRCm38.95.gtf --sjdbOverhang 74

After alignment, several files were successfully generated including star_A04_162_164_415_BXD84_RwwJ_M_Cocaine_NAC.SJ.out.tab file. In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates. A snapshot of the data in the file looks like following. Is there any way I can Annotate each chromosome coordinates to a valid transcript ID. I greatly appreciate your help with this issue.

Note: Genome was generated using the following command

STAR --runMode genomeGenerate --genomeDir ./mouse_genome_test --genomeFastaFiles ./Mus_musculus.GRCm38.dna.primary_assembly.fa \
--sjdbGTFfile ./mouse_genome_test/Mus_musculus.GRCm38.95.gtf --sjdbOverhang 74


1   3144863 3207067 2   2   1   3   0   34
1   3207318 3213431 2   2   1   3   0   29
1   3207318 3213438 2   2   1   78  0   37
1   3207318 3213608 2   2   1   2   0   21
1   3207318 3215989 2   2   1   1   0   18

.

RNA-Seq genome alignment sequencing assembly • 1.4k views

ADD COMMENT • link updated 5.8 years ago by GenoMax 152k • written 5.8 years ago by akhan • 0

score 0 · Answer 1 · 2019-09-12

0

Entering edit mode

5.8 years ago

swbarnes2 15k

In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates.

The manual is pretty clear about what goes in SJ.out.tab, and its not transcript IDs.

Is there any way I can Annotate each chromosome coordinates to a valid transcript ID.

Probably some trickery with bedtools and your gtf, but why do you want to annotate the splice junction file?

ADD COMMENT • link 5.8 years ago by swbarnes2 15k

0

Entering edit mode

Hi,

Thanks for your reply. I have tried with bedtools, bedops and the gtf file to get transcript ID for the chromosome coordinate in sj.out.tab file, but it generates completely different coordinates for the bed files and using bedop tools I also tried to map bed file to gtf with no luck having same coordinate as sj.out.tab file. I was trying to use a package called SUPPA written in python to analyze splice junction using sj.out.tab file as input. But, it seems SUPPA requires annotation of chromosome coordinate. I am very new to this type of analysis(especially splice junction) . I would greatly appreciate input about how I should analyze this sj.out.tab file.

Note: I have about 256 files , it seems each files has different numbers of rows with differing chromosome coordinate position.

Thanks.

Arshad

From: swbarnes2 on Biostar [mailto:mailer@biostars.org] Sent: Thursday, September 12, 2019 2:34 PM To: Khan, Arshad Subject: [biostar] Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR aligner

Activity on a post you are following on Biostarhttps://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=L2qBAqqFS2cf-3WaSuyxhLBNT7uKxlvESvnZmpfUoCE&e=

User swbarnes2https://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org_u_4407_&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=XMohsp7b_z9P8BFjnTr4vHrMs1WKdP8WYUwNu0V-85c&e= wrote Answer: Annotae with Transcript IDs for the chromosome coordinates in SJ.ou.tab files generated bu STAR alignerhttps://urldefense.proofpoint.com/v2/url?u=http-3A__www.biostars.org_p_398523_-23398528&d=DwMCaQ&c=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y&r=VpiQRjJKtDSM8l_g-Hu6lMwZJt51nM5McumWP-Tb7LU&m=rNR2mkIb1_m1cJu6Sn5PY1UgsceXyMGUEtd2O0CtHbM&s=qDrRPMtYMnsOGzmJeStP8G0PXQha1txOihSvXCl53u4&e=:

In the SJ.out.tab file I don't see any transcript IDs along with chromosome coordinates.

The manual is pretty clear about what goes in SJ.out.tab, and its not transcript IDs.

Is there any way I can Annotate each chromosome coordinates to a valid transcript ID.

Probably some trickery with bedtools and your gtf, but why do you want to annotate the splice junction file?

ADD REPLY • link 5.8 years ago by akhan • 0