What Is The Most Direct Way To Extract The Splice Junctions From A Sam File?
2
1
Entering edit mode
11.3 years ago

Hi!

I need to get the introns coordinates (chr:strat-end and the strand) of the spliced reads wich are in SAM file.

I have no experience with this kind of format, so I plan to transform the SAMs files to BED12 files with this pipe line:

samtools view -bS -o file.bam file.sam && bamToBed -bed12 -i file.bam > file.bed12


And then, use galaxy or own python script (which I haven't wrote yet) to extract the introns from the spliced reads.

The problem with this method is weight of the files... it'll take some time to convert to BED12 and then extract the introns...

Do you know a more direct way to extract the introns coordinates from the SAMs files?

sam intron rna • 11k views
0
Entering edit mode

The intron coordinates aren't really there, if anything, they are implicit in (many of) the alignments. Are you sure you want to roll your own instead of running tophat or mgene or similar?

0
Entering edit mode

Yes, I know there are implicit in bed12... But I'm not worry about that, because I wrote a scrip to extract introns coordinates from psl alignment format, so I'll try to adapt to bed12 inputs, or do as brentp proposes...

16
Entering edit mode
11.3 years ago
brentp 24k

It's not clear exactly what you want to do, but you can extract only the spliced alignments with:

samtools view -S file.sam | awk '($6 ~ /N/)'  and you may be able to pipe that command directly to bamToBed as: samtools view -hS file.sam | awk '($6 ~ /N/)' \
| samtools view -bS - \
| bamToBed -bed12 > file.bed12


From there, you can get introns by subtracting exons:

so:

bed12ToBed6 -i file.bed12 > file.bed6
subtractBed -a file.bed12 -b file.bed6 -s | cut -f 1-6 > file.introns.bed

1
Entering edit mode

Geparada, I updated the post to get you all the way to a file with the introns.

0
Entering edit mode

ha thanks Brent, I didn't know this syntax for awk.

0
Entering edit mode

Hi brentp!... I not only want to get the spliced reads... I want to get the genomics coordinates of the introns in those reads, I mean where they are in the genome (chromosome, start, end and strand). From a bed12 you can easily extract this information, but I wonder if are a script to extract direct from the SAM file.

0
Entering edit mode

0
Entering edit mode

I tried convert the SAMs files to bed12 as you propose and I noticed that the samtools view -bS - need the SAM's header... so the -h flag must be present in samtools view -S file.sam to to keep the headers... thanks you so much for your help!

0
Entering edit mode

I tried convert the SAMs files to bed12 as you propose and I noticed that the [samtools view -bS -] need the SAM's header... so the -h flag must be present in [samtools view -S file.sam] to to keep the headers... thanks you so much for your help!!

0
Entering edit mode

I added the -h as you suggested.

1
Entering edit mode
11.3 years ago
Malcolm.Cook ★ 1.3k

if you swing with R, a slight change to my post to account for strandedness should get you there...