I am currently working on RNA-seq data analysis using Leafcutter. During this process, I noticed that regtools requires the "-s" option. To understand its impact on the results, I conducted some tests. The "-s" option is intended to specify the strandness of the data. Ideally, the junction file (e.g., RNA.NA06984_CEU.chr1.bam_1.junc) should include only '+' or '-' annotations, depending on whether "-s 1" or "-s 2" is used. However, I am encountering unexpected outcomes where both strands are appearing in each file, which is perplexing (I used sample data on Leafcutter).
Additionally, I processed another BAM file similarly and observed similar outcomes. I am unsure whether these results indicate a problem or not. If my interpretation of regtools and strandness is incorrect, please advise.
The code I used is based on a sample code I found on a website (https://davidaknowles.github.io/leafcutter/articles/Usage.html). Here is the adapted code:
for bamfile in `ls run/geuvadis/*chr1.bam`; do
echo Converting $bamfile to $bamfile.junc
samtools index $bamfile
regtools junctions extract -s 1 -a 8 -m 50 -M 500000 $bamfile -o ${bamfile}_1.junc
regtools junctions extract -s 2 -a 8 -m 50 -M 500000 $bamfile -o ${bamfile}_2.junc
echo ${bamfile}_1.junc >> test_juncfiles.txt
echo ${bamfile}_2.junc >> test_juncfiles.txt
done
python ../leafcutter/clustering/leafcutter_cluster_regtools.py -j test_juncfiles.txt -m 50 -o testYRIvsEU -l 5000000
zcat testYRIvsEU_perind_numers.counts.gz | head -n10