Question: featureCounts junctions giving me NA for junction strand
0
gravatar for tassa.saldi
12 days ago by
tassa.saldi0 wrote:

Hi all,

I am trying to count cryptic splicing junctions using the featureCounts -J option. The program is working great in that I am getting the same number of junctions in the output file as I see on the genome browser screen. However, a large number of my junction counts have a strand designation of "NA". I have checked individual junctions and they are not of mixed strand (i.e. all the junctions spanning a specific interval have the same orientation). The junctions I need (and the ones missing strand information) and not annotated in the GTF file. Does anyone have a fix for this problem or know what causes it?

Here is the command I am using:

featureCounts -Q 1 -J -G  genome.fa --splitOnly --primary -a ~/Datafiles/human/genes.gtf -o $bam_file\_junctions $bam_file\_filtered_sorted

My libraries were made using the stranded RNAseq kit from Kapa (which give the reverse orientation to the actual transcript) and are paired-end 150bp reads. The RNA used as input was isolated nascent RNA- hence the high level of cryptic/novel splice junctions. An example from the output I am seeing is below with the problem entries in bold:

chr1    881666  881782  ENSG00000188976 33      -/
chr1    881925  883511  ENSG00000188976 20      -/
**chr1    882895  882986  ENSG00000188976 1       NA**/
chr1    883612  883870  ENSG00000188976 33      -/
chr1    883983  886507  ENSG00000188976 29      -/
chr1    886618  887380  ENSG00000188976 22      -/
chr1    887519  887792  ENSG00000188976 20      -/
chr1    887980  888555  ENSG00000188976 27      -/
chr1    888668  889162  ENSG00000188976 15      -/
chr1    889272  889384  ENSG00000188976 14      -/
chr1    889462  891303  ENSG00000188976 14      -/
chr1    889903  891303  ENSG00000188976 2       -/
chr1    891393  891475  ENSG00000188976 24      -/
chr1    891595  892274  ENSG00000188976 43      -/
chr1    892405  892479  ENSG00000188976 35      -/
chr1    892653  894309  ENSG00000188976 20      -/
chr1    896180  896673  ENSG00000187961 2       +/
chr1    896180  897009  ENSG00000187961 1       +/
**chr1    896217  896265  ENSG00000187961 1       NA**/
**chr1    896932  897009  ENSG00000187961 10      NA**/
chr1    897130  897206  ENSG00000187961 7       +/
chr1    897851  898084  ENSG00000187961 11      +/
chr1    898293  898489  ENSG00000187961 3       +/
chr1    898297  898412  ENSG00000187961 3       +/
chr1    898297  898489  ENSG00000187961 8       +/
chr1    898633  898717  ENSG00000187961 2       +/
chr1    898884  899300  ENSG00000187961 4       +/
chr1    899388  899487  ENSG00000187961 7       +/
chr1    899547  899729  ENSG00000187961 10      +/
chr1    899560  899729  ENSG00000187961 7       +/
chr1    899910  900343  ENSG00000187961 18      +/
**chr1    900940  901053  ENSG00000187961 1       NA**/

Thanks!

rna-seq • 101 views
ADD COMMENTlink modified 10 days ago • written 12 days ago by tassa.saldi0

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 11 days ago by RamRS24k

Nice. I didn't know about the code option; I'll use that in the future. Thanks so much.

ADD REPLYlink written 11 days ago by tassa.saldi0
0
gravatar for tassa.saldi
10 days ago by
tassa.saldi0 wrote:

Just as an update for this post, I decided to filter the bam file for only unpaired reads (they were a mix and I was afraid that was the problem). I then filtered for reads only on the positive strand (so the only thing that could be in the file are (+) reads), but I am still getting several NAs and several (-) values for the strand. Now I am really confused. Has anyone struggled with this before?

samtools view -bT ~/Datafiles/human/genome.fa -bq 1 -F 0X2 WT.bam > WT_unpaired.bam
samtools view -bT ~/Datafiles/human/genome.fa -f 16 WT_unpaired.bam > WT_unpaired_pos.bam
samtools view -bT ~/Datafiles/human/genome.fa -F 16 WT_unpaired.bam > WT_unpaired_neg.bam
samtools sort WT_unpaired_pos.bam > WT_unpaired_pos_sorted.bam
samtools sort WT_unpaired_neg.bam > WT_unpaired_neg_sorted.bam
samtools index WT_unpaired_pos_sorted.bam
samtools index WT_unpaired_neg_sorted.bam

Out put of featureCounts -J on WT_unpaired_pos_sorted.bam:

chr1    896932  897009  ENSG00000187961 2       NA
chr1    897130  897206  ENSG00000187961 2       +
chr1    897851  898084  ENSG00000187961 9       +
chr1    898293  898489  ENSG00000187961 2       +
chr1    898297  898412  ENSG00000187961 2       +
chr1    898297  898489  ENSG00000187961 3       +
chr1    898633  898717  ENSG00000187961 2       +
chr1    898884  899300  ENSG00000187961 3       +
chr1    899388  899487  ENSG00000187961 2       +
chr1    899547  899729  ENSG00000187961 5       +
chr1    899560  899729  ENSG00000187961 2       +
chr1    899910  900343  ENSG00000187961 10      +
chr1    900940  901053  ENSG00000187961 1       NA
chr1    906588  906704  ENSG00000187583 1       +
chr1    916992  942254  ENSG00000187642 1       -
chr1    916992  972576  ENSG00000188157 1       -
chr1    934812  934906  ENSG00000188290 1       -
chr1    936643  937587  NA      1       +
chr1    938371  996602  ENSG00000217801 1       NA
chr1    941833  941921  NA      1       +
chr1    942714  943398  NA      3       NA
chr1    943659  943850  NA      1       +
chr1    943659  943951  NA      1       +
chr1    948956  949364  ENSG00000187608 1       +
ADD COMMENTlink written 10 days ago by tassa.saldi0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1274 users visited in the last hour