Hello,
I've run featurecounts and have around ~40% unassigned_fragment
. My reads are paired end 150bp and aligned with STAR (~90-93%
uniquely successfully aligned) and a average read input length of ~299 for all my samples. Here are the conditions I've run for featurecounts and the summary output.
featureCounts \
-p --countReadPairs -B -P \
-F "GTF" \
-J \
-C \
-T 16 \
-g gene_id \
-t exon \
-a $annotation_file \
--extraAttributes "gene_type" \
-o processing/counts/output_file_name \
processing/mapping/star/*.bam
Summary output:
Assigned 14367477 13814907 15921754 14159227 13658284 14979822 15491455 13762178 15848496 13407749 14694807 13113942
Unassigned_Unmapped 0 0 0 0 0 0 0 0 0 0
Unassigned_Read_Type 0 0 0 0 0 0 0 0 0 0
Unassigned_Singleton 0 0 0 0 0 0 0 0 0 0
Unassigned_MappingQuality 0 0 0 0 0 0 0 0 0
Unassigned_Chimera 0 0 0 0 0 0 0 0 0 0
Unassigned_FragmentLength 11617180 10800353 12659685 11053441 11040810 9806741 12528930 11007146
11982978 10008445 11223062 9763096
Unassigned_Duplicate 0 0 0 0 0 0 0 0 0 0
Unassigned_MultiMapping 0 0 0 0 0 0 0 0 0 0
I know that the Unassigned_FragmentLength
could result in the fragments being >600 and <50 from the default settings, but based on the bam summary that is not the case? What else could cause this to happen and how might I resolve it?
Please let me know if there is any other information that would be helpful to resolve this!
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or use one of (a) the option highlighted in the image below/ (b) fenced code blocks for multi-line code. Fenced code blocks are useful in syntax highlighting. If your code has long lines with a single command, break those lines into multiple lines with proper escape sequences so they're easier to read and still run when copy-pasted. I've done it for you this time.Run without the
-P
flag, does this then include the problematic alignments?