Question

Decent Uniquely mapped reads but no features

0

Entering edit mode

4 months ago

ju_ra • 0

Hi guys,

I am analysing a mRNASeq dataset of extracellular vesicles. I get decent results for mapping using STAR

    Number of input reads | 32143093
                      Average input read length |   100
                                    UNIQUE READS:
                   Uniquely mapped reads number |   22862852
                        Uniquely mapped reads % |   71.13%
                          Average mapped length |   103.01
                       Number of splices: Total |   269874
            Number of splices: Annotated (sjdb) |   125578
                       Number of splices: GT/AG |   235728
                       Number of splices: GC/AG |   4253
                       Number of splices: AT/AC |   243
               Number of splices: Non-canonical |   29650
                      Mismatch rate per base, % |   0.42%
                         Deletion rate per base |   0.02%
                        Deletion average length |   1.79
                        Insertion rate per base |   0.01%
                       Insertion average length |   1.64
                             MULTI-MAPPING READS:
        Number of reads mapped to multiple loci |   4125490
             % of reads mapped to multiple loci |   12.83%
        Number of reads mapped to too many loci |   575277
             % of reads mapped to too many loci |   1.79%
                                  UNMAPPED READS:
  Number of reads unmapped: too many mismatches |   0
       % of reads unmapped: too many mismatches |   0.00%
            Number of reads unmapped: too short |   4016547
                 % of reads unmapped: too short |   12.50%
                Number of reads unmapped: other |   562927
                     % of reads unmapped: other |   1.75%
                                  CHIMERIC READS:
                       Number of chimeric reads |   0
                            % of chimeric reads |   0.00%

But after FeatureCounts most of them are not assigned to any feature.

Assigned    1766684
Unassigned_Unmapped 5154751
Unassigned_Read_Type    0
Unassigned_Singleton    0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 22295015
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   20977810
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    118358

I never run into this issue. Do you have any idea what is a possible explanation?

STAR featureCounts • 688 views

ADD COMMENT • link updated 4 months ago by Ram 43k • written 4 months ago by ju_ra • 0

0

Entering edit mode

Please show us your STAR and featureCounts commands.

ADD REPLY • link 4 months ago by Ram 43k

0

Entering edit mode

STAR was run on the Galaxy Server as well as feature counts in standard settings.

STAR  --runThreadN ${GALAXY_SLOTS:-4} --genomeLoad NoSharedMemory --genomeDir tempstargenomedir   --readFilesIn '/data/dnb09/galaxy_db/files/6/6/2/dataset_6629b28a-9531-4be8-b3e8-1f6be2dc35b8.dat'   --readFilesCommand zcat   --outSAMtype BAM SortedByCoordinate  --twopassMode None ''  --quantMode -   --outSAMattrIHstart 1 --outSAMattributes NH HI AS nM ch  --outSAMprimaryFlag OneBestScore  --outSAMmapqUnique 60   --outSAMunmapped Within    --outBAMsortingThreadN ${GALAXY_SLOTS:-4} --outBAMsortingBinsN 50 --winAnchorMultimapNmax 50 --limitBAMsortRAM $((${GALAXY_MEMORY_MB:-0}*1000000))

featureCounts  -a '/data/dnb05/galaxy_db/files/4/f/e/dataset_4fe52113-e454-4b36-97aa-085252740c72.dat' -F "GTF"  -o "output" -T ${GALAXY_SLOTS:-2}  -s  0  -Q  0     -t 'exon' -g 'gene_id'            --minOverlap  1 --fracOverlap 0 --fracOverlapFeature 0     '/data/dnb09/galaxy_db/files/1/2/2/dataset_12263332-93bc-4b5e-94b5-864f484a5c87.dat'

ADD REPLY • link 4 months ago by ju_ra • 0

0

Entering edit mode

Are the annotations/reference genome file identifiers matching? That is generally a prime cause of issues with counting. Have you examined the alignment to make sure reads are piling up under exons?

Since you are working with

dataset of extracellular vesicles

this could have unique/odd characteristics compared to plain RNAseq. Assuming there was something special done to isolate the structures before making libraries.

ADD REPLY • link 4 months ago by GenoMax 141k

0

Entering edit mode

Yes it is (as I test run an old sample with the same datasets and it worked out fine). Visualising the alignments it does not seem that they pile up under annotated regions but also in (see image). Top track is the aligned BAM file

The main question for me seems to be whether this is valid data from the extracellular vesicles or an technical error...

ADD REPLY • link 4 months ago by ju_ra • 0

0

Entering edit mode

This does not look like RNAseq data.

ADD REPLY • link updated 4 months ago by Ram 43k • written 4 months ago by GenoMax 141k