Question

STARsolo detected only one gene and Features.tsv file is empty

0

Entering edit mode

22 months ago

MYousry ▴ 20

Hi,

I am trying to use STARsolo on scRNA-seq data for A.Thaliana (GSM4423536) to produce the count matrix. However, everytime I run STARsolo, it detects only one gene and the features.tsv file is thus empty. As a beginner, I am quite lost. I tried different datasets for A.Thaliana, but the same issue repeats. I am not sure about what I am doing wrong here or not understanding correctly. The results published using 10X cellranger detects many genes and their IDs.

One more question, is that related to next steps and clustering? I mean, would that produce an issue? my sense is it should because there is one gene when we need multiple genes to cluster cells based on their expression.

Here is the command I ran:

STAR \
  --runThreadN 4 \
  --genomeDir reference_genome/STAR_annotated-index/ \
  --readFilesIn FASTQ_data/SRR13040580_2.fastq FASTQ_data/SRR13040580_1.fastq \
  --outFileNamePrefix STARsolo_results/ \
  --outReadsUnmapped Fastx \
  --outSAMattributes NH HI NM MD CB UB sM sS sQ \
  --outFilterMultimapNmax 1 \
  --outFilterMatchNmin 30 \
  --outFilterMismatchNmax 4 \
  --alignIntronMax 1 \
  --alignSJDBoverhangMin 999 \
  --soloType CB_UMI_Simple \
  --soloCellFilter EmptyDrops_CR \
  --soloCBwhitelist CB_whitelist/3M-february-2018.txt \
  --outSAMtype BAM SortedByCoordinate

enter image description here

STARsolo • 1.2k views

ADD COMMENT • link 21 months ago by MYousry ▴ 20

1

Entering edit mode

Seems like your reference data/index is missing information STARsolo expects.

One more question, is that related to next steps and clustering? I mean, would that produce an issue? my sense is it should because there is one gene when we need multiple genes to cluster cells based on their expression.

Again, I'd highly, highly recommend reading OSCA and/or speaking to a local expert. You appear to be flying blind here, so to speak, and it will only lead you to frustration and wasted effort.

ADD REPLY • link 22 months ago by jared.andrews07 ★ 16k

0

Entering edit mode

I downloaded the genome fasta file and the annotation file from ensemble. Is there something I should do to solve the missing information issue?

Regarding OSCA, I started reading it, but, alongside, I want to apply what I learn so I am trying to figure this step out so that I have some matrix to work on.

Thank you!

ADD REPLY • link 22 months ago by MYousry ▴ 20

0

Entering edit mode

Hi!

I could fix this issue. And yes, the annotation file needed to be filtered. I wrote the solution below.

Thank you:)

ADD REPLY • link 21 months ago by MYousry ▴ 20

0

Entering edit mode

Please do not post screenshots of things unless necessary. Copying and pasting data should do just fine. Use 101010 button to format your data as code which will maintain formatting.

ADD REPLY • link 22 months ago by GenoMax 142k

score 1 · Accepted Answer · 2022-07-30

1

Entering edit mode

21 months ago

MYousry ▴ 20

The annotation file from ensembl needs to be filtered for exons before inputting in STAR ti build the genome index. For that I used 10X genomics's tool "cellranger mkgtf."

After that step and following other steps as before, the issue got solved.

ADD COMMENT • link 21 months ago by MYousry ▴ 20