How to get different isoform's counts in different cell by using alevin (salmon)?
1
0
Entering edit mode
12 months ago
ruiyan_hou • 0

Hi, I have some 10× genomics scRNA-seq data. I use alevin to compare them to transcriptome. I want to get different isoform's count . However, when I use the following code, I got the CB × gene_id matrix. How can I get CB× transcript_id ? Thank you in advance.

nohup salmon alevin -l ISR  \
-1 /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz \
/mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \
-2 /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz \
/mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz \
--chromiumV3 \
-i /mnt/sda1/houruiyan/humanRef/salmon_index/salmon_index_v34/  \
-p 20 \
-o alevin_output \
--tgMap /mnt/sda1/houruiyan/humanRef/hg38txp2gene.tsv &


from vpolo.alevin import parser
import pandas as pd
pd.set_option('display.max_columns',None)
print(alevindf)


                  ENSG00000259376.1  ENSG00000259755.1  ENSG00000287892.1  \
TATCGCCTCTCCCAAC                0.0                0.0                0.0
CACTTCGTCACCTACC                0.0                0.0                0.0
TCCTCTTAGCCAAGGT                0.0                0.0                0.0
CCTCCAAAGGCCCGTT                0.0                0.0                0.0
AAGACAACAGATCACT                0.0                0.0                0.0

RNA-Seq • 455 views
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (text becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.

0
Entering edit mode

OK, thank you very much!

3
Entering edit mode
12 months ago
Rob 4.8k

Hi @ruiyan_hou,

Alevin does not perform transcript-level quantification from tagged-end single-cell RNA-seq data. As far as I am aware, there is no tool that outputs transcript-level abundances (i.e. a transcript x cell matrix) from such data. This is because tagged-end single-cell data — at least the current protocols — suffer from low per-cell read counts and a tremendous degree of sequence ambiguity (reads often align equally-well to many transcripts). Further, the fact that fragments are sequenced from the ends of transcripts and not across the full transcript body makes it less likely to observe unique distinguishing splicing junctions, or to rely on properties such as transcript length to help disambiguate reads that align to multiple transcripts. Thus, to make the results more robust, alevin produces a gene-level abundance matrix from tagged-end single-cell data. Even at the gene level, there is considerable multi-mapping in tagged-end single-cell data, and alevin attempts to resolve this through a combination of parsimonious UMI resolution and an EM algorithm. In the single-cell context, full length protocols (like Smart-seq2, etc.) can permit transcript-level abundance estimation and that data can be processed with a tool such as salmon. Of course, the trade-off there is that these full-length protocols typically assay many fewer cells.

0
Entering edit mode

I appreciate your help very much !!!