Question: How to get different isoform's counts in different cell by using alevin (salmon)?
0
gravatar for ruiyan_hou
15 days ago by
ruiyan_hou0
ruiyan_hou0 wrote:

Hi, I have some 10× genomics scRNA-seq data. I use alevin to compare them to transcriptome. I want to get different isoform's count . However, when I use the following code, I got the CB × gene_id matrix. How can I get CB× transcript_id ? Thank you in advance.

nohup salmon alevin -l ISR  \
    -1 /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R1_001.fastq.gz \
    /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R1_001.fastq.gz \
    -2 /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L001_R2_001.fastq.gz \
    /mnt/sda1/houruiyan/1kPBMC/fastqfile/pbmc_1k_v3_fastqs/pbmc_1k_v3_S1_L002_R2_001.fastq.gz \
    --chromiumV3 \
    -i /mnt/sda1/houruiyan/humanRef/salmon_index/salmon_index_v34/  \
    -p 20 \
    -o alevin_output \
    --tgMap /mnt/sda1/houruiyan/humanRef/hg38txp2gene.tsv &

from vpolo.alevin import parser
import pandas as pd
pd.set_option('display.max_columns',None)
alevindf=parser.read_quants_bin('/mnt/sda1/houruiyan/1kPBMC/alevin_output')
print(alevindf)

                  ENSG00000259376.1  ENSG00000259755.1  ENSG00000287892.1  \
TATCGCCTCTCCCAAC                0.0                0.0                0.0   
CACTTCGTCACCTACC                0.0                0.0                0.0   
TCCTCTTAGCCAAGGT                0.0                0.0                0.0   
CCTCCAAAGGCCCGTT                0.0                0.0                0.0   
AAGACAACAGATCACT                0.0                0.0                0.0
rna-seq • 108 views
ADD COMMENTlink modified 15 days ago by Rob4.1k • written 15 days ago by ruiyan_hou0

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink written 15 days ago by RamRS28k

OK, thank you very much!

ADD REPLYlink written 15 days ago by ruiyan_hou0
2
gravatar for Rob
15 days ago by
Rob4.1k
United States
Rob4.1k wrote:

Hi @ruiyan_hou,

Alevin does not perform transcript-level quantification from tagged-end single-cell RNA-seq data. As far as I am aware, there is no tool that outputs transcript-level abundances (i.e. a transcript x cell matrix) from such data. This is because tagged-end single-cell data — at least the current protocols — suffer from low per-cell read counts and a tremendous degree of sequence ambiguity (reads often align equally-well to many transcripts). Further, the fact that fragments are sequenced from the ends of transcripts and not across the full transcript body makes it less likely to observe unique distinguishing splicing junctions, or to rely on properties such as transcript length to help disambiguate reads that align to multiple transcripts. Thus, to make the results more robust, alevin produces a gene-level abundance matrix from tagged-end single-cell data. Even at the gene level, there is considerable multi-mapping in tagged-end single-cell data, and alevin attempts to resolve this through a combination of parsimonious UMI resolution and an EM algorithm. In the single-cell context, full length protocols (like Smart-seq2, etc.) can permit transcript-level abundance estimation and that data can be processed with a tool such as salmon. Of course, the trade-off there is that these full-length protocols typically assay many fewer cells.

ADD COMMENTlink written 15 days ago by Rob4.1k

I appreciate your help very much !!!

ADD REPLYlink written 15 days ago by ruiyan_hou0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 895 users visited in the last hour