Question: how to make the correct RNA-seq count file for cnvkit import-rna command?
0
gravatar for lhaiyan3
6 months ago by
lhaiyan350
United States
lhaiyan350 wrote:

Hi, all:

I tried to use the cnvkit import-rna function. I can successfully run the test TCGA dataset, but when I run my own data, I will get wrong message. I used both Samlon and Kallisto to run RNA-seq data and got the 2 column file. My file format is similar with the TCGA test dataset, but I always failed with my dataset. Can anyone please give me some suggestions? Thanks.

HY

Here is my script for Salmon and cnvkit run, the attachment is one of my file and the wrong message.

module load salmon/0.9.1 || exit 1
salmon index -t /fdb/salmon/ensembl/release-100/cdna_fasta/Homo_sapiens.GRCh38.cdna.all.fa.gz -i GRCh38.index
salmon quant -i GRCh38.index  -l A -1 23_109MES_RP2_R1.fastq.gz -2 23_109MES_RP2_R2.fastq.gz -p $SLURM_CPUS_PER_TASK -o 23_109MES_RP2_quant
cut -f1,5 23_109MES_RP2_quant/quant.sf | sed '1d' > 23_109MES_RP2.txt

module load cnvkit
cnvkit.py import-rna --gene-resource /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv \
--correlations /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv \
--output out-summary.tsv --output-dir /data/$USER/out/ *.txt

Wrong message,

Dropping 68092 / 178517 rarely expressed genes from input samples
Loading gene metadata and TCGA gene expression/CNV profiles
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv with shape: (221323, 9)
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv with shape: (19177, 4)
Resetting 2846 ambiguous genes' correlation coefficients to default 0.100000
Trimmed gene info table to shape: (63966, 13)
Aligning gene info to sample gene counts
Weighting genes with below-average read counts
/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py:267: FutureWarning: clip_upper(threshold) is deprecated, use clip(upper=threshold) instead
  weights = [np.sqrt((gene_counts / gene_counts.quantile(.75)).clip_upper(1))]
Calculating normalized gene read depths
Traceback (most recent call last):
  File "/usr/local/apps/cnvkit/0.9.7.b1/bin/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/commands.py", line 1535, in _cmd_import_rna
    args.normal, args.do_gc, args.do_txlen, args.max_log2)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/import_rna.py", line 39, in do_import_rna
    gene_info, sample_counts, tx_lengths, normal_ids)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 272, in align_gene_info_to_samples
    normal_ids)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 308, in normalize_read_depths
    assert sample_depths.values.sum() > 0
AssertionError
rna-seq • 238 views
ADD COMMENTlink modified 6 months ago by GenoMax96k • written 6 months ago by lhaiyan350

My file format is similar with the TCGA test dataset

Can you post a few lines of both?

ADD REPLYlink written 6 months ago by igor12k

this is salmon output,

ENST00000631435.1       0.000000
ENST00000415118.1       0.000000
ENST00000434970.2       0.000000
ENST00000448914.1       0.000000
ENST00000632524.1       0.000000
ENST00000633009.1       0.000000
ENST00000634070.1       0.000000
ENST00000632963.1       0.000000
ENST00000633030.1       0.000000
ENST00000633765.1       0.000000
ENST00000632619.1       0.000000
ENST00000633159.1       0.000000
ENST00000631871.1       0.000000
ENST00000633010.1       0.000000
ENST00000633379.1       0.000000
ADD REPLYlink modified 6 months ago by GenoMax96k • written 6 months ago by lhaiyan350

Hi, igor:

I also tried STAR alinment and then use htseq count for the input files, also failed. Here is my htseq output,

ENSG00000000003 0
ENSG00000000005 0
ENSG00000000419 0
ENSG00000000457 0
ENSG00000000460 0
ENSG00000000938 0
ENSG00000000971 0
ENSG00000001036 0
ENSG00000001084 0
ENSG00000001167 0
ENSG00000001460 0
ENSG00000001461 0
ENSG00000001497 0
ENSG00000001561 0
ENSG00000001617 0
ENSG00000001626 0
ENSG00000001629 0
ENSG00000001630 0
ENSG00000001631 0
ENSG00000002016 0
ENSG00000002079 0
ENSG00000002330 0
ADD REPLYlink modified 6 months ago by GenoMax96k • written 6 months ago by lhaiyan350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1011 users visited in the last hour
_