Question

how to make the correct RNA-seq count file for cnvkit import-rna command?

0

Entering edit mode

3.7 years ago

lhaiyan3 ▴ 80

Hi, all:

I tried to use the cnvkit import-rna function. I can successfully run the test TCGA dataset, but when I run my own data, I will get wrong message. I used both Samlon and Kallisto to run RNA-seq data and got the 2 column file. My file format is similar with the TCGA test dataset, but I always failed with my dataset. Can anyone please give me some suggestions? Thanks.

HY

Here is my script for Salmon and cnvkit run, the attachment is one of my file and the wrong message.

module load salmon/0.9.1 || exit 1
salmon index -t /fdb/salmon/ensembl/release-100/cdna_fasta/Homo_sapiens.GRCh38.cdna.all.fa.gz -i GRCh38.index
salmon quant -i GRCh38.index  -l A -1 23_109MES_RP2_R1.fastq.gz -2 23_109MES_RP2_R2.fastq.gz -p $SLURM_CPUS_PER_TASK -o 23_109MES_RP2_quant
cut -f1,5 23_109MES_RP2_quant/quant.sf | sed '1d' > 23_109MES_RP2.txt

module load cnvkit
cnvkit.py import-rna --gene-resource /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv \
--correlations /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv \
--output out-summary.tsv --output-dir /data/$USER/out/ *.txt

Wrong message,

Dropping 68092 / 178517 rarely expressed genes from input samples
Loading gene metadata and TCGA gene expression/CNV profiles
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv with shape: (221323, 9)
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv with shape: (19177, 4)
Resetting 2846 ambiguous genes' correlation coefficients to default 0.100000
Trimmed gene info table to shape: (63966, 13)
Aligning gene info to sample gene counts
Weighting genes with below-average read counts
/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py:267: FutureWarning: clip_upper(threshold) is deprecated, use clip(upper=threshold) instead
  weights = [np.sqrt((gene_counts / gene_counts.quantile(.75)).clip_upper(1))]
Calculating normalized gene read depths
Traceback (most recent call last):
  File "/usr/local/apps/cnvkit/0.9.7.b1/bin/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/commands.py", line 1535, in _cmd_import_rna
    args.normal, args.do_gc, args.do_txlen, args.max_log2)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/import_rna.py", line 39, in do_import_rna
    gene_info, sample_counts, tx_lengths, normal_ids)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 272, in align_gene_info_to_samples
    normal_ids)
  File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 308, in normalize_read_depths
    assert sample_depths.values.sum() > 0
AssertionError

RNA-Seq • 878 views

ADD COMMENT • link updated 3.7 years ago by GenoMax 141k • written 3.7 years ago by lhaiyan3 ▴ 80

0

Entering edit mode

My file format is similar with the TCGA test dataset

Can you post a few lines of both?

ADD REPLY • link 3.7 years ago by igor 13k

0

Entering edit mode

this is salmon output,

ENST00000631435.1       0.000000
ENST00000415118.1       0.000000
ENST00000434970.2       0.000000
ENST00000448914.1       0.000000
ENST00000632524.1       0.000000
ENST00000633009.1       0.000000
ENST00000634070.1       0.000000
ENST00000632963.1       0.000000
ENST00000633030.1       0.000000
ENST00000633765.1       0.000000
ENST00000632619.1       0.000000
ENST00000633159.1       0.000000
ENST00000631871.1       0.000000
ENST00000633010.1       0.000000
ENST00000633379.1       0.000000

ADD REPLY • link updated 3.7 years ago by GenoMax 141k • written 3.7 years ago by lhaiyan3 ▴ 80

0

Entering edit mode

Hi, igor:

I also tried STAR alinment and then use htseq count for the input files, also failed. Here is my htseq output,

ENSG00000000003 0
ENSG00000000005 0
ENSG00000000419 0
ENSG00000000457 0
ENSG00000000460 0
ENSG00000000938 0
ENSG00000000971 0
ENSG00000001036 0
ENSG00000001084 0
ENSG00000001167 0
ENSG00000001460 0
ENSG00000001461 0
ENSG00000001497 0
ENSG00000001561 0
ENSG00000001617 0
ENSG00000001626 0
ENSG00000001629 0
ENSG00000001630 0
ENSG00000001631 0
ENSG00000002016 0
ENSG00000002079 0
ENSG00000002330 0

ADD REPLY • link updated 3.7 years ago by GenoMax 141k • written 3.7 years ago by lhaiyan3 ▴ 80