Entering edit mode
3.7 years ago
lhaiyan3
▴
80
Hi, all:
I tried to use the cnvkit import-rna function. I can successfully run the test TCGA dataset, but when I run my own data, I will get wrong message. I used both Samlon and Kallisto to run RNA-seq data and got the 2 column file. My file format is similar with the TCGA test dataset, but I always failed with my dataset. Can anyone please give me some suggestions? Thanks.
HY
Here is my script for Salmon and cnvkit run, the attachment is one of my file and the wrong message.
module load salmon/0.9.1 || exit 1
salmon index -t /fdb/salmon/ensembl/release-100/cdna_fasta/Homo_sapiens.GRCh38.cdna.all.fa.gz -i GRCh38.index
salmon quant -i GRCh38.index -l A -1 23_109MES_RP2_R1.fastq.gz -2 23_109MES_RP2_R2.fastq.gz -p $SLURM_CPUS_PER_TASK -o 23_109MES_RP2_quant
cut -f1,5 23_109MES_RP2_quant/quant.sf | sed '1d' > 23_109MES_RP2.txt
module load cnvkit
cnvkit.py import-rna --gene-resource /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv \
--correlations /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv \
--output out-summary.tsv --output-dir /data/$USER/out/ *.txt
Wrong message,
Dropping 68092 / 178517 rarely expressed genes from input samples
Loading gene metadata and TCGA gene expression/CNV profiles
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/ensembl-gene-info.hg38.tsv with shape: (221323, 9)
Loaded /usr/local/apps/cnvkit/0.9.7.b1/data/tcga-skcm.cnv-expr-corr.tsv with shape: (19177, 4)
Resetting 2846 ambiguous genes' correlation coefficients to default 0.100000
Trimmed gene info table to shape: (63966, 13)
Aligning gene info to sample gene counts
Weighting genes with below-average read counts
/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py:267: FutureWarning: clip_upper(threshold) is deprecated, use clip(upper=threshold) instead
weights = [np.sqrt((gene_counts / gene_counts.quantile(.75)).clip_upper(1))]
Calculating normalized gene read depths
Traceback (most recent call last):
File "/usr/local/apps/cnvkit/0.9.7.b1/bin/cnvkit.py", line 9, in <module>
args.func(args)
File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/commands.py", line 1535, in _cmd_import_rna
args.normal, args.do_gc, args.do_txlen, args.max_log2)
File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/import_rna.py", line 39, in do_import_rna
gene_info, sample_counts, tx_lengths, normal_ids)
File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 272, in align_gene_info_to_samples
normal_ids)
File "/usr/local/Anaconda/envs_app/cnvkit/0.9.7.b1/lib/python3.6/site-packages/cnvlib/rna.py", line 308, in normalize_read_depths
assert sample_depths.values.sum() > 0
AssertionError
Can you post a few lines of both?
this is salmon output,
Hi, igor:
I also tried STAR alinment and then use htseq count for the input files, also failed. Here is my htseq output,