Hello,
I am attempting transcript quantification using Salmon
. I have indexed the genome. I am trying to figure out library type for the paired end RNA seq fastq files. I found using rseqc
infer_experiment.py
can figure this out. But I have a confusion here. I have:
sorted bam
file made by aligning fastq files to reference genome using Hisat2
but I have not mentioned strandedness while creating this bam file (Can I even use this bam file for rseqc),
reference genome file in fasta
format,
gene model file in gff
format
but I the genome bed
file is not available to download.
I am not sure which file to use for rseqc infer_experiment.py
. Thank you!
Thank you! I missed that part in the manual. I downloaded the files using SRA. I am not sure where to find methods section.
Can you share a link so I can have a look?
Thank you, I checked a few other links where this information is shared in the methods but it is not provided for the Bioproject I am working on. I checked the research article as well but no luck. Anyways, Salmon output the results.
I have another related question. I want to compare the results with another expression data using same input files. But the difference is the other data has FPKM values instead of TPM values using Salmon. Is there a way to get FPKM values using Salmon instead of TPM? If not, can you recommend any other software compared to Salmon which can take raw reads and output FPKM values?
Generally you want to use
tximport
to aggregate the transcript level estimates thatsalmon
produces to get gene level counts. These raw counts then typically go into a differential analysis. See manuals of e.g.edgeR
orDESeq2
for differential analysis tutorials. You generally cannot compare two different datasets, too much batch effect between samples that mask the biological signals. In other words, most of the difference you will see are due to technical artifacts, not biological differences.DESeq2
can use the output of tximport to generate FPKM and other kinds of normalized counts, check the manual. Agsin, do not compare like celltype A from study A with celltype B from study B, the results will be non-sense.