Question

rseqc bed file for library type paired end RNA fastq files

0

Entering edit mode

4.1 years ago

evelyn ▴ 230

Hello,

I am attempting transcript quantification using Salmon. I have indexed the genome. I am trying to figure out library type for the paired end RNA seq fastq files. I found using rseqc infer_experiment.py can figure this out. But I have a confusion here. I have: sorted bam file made by aligning fastq files to reference genome using Hisat2 but I have not mentioned strandedness while creating this bam file (Can I even use this bam file for rseqc), reference genome file in fasta format, gene model file in gff format but I the genome bed file is not available to download.

I am not sure which file to use for rseqc infer_experiment.py. Thank you!

sequence • 1.1k views

ADD COMMENT • link 4.1 years ago by evelyn ▴ 230

score 1 · Answer 1 · 2020-03-12

1

Entering edit mode

4.1 years ago

ATpoint 82k

You can set the library type option in salmon to A (automatic). It will then guess the most likely library type. There is actually no need for external software. Beside that it is typically best to simply check with the people who produced this library. Did you produce it or did you download it? If downloaded check the methods section for the kit.

ADD COMMENT • link 4.1 years ago by ATpoint 82k

0

Entering edit mode

Thank you! I missed that part in the manual. I downloaded the files using SRA. I am not sure where to find methods section.

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Can you share a link so I can have a look?

ADD REPLY • link 4.1 years ago by ATpoint 82k

0

Entering edit mode

Thank you, I checked a few other links where this information is shared in the methods but it is not provided for the Bioproject I am working on. I checked the research article as well but no luck. Anyways, Salmon output the results.

I have another related question. I want to compare the results with another expression data using same input files. But the difference is the other data has FPKM values instead of TPM values using Salmon. Is there a way to get FPKM values using Salmon instead of TPM? If not, can you recommend any other software compared to Salmon which can take raw reads and output FPKM values?

ADD REPLY • link 4.1 years ago by evelyn ▴ 230

0

Entering edit mode

Generally you want to use tximport to aggregate the transcript level estimates that salmon produces to get gene level counts. These raw counts then typically go into a differential analysis. See manuals of e.g. edgeR or DESeq2 for differential analysis tutorials. You generally cannot compare two different datasets, too much batch effect between samples that mask the biological signals. In other words, most of the difference you will see are due to technical artifacts, not biological differences. DESeq2 can use the output of tximport to generate FPKM and other kinds of normalized counts, check the manual. Agsin, do not compare like celltype A from study A with celltype B from study B, the results will be non-sense.

ADD REPLY • link 4.1 years ago by ATpoint 82k