Genes' fpkm values through cufflink
1
0
Entering edit mode
9 days ago
arsala521 ▴ 10

Hi,

I am a newbie to RNA-seq data analysis. I have to identify differentially expressed genes (DEGs) between human and chimpanzee in a tissue type. I have comparable RNA-seq experiment data (reads/fastq) for the two species. Each species has 2 biological replicates(each with three technical replicates) so six runs per species.

I understand that identification of DEGs by cufflink package (cuffdiff) is for two conditions with same reference genome. To identify DEGs between different species, I have to use edgeR or DEseq.

I intend to identify FPKM values for all genes in case of all 12 runs (6 runs per species) and then to use this FPKM dataset to identify DEGs with R package (EdgeR or Deseq). Is this approach okay?

Second, my main question is about fpkm values I am getting in cufflink output. For running cufflink, I am following the step-by-step protocol mentioned in the cufflink protocol paper (https://www.nature.com/articles/nprot.2012.016).

First I ran tophat with following command:

tophat -p 8 -G hg38.ncbiRefSeq.gtf -o Human_B1_T1 hg38 SRRxxx_1.fastq SRRxxx_2.fastq

Then I ran cufflink as below:

cufflinks -p 8 -o Clout_Human_B1_T1 Human_B1_T1/accepted_hits.bam

The 'genes.fpkm_tracking' file I got in cufflink output has first few lines as below:

tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

CUFF.1 - - CUFF.1 - - chr1:151793-152723 - - 1.57259 0.924969 2.22021 OK

CUFF.2 - - CUFF.2 - - chr1:153030-158982 - - 0.924186 0 6.23538e+06 OK

CUFF.3 - - CUFF.3 - - chr1:633736-634228 - - 12.1477 9.07784 15.2175 OK

If someone please tell what CUFF.1 CUFF.2 (and so on) means. Other than 1st (tracking id) column, the same thing is present in the 4th (gene_id) column as well. How can I get FPKM values along with gene names? There are no gene names in this file.

I found this (https://biostar.usegalaxy.org/p/17760/) as a relevant post but couldn't find clear answer there.

TIA

PS: For the hg38 genes.gtf file, I used the file 'hg38.ncbiRefSeq.gtf' downloaded from UCSC portal.

Cufflink RNA-seq fpkm • 370 views
ADD COMMENT
2
Entering edit mode
9 days ago
dsull ★ 4.2k

Your approach is not ok. Do not use cufflinks and do not use tophat. They're outdated -- and better methods which give you faster and more accurate/reliable quantification results exist.

Use kallisto or salmon for your purposes -- you can feed the results from those programs into edgeR/DESeq2/sleuth/swish/whatever.

ADD COMMENT
2
Entering edit mode

Correct. Even see the latest release notes from the tophat team themselves: https://ccb.jhu.edu/software/tophat/index.shtml

ADD REPLY
0
Entering edit mode

Thank you. Cufflink was already installed in the system so I proceeded with that. I will look into the other latest programs as well. If you can please also tell what these codes in cufflink output means and how to get cufflink output showing genes' names with fpkm?

ADD REPLY
1
Entering edit mode

cufflinks is no longer maintained and people don't use it anymore so it's difficult to find someone to help you. Furthermore, you can not use cufflinks output of FPKMs for edgeR/deseq2 which use negative binomial regression to perform differential expression (FPKMs don't follow negative binomial distribution). So, stop using cufflinks (the only useful thing about cufflinks now is managing assemblies [e.g. from Trinity]). For what it's worth, I actually currently work in the lab where cufflinks was first developed. All of us use kallisto now.

Just install kallisto (or similar software) on your system (installing locally is fine, if you don't have superuser access).

ADD REPLY
0
Entering edit mode

Got it. Thank you so much for the very helping and detailed reply.

ADD REPLY

Login before adding your answer.

Traffic: 1143 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6