Forum: How to calculate FPKM values of interested gene
3
gravatar for Ginsea Chen
4.9 years ago by
Ginsea Chen120
Chinese Academy of Tropical Agricultural Sciences, Danzhou, China
Ginsea Chen120 wrote:

FPKM, Fragments Kilobase of exon model per millon mapped reads, which can be used to indicate the expression (abundance) characteristics of genes. Now I want to describe the operation about obtaining interested gene FPKM value.

1.Software Download

1).fastq-dump: convert sra file to fastq file.

  website:http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=software 

2).bowtie:an ultrafast and memory efficient tool for aligning sequencing reads to long reference sequences.

  website:http://bowtie-bio.sourceforge.net/bowtie2/index.shtml 

3).cufflinks:assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.

  website:http://cufflinks.cbcb.umd.edu/ 

4).gffread: convert gff3 file to gtf file.

  website:http://cufflinks.cbcb.umd.edu/ (This program is included with cufflinks package)

2. Operation

1) Download genome.fa and genes.gff3 file from genome website; Download sra file from NCBI

2) Format conversion

  $ fastq-dump -I --split-files SRR123456789.sra # convert sra file to fastq file

  $ gffread -E genes.gff3 -o genes.gtf # convert gff3 file to gtf file

3) Index files

  $bowtie2-build genome.fa genome

4) Alignment

  $bowtie2 -x genome -1 SRR123456789_1.fastq -2 SRR123456789_2.fastq -S SRR123456789.sam

  $samtools view -bS SRR123456789.sam > SRR123456789.bam

  $samtools sort SRR123456789.bam SRR123456789

5) FPKM values

  $cufflinks SRR123456789.bam -G genes.gtf -o result

After these operations, we can extract FPKM values from genes.frkm_tracking file based on gene ID.

Notes: If you find some bugs, please contact me.

ADD COMMENTlink modified 4.9 years ago by Charles Warden6.6k • written 4.9 years ago by Ginsea Chen120
3
gravatar for dariober
4.9 years ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

I just wanted to point out that transcript abundance expressed as FPKM/RPKM has some biases and it is less easy to interpret than "transcript per million" or TPM. See, among other sources, Wagner et al. 2012 (Measurement of mRNA abundance using RNA-seq data: RPKM measure is inconsistent among samples) for a good explanation.

ADD COMMENTlink written 4.9 years ago by dariober10k

Hi dariober

Thanks for your suggestion, I am reading this paper now.

ADD REPLYlink written 4.9 years ago by Ginsea Chen120
1
gravatar for Charles Warden
4.9 years ago by
Charles Warden6.6k
Duarte, CA
Charles Warden6.6k wrote:

Why use Bowtie2 instead of TopHat2?

ADD COMMENTlink written 4.9 years ago by Charles Warden6.6k

Hi Warden

Thanks for your return! In fact, Tophat was usually used in many papers. Before this, I tried to use tophat to map reads to the reference genome, but the system always told me no enough RAM space and can't complete tophat analysis in my computer.  As the introduction of Tophat, I knew it is a software based on bowtie. So bowtie2 was used in this operation. Do you mean I shouldn't use Bowtie2 instead of Tophat2? 

Thank you again, sir!

ADD REPLYlink written 4.9 years ago by Ginsea Chen120

Bowtie2 might be better than Bowtie1, but TopHat is specifically designed to handle alignments across exon junctions for a chromsome-based reference.  If you were using transcripts as a reference, I think Bowtie would be OK (but then you would typically be using something like eXpress rather than cufflinks as a reference).  However, the general consensus is that TopHat is preferred for genomic alignments (such as against hg19, mm9, etc).

I would recommending seeing if your institution has a large, shared Linux server to perform the alignments, which shouldn't have the RAM problem.  I rarely run alignments on my desktop.  Not sure if that is your problem.

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Charles Warden6.6k

Hi, is it OK to use bowtie instead of TopHat if I am working with prokaryotic genomes which don't have introns like eukaryotic ones?

ADD REPLYlink written 3.3 years ago by liuyifan201460
0
gravatar for Ginsea Chen
4.9 years ago by
Ginsea Chen120
Chinese Academy of Tropical Agricultural Sciences, Danzhou, China
Ginsea Chen120 wrote:

 

Thanks for your suggestion, Linux server should be a good choice, but there were no equipment for me.

In fact, I am not sure the error of tophat is for no enough RAM space, the run log of tophat is pasted as follows:

[2014-07-03 23:13:54] Beginning TopHat run (v2.0.11)
-----------------------------------------------
[2014-07-03 23:13:54] Checking for Bowtie
          Bowtie version:     2.2.3.0
[2014-07-03 23:13:54] Checking for Samtools
        Samtools version:     0.1.19.0
[2014-07-03 23:13:54] Checking for Bowtie index files (genome)..
[2014-07-03 23:13:54] Checking for reference FASTA file
[2014-07-03 23:13:54] Generating SAM header for genome
[2014-07-03 23:13:54] Reading known junctions from GTF file
[2014-07-03 23:13:55] Preparing reads
     left reads: min. length=120, max. length=120, 19269414 kept reads (53310 discarded)
    right reads: min. length=120, max. length=120, 19263005 kept reads (59719 discarded)
[2014-07-03 23:32:02] Building transcriptome data files result/tmp/gene
[2014-07-03 23:32:07] Building Bowtie index from gene.fa
[2014-07-03 23:32:41] Mapping left_kept_reads to transcriptome gene with Bowtie2 
[sam_read1] missing header? Abort!
    [FAILED]
Error running bowtie:
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
Error while flushing and closing output
terminate called recursively
(ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

I tried to find many ways to solve it but no successful, many people of forum (such as SEQanswers) thought that it is for no enough RAM space.

I am a new of RNA-seq, Thanks for your help again.

ADD COMMENTlink written 4.9 years ago by Ginsea Chen120

I typically build the bowtie2 reference from source for TopHat2 analysis - if I remember, I think there are some issues with the reference that is provided.  For example, is there actually any sequence in the .fasta file?

ADD REPLYlink modified 4.9 years ago • written 4.9 years ago by Charles Warden6.6k

This problem may be a bug of tophat, I have solved it through addition of line "bowtie_header_cmd += ["-x"]" in the tophat code as the description of Google forum:  https://groups.google.com/forum/#!topic/tuxedo-tools-users/vG5Rn0IgxoA

Thanks for your help! 

ADD REPLYlink written 4.9 years ago by Ginsea Chen120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1044 users visited in the last hour