Question

Generating Counts Data From Fastq Sequence Files

1

Entering edit mode

10.1 years ago

josph.sh ▴ 10

I'm new to sequencing and I've currently got several FASTQ files containing data corresponding to sequencing experiments (sequenced using Illumina miseq).

I was hoping to carry out some expression analysis (with edgeR, probably) using this data, but I'll need to generate a counts matrix from this data. Could somebody provide some instruction on how to generate counts data from a FASTQ file?

sequencing fastq rna-seq counts differential-expression • 10k views

ADD COMMENT • link updated 10.1 years ago by Xingyu Yang ▴ 280 • written 10.1 years ago by josph.sh ▴ 10

Ram · Answer 1 · 2014-03-19

2

Entering edit mode

10.1 years ago

Ashutosh Pandey 12k

You will have to first align those fastq files against the reference genome and produce SAM/BAM files.Tophat, STAR and many other splice aware RNA-seq aligners are available for this task. It is always good to preprocess your read data including QC, trimming off the low quality bases etc.
Then you need to use some tool that will generate count data for you. Basically you will have to provide the aligned BAM file and the gene annotation file (gff3, gtf,bed format) for your reference genome. HTSeq, Cufflinks are some tools available for this task. Search "Biostar" and you will get names of other tools.

ADD COMMENT • link updated 4.5 years ago by Ram 43k • written 10.1 years ago by Ashutosh Pandey 12k

0

Entering edit mode

Thank you for your reply.

ADD REPLY • link 10.0 years ago by josph.sh ▴ 10

score 1 · Answer 2 · 2014-03-19

1

Entering edit mode

10.1 years ago

Xingyu Yang ▴ 280

http://www.nature.com/nprot/journal/v7/n3/abs/nprot.2012.016.html

ADD COMMENT • link 10.1 years ago by Xingyu Yang ▴ 280

0

Entering edit mode

Thanks for the link.

ADD REPLY • link 10.0 years ago by josph.sh ▴ 10