How can I be sure that raw read counts are well processed from fastq files?
6 weeks ago
Simon Ahn ▴ 10

Hi. I'm new in bioinformatics and try to process fastq files for getting raw read count matrix.

I downloaded fastq files from

  1. I used fasterq -dump to download fastq files from SRR

  2. Aligned fastq files with ENSEMBL annotation files which are Homo_sapiens.GRCh38.104.chr.gtf & Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa without any trimming

  3. Extracted raw count matrix using featurecounts with BAM files

To check if my results are well processed, I normalized my read count matrix (CPM)

since I could get normalized data matrix from

I compared my data with normalized count data from,

but the results are quite different than I thought.

I thought that the results would be a little different since I used other tools to get my result, but

when you see some results

enter image description here

left one is my data and right one if from normalized data. When you look at the A1BG gene, for example, there is huge difference between two data. \ What can I do to fix this problem? It seems not reasonable to use same tools everytime I try to extract raw count from fastq.

fastq RNAseq raw-count • 123 views

