How can I be sure that raw read counts are well processed from fastq files?
0
0
Entering edit mode
2.5 years ago
Simon Ahn ▴ 10

Hi. I'm new in bioinformatics and try to process fastq files for getting raw read count matrix.

I downloaded fastq files from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452

  1. I used fasterq -dump to download fastq files from SRR

  2. Aligned fastq files with ENSEMBL annotation files which are Homo_sapiens.GRCh38.104.chr.gtf & Homo_sapiens.GRCh38.dna_rm.primary_assembly.fa without any trimming

  3. Extracted raw count matrix using featurecounts with BAM files

To check if my results are well processed, I normalized my read count matrix (CPM)

since I could get normalized data matrix from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452.

I compared my data with normalized count data from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452,

but the results are quite different than I thought.

I thought that the results would be a little different since I used other tools to get my result, but

when you see some results

enter image description here

left one is my data and right one if from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63452 normalized data. When you look at the A1BG gene, for example, there is huge difference between two data. \ What can I do to fix this problem? It seems not reasonable to use same tools everytime I try to extract raw count from fastq.

fastq RNAseq raw-count • 475 views
ADD COMMENT

Login before adding your answer.

Traffic: 2468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6