Question

Biased GC content distribution in miRNA Illumina sequencing data

0

Entering edit mode

2.0 years ago

madhuratathode • 0

Hi All,

I am working on small RNA (miRNA) sequencing data analysis of human Glioblastoma cell line. I am checking the fastqc analysis results for my samples. Samples are already trimmed for the adapters. However I am seeing very different GC content pattern compared to the Theoretical distribution. Also, per base sequence quality is also bit weird and not able to interpret it.

Whether I should go for filtering the reads using Quality control analysis tool (such as NGS QC toolkit or any other?) in this case before proceeding with the alignment? (Attached Image 1)
How do I interpret the pattern of GC content in this case? (Attached Image 3)
Also number of N's after 22nd base position are increasing. How to handle this? (Attached Image 2)

I would like to understand can I consider these samples for downstream analysis?

Per base sequence quality Per base N content Per base GC content

smallRNA fastqc analysis • 530 views

ADD COMMENT • link 24 months ago by madhuratathode • 0

0

Entering edit mode

There is something seriously wrong with this dataset. There should not be this many N's, such poor quality. Did you get this data from a public source or is this your own dataset?

ADD REPLY • link 2.0 years ago by GenoMax 142k

0

Entering edit mode

Hi. This is our own dataset. I am really not sure why this happened? I agree, there is something wrong with this dataset. Can I try trimming bases after 22bp?

ADD REPLY • link 24 months ago by madhuratathode • 0