Entering edit mode
2.0 years ago
madhuratathode
•
0
Hi All,
I am working on small RNA (miRNA) sequencing data analysis of human Glioblastoma cell line. I am checking the fastqc analysis results for my samples. Samples are already trimmed for the adapters. However I am seeing very different GC content pattern compared to the Theoretical distribution. Also, per base sequence quality is also bit weird and not able to interpret it.
- Whether I should go for filtering the reads using Quality control analysis tool (such as NGS QC toolkit or any other?) in this case before proceeding with the alignment? (Attached Image 1)
- How do I interpret the pattern of GC content in this case? (Attached Image 3)
- Also number of N's after 22nd base position are increasing. How to handle this? (Attached Image 2)
I would like to understand can I consider these samples for downstream analysis?
There is something seriously wrong with this dataset. There should not be this many N's, such poor quality. Did you get this data from a public source or is this your own dataset?
Hi. This is our own dataset. I am really not sure why this happened? I agree, there is something wrong with this dataset. Can I try trimming bases after 22bp?