Question: FastQC results with abnormal GC content and Overrepresented sequences in RIP-seq datasets
my mouse RIP-seq data have abnormal GC content and the fastqc result in 'GC content' have 2 peaks, first one is about 55% in x axis, y axis is about 500000. the second one as about 85% in axis,y axis is about 1600000, very sharp peak.besides, the GC content each base is not about 25% for A,T,G,C. they looks like electrocardiogram from 0bp to 150 bp. I checks the unmaapped reads in my bam files and blast with NCBI nr database, which shows that these reads were come from mouse. I think it isn't the contaminate with other sequence, I really can't find how this can be?

There are 2 replicates sample in my RIP-seq for each condition, but another sample performed better than this. I maped my reads to mm10 with hisat2, the overall mapping rate about 69%,and nearly 50% of total reads aligned concordantly exactly 1 time. sequencer was illumina Hiseq 3000, And I am wondering whether this is caused by RNA degradation or PCR bias?

since you have RIPseq data, your libraries don't follow the assumptions of a FastQC. You amplify a limited set of sequences which have not the diversity as e.g. in whole genome sequencing data. You may have a look at the "common reason for warning" section of FastQC'S documentation (e.g. here)

On the other hand, you may check if the sequencer was a NextSeq machine. In 2 colour chemistry, you may discover poly-G sequences; see here for more information.



