Paired-end data with different length
2
0
5.6 years ago

I have two Paired-end ChIP-seq dataset. One set of data is 2*36 bp and the other set is 2*100 bp. I want to compare these data to find deferentially regions, ... but I have this concern that different length might affect downstream analysis.
Should I trim the 100 one to 36? or as far as both of them are PE, it does not matter that much. Would you please share your ideas with me? Thanks

3
1) Why on earth would you throw away 64% of your data by trimming? 2) if you want help, you're going to have to edit your question and share a lot more information about what you're trying to accomplish here.

3
5.6 years ago
Fidel ★ 2.0k

Don't trim but be careful with mappability and GC bias:

1. Longer reads can map unambiguously to repetitive regions compared to shorter reads. In your case, this means that a fraction of the 2x100 reads will map uniquely compared to the same regions for 36 bp reads. This most likely will not be an issue unless you are interested on repetitive regions of the genome.

2. GC bias: 36 bp reads were usually produced by the Illumina Genome Analyzer while 100bp reads are produced by newer machines. Thus, it could be the case that for the two samples, different polymerases were used to amplify the fragments. Old polymerases introduced a marked GC bias that could affect results (promoters of mammalian genes tend to be GC rich).

3. Check for duplication rates. Old data tend to be quite duplicated.

0
5.6 years ago
winter_li ▴ 60

HI, you can learn illumina sequencing principle . I think you can do that , but you' better be sure that quality of reads is good for your research.

