Question: Long Runs Of A'S, G'S, C'S Or T'S In A Scaffolded Assembly (Abyss-1.3.4/Soapdenovo2)
As I understand, prior to scaffolding using long mate-pair reads, runs of N's or A/G/C/T's cannot exceed the length of the reads used for assembly (in this case 101bp). After scaffolding there are long series of N's as a result of contigs being joined together.

Can there also be long runs of A/G/C/T's after scaffolding? This is what I am seeing. I thought perhaps that joining contigs together ending in homopolymer repeats might, instead of an N, result in an A being used for example if those two contigs ended with long runs of A.

ABySS-1.3.4 was used for assembly and SOAPdenovo2 was used for scaffolding.



Did you enable the gap-filling option during Soapdenovo2 scaffolding? I recall that Soapdenovo gapfiller does not hesitate to fill gaps with the same k-mer many times, to resolve tandem repeats.

