In our wetlab, we are loosing about 40% of the data due to (probably) low diversity libraries. We are using genome reduction techniques like GT-seq for plant variety identification. The outcome is that many reads do not have proper i7+i5 barcoding (mostly Ns) and they are kicked out of the pipeline. Is there anyway I can troubleshoot the issue or trace it back from which illumina cluster they came from ? I came across this old paper that talks about recovering some of the reads by "Deferred Cluster Calling", basically using the GOAT pipeline (General Oligo Analysis Tool) and Illumina sequence control software (SCS). I am not sure if the pipelines are compatible with the raw bcl files that we use today.
Any other suggestion or insight will be greatly appreciated.
The outcome is that many reads do not have proper i7+i5 barcoding
Sounds like simply having low diversity libraries is not reason you are losing data. N's in indexes indicates a problem with quantitation and loading. It is likely that you are overloading your samples. Indexes should have no diversity issues (assuming you are using standard Illumina indexes). If you are referring to data you lose to phiX (since one need to spike that in for low diversity libraries) there is not much to be done there.
You must have conferred with Illumina tech support on this issue. If you have not then I highly suggest that relevant people in lab do that. Illumina applications scientists can help come up with a strategy if your lab is not doing this optimally.
Since images are no longer saved there is no way to do any kind of base calling after the fact.