blakegonzales.com
Job Scheduler Failure Resiliency
Adding resiliency to your job scheduler can make a real difference in the overall reliability of your cluster. With shared memory systems, a single hardware failure can bring your entire system do…