AI, ML, analytics and database applications often experience tail latency, where the slowest overall I/O transaction determines completion time and can cause unacceptably slow response times and stall the completion of vital tasks if left unmitigated. With projections that AI applications alone will grow 20x by 2025, tail latency will become increasingly problematic and potentially drag down performance across the enterprise. Recent SNIA and the NVMe Express standards define approaches to I/O determinism that are suitable for most application workloads, including predictable latency mode, NVM sets and local modes, but today’s new scale-out architectures demand faster options.
Excelero’s patent governs a mechanism to set a timeout for an I/O operation when invoking it. It allows product engineers to determine that a specific read transaction on an NVMe drive that will have a hard time finishing quickly, and mark it to “fail fast.” Once tagged, if the drive determines that it cannot fulfil the request in the required time, the drive can notify the requestor and enable it to choose to take a different action, such as reading the data from an alternative location, or storing it elsewhere without delay.
A concept that evolved from Excelero’s extensive experience in web-scale shared NVMe deployments, Excelero’s “fail fast” tag option can help enable large private cloud operators to build scale-out networks that even out latency enterprise wide, even as highly demanding, ultra-low latency workloads proliferate.
“The industry needs a better answer to the tail latency issue, where entire workflows can get hung up by the slowest element’s completion time,” said Yaniv Romem, CTO and co-founder of Excelero. “With this and the 11 other patents we have pending, Excelero is advancing our ability to keep latency low across the board, and help customers squeeze more from their storage architectures – and budgets.”