Full-Time
Posted on 6/27/2024
Advanced clock synchronization for distributed systems
No salary listed
Palo Alto, CA, USA
In Person
Clockwork Systems provides clock synchronization technology for mission-critical distributed systems, ensuring precise timing across operations. Its solutions run in both cloud and on-premises environments and are delivered through software licensing, subscriptions, and professional services. The company differentiates itself with deep timing expertise and end-to-end timing across networks, framed by tools like Latency Sensei for cloud latency monitoring. Its goal is to help customers achieve reliable, accurate synchronization to boost performance and reduce timing-related issues in time-sensitive applications.
Company Size
1-10
Company Stage
Early VC
Total Funding
$41.6M
Headquarters
Palo Alto, California
Founded
2018
Help us improve and share your feedback! Did you find this helpful?
Competitive Salary
Clockwork.io has launched TorchPass Workload Fault Tolerance, a software solution that eliminates costly GPU training failures through Live GPU Migration technology. The system allows AI training workloads to continue running through hardware failures, network disruptions and node crashes without requiring checkpoint restarts. The company claims TorchPass can save over $6 million annually in a typical 2,048-GPU deployment by reducing wasted training progress by 95%. In large clusters, it cuts lost time from approximately three hours per day to under ten minutes. Independent testing by SemiAnalysis found TorchPass delivered faster fault-tolerant performance than standard checkpoint-restart approaches and higher Model FLOPs Utilisation than leading open-source alternatives. The solution typically completes recovery in approximately three minutes whilst training continues uninterrupted. TorchPass is now available as part of Clockwork.io's FleetIQ platform.
Clockwork Raises $20.57 Million in Funding
Stanford spinout Clockwork has raised $20.6 million, led by NEA with participation from notable investors, to address AI's GPU inefficiency. The funding coincides with the launch of FleetIQ, a software solution aimed at enhancing GPU performance by improving communication between GPUs, clusters, and clouds. This innovation seeks to reduce crashes, shorten restarts, and increase utilization rates, making AI infrastructure more efficient and sustainable.