FASHN - One faulty GPU worker failed to process requests – Incident details

One faulty GPU worker failed to process requests

Resolved
Operational
Started about 1 month agoLasted about 8 hours

Affected

API

Partial outage from 5:01 PM to 1:10 AM

Web App

Partial outage from 5:01 PM to 1:10 AM

Updates
  • Resolved
    Resolved
    • One GPU worker went into a bad state, it was restarted and returned to normal operation

    • A mechanism to detect and auto-restart such states was developed and deployed.

  • Investigating
    Investigating

    Some of the requests to the nightly endpoint (experimental in app) are returning errors