I had a hardware failure earlier this week - this wouldn't have caught it, but it did make me think more about what I'd do if other systems had storage die.
Currently, there doesn't seem to be anything which collects SSD endurance information - I want to throw together a plugin to collect that information from smartctl
.
Although I don't have an immediate need, it could later be extended to do collect SSD endurance from racadm and other similar vendor specific utils
Activity
23-Jun-22 18:54
assigned to @btasker
23-Jun-22 19:47
The plugin seems to be working (commits should show up soon).
You can pull out the last read with the following Flux
23-Jun-22 19:59
mentioned in commit github-mirror/telegraf-plugins@7dd2a896f34f82aeb87c4bd37a2693d80d154f2d
Message
Add initial implementation of ssd endurance plugin for utilities/telegraf-plugins#7
Currently untested (and the README needs some work too)
24-Jun-22 07:25
It might be better to use a normalised output so we don't have to try and mess about parsing the newer human readable output.
If we do
Then the output should be the same across all....
Nope
I guess the second output format is NVME specific then
24-Jun-22 07:54
Just found https://github.com/influxdata/telegraf/issues/8701, I'll pass some info on.
Could probably chuck a PR in, but need to figure out how to deal with the NVME output, it looks like we'd prob need to add a regex (I just don't see a better way around it)
26-Jun-22 13:38
PR at https://github.com/influxdata/telegraf/pull/11391