Advanced search

Message boards : Wish list : Using a GPU ID to determine faulty cards

Author Message
Profile Nognlite
Send message
Joined: 9 Nov 08
Posts: 69
Credit: 25,106,923
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwat
Message 13268 - Posted: 24 Oct 2009 | 14:09:56 UTC

Is there a way to identify the GPU that was used to process a particular WU? This would be a good feature to add as it would allow volunteers with multiple GPU systems to identify GPU's that produce errors.

This in turn would allow them to keep an eye on those GPU's for failure and maybe disable them from processing GPUGrid WU and process other less intensive projects.

It was always a pain for me to get 50-90% through a WU to have it error on me due to a card failure. I've replaced four GTX 280's. Even though they all failed, they did not fail at the same time and it was difficult to troubleshoot which card was faulty at any one moment. Especially when having to process 50-90% to get an error.

MarkJ
Volunteer moderator
Project tester
Volunteer tester
Send message
Joined: 24 Dec 08
Posts: 732
Credit: 197,194,445
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 13276 - Posted: 25 Oct 2009 | 9:37:44 UTC - in response to Message 13268.
Last modified: 25 Oct 2009 | 9:41:19 UTC

Is there a way to identify the GPU that was used to process a particular WU? This would be a good feature to add as it would allow volunteers with multiple GPU systems to identify GPU's that produce errors.

This in turn would allow them to keep an eye on those GPU's for failure and maybe disable them from processing GPUGrid WU and process other less intensive projects.

It was always a pain for me to get 50-90% through a WU to have it error on me due to a card failure. I've replaced four GTX 280's. Even though they all failed, they did not fail at the same time and it was difficult to troubleshoot which card was faulty at any one moment. Especially when having to process 50-90% to get an error.


If you look at your results it has that information in there. It lists the cards found and then says which one its using. It also lists the clock speed, as one of the common problems is downclocking on some card.

Here is one of mine:
<core_client_version>6.10.13</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# Using CUDA device 0
# There is 1 device supporting CUDA
# Device 0: "GeForce GTX 275"
# Clock rate: 1.40 GHz
# Total amount of global memory: 939196416 bytes
# Number of multiprocessors: 30
# Number of cores: 240
MDIO ERROR: cannot open file "restart.coor"

</stderr_txt>
]]>

____________
BOINC blog

Post to thread

Message boards : Wish list : Using a GPU ID to determine faulty cards