Advanced search

Message boards : News : Old Noelia WUs

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1957
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 29045 - Posted: 7 Mar 2013 | 13:59:37 UTC

We have checked the error statistics and they are too high to be normal, so we are going to abort them.

They work perfectly over here, so it's not clear what is the problem. We might need to run few in beta to try to understand it.

gdf

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29046 - Posted: 7 Mar 2013 | 14:21:55 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29048 - Posted: 7 Mar 2013 | 14:52:44 UTC

Thank you guys for the very fast intervention. I have 13 of the new NATHAN units processed half way so far, and they are all running just perfectly. Will let you know in exactly 2 hours if they all end succesfull in here.

Profile algabe
Send message
Joined: 23 May 10
Posts: 9
Credit: 743,980,301
RAC: 106,136
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29049 - Posted: 7 Mar 2013 | 16:04:28 UTC - in response to Message 29048.
Last modified: 7 Mar 2013 | 16:21:08 UTC

I am very disappointed with this project lately, yesterday NOELIA two units of 8 hours each processed with faulty execution, today NOELIA other two units with 10 hours each processed aborted by the user, this is unacceptable and very little serious.
Now I am processing nathan two units, if these errors persist with these units, much lamentadolo definitely fails to process in this project, I'm paying a lot of money on electricity bills with the crisis there, that is useless after this effort.


Greetings.
____________

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29050 - Posted: 7 Mar 2013 | 16:09:24 UTC - in response to Message 29049.

I am very disappointed with this project lately, yesterday NOELIA two units of 8 hours each processed with faulty execution, today NOELIA other two units with 10 hours each processed aborted by the user, this is unacceptable and very little serious.
Now I am processing nathan two units, if these errors persist with these units, much lamentadolo definitely fails to process in this project, I'm paying a lot of money on electricity bills with the crisis there, that is useless after this effort.


Greetings.


I agree with you. Lots of WU problems recently, even on the short queue. I don't know if I'll continue to support this project if these problems keep going.
____________

Team Belgium

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29052 - Posted: 7 Mar 2013 | 17:05:35 UTC

The new NATHAN units are processing just fine. A small 23,88mb result, 70800 credits, very good ones.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29053 - Posted: 7 Mar 2013 | 17:24:34 UTC

Yes, I've just reported my first completed one - task 6588199 - on the same host, same settings, same session (no reboot) as the one which failed a Noelia this morning.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29054 - Posted: 7 Mar 2013 | 17:35:25 UTC

All my first 13 units where processed without any issue. Second batch is processing. I say we are back on business, i´m glad.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29061 - Posted: 7 Mar 2013 | 21:43:36 UTC - in response to Message 29046.
Last modified: 7 Mar 2013 | 21:44:06 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.

Thanks Nate, want to mention that these NATHAN WUs are running great on my 4 GTX 460 768mb cards too so I think you've done some honing. Thanks again, appreciate it.

mhhall
Send message
Joined: 21 Mar 10
Posts: 23
Credit: 861,667,631
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29065 - Posted: 7 Mar 2013 | 22:12:20 UTC

Been seeing multiple cases where new WU are not checkpointing
(or showing progress in BOINC Manager). Have aborted a couple
without improvement. Running on client v7.0.27 under linux x86.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29066 - Posted: 7 Mar 2013 | 22:21:56 UTC

Yummy, there are some TONYs on the pipe aswell... tasty wu´s, gimme gimme....crunch them all and gladly pay the energy bills when all works like a charm....

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 37
Credit: 3,839,902,185
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29070 - Posted: 8 Mar 2013 | 2:33:50 UTC - in response to Message 29065.

Been seeing multiple cases where new WU are not checkpointing
(or showing progress in BOINC Manager). Have aborted a couple
without improvement. Running on client v7.0.27 under linux x86.


Yes I am still having problems in linux also. I thought it was just the NOELIAs but the new NATHANs are doing it also. The tasks will lock up or remain at 0% and the system has to be rebooted for the GPU to work again on any project. Might be the new app, my linux sytems had not needed a reboot in months before this.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29072 - Posted: 8 Mar 2013 | 7:25:17 UTC

I switched to short ones only and recently got a short NOELIA task which after 24 hours was stuck at 0%. Too bad I was away from the machine and realized it too late.
So the short ones are problematic as well.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29077 - Posted: 8 Mar 2013 | 16:27:43 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.
____________
Greetings from TJ

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29079 - Posted: 8 Mar 2013 | 17:54:35 UTC - in response to Message 29077.
Last modified: 8 Mar 2013 | 17:55:13 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results
____________

Team Belgium

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29080 - Posted: 8 Mar 2013 | 18:40:01 UTC
Last modified: 8 Mar 2013 | 18:42:29 UTC

They did run well for few days, but this one:
http://www.gpugrid.net/result.php?resultid=6583089
was stuck, so I tried to relaunch it and after that I had to abort it.
Other computers returned error with this WU too.

wiyosaya
Send message
Joined: 22 Nov 09
Posts: 114
Credit: 589,114,683
RAC: 0
Level
Lys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29088 - Posted: 9 Mar 2013 | 3:46:23 UTC - in response to Message 29079.
Last modified: 9 Mar 2013 | 3:48:36 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results

Long queue Noelia's run fine on my GTX 460 and my GTX 580 both running 310.70 driver.
____________

Profile microchip
Avatar
Send message
Joined: 4 Sep 11
Posts: 110
Credit: 326,102,587
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29094 - Posted: 9 Mar 2013 | 13:52:28 UTC - in response to Message 29088.
Last modified: 9 Mar 2013 | 13:52:54 UTC

The short Noelia's run fine on my 550Ti. They take little longer than the previous 4.2 ones (100-200 sec. more) even little more CPU use and less credit 8700 to 10500 previous.


Yup, crunch "fine" on my GTX 560. I noticed though that they often crash the NV driver and they also show CUDA errors when looking at task details, but they complete fine here and I get valid results

Long queue Noelia's run fine on my GTX 460 and my GTX 580 both running 310.70 driver.


Well, I've disabled long one for the time being as the last 2 long WUs I crunched errored out, so I'm crunching only short ones at the moment. I mostly get short NOELIA and so far, so good. I'm able to report valids
____________

Team Belgium

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29096 - Posted: 9 Mar 2013 | 15:06:24 UTC
Last modified: 9 Mar 2013 | 15:07:49 UTC

All short NOELIA runs from me with my two GTX 650 Ti GPUs, too: no problems.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29100 - Posted: 9 Mar 2013 | 21:45:02 UTC - in response to Message 29046.
Last modified: 9 Mar 2013 | 21:45:38 UTC

I have put a whole bunch of simulations to replace the cancelled ones on long. It is a system I have been meaning to run more simulations to get more statistics. They should pose no problems, but I'll be keeping an eye on them. As always, please let us know if the case is otherwise.



These ran smoothly until yesterday evening when one failed:

http://www.gpugrid.net/result.php?resultid=6595932

It was the same thing that was happening with the last bunch of TONI units.

Today, I had an adventure with this unit:

http://www.gpugrid.net/result.php?resultid=6600488

It finished successfully, but barely. When it was about 25% done, I got an error message saying that acemd.2865.exe had failed, and the unit wasn't crunching, so I suspended it before I got computation error in boinc manager, the video card's speed and setting were reset (slower speed), and I rebooted the computer, resumed the unit, and it continued to crunch. At 90%+ completion, the computer froze, so I had to unplug it, and restart. It finished successfully!
But the subsequent unit refuse to start crunching and the video card speed and settings were reset to a slower speed. I had to suspend that unit and reboot. It is running okay right now, and hopefully it won't crash.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29108 - Posted: 10 Mar 2013 | 23:37:34 UTC

True. Not 100%, but doable.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29109 - Posted: 10 Mar 2013 | 23:38:23 UTC
Last modified: 10 Mar 2013 | 23:38:41 UTC

If I babbyseat the machines, I mean.... I will be traveling in two days, then the worst is expected.

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 29110 - Posted: 11 Mar 2013 | 0:29:01 UTC - in response to Message 29109.
Last modified: 11 Mar 2013 | 0:29:08 UTC

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29111 - Posted: 11 Mar 2013 | 1:06:02 UTC - in response to Message 29110.

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29113 - Posted: 11 Mar 2013 | 8:43:15 UTC

Now I got more problems even with short Noelia tasks. They were stuck, caused errors or app crash. A reboot was needed to start a new GPU task.
I have ordered a new GPU for GPUGrid, but I think I'll suspend this whole project (and switch to another one) until these problems are solved.

STE\/E
Send message
Joined: 18 Sep 08
Posts: 368
Credit: 313,909,798
RAC: 486,126
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29114 - Posted: 11 Mar 2013 | 9:08:08 UTC

Same here, got 5 Box's running the shorter ones & I think all 5 are hung Wu's right no, one at 37 Hr's ...
____________
STE\/E

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29115 - Posted: 11 Mar 2013 | 10:43:09 UTC
Last modified: 11 Mar 2013 | 10:47:18 UTC

No problems with short NOELIA tasks. I have not attempted any long NOELIAs for about a week.

PC #1 AMD 1090T with Acer GTX 650 Ti
PC #2 AMD A10 5800K with Acer GTX 650 Ti
____________
John

Ken_g6
Send message
Joined: 6 Aug 11
Posts: 8
Credit: 52,496,994
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwat
Message 29116 - Posted: 11 Mar 2013 | 16:55:18 UTC

Short Noelias were going fine, until I had to abort this one, which was restarting repeatedly with error:

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29122 - Posted: 12 Mar 2013 | 11:24:18 UTC
Last modified: 12 Mar 2013 | 11:34:37 UTC

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29123 - Posted: 12 Mar 2013 | 11:47:08 UTC

SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.


It looks like most of the major errors are gone (severe error % is good), but this one does seem to be occurring more frequently than we would like. We'll see if we can find a cause.

cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29124 - Posted: 12 Mar 2013 | 12:28:12 UTC - in response to Message 29123.

dmesg from the beta WU's

[400033.132826] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400049.637834] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400054.854423] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400066.358868] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400082.863901] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400099.368878] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400115.873938] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400119.305177] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400133.382624] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400136.664677] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400149.890962] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400166.399277] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400182.904290] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400198.412211] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400215.917612] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400220.224939] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400244.929342] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400260.437256] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400276.942267] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400293.450605] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400308.955195] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400325.463524] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400341.968561] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400358.476864] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400369.667884] NVRM: Xid (0000:01:00): 13, 0001 00000000 000090c0 00001b0c 00000000 00000000
[400382.174156] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400397.678751] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400414.183758] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400430.692078] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400446.196682] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400461.704604] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400464.387651] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400484.212040] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400500.218499] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400516.723568] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400533.231872] NVRM: Xid (0000:01:00): 8, Channel 00000001
[400535.747891] NVRM: Xid (0000:01:00): 31, Ch 00000001, engmask 00000101, intr 10000000
[400555.739274] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401174.487665] NVRM: Xid (0000:01:00): 8, Channel 00000001
[401189.992293] NVRM: Xid (0000:01:00): 8, Channel 00000001

I suspect I will have to reboot to recover from these.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29125 - Posted: 12 Mar 2013 | 12:30:39 UTC - in response to Message 29122.

226 (0xffffffffffffff1e) ERR_TOO_MANY_EXITS error on the latest beta units. This is a new one!


After running flawlessly, I got a few units with this error, on the latest set of betas.

http://www.gpugrid.net/result.php?resultid=6611952

http://www.gpugrid.net/result.php?resultid=6610530

http://www.gpugrid.net/result.php?resultid=6610707


Is it my imagination or did you change the error message these units?


cciechad
Send message
Joined: 28 Dec 10
Posts: 13
Credit: 37,543,525
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwat
Message 29126 - Posted: 12 Mar 2013 | 12:38:32 UTC - in response to Message 29124.

Verified the beta WU's hang the GPU in some manner. rmmoding nvidia and modprobing nvidia does not resolve. The system must be rebooted to recover from whatever the WU is causing. On Nvidia 313.26.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29127 - Posted: 12 Mar 2013 | 13:28:05 UTC
Last modified: 12 Mar 2013 | 13:38:17 UTC

I wanted to chime in to say I just had 12 NOELIA tasks fail hard on the "ACEMD beta version v6.49 (cuda42)" app, using Windows 8 Pro x64, BOINC v7.0.55 x64 beta, nVidia 314.14 beta drivers, GTX 660 Ti (which usually works on GPUGRID) and GTX 460 (which usually works on World Community Grid)

The tasks resulted in "Driver stopped responding" errors, and Windows restarted the drivers to recover. But the failures also appear to have caused other GPUs (which were working on entirely different projects, like World Community Grid)... to also fail.

I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app,
Jacob

================================================
PS: The 12 that failed were:

063ppx43-NOELIA_063pp_equ-1-2-RND4865_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148px44-NOELIA_148p_equ-1-2-RND1140_2
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

216px20-NOELIA_216p_equ-1-2-RND7557_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px45-NOELIA_041p_equ-1-2-RND6478_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

041px33-NOELIA_041p_equ-1-2-RND8614_2
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

255px9-NOELIA_255p_equ-1-2-RND6395_1
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx29-NOELIA_063pp_equ-1-2-RND2517_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx39-NOELIA_148n_equ-1-2-RND5760_1
SWAN : FATAL : Cuda driver error 1 in file 'swanlibnv2.cpp' in line 1330.
Assertion failed: a, file swanlibnv2.cpp, line 59

063ppx16-NOELIA_063pp_equ-1-2-RND8732_1
The system cannot find the path specified.
(0x3) - exit code 3 (0x3)

063ppx18-NOELIA_063pp_equ-1-2-RND6787_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

109nx31-NOELIA_109n_equ-1-2-RND1501_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

148nx37-NOELIA_148n_equ-1-2-RND2228_0
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

ETQuestor
Send message
Joined: 11 Jul 09
Posts: 27
Credit: 1,000,618,568
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29128 - Posted: 12 Mar 2013 | 14:53:25 UTC

These NOELIA acemdbeta WUs are all hanging for me. They get stuck at a "Current CPU Time" of between 1 and 5 seconds. I had to abort them.


http://www.gpugrid.net/result.php?resultid=6610160
http://www.gpugrid.net/result.php?resultid=6610894

http://www.gpugrid.net/show_host_detail.php?hostid=43352

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29135 - Posted: 12 Mar 2013 | 23:28:41 UTC
Last modified: 12 Mar 2013 | 23:34:32 UTC

On my system Vista 32bit, BOINC 6.10.58 nVidia 314.7 the latest Noelia beta errored out after more than 11 hours. It is this one:
http://www.gpugrid.net/workunit.php?wuid=4248935
____________
Greetings from TJ

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29137 - Posted: 13 Mar 2013 | 0:45:34 UTC - in response to Message 29127.


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29138 - Posted: 13 Mar 2013 | 10:34:33 UTC - in response to Message 29137.


I know this is the beta app, but...
Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.

ie: Many unnecessary communications, errors with unrelated projects, time spent reporting avoidable bugs, etc.

Looking for more stability, even in the beta app


They would need 10 to 15 computers (dual booting or virtual pc) with every operating system on them plus all the different versions of BOINC everyone's running not to mention the different video cards. They'll never be able to please everyone, I always suspend other jobs or clear them out if I know I'm going to beta test but that's just me not 20/20 hindsight. What I'm trying to say is that if they did do some limited testing, who's to say what OS they choose? It certainly wouldn't be Windows 8, it's turning out to be a flop and a real disappointment for Microsoft and their vendors. I don't want to sound too harsh (if I do I apologize) but that's what beta testing is all about, right?


I agree with you flashawk. We crunchers need to do the testing with all the different set-ups and platforms. Win8 is a pain indeed.

____________
Greetings from TJ

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29139 - Posted: 13 Mar 2013 | 10:49:48 UTC

Devs, do you run some of these tasks before issuing them to us? If not, you should, because when the bugged tasks get to us, the failures waste many more resources than they would if you tested them locally first.


We do test them locally, to the extent we can. Part of the issue is that running locally for us vs. running on BOINC are not comparable. We do have an in-house fake BOINC project, but even that isn't exactly comparable to sending to you users. Additionally, we have very limited ability to test on Windows. In the future we will improve there, but we have limited resources right now.

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.

Fred Bayliss
Send message
Joined: 27 May 11
Posts: 9
Credit: 255,985,614
RAC: 0
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29140 - Posted: 13 Mar 2013 | 10:57:02 UTC - in response to Message 29139.

I'm running these om Win7 with GTX670 and often get a windows message Nvidia driver stopped working
Hope this helps.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29141 - Posted: 13 Mar 2013 | 11:01:13 UTC

The previous bunch of Noelia's beta's did good on my WinVista 32 bit pc with driver 314.7 BOINC 6.10.58. The batch from last days error out after hours with the message that the acemd driver stopped and has recovered from an unexpected error. I am now trying the long runs from Nathan on my GTX550Ti.
____________
Greetings from TJ

Oktan
Send message
Joined: 28 Mar 09
Posts: 16
Credit: 953,280,454
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29142 - Posted: 13 Mar 2013 | 11:18:51 UTC - in response to Message 29139.

Hi there im having problems on my linux box havent been able to run any work at all for a 3-4 days..

Mvh/ Oktan

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29143 - Posted: 13 Mar 2013 | 11:22:00 UTC - in response to Message 29139.
Last modified: 13 Mar 2013 | 11:31:32 UTC

Thanks for the reply, Nate. I'm glad to hear that you guys are looking to improve the testability for Windows, even before issuing tasks on the Beta application to us Beta users.

Regarding your request for info, my previously mentioned NOELIA task failures are happening on Windows 8 Pro x64, using BOINC v7.0.55 x64 beta, running nVidia drivers 314.14 beta, using 2 video cards, GTX 660 Ti and GTX 460.

It appears to me that, when a GPUGrid task causes the nVidia driver to stop responding, Windows catches the error and restarts the driver (instead of BSOD), giving a Taskbar balloon to the effect of "The nVidia driver had a problem and has been restarted successfully." (I'm not sure of the exact text). When this happens, in addition to the GPUGrid task erroring out on my main video card, crunching on my other GPU (which is usually doing World Community Grid Help Conquer Cancer work) also results in its tasks erroring out.

I believe the next tasks that get processed after that driver recovery, are successful, unless another NOELIA task on the beta app causes an additional driver crash and recovery.

If you have any more resources to test these tasks out more, locally, it would save us a huge headache. I understand I signed up for these beta tasks, and I understand that seeing these errors is part of the gig, and so... If you find a way to replicate the error locally, then I'd politely ask that you also remove the bugged tasks from the beta queue. If you cannot yet reproduce the problem locally, then we'll keep erroring them for you, as part of our obligation.

Not sure if this much info helps, but that's the behavior I'm seeing on my Windows 8 x64 PC, and if you need anything more, feel free to ask.

Kind regards,
Jacob Klein

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29144 - Posted: 13 Mar 2013 | 11:24:51 UTC - in response to Message 29139.

What we are thinking is that this might be related to the Windows application. Has anyone who experiences these problems seen them on a linux box? Is it only Windows? The more we know, the more quickly we can improve. The last thing we want is to crash your machines. A failed WU is one thing. Locking up cruncher machines is much, much worse. Please let us know so we can fix it.

I've just aborted one of your long run tasks which looked as if it was going bad - http://www.gpugrid.net/workunit.php?wuid=4246107 (replication _6 is always a bad sign).

The first cruncher to try it was running Linux.

Killer 69
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29146 - Posted: 13 Mar 2013 | 15:50:08 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29147 - Posted: 13 Mar 2013 | 16:03:24 UTC - in response to Message 29146.
Last modified: 13 Mar 2013 | 16:04:34 UTC

All NOELIA tasks at the moment freeze my Linux pretty totally to the point that I have to restart computer. What's worse, I did de-select beta tasks but after the reboot BOINC downloads more from the ACEMDBETA queue those tasks and I'm back to reboot cycle.

Deselect

Run test applications?
This helps us develop applications, but may cause jobs to fail on your computer

as well.

Killer 69
Send message
Joined: 7 Jan 09
Posts: 3
Credit: 3,624,425
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29150 - Posted: 13 Mar 2013 | 17:20:32 UTC - in response to Message 29147.

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29151 - Posted: 13 Mar 2013 | 17:29:49 UTC - in response to Message 29150.

Ok, I had still test applications selected, after deselecting that and resetting project I got now NATHAN long run task, which is also pretty odd, because I have only short runs enabled at the moment.

There aren't any short run tasks available today. Might you have had

If no work for selected applications is available, accept work from other applications?

selected as well?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29154 - Posted: 13 Mar 2013 | 18:14:17 UTC - in response to Message 29151.

109nx33-NOELIA_109n_equ-1-2-RND6949_0 4248581 139265 12 Mar 2013 | 8:04:33 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 58,742.39 1.73 --- ACEMD beta version v6.49 (cuda42)

This Long WU hung after 16h on a W7 system with a GTX660Ti. The GPU sat at zero usage and the app stayed running/crashed preventing new work units from starting or a backup GPU project from running. It also prevented an additional CPU core from being used at a CPU project. Saw the usual cuda driver pop-up error.

Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.
MDIO: cannot open file "output.restart.coor"
SWAN : FATAL : Cuda driver error 999 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

The next two WU's also failed:
148px38-NOELIA_148p_equ-1-2-RND3814_6 4249317 139265 13 Mar 2013 | 17:29:07 UTC 13 Mar 2013 | 17:32:46 UTC Error while computing 31.09 1.76 --- ACEMD beta version v6.49 (cuda42)
216px36-NOELIA_216p_equ-1-2-RND0721_0 4249016 139265 12 Mar 2013 | 9:48:07 UTC 13 Mar 2013 | 14:48:02 UTC Error while computing 12.60 1.81 --- ACEMD beta version v6.49 (cuda42)

I don't see the point in testing a WU 7 or more times, especially if it's one of a batch of hundreds.

Again, I suggest you start up an alpha project to test on properly - Beta testing shouldn't crash systems, hang drivers or banjax the OS!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Martin Aliger
Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29156 - Posted: 13 Mar 2013 | 20:07:36 UTC
Last modified: 13 Mar 2013 | 20:14:10 UTC

All of these betas failed on mine machine. Moreover, I opted out from beta, updated, but am still receiving them (and only them).

I also observed that mine W7 always restart driver few seconds after acemd application is killed.

And all those WUs failed immediatelly. If you see some times at statistics, that becouse there is crash messagebox onscreen which counts to time running. Sometimes its on screen for hours...

Dylan
Send message
Joined: 16 Jul 12
Posts: 98
Credit: 386,043,752
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwat
Message 29158 - Posted: 13 Mar 2013 | 20:18:33 UTC - in response to Message 29156.

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29159 - Posted: 13 Mar 2013 | 20:20:05 UTC
Last modified: 13 Mar 2013 | 20:22:23 UTC

Seven in a row of the 6.49 ACEMD beta NOELIAs failed for me also, all in 8 seconds or less, so I am giving it a rest for now. That was on a Kepler GTX 650 Ti card, and I will try a Fermi GTX 560 tomorrow to see if that does any better. This is on Win7 64-bit, and BOINC 7.0.56 x64. Those cards have been basically error free for the last several days, since the last Noelia errors.

Tsukiouji
Send message
Joined: 6 Jan 13
Posts: 1
Credit: 1,548,050
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwatwat
Message 29164 - Posted: 14 Mar 2013 | 11:08:10 UTC

A few NOELIA WUs failed recently on my system too.
I'm running GTS450 (314.14, Win7 x64).

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29165 - Posted: 14 Mar 2013 | 12:39:22 UTC - in response to Message 29164.

Tsukiouji,
When I clicked your link, I got a page that says "No access". For your account settings, in GPUGRID preferences, do you have "Should GPUGRID show your computers on its web site?" set to yes?

Profile nenym
Send message
Joined: 31 Mar 09
Posts: 137
Credit: 1,308,230,581
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29166 - Posted: 14 Mar 2013 | 14:49:35 UTC - in response to Message 29165.

The problem is at link. Filter set "http://www.gpugrid.net/results.php?userid=94436" can be set and results can be seen by the owner only. There is no problem with "host" filers, e.g. http://www.gpugrid.net/results.php?hostid=144019.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29170 - Posted: 14 Mar 2013 | 15:54:42 UTC - in response to Message 29166.

Thanks for the explanation -- I was able to find the user's tasks by clicking on their name, and looking at the tasks for the only computer. Link: http://www.gpugrid.net/results.php?hostid=144019

Anyway... The way I look at this issue is...

The project admins have already made a decision whether they want the beta testers to suffer through and to "process all these failures".

So, what I do is, look at the server status page here http://www.gpugrid.net/server_status.php ... and just keep praying that the "Unsent" tasks for the "ACEMD beta version" app goes down quicker.

Good news - it's pretty much exhausted - Now maybe my system will be stable again!

Martin Aliger
Send message
Joined: 16 Jul 10
Posts: 7
Credit: 35,198,028
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29174 - Posted: 15 Mar 2013 | 4:14:14 UTC - in response to Message 29158.

Martin, did you follow this thread on how to completely opt out of beta tasks?


http://www.gpugrid.net/forum_thread.php?id=3272


No, but I'm in all other queues, so there is (plenty of) other work.

But the problem is solved now. Admins cancelled existing beta tasks and no others are waiting. I'll opt in to beta again to help test on Win platform.

Profile AdamYusko
Send message
Joined: 29 Jun 12
Posts: 26
Credit: 21,540,800
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwat
Message 29185 - Posted: 16 Mar 2013 | 23:53:19 UTC

Now I am not sure if this is an error with my machine it has been offline for a few weeks, or if it is due to a bug in the Noelia Tasks it got earlier today.

The errors both of them sent out had issues with the file: "restart.coor"

One had this output:

<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1841.
MDIO: cannot open file "restart.coor"


the other had a much shorter but similar output of:

<message>
process exited with code 255 (0xff, -1)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"

____________

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29189 - Posted: 17 Mar 2013 | 12:36:08 UTC
Last modified: 17 Mar 2013 | 12:36:36 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.

idimitro
Send message
Joined: 25 Jun 12
Posts: 3
Credit: 47,912,263
RAC: 0
Level
Val
Scientific publications
watwatwatwatwatwatwat
Message 29281 - Posted: 29 Mar 2013 | 13:01:30 UTC

After working for a few days the Nathan packages are also crashing the application and my driver.
I liked the cause of this project but basically I can not allow it to crush my computer and interrupt my work.
Asta la vista.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29287 - Posted: 30 Mar 2013 | 15:37:56 UTC - in response to Message 29189.
Last modified: 30 Mar 2013 | 15:39:08 UTC

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*

Jorge Alberto Ramos Olive...
Send message
Joined: 13 Aug 09
Posts: 24
Credit: 156,684,745
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwat
Message 29303 - Posted: 31 Mar 2013 | 23:40:02 UTC - in response to Message 29287.

I´m still having the BSOD/reboot thing on my triple 690 rig, each two days, even with the NATHAN long units. Just a full cache abort and clean units will solve it, but then in two days another one will come.


On my end, i´m having suspicious about one of the 690´s beeing not that strong. Taking out the oc of it seems to improve the machine stability. This issue should be machine fault, because none of my other machines does it. Plus no one seems to have the same BSOD problem with the current units, then the problem is here. Just want to share it, because that´s not a project fault.
BTW I would like to have more news from the results front, so I can proudly share it with my family and friends, and maybe found some more volunteers to the cause.

Typo edited*


BSODs Strike Back!

I don't have my 690's OC'ed and my system crashed today with NATHAN units e.g. http://www.gpugrid.net/workunit.php?wuid=4313870 (I deactivated the project before error reports from this unit could be assembled, as the system BSODs first before BOINC notices it)

Have been working through them for a month or so without a BSOD, after experiencing the same crash reports seen elsewhere around here (e.g. .http://www.gpugrid.net/forum_thread.php?id=3308&nowrap=true#29090)

I will be crunching my backup project until this is fixed.

Profile Bikermatt
Send message
Joined: 8 Apr 10
Posts: 37
Credit: 3,839,902,185
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29330 - Posted: 6 Apr 2013 | 3:18:06 UTC

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29335 - Posted: 6 Apr 2013 | 7:36:26 UTC - in response to Message 29330.

I just noticed I have two Noelia WU on my linux boxes for the first time in a few weeks. They were both stuck at 0% and the boxes had to be rebooted to get the gpu running again.


Exact same thing here, Windows XP Pro 64 bit. I had 3 NOELIA's come through, I caught one at 0% after 5 1/2 hours of crunching on a GTX680, GPU was at 99%, memory controller was at 0% along with the CPU usage for that GPU. The other 2 caused a 2685 error and one NOELIA hosed a CPDN work unit that I had over 250 hours on. I am not signed on to do beta testing, these came through the regular server (I also did a TONI without issue).

Interesting that they slipped them through like this, makes me feel like they don't trust us.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29338 - Posted: 6 Apr 2013 | 8:34:30 UTC - in response to Message 29335.

Interesting that they slipped them through like this, makes me feel like they don't trust us.

No, the way I understand it is that Noelia is testing new functionality, which had been added in the recent app update but wasn't used in previous WUs (except the infamous Noelias).

To me it looks like there's more alpha and beta testing needed here. And serious debugging.

MrS
____________
Scanning for our furry friends since Jan 2002

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29339 - Posted: 6 Apr 2013 | 8:45:23 UTC

Same here, this morning the machine (Ubuntu 64, 2x660GTIs) was hung, reboot to see that there was a Noelia stuck at 0%, wait to see if it progresses...no way...a couple of reboots more to finally abort and get back to normality.

Weekends are not the best moments for new trials imho.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29340 - Posted: 6 Apr 2013 | 9:02:35 UTC
Last modified: 6 Apr 2013 | 9:04:36 UTC

Well, I guess you're getting information through the moderators lounge, I seriously didn't see any post about those work units coming through or I would have been on the look out.

I guess I got a little complacent doing the NATHAN's for the last month. I just can't wrap my mind around the fact that she (NOELIA) always has problems with her work units and it's tough for anyone to figure out why.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29341 - Posted: 6 Apr 2013 | 9:02:42 UTC - in response to Message 29339.
Last modified: 6 Apr 2013 | 9:07:21 UTC

On 30th March I had a Short task sit for 18h before I spotted it doing nothing, 47x2-NOELIA_TRYP_0-2-3-RND8854_6 (6.52app). Since then I've had three Nathan tasks fail and one Noelia 148nx9xBIS-NOELIA_148n-1-2-RND8819_1 (all 6.18apps).

It bugs me too when tasks fail after 6h, run indefinitely or crash systems.

'moderators lounge' - ha!
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile nate
Send message
Joined: 6 Jun 11
Posts: 124
Credit: 2,928,865
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 29360 - Posted: 6 Apr 2013 | 22:14:52 UTC

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29362 - Posted: 6 Apr 2013 | 23:36:25 UTC - in response to Message 29360.

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Ya buddy, you got the touch. Maybe you can work you're magic on rebuilding the NOELIA's, you seem to have the "Right Stuff". I admit, I have no idea what goes into writing these wu's, Noelia must be doing something fundamentally different than the rest of the scientist's at GPUGRID. I'm hoping she'll get it right soon and this well all have been worth it.

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29363 - Posted: 7 Apr 2013 | 5:36:58 UTC

Please NO MORE NEW LONG NOELIA tasks until they are really tested.
I have been running well any tasks for few weeks, but yesterday got a new long Noelia and the same result again - hang.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29364 - Posted: 7 Apr 2013 | 6:28:57 UTC - in response to Message 29360.
Last modified: 18 Apr 2013 | 12:25:17 UTC

There have been some really odd errors in the last couple of months,
I11R10-NATHAN_dhfr36_3-26-32-RND2505_7
Stderr output

<core_client_version>7.0.44</core_client_version>
<![CDATA[
<message>
- exit code 98 (0x62)
</message>
<stderr_txt>
MDIO: unexpected end-of-file for file "input.coor": reached end-of-file before reading 39350 coordinates
ERROR: file mdioload.cpp line 80: Unable to read bincoordfile

called boinc_finish

</stderr_txt>
]]>


Would like plenty of Noelia's NOELIA_Klebe_Equ WU's.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1127
Credit: 1,901,927,545
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29395 - Posted: 9 Apr 2013 | 3:39:59 UTC - in response to Message 29360.

Nothing has changed with the NATHAN tasks. They have been running for weeks with historically low error rates, so they really shouldn't be a problem, as far as I can imagine. I know almost nothing at this point about the new NOELIA WUs, but I have suspended them for now considering the complaints.


Thank you Nate for suspending them. I really hope you guys can figure out the problems in your staging environment, before even sending them through the beta app. If there's anything I can do to help (like some sort of pre-Beta test, if possible), you can PM me. I really enjoy testing, especially when I know it might fail, but I expect the production apps to be near-error-free.

Regards,
Jacob

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29406 - Posted: 11 Apr 2013 | 9:42:31 UTC

I just got another NOELIA long wu and it gave me an error message after 30 seconds of run time, I had to reboot to get the GPU back working.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29407 - Posted: 11 Apr 2013 | 11:28:36 UTC - in response to Message 29406.

Had a NOELIA beta fail this morning, 291px1x1BIS-NOELIA_291p_beta-1-2-RND9212

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29409 - Posted: 11 Apr 2013 | 13:05:45 UTC

063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2
WU has run for 8 hr 20 min with another 8 hr 05 min projected.
Seems excessive on a GTX580
____________

Simba123
Send message
Joined: 5 Dec 11
Posts: 147
Credit: 69,970,684
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29411 - Posted: 11 Apr 2013 | 13:17:21 UTC - in response to Message 29111.

If you travel, I would recommend getting an app on a mobile device to bring with you that will allow you to remote into the computers. An example would be teamviewer, which is free.


https://play.google.com/store/apps/details?id=com.teamviewer.teamviewer.market.mobile&hl=en

exactly what I do on my tablet. Problem is, when the big rig starts to reboot, I can´t access it, Hope it won´t happen.



you can set teamviewer to start with windows and auto-login, so if the computer at home is setup this way, if it reboots, you will still have access to it.

Profile The King's Own
Avatar
Send message
Joined: 25 Apr 12
Posts: 32
Credit: 945,543,997
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 29412 - Posted: 11 Apr 2013 | 15:57:06 UTC

Further to http://www.gpugrid.net/forum_thread.php?id=3318&nowrap=true#29409


063ppx1xBIS-NOELIA_063pp_beta-0-2-RND4224_2 crashed after 10+ hours Locking up whole system and requiring reboot.

The following error from tasks:


<core_client_version>7.0.31</core_client_version>
<![CDATA[
<message>
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
</message>
<stderr_txt>
MDIO: cannot open file "restart.coor"
SWAN : FATAL : Cuda driver error 702 in file 'swanlibnv2.cpp' in line 1574.
Assertion failed: a, file swanlibnv2.cpp, line 59

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

</stderr_txt>
]]>

____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29413 - Posted: 11 Apr 2013 | 16:02:06 UTC

I aborted 063px1x1BIS-NOELIA_063p_beta-1-2-RND8034_1 after it had given the "acemd.2865P.exe has encountered a problem ..." popup error three times in succession.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29418 - Posted: 11 Apr 2013 | 19:20:51 UTC

I guess I should have clarified, the NOELIA that crashed on me came through the regular server. Richard, I always get the 2865P error, I thought it was a Windows XP thing.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 29421 - Posted: 12 Apr 2013 | 2:48:56 UTC

I had another NOELIA sneak through on the non-beta long run server, I didn't get the error message this time, it ran for 59 minutes and remained at 0%. The CPU usage was at 0%, the GPU usage was at 0% and the memory controller was at 0% so I aborted it and had to reboot my computer to get my GTX680 working again.

Windows XP Pro x64

2x EVGA GTX680 2GB

Running CPDN on the other 6 cores.

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 231
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29422 - Posted: 12 Apr 2013 | 8:41:04 UTC
Last modified: 12 Apr 2013 | 8:44:52 UTC

Also just noticed on one of my Linux machines a NOELIA beta task must have been sent using the non-beta long run server & had stalled 24hrs ago.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29562 - Posted: 25 Apr 2013 | 13:30:42 UTC

I got one Noelia on a vista ultimate x86 system with a GTX550Ti. It took 93686.14 seconds to complete, but it did with almost 95000 credits.
So not all Noelia's WU error out!
____________
Greetings from TJ

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29830 - Posted: 11 May 2013 | 19:40:15 UTC

Just had two Noelia tasks failed:
http://www.gpugrid.net/result.php?resultid=6852307
http://www.gpugrid.net/result.php?resultid=6849844

Others running well.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29898 - Posted: 13 May 2013 | 13:43:18 UTC - in response to Message 29830.
Last modified: 13 May 2013 | 14:00:32 UTC

Just had two Noelia tasks failed:
http://www.gpugrid.net/result.php?resultid=6852307
http://www.gpugrid.net/result.php?resultid=6849844

They both completed successfully on other machines after you posted, but I don't see any rhyme or reason for it. The machines that failed all did so quickly (in a few seconds). But they have a variety of GPU cards and operating systems, and I doubt they were all overclocked so much that they failed right away (though that is a possibility that should be checked), and they wouldn't have time to get too hot either.

I noticed though that my GTX 650 Ti would sometimes fail after only a few seconds, which I haven't yet seen on my GTX 660s (except those bad work units that everyone failed on). That suggests to me that some work units just won't run on some types of cards. I know that on Folding, it was found out a couple of years ago that some of the more complex work units would fail on cards with only 96 shaders, but would run fine with 192 shaders or more. I don't see that pattern here yet, but something else might become apparent.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 29900 - Posted: 13 May 2013 | 14:56:06 UTC

Jim1348; Not had any problems with Noelia's with either my 650's nor 670. I am running xp sp3 with beta 320 drivers which have been completely stable for me. Actually I even noticed a small perf improvement over the 314's. Might be worth a try one of your problematic machines?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29901 - Posted: 13 May 2013 | 15:06:27 UTC - in response to Message 29900.
Last modified: 13 May 2013 | 15:15:26 UTC

Jim1348; Not had any problems with Noelia's with either my 650's nor 670. I am running xp sp3 with beta 320 drivers which have been completely stable for me. Actually I even noticed a small perf improvement over the 314's. Might be worth a try one of your problematic machines?

Not problematic; only an occasional failure at the outset on the GTX 650 Ti. But it was a factory-overlocked card, and I have now reduced the clock (and increased the core voltage) to the point where I don't think it gets even the occasional failure anymore.

But many of the cards are factory-overclocked now. That is the same insofar as errors are concerned as if you had used software to overclock a card; it is the chip specs from Nvidia that determine the default clock rate. If the work units fail quickly, it is not much of a problem and you will gain points overall with the faster clocks.

The real problem comes when they fail after a couple of hours; then you should get out the MSI Afterburner and start reducing the clocks, or check the cooling. You will be points ahead in the end. Also, the work units change difficulty; what starts out as a stable card can easily start failing later when (not if) the harder ones come along. So I just don't overclock, which save a lot of troubleshooting later.

wdiz
Send message
Joined: 4 Nov 08
Posts: 20
Credit: 871,871,594
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 29902 - Posted: 13 May 2013 | 16:52:28 UTC - in response to Message 29421.

I had another NOELIA sneak through on the non-beta long run server, I didn't get the error message this time, it ran for 59 minutes and remained at 0%. The CPU usage was at 0%, the GPU usage was at 0% and the memory controller was at 0% so I aborted it and had to reboot my computer to get my GTX680 working again.

Windows XP Pro x64

2x EVGA GTX680 2GB

Running CPDN on the other 6 cores.


Same problem here.
Noelia task is running for 10 hours and only 3% done !
No CPU load, no GPU load


Linux Arch - GTX680 - Driver 319.17

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 29990 - Posted: 16 May 2013 | 6:01:07 UTC

Another Noelia failed after 8:45 hours.
I think I'll take a break with GPUGrid...

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30019 - Posted: 16 May 2013 | 17:55:18 UTC

Mumak; reading all negative comments yet I have not had any problems with Noelia's. I see you have two machines, one with a 650ti which appears stable the other with a 660ti which is causing you to loose your hair.
Starting to see a pattern and wondering if Nvidia's boost is causing stability issues here?
Will state the obvious and as a test suggest decreasing your 660ti's power target to 72% and clock to Nvidia's default (928, 1500) and see how it goes?

Profile Mumak
Avatar
Send message
Joined: 7 Dec 12
Posts: 92
Credit: 225,897,225
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 30023 - Posted: 16 May 2013 | 18:33:46 UTC

I had no issues with other tasks, but Noelia's failed on 650Ti in the past too.. It's no all which fail - I currently got another one, so will see how that goes...

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30028 - Posted: 16 May 2013 | 19:22:27 UTC
Last modified: 16 May 2013 | 19:23:30 UTC

I only want to ask how much of you try to raise the gpu voltage for about 25mV whats described in some forumthreads. It still is a need on some cards on gpugrid with some type of workunits. Perhaps it helps some of you.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30052 - Posted: 17 May 2013 | 15:24:14 UTC

Just wanted to add that I too had a Noelia WU that ran for almost 7 hrs and was only 5% complete on GTX660Ti. I had to abort it => 291px6x2-NOELIA_klebe_run2-0-3-RND9489 http://www.gpugrid.net/workunit.php?wuid=4459890

I'm not here to complain, problems are to be expected (I left Seti project after a month of solid outages/problems), and this to me is just minor, business as usual. I just wanted to post so that maybe it can get corrected. Initially I was concerned since it was the very first task I ran on a new Linux build, but I'm currently about 40% done with I2HDQ_35R5-SDOERR_2HDQd-0-4-RND7274 @ 3hrs 45mins, so everything is looking normal.

Steve
____________

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30063 - Posted: 18 May 2013 | 12:55:22 UTC

I had another one which if I had let it run, would have taken 100 hours to complete. 041px44x4-NOELIA_klebe_run2-1-3-RND4186, ran ~6hrs for ~6%, so I aborted again. I haven't seen this happening on any of my other machines. They are all Win7, this system is Linux, so maybe there's something wrong specific to the Linux platform? I'll have to investigate further when I get a chance, too much other things going on right now.http://www.gpugrid.net/workunit.php?wuid=4464500

Steve

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30064 - Posted: 18 May 2013 | 13:50:25 UTC - in response to Message 30063.

This might just be an issue with these specific WU's, they don't run well on Linux.
There are some Downclocking and CPU usage possibilities that might cause this:

If you are not using coolbits to increase the fan speed and the GPU is getting too hot, it would downclock.

The GPU might get downclocked by the OS if the GPU usage isn't perceived as being high enough.

If the CPU's clocks drop (to 1400 MHz) it might starve the GPU enough to cause the GPU's clocks to drop. Top

The problem with Linux is finding out what's going on. I would really like to be able to fill out the Useful Tools area for Linux...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30067 - Posted: 18 May 2013 | 16:21:49 UTC

Has anyone had success in running the current Noelias under Linux? So far I've read a few posts saying it wouldn't work at all.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30068 - Posted: 18 May 2013 | 16:25:38 UTC - in response to Message 30067.
Last modified: 18 May 2013 | 16:35:21 UTC

People tend to complain when things aren't working, rather than when things are working.

There are some successful (and normal) Linux runs for NOELIA_klebe WU's out there:

005px12x2-NOELIA_klebe_run-1-3-RND5943_1 4441647 16 May 2013 | 12:36:44 UTC 16 May 2013 | 23:06:31 UTC Completed and validated 36,427.18 15,679.52 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/result.php?resultid=6878594

255px50x1-NOELIA_klebe_run2-0-3-RND5892_0 4458045 16 May 2013 | 17:37:23 UTC 17 May 2013 | 9:19:35 UTC Completed and validated 36,790.19 18,420.04 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4458045

306px36x4-NOELIA_klebe_run-2-3-RND3942_0 4447878 14 May 2013 | 2:20:57 UTC 14 May 2013 | 12:38:57 UTC Completed and validated 36,208.52 15,933.88 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4447878

All 3 on the same rig.


291px19x1-NOELIA_klebe_run2-0-3-RND1187_1 4459422 18 May 2013 | 1:41:50 UTC 18 May 2013 | 15:27:00 UTC Completed and validated 41,384.82 2,370.90 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4459422

148nx1x4-NOELIA_klebe_run2-0-3-RND6125_1 4455890 16 May 2013 | 21:26:28 UTC 17 May 2013 | 13:28:33 UTC Completed and validated 40,988.13 17,533.30 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4455890

290px25x1-NOELIA_klebe_run-2-3-RND9425_1 4444345 17 May 2013 | 16:11:58 UTC 18 May 2013 | 6:09:31 UTC Completed and validated 38,213.60 3,327.34 127,800.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
http://www.gpugrid.net/workunit.php?wuid=4444345

3 more different Linux rigs, and different WU's, and enough for me.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30077 - Posted: 18 May 2013 | 22:23:14 UTC - in response to Message 30068.

People tend to complain when things aren't working, rather than when things are working.

Sad but true, almost over the entire planet.

____________
Greetings from TJ

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30080 - Posted: 19 May 2013 | 4:39:08 UTC - in response to Message 30077.

People tend to complain when things aren't working, rather than when things are working.

Sad but true, almost over the entire planet.

Why would people complain when things are working?

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30082 - Posted: 19 May 2013 | 6:40:28 UTC
Last modified: 19 May 2013 | 6:41:50 UTC

I have finished succesfullty over 25 new NOELIAS in Linux on my PC with two GTX 660Ti but this morning I've found one at 0% after six hours of processing, stopped it and restarted but still no progress, reboot and start boincmanager but the machine has quickly become unusuable and I had to restart again and abort the unit. These messages were in the log:

Sun 19 May 2013 08:15:45 AM CEST GPUGRID Task 216px32x1-NOELIA_klebe_run2-0-3-RND9100_4: no shared memory segment
Sun 19 May 2013 08:15:45 AM CEST GPUGRID Task 216px32x1-NOELIA_klebe_run2-0-3-RND9100_4 exited with zero status but no 'finished' file

The fist one I hadn't noticed before. The second one is common of suspended and restarted units but before aborting the units it appeared many times so it seems like the unit was restarting itself again and again.

I've seen that all other wingmen have computation errors.

regards

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30083 - Posted: 19 May 2013 | 9:55:17 UTC - in response to Message 30082.
Last modified: 19 May 2013 | 9:58:45 UTC

Trotador, I would suggest you abort it, if you haven't already.

Beyond, I have no idea why 51% of people do anything they do - I don't even ask anymore.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30084 - Posted: 19 May 2013 | 11:24:37 UTC - in response to Message 30080.

Why would people complain when things are working?

That was worth getting up early to read.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30085 - Posted: 19 May 2013 | 12:09:50 UTC - in response to Message 30083.
Last modified: 19 May 2013 | 12:14:35 UTC

Beyond, I have no idea why 51% of people do anything they do - I don't even ask anymore.

Like electing (sort of) gwb twice?

I've had a couple WUs seem to stall lately and when I vnc to the machine there's an error message saying the acemd app has had an error. If I close that box the WU restarts from zero, but if I shut down BOINC, then hit the X on the box, and then either restart BOINC or reboot the PC the WU progresses normally. It seems better to reboot because restarting BOINC sometimes causes the WU to progress at about 1/2 speed. A restart gets the GPU running normally again. BTW, the above order of the steps is important:

1) Shut down BOINC.
2) Hit the X on the error message.
3) Restart BOINC or (preferably) reboot.

BTW, all these boxes are Win7-64.

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30118 - Posted: 20 May 2013 | 14:39:21 UTC - in response to Message 30068.

I'm sorry if you think that I was complaining, I was under the impression that maybe I'd get some help here. I've had 3 successful SDOERR tasks complete normally in the expected amount of time, and 3 NOELIA_klebe tasks that run painfully slow, most likely to end in error. I have a 4th NOELIA_klebe at 11% that's been running for 13hrs 15mins that I'm about to abort. I am in no way saying that they can't be successfully run on the Linux platform, just trying to find out what's going on so that I can get it corrected.

Here is a link to the problematic host's tasks
http://www.gpugrid.net/results.php?hostid=151979

Also, I did enable coolbits (GPU temps are around 41 degrees C) and set PowerMizer to prefer maximum performance. Also, I decided against aborting the current NOELIA_klebe task in hopes of using it for troubleshooting the problem. I've tried shutting down BOINC and rebooting, nothing's changed & still running slow.

Any suggestions?

Thanks,
Steve

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30123 - Posted: 20 May 2013 | 17:49:12 UTC - in response to Message 30118.

Also, I did enable coolbits (GPU temps are around 41 degrees C) and set PowerMizer to prefer maximum performance. Also, I decided against aborting the current NOELIA_klebe task in hopes of using it for troubleshooting the problem. I've tried shutting down BOINC and rebooting, nothing's changed & still running slow.

Any suggestions?

Thanks,
Steve

Hi Steve, the temp suggests to me that the WU has stopped. That happens now and then on windows too. See my post just above and see if that gets the WU moving again (with or without the error message). I'd try the reboot option as the GPU may have gone into an idle state (slow but still slightly processing). Shut down BOINC first and THEN reboot. Hope it works for you.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30134 - Posted: 21 May 2013 | 2:31:51 UTC

The latest Noelia seem to take more time to finish than an earlier units.

Here is a unit completed just a little while ago. It completed in just over 12 hours:

http://www.gpugrid.net/workunit.php?wuid=4468403

While a unit complete on May 9, finished in a little over 9 hours:

http://www.gpugrid.net/workunit.php?wuid=4438103

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30143 - Posted: 21 May 2013 | 8:24:55 UTC - in response to Message 30134.

The latest Noelia seem to take more time to finish than an earlier units.

Here is a unit completed just a little while ago. It completed in just over 12 hours:

http://www.gpugrid.net/workunit.php?wuid=4468403

While a unit complete on May 9, finished in a little over 9 hours:

http://www.gpugrid.net/workunit.php?wuid=4438103

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.


On my GTX285 and GTX550Ti they take between 41-42 hours. The previous ones around 30 hours. With no errors, but my systems crunch only a few though.
The ones from Stephen, SDOERR take roughly 28 on my rigs. As of yet without error as well.

____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30151 - Posted: 21 May 2013 | 10:02:03 UTC

Bedrich, your runtimes for the current Noelias vary from 33ks to 43ks on the one host I looked at. That's a lot.. I'd look at GPU utilization fluctuation, try to free some more CPU cores (if they're busy with other tasks) and see if GPu utilization stabilizes. I don't think this strong variation is inherent to the WUs.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30156 - Posted: 21 May 2013 | 13:52:26 UTC - in response to Message 30134.

The latest Noelia seem to take more time to finish than an earlier units.
Here is a unit completed just a little while ago. It completed in just over 12 hours:
While a unit complete on May 9, finished in a little over 9 hours:

Anybody else notice this?

So far, I have not experienced the blue screen, with these unit, and I had only about 4 or 5 error out, but those units were the "Too many errors (may have bug)" units, so that doesn't worry me. At last count, I have 97 completed and valid.

I haven't noticed them getting longer lately but they're definitely longer than is comfortable for my GPUs. In fact I move 4 of my cards to different projects when NOELIAS are the only WUs available. Not sure if the length is necessary or just an arbitrary choice. Strongly wish they were shorter though.

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30169 - Posted: 21 May 2013 | 19:14:37 UTC - in response to Message 30156.
Last modified: 21 May 2013 | 19:15:58 UTC

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Too late to edit my above post. New NATHANs just hit and they're SLOWER than anything I've seen yet at least on my 4 GTX 460/768MB cards. Not sure how the credits/hour will play out but since they won't make 24 hours it won't be pretty (at least on the 460s). Haven't hit the 650 Ti GPUs yet. But really, I'll ask again: Is there a good reason that the WUs have to be this long or is it just an arbitrary setting?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30171 - Posted: 21 May 2013 | 20:55:31 UTC - in response to Message 30169.

Basically, the amount of information included in a model determines the runtime. The more info you put in, the longer it will take, but the more accurate and meaningful the results can be.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30176 - Posted: 21 May 2013 | 23:28:30 UTC - in response to Message 30151.

Bedrich, your runtimes for the current Noelias vary from 33ks to 43ks on the one host I looked at. That's a lot.. I'd look at GPU utilization fluctuation, try to free some more CPU cores (if they're busy with other tasks) and see if GPu utilization stabilizes. I don't think this strong variation is inherent to the WUs.

MrS


I have 1 cpu dedicated to 1 gpu, and I haven't changed that.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30223 - Posted: 22 May 2013 | 18:56:12 UTC

The way I understand it is that the complexity of the WU determines the time for each time step (in the range of ms). The amount of time steps should be rather arbitrary and choosen so that the server is not overloaded.

MrS
____________
Scanning for our furry friends since Jan 2002

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30232 - Posted: 22 May 2013 | 21:41:21 UTC

This particular unit was a nasty one for me.

http://www.gpugrid.net/workunit.php?wuid=4452849

It caused my computer to shut down and it caused 2 other units, which were running well, to crash.

http://www.gpugrid.net/workunit.php?wuid=4475286

http://www.gpugrid.net/workunit.php?wuid=4474620

Profile Stephen Yeathermon
Avatar
Send message
Joined: 29 Apr 09
Posts: 9
Credit: 338,904,942
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 30234 - Posted: 22 May 2013 | 22:34:24 UTC - in response to Message 30169.

My observations on NOELIA WUs on my GPUs:

1) They're the longest running WUs I've seen at GPUGrid.
2) They're the most troublesome WUs I've seen at GPUGrid.
3) They have the lowest credits/hour of any long WUs.

Something does not compute (pun intended).

Too late to edit my above post. New NATHANs just hit and they're SLOWER than anything I've seen yet at least on my 4 GTX 460/768MB cards. Not sure how the credits/hour will play out but since they won't make 24 hours it won't be pretty (at least on the 460s). Haven't hit the 650 Ti GPUs yet. But really, I'll ask again: Is there a good reason that the WUs have to be this long or is it just an arbitrary setting?


I've done 7 total of the new Nathan's & they're giving credit of 167,550. 6 of the tasks ran 47K seconds on GTX660Ti's, and 75K seconds on a GTX560Ti.

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30317 - Posted: 24 May 2013 | 20:50:12 UTC

Looks like, I pulled this one out of the fire:

http://www.gpugrid.net/workunit.php?wuid=4473341


HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30366 - Posted: 25 May 2013 | 19:10:58 UTC - in response to Message 30317.

My 680 is crunching Noelia task in about 300000 s. Other tasks (Sdoerr and Nathan) are ok. I have updated driver to last 319.23.

Are they really so long?

Zdenek

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 30368 - Posted: 25 May 2013 | 19:35:16 UTC

My 680's are taking right around 29,000, that's at 1175MHz Windows XP x64.

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30369 - Posted: 25 May 2013 | 19:53:31 UTC - in response to Message 30368.

My 680's are taking right around 29,000, that's at 1175MHz Windows XP x64.


I have times around 29000 week ago. Maybe something with driver.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 30370 - Posted: 25 May 2013 | 20:31:06 UTC

What clock speed is you're 680 running at? To be honest, I think you're in the pipe 5x5 (just right).

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30372 - Posted: 25 May 2013 | 23:25:41 UTC - in response to Message 30366.

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30384 - Posted: 26 May 2013 | 9:00:05 UTC - in response to Message 30372.
Last modified: 26 May 2013 | 9:00:28 UTC

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.


Yes, you are right. Moved back to 310 and all is ok.

I have problem with my own distrrtgen app also. IMHO It stucks on synchronizing between CPU and GPU in cudaDeviceSynchronize().

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30387 - Posted: 26 May 2013 | 9:11:15 UTC - in response to Message 30384.
Last modified: 26 May 2013 | 9:11:48 UTC

I think there has generally been issues with this since about CUDA 4.2 dev.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 30397 - Posted: 26 May 2013 | 9:47:34 UTC - in response to Message 30387.

I have found that 6xx and Titan have problems. 5xx looks ok.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 30408 - Posted: 26 May 2013 | 12:08:47 UTC

Two more NOELIA failures yesterday: one after 4 seconds, the other after 20,927 seconds. I will continue with 'short' tasks.


6895000 4475926 25 May 2013 | 16:07:19 UTC 25 May 2013 | 16:18:17 UTC Error while computing 4.10 3.21 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6894999 4475925 25 May 2013 | 16:07:19 UTC 25 May 2013 | 22:08:10 UTC Error while computing 20,927.20 9,396.42 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)
6894967 4475903 25 May 2013 | 16:18:17 UTC 25 May 2013 | 16:21:06 UTC Aborted by user 0.00 0.00 --- Long runs (8-12 hours on fastest card) v6.18 (cuda42)

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31087 - Posted: 28 Jun 2013 | 8:48:17 UTC

Hi

Just completed what looks like a brand new Noelia

http://www.gpugrid.net/result.php?resultid=6992175

87k sec on a gtx 650 ti, poor gpu utilisation despite a reboot half way thinking there was a problem :(

Hope its a one off?

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31091 - Posted: 28 Jun 2013 | 12:56:11 UTC - in response to Message 31087.

Just completed what looks like a brand new Noelia

http://www.gpugrid.net/result.php?resultid=6992175

87k sec on a gtx 650 ti, poor gpu utilisation despite a reboot half way thinking there was a problem :(

Hope its a one off?

I got one of these too. The first guy aborted it and it took my OCed 650 Ti well over 24 hours (92,919.91 seconds) to run it in Win7-64 (vs yours in XP). GPU utilization was OK, but these are TOO LONG and to add insult to injury only give out about 1/2 the credits they should.

http://www.gpugrid.net/workunit.php?wuid=4527468

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31092 - Posted: 28 Jun 2013 | 12:56:30 UTC - in response to Message 31087.

I had one, too - http://www.gpugrid.net/result.php?resultid=6992593

I aborted it when I noticed it at 16+ hours, 54% completed.
The GPU load was at 95%, but Mem Controller load was only 4% on a 660ti.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31093 - Posted: 28 Jun 2013 | 13:13:57 UTC
Last modified: 28 Jun 2013 | 13:16:42 UTC

Hmm, Does not sound promising - don't suppose anyone has noticed what the gpu mem utilisation is?
Was wondering if it went over 1gb as seen a 680 complete one in a third of the time.
Agree these are too long, in my opinion they should be in a separate 'bucket' with a clear min hardware spec requirement; The long and short descriptions are far too vague.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31095 - Posted: 28 Jun 2013 | 14:20:05 UTC

There's one thing I'd like to say though, Nathan sure did a bang-up job on those NATHAN_KIDKIXc22's, I'm getting 98% GPU load and 35-38% memory controller utilization on my GTX680's. This buds for you, Nathan! You should have named them KIDKIX_BUTTc22, you really should give a clinic for the fellow researchers (I'm sure they'll get it sorted, not complaining).

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31097 - Posted: 28 Jun 2013 | 14:56:18 UTC - in response to Message 31093.

Hmm, Does not sound promising - don't suppose anyone has noticed what the gpu mem utilisation is?


Mine was around 1045 MB.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31099 - Posted: 28 Jun 2013 | 16:19:12 UTC
Last modified: 28 Jun 2013 | 16:19:38 UTC

Wow does that mean this units dont run on 1GB VRAM Hardware? (didnt tried)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31100 - Posted: 28 Jun 2013 | 16:54:59 UTC

Thanks, at least we have a plausible explanation, not so sure ruling out the mainstream will be good for GpuGrid. Pity we can't isolate WUs as I can think of an addition to Flashawks naming convention - but lets not go there.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31101 - Posted: 28 Jun 2013 | 17:14:39 UTC

Are you asking about GPU utilization or the size of the work units? I take it that these wu's are from the short queue and if the work unit size is larger than the amount of GDDR memory on the video card, that would not only cause a massive slow down in crunching times it will also make your computer almost unresponsive (mouse, keyboard and such).

Beyond knows what I'm talking about and if that were the case, I'm sure he would have mentioned it. I am confused by petebe's response, is he talking about the work unit size? I haven't done any short queue tasks in sometime and I do know that Noelia's work units are setup differently than Nathans and her wu's typically have a lower CPU and GPU utilization.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31103 - Posted: 28 Jun 2013 | 18:04:00 UTC - in response to Message 31101.

That's what I understood too. All scientists are working on different projects/amino acids and use different algorithms. Thus WU's differ. The latest one from Nathan seems almost optimal as far we can see in error-free and rather fast cycles on the fastest cards.
But I also like to mention that I had very little problems with Noelia's WU's as well, only one beta failed and one because Windows though to update itself (this is now not longer possible).
____________
Greetings from TJ

petebe
Send message
Joined: 19 Nov 12
Posts: 31
Credit: 1,549,545,867
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwat
Message 31104 - Posted: 28 Jun 2013 | 18:13:35 UTC - in response to Message 31101.

Flashhawk, I was referring to the "Memory Used" figure as reported by GPU-Z.
Memory Used was 1045 mb and Memory Controller Load was 4%.
In contrast, NATHANs usually run around 250-450 mb Memory Used and 35% Mem Controller Load.

I don't know how this relates to a WU size - sorry if this was confusing.

This particular 660ti does GPUGrid crunching only - it's not connected to a monitor. One HT CPU reserved per GPU.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31106 - Posted: 28 Jun 2013 | 18:46:34 UTC - in response to Message 31101.

Are you asking about GPU utilization or the size of the work units? I take it that these wu's are from the short queue and if the work unit size is larger than the amount of GDDR memory on the video card, that would not only cause a massive slow down in crunching times it will also make your computer almost unresponsive (mouse, keyboard and such).

Beyond knows what I'm talking about and if that were the case, I'm sure he would have mentioned it. I am confused by petebe's response, is he talking about the work unit size? I haven't done any short queue tasks in sometime and I do know that Noelia's work units are setup differently than Nathans and her wu's typically have a lower CPU and GPU utilization.

They're long queue WUs yet they credit like the short queue. If I see any more I'll make like a Dalek: EXTERMINATE, EXTERMINATE!!!

BTW, like you mentioned: kudos to Nathan on the new KIX WUs. Nathan, give the other WU generators a class in WU design. Please?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31107 - Posted: 28 Jun 2013 | 18:47:01 UTC - in response to Message 31104.

NOELIA_Mg WU are Long runs. Most of Noelia's work has used >1GB GDDR and taken longer than other work.
207850 credits would be about right in my opinion (including the 50% bonus).
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31108 - Posted: 28 Jun 2013 | 18:51:56 UTC - in response to Message 31107.

NOELIA_Mg WU are Long runs. Most of Noelia's work has used >1GB GDDR and taken longer than other work.
207850 credits would be about right in my opinion (including the 50% bonus).

The ones listed above scored just 69,875. Including only 25% bonus though since they're SO LONG :-(

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31110 - Posted: 28 Jun 2013 | 19:12:28 UTC - in response to Message 31104.
Last modified: 28 Jun 2013 | 19:19:50 UTC

Flashhawk, I was referring to the "Memory Used" figure as reported by GPU-Z.
Memory Used was 1045 mb and Memory Controller Load was 4%.
In contrast, NATHANs usually run around 250-450 mb Memory Used and 35% Mem Controller Load.

I don't know how this relates to a WU size - sorry if this was confusing.

This particular 660ti does GPUGrid crunching only - it's not connected to a monitor. One HT CPU reserved per GPU.


No petebe, I wasn't confused by anything on your part, I was confused because more people aren't complaining about unresponsive computers. If someone is using an older card with only 1GB of onboard GDDR, then the system RAM or swap file would be used slowing computers to a crawl.

No, you're fine buddy, sorry for the confusion, I should have been a little clearer. Frankly, I'm shocked I haven't seen more of this in the forum, that's a huge wu for not much credit. I guess I'll have to turn on the short queue and check them out, that is odd they aren't in the long queue.

Edit: I understand now (I'm pretty slow sometimes), there coming through the long queue, I haven't seen one yet.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31114 - Posted: 28 Jun 2013 | 20:56:00 UTC

GPU-Z "only" reports the overall memory used, which includes the GPU-Grid WU and anything else running. If a card with 1024 MB shows 1045 MB used that won't slow the computer to a crawl. Everything except a whopping 21 MB still fit into the GPU memory. How often can this amount be transferred back and forth between system RAM and GPU at PCIe speeds? (rough answer: a damn lot)

It's only when the amount of memory needed significantly exceeds the amount of memory present on the card that things will become.. uncomfortable.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31117 - Posted: 28 Jun 2013 | 22:16:45 UTC - in response to Message 31114.

GPU-Z "only" reports the overall memory used, which includes the GPU-Grid WU and anything else running. If a card with 1024 MB shows 1045 MB used that won't slow the computer to a crawl. Everything except a whopping 21 MB still fit into the GPU memory. How often can this amount be transferred back and forth between system RAM and GPU at PCIe speeds? (rough answer: a damn lot)

It's only when the amount of memory needed significantly exceeds the amount of memory present on the card that things will become.. uncomfortable.

MrS

It could also be other things. As some of you remember from another thread I was having problems with my new GTX660 on a XFX MOBO. With a laggy system at times with 2GB RAM on the GPU and 12GB on the MOBO. Or the driver, or the driver in combination with another piece of software. I have put the GTX660 in another system and it works like a train (as we say in Dutch).

But how about the question from dskagcommunity about VRAM? His exact question:
Wow does that mean this units don't run on 1GB VRAM Hardware? (didn't tried)

As it can be swapped back and forth from system RAM to GPU than would the message that these WU's won't run on 1GB VRAM cards make no sense, or am I missing something important? I just want to learn here.


____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31120 - Posted: 28 Jun 2013 | 23:42:04 UTC - in response to Message 31117.
Last modified: 28 Jun 2013 | 23:42:34 UTC

This has been discussed before, and to some length - These WU's are only going to be slow on 768MB cards and 512MB cards; a few GTX460's, GTX450's and GT440's. Generally speaking the relative performance of Noelias' WU's on mid-range cards should be better as they won't be burdened with low bus/bandwidths. More the pity they don't have better credit and cant finish inside 24h...
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31124 - Posted: 29 Jun 2013 | 9:54:18 UTC - in response to Message 31117.

You're right TJ, these WUs should run on cards with 1 GB VRAM. However, I think the signs are clear: don't anyone buy or recommend a card with 1 GB for GPU-Grid any more.

And there's the issue of algorithm selection. GFD once said that they've got (at least) 2 different algorithms, one is faster but needs more VRAM. The app selects the faster one if possible. Meaning cards with "low" VRAM may see reduced crunching speed even before running out of VRAM completely.

Oh, and regarding the reference to your strange GTX660 problem: the question here was not "what can make a system choppy" (lot's of possibilities, I agree) but rather "would exceeding the available VRAM slightly make a system choppy".

MrS
____________
Scanning for our furry friends since Jan 2002

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31125 - Posted: 29 Jun 2013 | 11:23:52 UTC
Last modified: 29 Jun 2013 | 11:24:40 UTC

Ah good to read, hope the 1,28gb on my 24h crunchermachines are working still a longer time without swapping, i buyed them only few month ago ^^
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31126 - Posted: 29 Jun 2013 | 12:27:32 UTC - in response to Message 31120.

This has been discussed before, and to some length - These WU's are only going to be slow on 768MB cards and 512MB cards; a few GTX460's, GTX450's and GT440's. Generally speaking the relative performance of Noelias' WU's on mid-range cards should be better as they won't be burdened with low bus/bandwidths. More the pity they don't have better credit and cant finish inside 24h...

EXCEPT that no one has been talking about these WUs on sub 1GB cards. The first reports were referring to 650 Ti GPUs and so far reports have been that they're running even worse on the 2GB 660 and 660 TI cards. BTW, you can add a 1280MB 570 to the list of GPUs that don't like these new NOELIAS...

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31355 - Posted: 8 Jul 2013 | 21:42:34 UTC

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31361 - Posted: 9 Jul 2013 | 6:55:53 UTC - in response to Message 31355.

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...
____________

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31362 - Posted: 9 Jul 2013 | 8:30:31 UTC

In Linux I can't check GPU utilization, so can't tell how well this NOELIA is using my card. Judging by the temperature of the card though (52C), utilization must be pretty low, as it normally goes up to 64-67C with NATHANs.

I'm wondering what to do, let it continue or abort it? It seems I'll finish it before its deadline, but looks like such a waste of both time and resources, doesn't it?
____________

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31363 - Posted: 9 Jul 2013 | 9:17:15 UTC - in response to Message 30384.

Zdenek, It might be a driver issue. Others have reported similar problems with 319.x on Linux.


Yes, you are right. Moved back to 310 and all is ok.

I have problem with my own distrrtgen app also. IMHO It stucks on synchronizing between CPU and GPU in cudaDeviceSynchronize().


All drivers above 319 (incl 325 beta) under linux still have problem with noelia tasks with 6xx gpus. Very slow. Low CPU usage.

I recommend to use 310 under linux.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31364 - Posted: 9 Jul 2013 | 9:43:44 UTC - in response to Message 31363.

You're so right! I removed 319 and installed 310.44 and immediately CPU usage went up (40-45% from 15-20%) and the GPU temp is at the usual 65C! Also, estimated total time is dropping rapidly.

Thanks for the great tip!
____________

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31365 - Posted: 9 Jul 2013 | 11:24:12 UTC - in response to Message 31361.
Last modified: 9 Jul 2013 | 11:25:02 UTC

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...

They are ok here. Finishing without errors in the usual 8:30/9:00 hrs in my 690s and 770s. Driver 320.49

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31366 - Posted: 9 Jul 2013 | 12:43:22 UTC - in response to Message 31365.

Yeah, it seems it was the driver (319.17). I downgraded to 310.44 and it's working much better now. If my calculations are not off, it should take ~18.5h for my 650Ti to complete such a WU, which is very similar to NATHAN_KIDs. This particular NOELIA I'm crunching right now will of course take longer, as I did the first 6% very slowly.
____________

HA-SOFT, s.r.o.
Send message
Joined: 3 Oct 11
Posts: 100
Credit: 5,879,292,399
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31369 - Posted: 9 Jul 2013 | 14:32:22 UTC - in response to Message 31365.

They are ok here. Finishing without errors in the usual 8:30/9:00 hrs in my 690s and 770s. Driver 320.49


It is linux and noelia tasks problem only. Win are ok. Nathans tasks and linux are ok also.

captainjack
Send message
Joined: 9 May 13
Posts: 171
Credit: 2,321,929,288
RAC: 2,361,870
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31370 - Posted: 9 Jul 2013 | 15:45:14 UTC

This Noelia task http://www.gpugrid.net/result.php?resultid=7034770 appeared to lock up my Linux box so I aborted it (after two reboots). It had previously failed on two Windows boxes.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31371 - Posted: 9 Jul 2013 | 16:22:46 UTC

On all 3 of my GTX 460/768 GPUs:

CRASH!

Problem signature:
Problem Event Name: APPCRASH
Application Name: acemd.2865P.exe
Application Version: 0.0.0.0
Application Timestamp: 511b9dc5
Fault Module Name: acemd.2865P.exe
Fault Module Version: 0.0.0.0
Fault Module Timestamp: 511b9dc5
Exception Code: 40000015
Exception Offset: 00015ad1
OS Version: 6.1.7601.2.1.0.256.1
Locale ID: 1033
Additional Information 1: ef57
Additional Information 2: ef57694f685d7e60ac50a2030c6fbaf6
Additional Information 3: 907e
Additional Information 4: 907ef510ab2fa0efd4b93de2612b25ed

Seems they need at least 1GB, thanks for the heads up, not...

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31373 - Posted: 9 Jul 2013 | 20:44:53 UTC

They did not failed on my hosts so far.
Their credit per sec and their GPU usage are ok. (on WinXP x64 and x32)
They still don't use a full CPU thread (with Kepler GPUs), however it does not decrease their GPU usage.
No complaints from me this time. So far. :)

pvh
Send message
Joined: 17 Mar 10
Posts: 23
Credit: 1,173,824,416
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31376 - Posted: 9 Jul 2013 | 22:02:47 UTC

I thought you were going to abort the current Noelia runs...?

I had one that was stuck at 6% and aborted it. But I keep getting other Noelia WUs as replacements. No Nathan runs anywhere in sight...

I am giving up on this project for now...

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31377 - Posted: 10 Jul 2013 | 0:40:12 UTC - in response to Message 31376.

I have 1 box that doesn't like these NOELIA's so I'm going to swap in my 2 GTX670 backup cards and see if that works. I should just switch to Linux Debian now, I've been getting ready for sometime. Microsoft is going to stop supporting Windows XP 32 bit in April and XP x64 in September 2014 even though XP is still running on 38% of the worlds computers (Windows 7 is 44%).

Profile Stoneageman
Avatar
Send message
Joined: 25 May 09
Posts: 224
Credit: 34,057,224,498
RAC: 231
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31378 - Posted: 10 Jul 2013 | 1:35:10 UTC

So far no issues with current Noelia tasks on Linux but this beta had run for 11+ hrs before I noticed it & required a reboot to get the 660 working again.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31380 - Posted: 10 Jul 2013 | 8:57:28 UTC - in response to Message 31361.

WELL..... new Noelias are filling the cache.... let´s see how these ones goes and hope for the best.

Well, my 650Ti doesn't seem to like them AT ALL! At least this one.

Slot: 0
Task: 063ppx8x1-NOELIA_klebe_run4-0-3-RND9577_0
Elapsed: 04:29
CPU time: 00:17
Percent done: 03.76
Estimated: 119:17
Remaining: 114:36

So, it will take something like 5 days to finish on my 650Ti! I wonder if there's a card out there that can finish these in the 24h window...


Reporting back on this. It turned out (with the help of HA-SOFT, s.r.o., thanks!) that NOELIAs have some trouble with driver 319 under Linux. I downgraded to 310.44 and the NOELIA I was currently crunching started progressing at a much faster rate. It finished in ~25h (previously estimated at 119h!) and, of course, I missed the 24h bonus, but only because I had lost ~7 hours with the newer driver.

The new NOELIA_klebe_run I got has an estimated 18:09, which is about the same with NATHANs on my GTX 650Ti. What's sweet with these NOELIAs is the CPU usage, about 40-45% of my i7 870.
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31381 - Posted: 10 Jul 2013 | 10:58:30 UTC - in response to Message 31380.

The 304.88 repository driver works just fine. In my opinion there are too many issues with the ~320 drivers for both Windows and Linux.

83equ-NOELIA_7mg_restraint-0-1-RND2660_2 4581443 9 Jul 2013 | 22:39:33 UTC 10 Jul 2013 | 8:39:45 UTC Completed and validated 13,625.77 4,782.22 38,025.00 ACEMD beta version v6.49 (cuda42)

The times look similar to last weeks:

53equ-NOELIA_1MG-0-1-RND1933_0 4565739 3 Jul 2013 | 19:01:52 UTC 4 Jul 2013 | 1:19:44 UTC Completed and validated 13,469.70 4,359.40 38,025.00 ACEMD beta version v6.49 (cuda42)

I like that the Beta credit is in line with Long WU's even though the Betas are quite short.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31384 - Posted: 10 Jul 2013 | 14:32:07 UTC - in response to Message 31380.

It turned out (with the help of HA-SOFT, s.r.o., thanks!) that NOELIAs have some trouble with driver 319 under Linux. I downgraded to 310.44 and the NOELIA I was currently crunching started progressing at a much faster rate.

Don't know what's going on with NV drivers lately. Had to switch 3 of my GPUs to other projects because of the NOELIAs and found that while 2 ran fine at SETI, the 3rd did not. Looked at them and sure enough the 3rd had a newer driver (all are Win7-64). Reverted to 310.90 and SETI ran like a charm. So it's not just Linux, it's Windows too with NVidia driver problems.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31386 - Posted: 10 Jul 2013 | 15:33:37 UTC
Last modified: 10 Jul 2013 | 15:35:13 UTC

Thats why the admins over several projects often say, the lastest drivers are perhaps good for gaming but not always for crunching ;) i think the latest really stable crunchproofdrivers are 310.xx Im very careful with driver updates because there where too much problems often enough. But thats not only an NVidia Thing. You can always hit the ground hard with actual ATI/AMD Drivers like 13.x too ;)
____________
DSKAG Austria Research Team: http://www.research.dskag.at



John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31387 - Posted: 10 Jul 2013 | 17:24:16 UTC
Last modified: 10 Jul 2013 | 17:25:04 UTC

My system builder has threatened me with at least death if I ever update an NVIDIA driver. He carefully selects the driver as he builds the machine and leaves in in place....

I am now running NOELIAs without a problem.

John

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31388 - Posted: 10 Jul 2013 | 17:45:09 UTC - in response to Message 31387.

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31392 - Posted: 11 Jul 2013 | 7:45:13 UTC

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?
____________

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 1576
Credit: 5,600,586,851
RAC: 8,766,111
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31393 - Posted: 11 Jul 2013 | 8:16:03 UTC - in response to Message 31392.

Just completed 35x5-NOELIA_7MG_RUN-0-2-RND3709 - only a minor increase in runtime (less than 10%) compared to other recent tasks for host 132158.

But I did see that the final upload was ~150 MB - that's back to pre-compression sizes.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31394 - Posted: 11 Jul 2013 | 8:17:47 UTC

I have 3 Noelia's running estimate is 12-13 hours and definitely one will finish in that time. All run on windows (vista and 7) with 320.18 drivers. I had some Noelia SR in the previous days and they finish all okay.
Yesterday evening one resulted in a automatic system boot. Checking Who Crashed shows that it was the nVidia driver. I haven´t changed it yet as I want to see more crashed or bad results.
____________
Greetings from TJ

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31395 - Posted: 11 Jul 2013 | 8:46:22 UTC
Last modified: 11 Jul 2013 | 8:51:23 UTC

Thanks for the responses guys!

You both have faster cards than my 650Ti, you have 660s and 670s.

TJ, it's one of your 660s that estimates to 12-13h, right?

Shouldn't my 650Ti estimate to about twice that, ~24h? Instead, my estimate is at 44h, which is more than 3x your 660's time!

I hope these NOELIAs don't have a problem with driver 310 under Linux!

Edit: TJ, I guess your estimates are from the BOINC manager, right? My 44h estimate is from a script I have that parses the slots' task state files. My BOINC manager shows ~24h estimate, as expected. Of course, the BOINC manager's estimates are almost always wrong for me...
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31397 - Posted: 11 Jul 2013 | 9:21:59 UTC - in response to Message 31395.

Thanks for the responses guys!

You both have faster cards than my 650Ti, you have 660s and 670s.

TJ, it's one of your 660s that estimates to 12-13h, right?

Shouldn't my 650Ti estimate to about twice that, ~24h? Instead, my estimate is at 44h, which is more than 3x your 660's time!

I hope these NOELIAs don't have a problem with driver 310 under Linux!

Edit: TJ, I guess your estimates are from the BOINC manager, right? My 44h estimate is from a script I have that parses the slots' task state files. My BOINC manager shows ~24h estimate, as expected. Of course, the BOINC manager's estimates are almost always wrong for me...

Hello Vagelis,
Yes indeed the 12-13 hour is for the 660. I have also a 550Ti doing a Noelia and that would take about 46 hours! Already 36.5 hours done.
I do the estimates myself. If I see what % has been done in which time I calculate that towards 100%. So 100 divided by percentage % dome times the time it took to do that percentage.
You have to see if the driver works by let it do a few WU's. I don't switch drivers to often.
____________
Greetings from TJ

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,131,296
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31398 - Posted: 11 Jul 2013 | 9:24:10 UTC

I have been running GPUgrid for weeks without trouble, suddenly, a few days ago, all the work units I try to run don't utilize the GPU as before. Initially they are calculated to run for abou 13 hours, but after the 13 hours they have only reached about 15% and the time to completion starts rising. At this point I abort them, if not before, as they don't seem to utilize more than a small part of the GPU.
I'm running Boinc 7.0.65 on Linux, nvidia-drivers 319.23.
So far I've wasted about 2-3 days of electricity trying to crunch. Am I the only one experiencing this? Is it due to the new WUs mentioned?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31399 - Posted: 11 Jul 2013 | 9:39:25 UTC - in response to Message 31392.
Last modified: 11 Jul 2013 | 9:57:14 UTC

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?

A little under 19 hours on my 650 Ti (980 MHz), using Win7 64-bit. Two have completed successfully, and one is in progress. (The only crash was when I changed a cc_config file on a work in progress; I think it would have completed normally otherwise.)

Did you leave a CPU core free to support the GPU?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31400 - Posted: 11 Jul 2013 | 9:44:11 UTC - in response to Message 31398.

I have been running GPUgrid for weeks without trouble, suddenly, a few days ago, all the work units I try to run don't utilize the GPU as before. Initially they are calculated to run for abou 13 hours, but after the 13 hours they have only reached about 15% and the time to completion starts rising. At this point I abort them, if not before, as they don't seem to utilize more than a small part of the GPU.
I'm running Boinc 7.0.65 on Linux, nvidia-drivers 319.23.
So far I've wasted about 2-3 days of electricity trying to crunch. Am I the only one experiencing this? Is it due to the new WUs mentioned?

Well its hard to say I guess. It are all different Noelia WU´s. You can also read in this thread that the driver you are using could be the issue. The klebe-run seems to be in line with the long runs, but on my 550Ti it is already 36.6 hours working for 79%. That 550Ti uses driver 320,18 but could be an issue with this particular WU.
I see regularly that a SR is about 3 times as long on the 550Ti than on the 660 and a LR twice as long.
Now the Noelia klebe is going to take 4 times as long. I like to see some klebe-runs on my 660 first to decide what to do with the driver.
____________
Greetings from TJ

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31401 - Posted: 11 Jul 2013 | 10:07:05 UTC

The NOELIA_xMG_RUN WUs have a very large output file, approximately 147 MB. Are you using the previous application version, again? The units are running otherwise fine, taking me between 10.5 to 11.5 hours to complete on my windows 7 computer, so please don't cancel them, like you did last time.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31402 - Posted: 11 Jul 2013 | 10:35:32 UTC - in response to Message 31401.

The units are running otherwise fine, taking me between 10.5 to 11.5 hours to complete on my windows 7 computer, so please don't cancel them, like you did last time.

I agree. They are running fine on both of my 660's and 650 Ti.
It is too early to see what the error rate is; it may be a little more than the Nathans, but not very much thus far.

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,131,296
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31403 - Posted: 11 Jul 2013 | 10:50:29 UTC

How long should I let them run, then? If they were using the GPU 100% I'd have let them run, but they don't. That's why I cancelled them, fearing they would take days to run or eventually error out.
Unfortunately the only way I can see how much they use the GPU is by watching the temperature; usually it stays around 60-62 degrees, with the new WUs it's around 50. Idle is 35-40. Oh, and ambient temperature is higher than normal the last few days (35-38 degrees), so that's not the issue.
I'm not going to downgrade the nvidia-drivers as I'm using the computers for many other things beside crunching, like games.

Anyway, I suppose I'll give it another go then and let it run to 100%, then report back here.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31404 - Posted: 11 Jul 2013 | 10:55:23 UTC - in response to Message 31388.
Last modified: 11 Jul 2013 | 10:58:08 UTC

Hi, Jim:

Both my NVIDIA GTX 650 Ti GPUs show driver 320.18 dated 12 May 2013.

John

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31405 - Posted: 11 Jul 2013 | 11:11:27 UTC - in response to Message 31404.

Hi, Jim:

Both my NVIDIA GTX 650 Ti GPUs show driver 320.18 dated 12 May 2013.

John

John,

What driver do you (and your system builder) like these days?
I haven't noticed any problems with recent Nvidia drivers, but I can't say that they don't occur either.

Thanks. I was using 320.49 with no problems on my 650 Ti, but thought I would go back to 310.90 as a test.
But in general (unlike AMD drivers), the Nvidia ones all work the same for me.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31406 - Posted: 11 Jul 2013 | 12:36:49 UTC - in response to Message 31403.

I don´t see a difference in temperature still around 66°C just like Nathan´s LR´s.
But I have never seen any use 100% GPU load, now Noelia is around 93%.
All running fine still, but a little slower than Nathan´s but keep in mind that Noelia is using different functionality, so these WU´s can´t be compared one to one I guess.
____________
Greetings from TJ

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31407 - Posted: 11 Jul 2013 | 13:50:12 UTC - in response to Message 31392.

Argh! These NOELIA_xMG_RUN WUs are taking too long on my 650Ti, around 44h!! I aborted two of them, hoping for a NOELIA_klebe or NATHAN, but nope, it was one of these beasts or nothing..

What times do you guys see for these WUs?

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31408 - Posted: 11 Jul 2013 | 14:26:04 UTC - in response to Message 31407.

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

My experiences above were only on the NOELIA_klebe, so I don't know what problems will occur on the NOELIA_xMG_RUN. But my 660s have 2GB, and my 650 Ti has 1GB, so I guess I will find out.

Maybe they should have an opt-in for these larger sizes? I am sure there are plenty of cards around that can do them, it is just a question of getting the right work unit on the right card.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31409 - Posted: 11 Jul 2013 | 14:32:30 UTC - in response to Message 31408.

Same here. I think Noelia has thrown us another curve without notice. Just as the NOELIA_klebe will not run on cards with less than 1GB, these NOELIA_xMG_RUN WUs look as if they run OK on 2GB cards but extremely slow on 1GB. Some of the earlier NATHAN WUs had a similar behavior on < 1GB GPUs and ran at 1/2 speed. He fixed them and the later NATHANs then ran fine on sub 1GB cards. I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

My experiences above were only on the NOELIA_klebe, so I don't know what problems will occur on the NOELIA_xMG_RUN. But my 660s have 2GB, and my 650 Ti has 1GB, so I guess I will find out.

Maybe they should have an opt-in for these larger sizes? I am sure there are plenty of cards around that can do them, it is just a question of getting the right work unit on the right card.

We've asked and asked and it should be simple to do. Maybe don't know how, don't care? Who knows.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31410 - Posted: 11 Jul 2013 | 15:58:26 UTC
Last modified: 11 Jul 2013 | 16:00:20 UTC

These MG Units are the first ones that run bit different in relation to all others before here ^^ The single 560TI 448 Cores in the Pentium4 system run MG Units a bit faster then one of the two 570 Cards in a Core2Duo System. I would suggest it is the card in the x4 Slot. But never saw before the 570 with higher runtime then the 560. Seems first time to me that a bit PCIe bandwidth is needed.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31411 - Posted: 11 Jul 2013 | 16:36:05 UTC - in response to Message 31410.

These MG Units are the first ones that run bit different in relation to all others before here ^^ The single 560TI 448 Cores in the Pentium4 system run MG Units a bit faster then one of the two 570 Cards in a Core2Duo System. I would suggest it is the card in the x4 Slot. But never saw before the 570 with higher runtime then the 560. Seems first time to me that a bit PCIe bandwidth is needed.

Also looks like they run OK in 1279MB, so it seems they need more than 1024MB but somewhere less than 1279MB to run at an acceptable speed. Unfortunately that's too bad for most of us.

werdwerdus
Send message
Joined: 15 Apr 10
Posts: 123
Credit: 1,004,473,861
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31412 - Posted: 11 Jul 2013 | 16:46:11 UTC

I have a 7MG work unit running on 650 Ti on windows 7, GPU usage is 98% and it is 50% after 20 hours. Similar 7MG units are running on my 470 with about 85% done after 14 hours, and 660 Ti about 55% done after 5.5 hours, so I guess it is the GPU memory that is the problem. 650 Ti has 1GB, 470 has 1280MB, and 660 Ti has 2GB.
____________
XtremeSystems.org - #1 Team in GPUGrid

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31413 - Posted: 11 Jul 2013 | 17:05:42 UTC

It just took 8 hours 27 minutes for my 670 to finish a 7MG_RUN with a 151MB upload, I noticed that the NATHAN's used over 95% of the CPU while the NOELIA's use less than 50% of the CPU. My 680's and 770 take about 7 hours 40 minutes, I have no choice but to use the 320.xx series drivers other wise the 770 won't work, the 320.49 seems to be fine but the other 320's are buggy (it's all over the internet).

It looks like were going to have to buck up and bite the bullet and get through these work units, it's been running in the mid 90's F here where I live in the Sierra's and I've had to shut down half my rigs for 6 - 8 hours every day. The San Joaquin Valley has been hitting 105°-110°F, so I guess I'm lucky I live at 5000 feet, these are normal temperatures for here in the summer and it's tolerable with the humidity at 25%. I just wish it would cool off so I can keep all my rigs running 24/7.

Kenneth Larsen
Send message
Joined: 11 Feb 09
Posts: 6
Credit: 162,131,296
RAC: 0
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31414 - Posted: 11 Jul 2013 | 18:00:49 UTC

Just for your information, my graphics card is a GTX660 with 2GB of memory, and I'm still unable to these WUs well. Maybe it's different on Linux than on Windows?

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31415 - Posted: 11 Jul 2013 | 19:01:30 UTC - in response to Message 31414.

Kenneth.. sorry there was no clear response before: nVidia driver 319 has been shown by at least 2 others to cause the issue you describing. Downgrading to 310 has fixed it in both cases, so give it a try.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31418 - Posted: 11 Jul 2013 | 20:02:13 UTC
Last modified: 11 Jul 2013 | 20:03:01 UTC

A Noelia 7MG is using 1329MB RAM on my GTX 480 (Win7 x64), another one (on a GTX 670 WinXP x64) is started at 1188MB memory usage, and it's slowly rising. So these workunits won't fit in 1GB RAM. I had a stuck workunit, it made no progress after 6 hours, so I've aborted it, however it's page shows 0 sec runtime. The subsequent workunit also stuck at 0% progress, but a system restart fixed this situation.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31420 - Posted: 11 Jul 2013 | 20:50:59 UTC - in response to Message 31414.

Just for your information, my graphics card is a GTX660 with 2GB of memory, and I'm still unable to these WUs well. Maybe it's different on Linux than on Windows?


One of the problems with Linux is lack of good monitoring and GPU clock adjusting software, in Windows, when one wu finished and another started, especially when going from a NATHAN to a NOELIA, my GPU clock would change. Sometimes it would boost too high and cause errors, I am able to create profiles in PrecisionX and reset everything with one click.

I know there aren't very good apps for Linux (at least that I'm aware of) for doing this, it would certainly help. I wish someone would write a good one soon because I'll be switching to it when NVidia stops making drivers for XP x64 and Microsoft stops supporting it next year.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31422 - Posted: 11 Jul 2013 | 21:29:10 UTC - in response to Message 31407.

I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31423 - Posted: 11 Jul 2013 | 23:07:31 UTC - in response to Message 31422.
Last modified: 11 Jul 2013 | 23:29:59 UTC

This batch could well have a GDDR capacity issue for anything other than cards with 2GB GDDR (which suggests CUDA routine selection isn't working/doesn't exist in this run), and possibly a separate issue with Linux... I will plug in a rig tomorrow with a GTX650Ti and 304.88 to confirm this, but it's obviously going to take a while!

I would be reluctant to entirely blame Linux drivers however - my GTX470 WU took 25h 20min (too long), and about 10min to upload, and I'm using the 304.88 driver (which was fine up to now, and on two different systems). The numbers don't look right for a ~2.4 times downclock (My GTX470 will only downclock to 405 or 50MHz) and NVidia X Server tells me that my 470 is at it's FOC settings of 656MHz and 60°C (dual fan @72% and open case).

On Windows 7 and GPU's with 2GB GDDR I've had no issues (314.22 drivers).

The memory controller load is higher on these WU's (38% for a GTX660Ti and 31% on a GTX660, W7) and the app is 6.18 (an older one), so this looks like a basal (marker) run. I don't expect we will be seeing months of work on the 6.18 app. More likely a few days.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31424 - Posted: 11 Jul 2013 | 23:45:10 UTC - in response to Message 31422.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

A NOELIA_1MG just crashed on my GTX 660 with 2 GB memory after 7 hours run time, so there is no guarantee that even more memory will fix it (Win7 64-bit, 314.22 drivers, supported by a virtual core of an i7-3770).

The other NOELIAs that I have received have been fine, though that is not the entire set. There are some good ones and some not-so-good ones.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31425 - Posted: 12 Jul 2013 | 6:33:21 UTC
Last modified: 12 Jul 2013 | 6:37:00 UTC

This one 1MG crashed too on several machines like my superstable one http://www.gpugrid.net/workunit.php?wuid=4583900
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31426 - Posted: 12 Jul 2013 | 8:45:32 UTC - in response to Message 31422.

I think maybe NOELIA has just knocked all 1GB GPUs off GPUGrid. Very sad indeed.

Since all 9 of my NVidia GPUs are 1GB or less, and I can't get anything but these #*&$% NOELIA_1MG WUs, I'm off the project till something changes here. Think I'll have a lot of company,,, Sad :-(

I agree 100% and have already moved my 650Ti to Einstein. Alas, Einstein's credit is SO lame!
____________

tbret
Send message
Joined: 2 Jan 13
Posts: 5
Credit: 233,329,525
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 31428 - Posted: 12 Jul 2013 | 11:41:28 UTC

This is just weird.

NVIDIA 320.18

WinXP computer with two reference 560tis is showing 14.5 hours elapsed, 3.5 hours to go, but 48% done. Ok..., that's not even close. GPU usage, 94-99%.

Another computer, another pair of 560Ti cards, another 320.18 driver -

Win7 Pro 28.25 hours elapsed, 2 hours to go, 92% done, GPU usage 97-99%.

Really? 30 hours on a 560Ti? 30 hours?

The 560s (no Ti) are warm and 96%, looks like they will take 18 hours. Those are 2GB cards.

Looks like 24 hours on a different SOC 560Ti but only 16 hours on a 560Ti-448. So, yeah, it looks like 1GB is just a little too little RAM, but 1.2GB is better. It's not great because a 560Ti-448 should fly compared to a 560.

The GTX470s (1.2GB) are doing them in about 15.3 hours. That makes them faster than the 560Ti-448. Yeah, that was a thing that made me go "Hmmmm."

The 660Tis and 670s are doing much, much better, of course; about 11.5 hours.

I've set NNT on my seven 560Tis. I've had multiple driver crashes and compute errors after 7 or 8 hours of crunching. I'll let what I've got either crunch or crash, but I don't want any more of these work units on a 560Ti and I don't want to have to change my drivers every time the work changes, so while I believe a downgrade might work, I'm unwilling. (call me lazy)

Ordinarily I'd say, "I don't care about the credits" but the fact is I'm in a little race with a friend so this time I do care. I don't care enough to get mad or upset, but I care a little.

Oh, all the CPU cores are idle other than feeding the GPUs in every case.

I'm just reporting-in.

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31430 - Posted: 12 Jul 2013 | 13:30:57 UTC
Last modified: 12 Jul 2013 | 13:31:36 UTC

I didn´t want to complain earlier, but with more than ten WU´s failed since yesterday (not different than the other days) I feel compelled to do it. The Noelias, new and not so new ones, are failing on all my rigs and cards (690s and 770s).
Can we please change them?

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31432 - Posted: 12 Jul 2013 | 14:33:00 UTC

On my GTX550Ti with 1GB RAM a Noelia (klebe run) took 46h22m to finish without error. On a WinVista x86 rig with 2 CPU cores doing Einstein@home and nVidia driver 320.49.

On the 660 the Noelia have "normal" run-times. One with Vista x64 and driver 420.49 as well, the other on Win7 x64 and driver 320.18

With 1 error in 12 WU's I don't see the need to update the drivers just yet.

Happy crunching

____________
Greetings from TJ

tbret
Send message
Joined: 2 Jan 13
Posts: 5
Credit: 233,329,525
RAC: 0
Level
Leu
Scientific publications
watwatwatwat
Message 31434 - Posted: 12 Jul 2013 | 14:55:53 UTC

This isn't funny --- ok, so it is funny, but only because nothing burned-up:

I've now caught two computers, both Windows, both running Precision X, one running 3x 560 and one 3x 560Ti, resetting my manually set 100% fan speeds back to "auto" but only on ONE card. Really weird. Really strange. In both cases it was the middle (read: hottest) card (Device 1).

That's never happened before, but I'm guessing it is a driver-related failure caused by these new work units.

Oh, and the "time remaining" is increasing, so the 60% completed I reported earlier is probably better than the estimated time remaining.

AND as if that weren't enough, it's taking close to two hours to upload the 147MB results at 27.5kbps.

I'd say someone needs to take this work either:
A) back to the drawing board
B) out to the woodshed for a serious talking-to

GPUGRID
Send message
Joined: 12 Dec 11
Posts: 91
Credit: 2,730,095,033
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31435 - Posted: 12 Jul 2013 | 14:59:41 UTC - in response to Message 31434.
Last modified: 12 Jul 2013 | 15:01:01 UTC

This isn't funny --- ok, so it is funny, but only because nothing burned-up:

I've now caught two computers, both Windows, both running Precision X, one running 3x 560 and one 3x 560Ti, resetting my manually set 100% fan speeds back to "auto" but only on ONE card. Really weird. Really strange. In both cases it was the middle (read: hottest) card (Device 1).


Yeah. That happens when a WU fails on my machines too. But since it didn´t restart automatically a new WU, nothing burns. But it will loose all your precision X presets and waste the processing so far. Upseting mode on.

Edit: will say again: can we (ok you, the project guys) change the wus? They aren´t good and are upseting users.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,011,293
RAC: 1,606,503
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31436 - Posted: 12 Jul 2013 | 15:02:44 UTC

I can confirm, the following:
GTX 570, Nvidea 311.06: 56x2-NOELIA_1MG_RUN1-0-2-RND1781_0 success with 71,288.79 s runtime.
GTX 670, Nvidea 311.06: 97x3-NOELIA_1MG_RUN-0-2-RND9119_0 success with 38,157.74 s runtime.
NOELIA_klebe tasks run on all three computers: GTX650 TI (2GB), GTX 570 and GTX 670 without mayor hickups, exept two failed early on the GTX570, since then no problem.

However I noticed that this NOELIA 1MG and klebe tasks need around 900 to 1350 MB of GPU Memory.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31446 - Posted: 12 Jul 2013 | 22:12:55 UTC

HELP! Nathan, where are you?

nanoprobe
Send message
Joined: 26 Feb 12
Posts: 184
Credit: 222,376,233
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31447 - Posted: 12 Jul 2013 | 22:42:04 UTC

These newer NOELIA klebe tasks seem to be taking longer and longer to finish. The old NOELIAs were 9-10hours. Then it went to 12-13 hours. This latest one is going to be in the 15-16 hour range using 750MB of memory on an MSI660TI PE.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31451 - Posted: 13 Jul 2013 | 0:20:50 UTC - in response to Message 31446.

HELP! Nathan, where are you?

He's hiding in the Short queue :)

These newer NOELIA klebe tasks seem to be taking longer and longer to finish. The old NOELIAs were 9-10hours. Then it went to 12-13 hours. This latest one is going to be in the 15-16 hour range using 750MB of memory on an MSI660TI PE.

Don't take it personally, the present Looooong NOELIA WU's don't like anyone.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31452 - Posted: 13 Jul 2013 | 0:29:24 UTC - in response to Message 31446.

HELP! Nathan, where are you?


He's on vacation, I see that down clocking my cards a little has helped reduce my error rate. There's only a finite amount of wu's here, we got to bite the bullet and chug through the weekend, I think Nathan will be back on Monday. He should be able to sort things out.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31455 - Posted: 13 Jul 2013 | 7:54:04 UTC - in response to Message 31452.
Last modified: 13 Jul 2013 | 7:55:34 UTC

HELP! Nathan, where are you?


He's on vacation, I see that down clocking my cards a little has helped reduce my error rate. There's only a finite amount of wu's here, we got to bite the bullet and chug through the weekend, I think Nathan will be back on Monday. He should be able to sort things out.

If I have read previous posts from Nathan correctly, every scientist does her or his own WU's with different functionality, thus Nathan would not interfere (at least much).
I have my clocks still high and the Noelia's that do not error within the first minutes will finish, but take (a lot) more time. I call them ELR's, exceptional long run's. Even on fast cards (770/690) they took long. I don't mind running them.
____________
Greetings from TJ

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31456 - Posted: 13 Jul 2013 | 9:06:22 UTC - in response to Message 31455.
Last modified: 13 Jul 2013 | 9:14:06 UTC

Nathan helps Noelia all the time, I guess you weren't following about a year ago. On my 680's they take 7.5 hours to 8 hours, that's not long to me. I didn't know you had a GTX770 or 690, how long have you had those? My 770 is identical to my 680's time wise, memory speed doesn't seem to make that much difference.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31458 - Posted: 13 Jul 2013 | 9:44:34 UTC

Guys, not all of us have high-end to super-high-end cards crunching! My 650Ti was able to chew all long WUs within 24h up until these latest NOELIAs (xMG_RUN) appeared, which take me more than 40h!! Not only is the credit low, the risk of losing too much work becomes greater!

What I can't understand is, why do NOELIAs have to consistently be so problematic? Can't they do some debugging and optimizations? If not, why don't they create another queue just for them - named ELR as TJ suggested - so people with the ultra capable cards can chew them and the rest of us continue as usual?

Of course, I know the answer: that queue would be taken up by very few people. Well, that's no excuse for force-feeding ALL of us with hard-to-digest WUs!
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31460 - Posted: 13 Jul 2013 | 9:55:12 UTC - in response to Message 31456.
Last modified: 13 Jul 2013 | 10:03:21 UTC

Nathan helps Noelia all the time, I guess you weren't following about a year ago. On my 680's they take 7.5 hours to 8 hours, that's not long to me. I didn't know you had a GTX770 or 690, how long have you had those? My 770 is identical to my 680's time wise, memory speed doesn't seem to make that much difference.

I haven't, but a 770 is on the way, but one can look at rigs that have them and compare that with Nathan's. That's what I did and saw they took longer.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31461 - Posted: 13 Jul 2013 | 10:03:01 UTC

I must admit that I think Vagelis has a point. Being around at this project for a while, I see that only the best (and thus expensive) hardware can cope with the Wu's lately. My GTX285 (a former workhorse) can no longer be used, as it will need 2.5-3 days to finish. The 550Ti is taking almost two days, so is waiting for retirement as well.

I just also figured out that my second 660 sits in a MOBO that has PCIe 1.1 only, so that is (I think) the reason that it takes more than a day to finish.
So staying with this project means that I (we) must invest heavily on hardware.
____________
Greetings from TJ

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31462 - Posted: 13 Jul 2013 | 10:36:43 UTC

It hasn't just been Noelia's wu's,

http://www.gpugrid.net/forum_thread.php?id=3116#26648

I have watched very carefully all the issues and problems and when I chose components and built my 4 rigs, I went for horse power rather than "bang for the buck", I didn't go all out high end but fairly powerful computers for this precise reason.

The work units are going to progress and become larger, that's inevitable and as newer versions of the Cuda app is released, that will contribute to the wu's becoming larger still. It's going to take time for Noelia to get up to speed, we've gotten spoiled with Nathan because he's very good at this. Every time this happens, someone gets really upset and demands separate queues or that they get pulled and reworked and I know it can be very frustrating when everything is going smooth and then the bottom drops out.

You think I don't mind when a Noelia crashes and takes out a CPDN model that had over 300 hours crunching? It's very frustrating and it doesn't matter what kind of card I have, there's some people that have had to quit all together because the wu's were too big and I've been trying to help someone out by sending him my old cards. I know how you guys feel.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31464 - Posted: 13 Jul 2013 | 11:57:31 UTC - in response to Message 31462.
Last modified: 13 Jul 2013 | 12:15:46 UTC

The best "bang for buck" cards were the mid-range cards, at least until the arrival of the GTX770, when the prices started to fall across the range. The GTX 670 might now be the best "bang for buck" card, or not far off (but it depends on the price and they change regularly even in the same country).

You need to make sure you get a 2GB model however! The faster algorithms require more memory and if you don't have the extra memory some WU's fail. There are 1GB and 2GB models of the GTX 650, GTX 650Ti. Anything above that is 2GB or more, with the exception of one OEM GTX 660 which is 1.5GB. Some of the lesser cards such as the GTX 645, GT 645 and GT 640 Rev. 2 are all 1GB only (so not recommended).

Again, Nate has work in the short queue!
If your runs are taking 40h &/or failing, setup a profile to get short tasks for that system. You might find fewer failures and quick returns quite refreshing, despite the credit difference. BTW, it might be the case that some work in the short queue is for a different project and will eventually result in its own publication badge (happened in the past).

In my opinion, some of Noelia's WU's are failing due to a CUDA bug that seems to occasionally raise it's head, but it's better to continue developing than just to give up.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Trotador
Send message
Joined: 25 Mar 12
Posts: 103
Credit: 9,769,314,893
RAC: 39,662
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31467 - Posted: 13 Jul 2013 | 14:02:57 UTC

My experience with last Noelia's in my Linux Ubuntu 12.04 with Nvidia drivers 304.43 and two EVGA SC 660GTI cards is satisfactory.

Only once I had what seemed to be a driver crash due probably to be disabling one of the cards during the hottest hours of the day (hot summer here in Madrid) what implies restarting my 6.10.58 boinc manager after changing the cc_config.xml.

Noelia's Klebe take around 10,3 hours and Noelias 1/7MG around 11 hours.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31468 - Posted: 13 Jul 2013 | 14:47:01 UTC - in response to Message 31464.

I just have one crunching system with a single mid-range GPU, I'm no mega-cruncher like some of you guys. I may as well work only on the short queue, until these NOELIAs disappear.

The whole thing does leave a bad taste in my mouth, though. My 650Ti on Ubuntu 12.04 crunched NATHAN, NOELIA_klebe, SDOERR WUs in ~18h, it's not like I was on the edge of being obsolete. These specific NOELIAs just KO'ed my card, and I'm pretty sure the vast majority of crunchers here.

I do understand the increasing complexity of the models and the increasing processing power available as time passes, these WUs though seem like a very aggressive step forward. IMHO, the researchers must take into account not only their research goals, but also the average (not the high-end) cruncher's crunching power.
____________

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31469 - Posted: 13 Jul 2013 | 14:52:06 UTC - in response to Message 31462.

You think I don't mind when a Noelia crashes and takes out a CPDN model that had over 300 hours crunching?

Interesting. My CPDN work is done on a different PC than the ones that do GPUGrid, and it looks like I will be keeping it that way. But I haven't really noticed that a Noelia crash takes out anything else (yet) on Win7 64-bit.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31470 - Posted: 13 Jul 2013 | 15:15:24 UTC - in response to Message 31468.

IMHO, the researchers must take into account not only their research goals, but also the average (not the high-end) cruncher's crunching power.

They do. That's why there are two queues here at GPUGrid.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31472 - Posted: 13 Jul 2013 | 15:45:53 UTC - in response to Message 31470.

IMHO, the researchers must take into account not only their research goals, but also the average (not the high-end) cruncher's crunching power.

They do. That's why there are two queues here at GPUGrid.

I don't want work units to crash, but what I really want is for my cards to be used efficiently. Some projects work too hard to be backward-compatible with older cards that you don't get the full value of your investment in a new card. At that point, I start looking for other projects.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31473 - Posted: 13 Jul 2013 | 16:07:31 UTC - in response to Message 31469.

You think I don't mind when a Noelia crashes and takes out a CPDN model that had over 300 hours crunching?

Interesting. My CPDN work is done on a different PC than the ones that do GPUGrid, and it looks like I will be keeping it that way. But I haven't really noticed that a Noelia crash takes out anything else (yet) on Win7 64-bit.


I lost 4 models one day, it was the dreaded "ACEMD.2865P.exe*32 Encountered an error and needs to close", the CPDN models ranged from 328 hours, 256 hours, 198 hours and 73 hours (I wrote them down). It's only happened twice, the other time only got 1 model.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31474 - Posted: 13 Jul 2013 | 17:14:28 UTC - in response to Message 31473.
Last modified: 13 Jul 2013 | 17:28:26 UTC

Unfortunately I can empathize with you all too well.
I would prefer these WU's were in a separate queue; Short queue, Long queue, Crashy the WU queue ;)
To be fair I've had 13 Noelia WU's finish and only 2 fail (both within a few minutes, which is a lot better than after 10h). That said I did edit the registry to try to prevent failures.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31475 - Posted: 13 Jul 2013 | 22:09:47 UTC - in response to Message 31474.

I may have identified the source of some problems with the present Noelia WU's. When I checked the Memory Controller Load it was 1% for a GTX 660Ti. The last time I looked it was around 40%. The GPU load was 98% and clocks were normal (high).


____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31476 - Posted: 13 Jul 2013 | 22:15:53 UTC - in response to Message 31475.

I may have identified the source of some problems with the present Noelia WU's. When I checked the Memory Controller Load it was 1% for a GTX 660Ti. The last time I looked it was around 40%. The GPU load was 98% and clocks were normal (high).

I saw that a couple of days ago with one of my cards, I think a 660. I exited BOINC as normal, and when it restarted, the Noelia errored out.

But that means the work unit could hang that way for a long time unless you manually intervene; not a fun thought.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31477 - Posted: 13 Jul 2013 | 22:35:04 UTC - in response to Message 31475.
Last modified: 13 Jul 2013 | 22:35:25 UTC

I may have identified the source of some problems with the present Noelia WU's. When I checked the Memory Controller Load it was 1% for a GTX 660Ti. The last time I looked it was around 40%. The GPU load was 98% and clocks were normal (high).

I have that too on my quad with the 660 still in it. I did some alternations with Precision X, and a reboot, but it stays at MCU stays at 1% and the GPU power sits around 62%. It has done 34% in 17 hours. The other 660 in the T7400 does great.

How did you fix this problem skgiven with the 1% MCU load?
____________
Greetings from TJ

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31478 - Posted: 13 Jul 2013 | 22:43:16 UTC

Hello: It seems that I have a problem on Linux / Ubuntu with the GTX 770 and Noelia tasks, performance is pitiful salary at the GPU and no CPU usage.

I can not get off to a less than 319.23 as driver do not support the 770.

Is there any forecast of when this issue will be resolved or have to wait for a new driver from Nvidia?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31479 - Posted: 13 Jul 2013 | 22:46:11 UTC - in response to Message 31476.
Last modified: 13 Jul 2013 | 23:05:27 UTC

I've restarted Boinc, the system, suspended and resumed tasks to make them swap GPU and now both Noelia WU's are using 0 or 1% Memory Controller Load. The worrying thing is that one WU is at 52% after 24h, mostly on a GTX660Ti and the other is at 39% after 5h40min, but will no doubt take days since the memory controller load is banjaxed.
The GPU temperature, Fan speeds and Power targets are all down but the clocks are normal (high).

If I raise the Power Limit using Afterburner from 100% to 101% the GPU power drops from 65% to 56%, when I raise it to 102% it goes back to 65%. It appears that something is either being set to on or off. It doesn't matter what the percentage is, it changes to 65 then 56 and back to 65...

I'm going to dispose of the 314.22 drivers and try 310.90, but since I have not experienced the memory controller issues with other WU's I would say it's task related. I'm also seeing wonky driver restarts, but I've seen that before with Noelia WU's.

... No difference.
I will have to abort the WU's, as they will take days at 1% memory controller load.
Short queue here I come,
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31481 - Posted: 13 Jul 2013 | 23:01:29 UTC - in response to Message 31479.

Could it be hardware/software related? Your 660Ti isn't worse than my 660. Both 660's are exact the same both EVGA not OC. One in the T7400 with PCIe 2.0 and is doing good with 93% GPU load, 65°C, 35% MCU load and 96% GPU power. It does a Noelia in about 14 hours.
The other is in a quad core in PCIe 1.1 and uses 1% MCU load at 60% GPU power and 97 GPU load.
So there is something wrong. Your card is taking a lot of time to finish as well.
____________
Greetings from TJ

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31483 - Posted: 13 Jul 2013 | 23:06:19 UTC - in response to Message 31479.
Last modified: 13 Jul 2013 | 23:09:22 UTC

I'm going to dispose of the 314.22 drivers and try 306.97, but since I have not experienced the memory controller issues with other WU's I would say it's task related. I'm also seeing wonky driver restarts, but I've seen that before with Noelia WU's.

What kind of OS is running on this host?

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31485 - Posted: 13 Jul 2013 | 23:22:42 UTC - in response to Message 31483.

W7x64, but went to 310.90.

I aborted both WU's and started running short WU's. So far no issues, 6% in.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31486 - Posted: 13 Jul 2013 | 23:34:01 UTC

I have noticed recently that when exiting BOINC (7.0.64 x64) I have been getting crashes of the Nvidia drivers. But I have just upgraded to BOINC 7.2.4, and don't see this. Whether that has anything to do with the present Noelia problems is another matter, but it is worth watching.
http://boinc.berkeley.edu/dl/

(I am currently on the 311.06 drivers, a Windows update from the 310.90 drivers, but I think it happens on the other versions too.)

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31487 - Posted: 13 Jul 2013 | 23:54:36 UTC - in response to Message 31485.
Last modified: 13 Jul 2013 | 23:54:52 UTC

W7x64, but went to 310.90.

I have the feeling that Win7x64 is more prone to cause workunit errors (especially Noelia's) than WinXPx64.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31489 - Posted: 14 Jul 2013 | 0:18:53 UTC - in response to Message 31487.

I still think this might be a WU issue, but I've suspected for some time that hidden WDDM bugs could occasionally cause issues.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Chilean
Avatar
Send message
Joined: 8 Oct 12
Posts: 98
Credit: 385,652,461
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31491 - Posted: 14 Jul 2013 | 3:53:05 UTC - in response to Message 31486.

I have noticed recently that when exiting BOINC (7.0.64 x64) I have been getting crashes of the Nvidia drivers. But I have just upgraded to BOINC 7.2.4, and don't see this. Whether that has anything to do with the present Noelia problems is another matter, but it is worth watching.
http://boinc.berkeley.edu/dl/

(I am currently on the 311.06 drivers, a Windows update from the 310.90 drivers, but I think it happens on the other versions too.)


I think this is a bug with NOELIA's long WUs.

I switched over to short WUs only to avoid this NVIDIA driver crash everytime I suspend or exit BOINC which sometimes crashes my whole system and I'm forced to hard reboot.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31493 - Posted: 14 Jul 2013 | 5:08:57 UTC

I've only had two or three that errored out. Both were almost immediate, with one unit erroring for everyone. And the one that I just crashed on went to SAM, and he finished it.

The WUs are currently at 1.2 GB, AFAIK, and I haven't experienced any running for an abnormally long time. Although all my cards are on the high end side of things.

EDIT: I can say this. I thought the WU's were supposed to have swan_sync=0 enabled by default for 6xx+ cards? The latestn Noelia units have not been doing this, and have been running at about 1/2 that. Meaning 2:1 GPU:CPU time

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31495 - Posted: 14 Jul 2013 | 9:37:44 UTC - in response to Message 31493.
Last modified: 14 Jul 2013 | 9:41:28 UTC

These latest Noelia WU's use the older v6.18 Application. Previous Noelia WU's used v6.49. So that may explain behavior differences. Previous Noelia WU's did not use a full CPU core/thread (the only type of work that doesn't).

While some WU's complete successfully, there seems to be at least four types of problem:
Early failures after a few minutes,
Driver restarts that crash the work,
High GDDR memory usage that prevents some cards from being used,
A reduction in Memory Controller load which causes the tasks to appear to run normally (even faster going by GPU usage) but actually slows the work down massively (causing it to take days).

WU behavior may be different on different operating systems, and with different drivers. GPU card architectural differences may also be an issue and these WU might challenge the GPU in different ways exposing weaknesses in the GPU that were previously unseen with different WU's. It's not often you see 3 or 4 known good cards all fail a WU, and then the WU to succeed on another card.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31496 - Posted: 14 Jul 2013 | 9:51:40 UTC - in response to Message 31495.
Last modified: 14 Jul 2013 | 9:51:58 UTC

If I understand correct than the 1% MCU load is a result of the WU and we can not do anything about it?
Well it has done 54% on the 660 in 24 hours. So aborting it seems a waist. I let it run for another day and then no new work for that rig.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31497 - Posted: 14 Jul 2013 | 10:19:53 UTC - in response to Message 31496.
Last modified: 14 Jul 2013 | 10:20:27 UTC

It will be interesting to see how that turns out.
Some WU's start normally and run normally for hours and then the Memory Controller load drops. From then on progress will be very slow.
This reminds me of what was happening in Linux for some WU's a few months back.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31501 - Posted: 14 Jul 2013 | 11:38:16 UTC - in response to Message 31497.

While you mentioned it, I took a close eye on it. And I saw at least two WU's from Noelia from the start and the MCU was 1% from the begin onwards.
I could be wrong, but I have a bit of a feeling that it is hardware oriented as well. My quad have the "difficulties", while the CPU do not crunch momentarily. The 7 year old high-end T7400 runs smooth at steady loads. I can stop BOINC, reboot the system or power it down, when I leave for longer, the WU's keep going smooth and about 1 hour longer than a Nathan did last week (on the 660).
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31505 - Posted: 14 Jul 2013 | 14:02:24 UTC - in response to Message 31487.

Zoltan wrote:
I have the feeling that Win7x64 is more prone to cause workunit errors (especially Noelia's) than WinXPx64.

This could well be. XP uses the old driver architecture versus WDDM on Vista/7/8, so they're actually on different branches now. Generally they should be similar, but especially corner cases like bugs being triggered would be expected to differ between them.

Carlesa25 wrote:
It seems that I have a problem on Linux / Ubuntu with the GTX 770 and Noelia tasks, performance is pitiful salary at the GPU and no CPU usage.
I can not get off to a less than 319.23 as driver do not support the 770.
Is there any forecast of when this issue will be resolved or have to wait for a new driver from Nvidia?

Well, it's obviously a driver issue, since it works with the older versions. I can't see anything BOINC or GPU-Grid could do about this other than to inform nVidia and hope they'll fix it at some point. If the most recent beta drivers are still not working, chances are that nVidia doesn't yet know about this problem.

As a work around you switch the GTX770 to a windows box, if you've got any. And.. the issue applies to other WUs as well doesn't it? Otherwise you could go for the short queue.

@1% MCU load: so far the only reports of this happening have been from SK and TJ. Are you guys just watching more closely than others.. or is the error only happening on your systems? In the latter case it could be the disabled driver watchdog (did you apply this registry change as well, TJ?). If something goes wrong in the GPU and normally the watchdog would reset the driver & GPU (with task failure or not, whatever)... and you disable the watchdog, then your GPU may just continue to do something in this strange state.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31507 - Posted: 14 Jul 2013 | 14:16:04 UTC - in response to Message 31505.
Last modified: 14 Jul 2013 | 14:17:13 UTC

@1% MCU load: so far the only reports of this happening have been from SK and TJ. Are you guys just watching more closely than others.. or is the error only happening on your systems? In the latter case it could be the disabled driver watchdog (did you apply this registry change as well, TJ?). If something goes wrong in the GPU and normally the watchdog would reset the driver & GPU (with task failure or not, whatever)... and you disable the watchdog, then your GPU may just continue to do something in this strange state.

MrS

No I did not change this in the registry. I have looked for it but didn't find it. So to not mess things up I left it. Yes I look closely at these WU's at the moment, and I guess skgiven does too.

skgiven said:
To be fair I've had 13 Noelia WU's finish and only 2 fail (both within a few minutes, which is a lot better than after 10h). That said I did edit the registry to try to prevent failures.

Perhaps skgiven can give a hint what need to be changed. I suppose it is in Software, nVidia driver or card manufactures?
____________
Greetings from TJ

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31511 - Posted: 14 Jul 2013 | 14:41:26 UTC

skygiven do you mean the registrychange that no windows error messages pops up and block the GPU/BOINC Slot from working on? I added them too on my Systems, or do you mean another regedit?
____________
DSKAG Austria Research Team: http://www.research.dskag.at



ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31513 - Posted: 14 Jul 2013 | 15:08:11 UTC - in response to Message 31511.

It's disabling the driver watchdog completely. It's supposed to stop errors from happening.. but if there's a real error you'll have to go for the hard reboot.

Or was it increasing the watchdog timeout? Here SK suggested to put 20s tolerance in there default 2 s). I've tried it with 5 s and got a froozen screen for ~20 s (upon suspending a NOELIA) followed by the usual driver restart. Instead of trying also with the 20 s setting I reverted the change completely.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31514 - Posted: 14 Jul 2013 | 15:34:57 UTC - in response to Message 31505.
Last modified: 14 Jul 2013 | 15:37:05 UTC

Carlesa25 wrote:
It seems that I have a problem on Linux / Ubuntu with the GTX 770 and Noelia tasks, performance is pitiful salary at the GPU and no CPU usage.
I can not get off to a less than 319.23 as driver do not support the 770.
Is there any forecast of when this issue will be resolved or have to wait for a new driver from Nvidia?

Well, it's obviously a driver issue, since it works with the older versions. I can't see anything BOINC or GPU-Grid could do about this other than to inform nVidia and hope they'll fix it at some point. If the most recent beta drivers are still not working, chances are that nVidia doesn't yet know about this problem.

As a work around you switch the GTX770 to a windows box, if you've got any.

I suggest a different workaround:
As the GTX 770 is basically a GTX 680 with higher clocks, the GTX 770 should work with the previous drivers, if you include the appropriate line into the nv4_dispi.inf file (this method is for Windows only, so a Linux guru should tell us how to do it under Linux)
You should look for:
%NVIDIA_DEV.1180% = Section021, PCI\VEN_10DE&DEV_1180
You should copy the whole line below the original (the section number may be different), and then change both 1180 to 1184:
%NVIDIA_DEV.1184% = Section021, PCI\VEN_10DE&DEV_1184
and then you should look for:
NVIDIA_DEV.1180 = "NVIDIA GeForce GTX 680"
copy the whole line below the original and then change 1180 to 1184, and 680 to 770:
NVIDIA_DEV.1184 = "NVIDIA GeForce GTX 770"
After these modifications the old driver should recognize the new card.

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31515 - Posted: 14 Jul 2013 | 15:43:03 UTC - in response to Message 31505.

so far the only reports of this happening have been from SK and TJ. Are you guys just watching more closely than others.. or is the error only happening on your systems?


Yesterday, I caught one that had been running for 10.5 hours and was at 0.00%, GPU at 99%, memory controller at 0%, I aborted it and it was a first run (ended in 0).

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31516 - Posted: 14 Jul 2013 | 16:37:14 UTC - in response to Message 31515.

Hello: I do not understand is that with short tasks work perfectly my GTX770 (Ubuntu 13.04) and the long NOELIA failure, so far I have not tried with a different long task.

I think the problem is in Noelia tasks.

Moreover Einstein I finished seven tasks perfectly, no problems. Greetings.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31517 - Posted: 14 Jul 2013 | 17:11:36 UTC

Einstein tasks are entirely different, and al an older version of CUDA.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31518 - Posted: 14 Jul 2013 | 17:26:46 UTC - in response to Message 31517.
Last modified: 14 Jul 2013 | 17:28:03 UTC

Einstein tasks are entirely different, and al an older version of CUDA.

That's true, but Einstein, Albert, Milkyway are nice projects to test your setup and drivers. Most WU' run fast so you can see what happens.

I agree with Carlesa25 and others that these set of Noelia's have strange behavior. I use driver 320.49 and 320.18 for Noelia's and most finish, but in longer time or error quite immediately. The error rate is higher. I don't think a driver change will resolve all problems, and Linux is more efficient than Windows when crunching.
____________
Greetings from TJ

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31519 - Posted: 14 Jul 2013 | 17:37:12 UTC

Such a pity about the NOELIA tasks: wasting energy and computer resources.....

John

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31520 - Posted: 14 Jul 2013 | 17:58:38 UTC - in response to Message 31519.

Hi Noelia tasks in Windows 8-64bit are working well, so far, on the GTX770, the GPU load 86% + 22% CPU.

In Ubuntu keep failing.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31523 - Posted: 14 Jul 2013 | 19:52:08 UTC - in response to Message 31520.

Three people reported the 1% MCU load, and I changed drivers just in case they were the issue. You have to be using GPUZ to see the MCU load, it’s not listed in Precision/Afterburner. I suggest anyone seeing a task run for ages check this in GPUZ.

I didn’t disable the driver watchdog, just give WU’s a bit more time before the system restarted the drivers. Prior to doing this, I had driver restarts for most task types and higher error rates. I think it has fixed some issue but not others. I still get driver restarts, but they are rarer. Occasionally I get some screen lag, but I’m now using the iGPU for display.

While the 319 drivers might have issues, I don’t think the drivers are to blame with these WU’s. The GTX770 would probably work with older Linux drivers. I’ve done this before on Linux and Windows and Boinc recognised the cards correctly in both cases.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31524 - Posted: 14 Jul 2013 | 20:03:21 UTC - in response to Message 31523.

Hello: I'm using EVGA-Precision and gives the same readings on all parameters that GPU-Z. 320.49 driver.

At present after two hours of work have you done on 27% of the task, which is fine.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31525 - Posted: 14 Jul 2013 | 22:01:43 UTC - in response to Message 31524.
Last modified: 14 Jul 2013 | 23:56:22 UTC

I don't think EVGA precision gives a Memory Control load reading, so you have to use GPUZ to get that.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31528 - Posted: 14 Jul 2013 | 23:31:20 UTC - in response to Message 31525.

I don't think EVGA precision give a Memory Control load reading, so you have to use GPUZ to get that.


SK is right, you can't read the memory controller load in PrecisionX, you need GPU-Z or GPU-Shark to see the controller.

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31529 - Posted: 14 Jul 2013 | 23:44:28 UTC - in response to Message 31528.

I don't think EVGA precision give a Memory Control load reading, so you have to use GPUZ to get that.


SK is right, you can't read the memory controller load in PrecisionX, you need GPU-Z or GPU-Shark to see the controller.


Hello: I have installed and use, GPU-Z 0.72 and EVGA-PrecisionX 4.2.0.2143, and the same sensor readings and see one another, do not really understand what you say.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31530 - Posted: 15 Jul 2013 | 0:01:54 UTC - in response to Message 31529.

GPUZ
Sensors Tab,
Memory Controller Load.

~28% is normal for a GTX660 and 37% for a GTX660Ti, but some WU types are a bit higher and some a bit lower. If the load is 1% its really bad!

____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31535 - Posted: 15 Jul 2013 | 5:34:57 UTC
Last modified: 15 Jul 2013 | 5:35:17 UTC

That's correct Precision X does not show MCU load or GPU power, but EVGA NV-Z does.
I use that most of the time.
____________
Greetings from TJ

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31537 - Posted: 15 Jul 2013 | 10:02:13 UTC - in response to Message 31535.
Last modified: 15 Jul 2013 | 10:07:43 UTC

Hi Noelia in Windows 8 Task completed smoothly.

It took 28,675 sec. almost like the GTX680 that I could compare.

Deputy screenshot with GPUZ readings and EVGA, which give the same data.

https://picasaweb.google.com/lh/photo/-bwUatfEBByRcJpZTfJikdMTjNZETYmyPJy0liipFm0?feat=directlink

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31539 - Posted: 15 Jul 2013 | 12:56:27 UTC

I was going to keep my GTX 650 Ti with 1GB memory on Longs even with the occasional crash, but it has occurred to me that this might give the wrong feedback to GPUGrid. That is, if they get 5 errors (or whatever their limit is), they might conclude that it is a bad work unit and discard it, when in fact it is merely due to 5 cards with not enough memory. Therefore, I am going to Shorts, but also enabling Beta testing, so they can try out their stuff (if they choose to) before releasing it. (I make sure my card is running at the default Nvidia chip speed rather than the card factory overclock, since a test of how unstable the card is does not do them any good in evaluating their work units).

But if they knew about the larger memory requirement, I think they should have communicated that to us in advance to save us all a lot of trouble. And if they didn't, then that is what the Betas are for.

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31540 - Posted: 15 Jul 2013 | 13:56:17 UTC

Ah, Carlesa. You leave K-Boost off I see. That's why it shows your FB %. Than yes, they do show the same data.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31543 - Posted: 15 Jul 2013 | 19:40:41 UTC
Last modified: 15 Jul 2013 | 19:41:22 UTC

I've tried the latest (320.49) driver on my least reliable host (it has a WinXPx64), and it became completely unreliable :) Every (Noelia) task is stuck at 0% with 0% GPU usage using the 320.49 driver (btw it's CUDA 5.5).
I'm reverting to the driver downloaded by the Windows Update (307.9)

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31544 - Posted: 15 Jul 2013 | 19:50:11 UTC - in response to Message 31543.

I've tried the latest (320.49) driver on my least reliable host (it has a WinXPx64), and it became completely unreliable :) Every (Noelia) task is stuck at 0% with 0% GPU usage using the 320.49 driver (btw it's CUDA 5.5).
I'm reverting to the driver downloaded by the Windows Update (307.9)



Hello: In Windows 8-64bits I am using the 320.49 without problems, both short tasks like Noelia.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1112
Credit: 6,162,416,256
RAC: 0
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31545 - Posted: 15 Jul 2013 | 20:17:21 UTC - in response to Message 31544.

One long WU is probably not enough to know it's running without problems, but you never can tell.

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,011,293
RAC: 1,606,503
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31546 - Posted: 15 Jul 2013 | 21:33:26 UTC

I just got one of the famous xMG WUs on my EVGA GTX 650 Ti with 2GB:
25x2-NOELIA_1MG_RUN-0-2-RND7421
OC Scanner says: GPU load 93%, MemLoad: 1537 MB, MCU Load 45% after 2.5% advanced.
It's the first of those on this card, I will keep you informed.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31547 - Posted: 15 Jul 2013 | 21:38:29 UTC - in response to Message 31516.

I do not understand is that with short tasks work perfectly my GTX770 (Ubuntu 13.04) and the long NOELIA failure, so far I have not tried with a different long task.

I think the problem is in Noelia tasks.

The NOELIA tasks make the error in the driver appear, but they are not causing it. Noelia is testing new functionality, that's why the error doesn't appear with short queue tasks or other long-runs.

Again, the exact problem you're describing has been seen by at least 2 others and has been solved by downgrading the driver. That's the best we can offer, including the .inf mod propsed by Zoltan. Or keep it running under Win, of course.

John C MacAlister wrote:
Such a pity about the NOELIA tasks: wasting energy and computer resources.....

Noelia is trying to make new functionality work, and GDF said they definitely need them. So while the execution of these tests may be lacking, the results are not worthless. I wonder if they'd be better off in the beta queue, though.

MrS
____________
Scanning for our furry friends since Jan 2002

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31548 - Posted: 15 Jul 2013 | 23:14:27 UTC - in response to Message 31547.
Last modified: 15 Jul 2013 | 23:16:37 UTC

The NOELIA tasks make the error in the driver appear, but they are not causing it. Noelia is testing new functionality, that's why the error doesn't appear with short queue tasks or other long-runs.

Again, the exact problem you're describing has been seen by at least 2 others and has been solved by downgrading the driver. That's the best we can offer, including the .inf mod propsed by Zoltan. Or keep it running under Win, of course.


Hello: Thanks for the comment.

I'll wait for the final output of Nvidia 325.08 driver for Linux (currently in Beta, I do not think that take) and see if it solves, if anything I can always move to Windows without problem.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31550 - Posted: 16 Jul 2013 | 0:53:26 UTC - in response to Message 31547.

Many thanks:

Yes, I think these would be better in the Beta queue so we would be able to decide whether we want to take the risk of their failing. Good suggestion!

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31551 - Posted: 16 Jul 2013 | 1:17:32 UTC - in response to Message 31546.

I just got one of the famous xMG WUs on my EVGA GTX 650 Ti with 2GB:
25x2-NOELIA_1MG_RUN-0-2-RND7421...

I, too, just got my first xMG (44x5-NOELIA_7MG_RUN-0-2-RND1940) for my GTX 570. Holding my breath...
____________

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31553 - Posted: 16 Jul 2013 | 10:10:39 UTC - in response to Message 31551.
Last modified: 16 Jul 2013 | 10:41:05 UTC

I see Nathan WU's back in the Long queue.

My last NOELIA worked fine on Linux 304.88:
041px50x3-NOELIA_klebe_run4-2-3-RND3727_0 4590724 15 Jul 2013 | 3:35:35 UTC 15 Jul 2013 | 20:47:59 UTC Completed and validated 48,089.27 20,649.70 135,300.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31560 - Posted: 16 Jul 2013 | 14:57:39 UTC - in response to Message 31544.
Last modified: 16 Jul 2013 | 15:02:36 UTC

I've tried the latest (320.49) driver on my least reliable host (it has a WinXPx64), and it became completely unreliable :) Every (Noelia) task is stuck at 0% with 0% GPU usage using the 320.49 driver (btw it's CUDA 5.5).
I'm reverting to the driver downloaded by the Windows Update (307.9)



Hello: In Windows 8-64bits I am using the 320.49 without problems, both short tasks like Noelia.

I have mentioned it before: my quad core with Vista x86 is using driver 320,49 with a GTX550Ti and has relative the least errors from my rigs. It does the Noelia´s but very slow, but that´s the card. It seems a bit "bad luck". I have checked a lot of my WU´s and the wingman (in case of error) and I saw a lot "error while downloading", thus even before the Noelia´s WU.
____________
Greetings from TJ

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31562 - Posted: 16 Jul 2013 | 20:02:57 UTC - in response to Message 31551.

I, too, just got my first xMG (44x5-NOELIA_7MG_RUN-0-2-RND1940) for my GTX 570. Holding my breath...

Okay, so that wasn't so bad:

Run time 65,793.69
CPU time 3,755.88
Validate state Valid
Credit 150,000.00

Per GPU-Z, Avg GPU Load = 85%, Max GPU Memory = 1050MB
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31563 - Posted: 16 Jul 2013 | 20:28:23 UTC - in response to Message 31562.

No the Noelia´s warn´t to bad. I have Nathan´s again and I see a GPU load between 82-88% and a MCU load of 28-30%, GPU time and CPU time almost the same.
Noelia´s had 93% GPU load and 37-45% MCU load and used little CPU.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31565 - Posted: 16 Jul 2013 | 22:09:05 UTC

The Noelia _MG7_ on the GTX660 with 1% MCU load finished without error!
7043246 4587792 154818 12 Jul 2013 | 15:39:52 UTC 16 Jul 2013 | 21:56:27 UTC Completed and validated 192,242.79 83,933.54 100,000.00 Long runs (8-12 hours on fastest card) v6.18 (cuda42)

The time from more than 2 days of course is no joy. I will try a short run. If the card does not crunch properly then I remove it from the quad core with x86 Vista.
____________
Greetings from TJ

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31566 - Posted: 16 Jul 2013 | 22:09:20 UTC

Sadly I wish I could agree - the NOELIA_klebes have been running ok (just about) however the *MGs have been a nightmare. I noticed GpuGrid's total performance has dropped by some 15% in the last 3 months - so I guess the jury is out on that.
Unfortunately communication has been 'limited at best' on the new WU's, no hardware requirements, no details on the level of testing they have gone through, what they are for - nothing.
I just hope the quality of the science is up to a higher standard than the management of GpuGrid at the moment otherwise I suspect we are all wasting our time and money.

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31568 - Posted: 17 Jul 2013 | 0:46:37 UTC

And another one bites the dust... 4x2-NOELIA_1MG_RUN1-0-2-RND8035_3. I guess I shouldn't pile on.
____________

klepel
Send message
Joined: 23 Dec 09
Posts: 189
Credit: 4,196,011,293
RAC: 1,606,503
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31571 - Posted: 17 Jul 2013 | 1:52:19 UTC - in response to Message 31546.

I just got one of the famous xMG WUs on my EVGA GTX 650 Ti with 2GB:
25x2-NOELIA_1MG_RUN-0-2-RND7421
OC Scanner says: GPU load 93%, MemLoad: 1537 MB, MCU Load 45% after 2.5% advanced.
It's the first of those on this card, I will keep you informed.

The above WU finished successfully! I checked memory load several times and it was always about the same: 1500 MB.

GPU time: 73,064.51 s, CPU time: 33,299.85 s, the low credit is due to my low up-load speed - missed 24 deadline again…

So I am pretty satisfied with my GTX 650 Ti with 2 GB memory. The only sad thing, I just purchased one of the two cards on Amazon “used – like new” for the same price well below the 1 GB versions (new and used). Missed an opportunity…

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31574 - Posted: 17 Jul 2013 | 8:51:01 UTC

I'm currently crunching one of these dreaded NOELIAs on my 1GB 650Ti:

Slot: 0
Task: 35x11-NOELIA_7MG_RUN3-0-2-RND0597_0
Elapsed: 10:30
CPU time: 04:43
Percent done: 24.66
Estimated: 42:35
Remaining: 32:04

It's consuming 622MB on the card, CPU usage is ~45%, so it appears to be progressing normally - sorry, no GPU load info, I'm on Linux.

Let's see how it goes!
____________

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31575 - Posted: 17 Jul 2013 | 9:25:20 UTC

Another 'wonderful' unannounced WU: NOELIA_2HRUN - 25% in 7.5hrs (30hrs estimated) on a 1gb 650ti.
Terrific..

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31577 - Posted: 17 Jul 2013 | 11:08:51 UTC - in response to Message 31575.

re: NOELIA_2HRUN - at least it appears to only be using 714MB GPU mem(of 1024MB); mem controller usage is low at 21% (normally 41% for non NOELIA); GPU usage 99%, CPU 30%.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31578 - Posted: 17 Jul 2013 | 11:10:37 UTC

It is a good thing that I retired my 1GB GTX 650 Ti, since my 2GB GTX 660s are now getting only the larger work units.

I am presently running 2-NOELIA_2HRUN at 1406 MB, and a 44x1-NOELIA_1MG at 1135 MB. The former is running at only a 5% memory load, but I will just let it run and see if it completes. Maybe that is the way it is supposed to work? Since nobody tells us, we are left to figure this out for ourselves.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31579 - Posted: 17 Jul 2013 | 11:25:20 UTC - in response to Message 31578.
Last modified: 17 Jul 2013 | 11:36:54 UTC

Maybe a memory allocation issue has been fixed. Noticed against the task properties the NOELIA estimated app speed is 96.06 GLFOPs/sec (650ti) whereas a SANTI_bax is 516 GLFOPs/sec (670ti @ 72% power). Normally my 650s are about 70% the performance of my 670 - wondering if double precision is being more heavily used? There again its is probably more related to cache size (256k v 512k).

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31583 - Posted: 17 Jul 2013 | 12:17:12 UTC

MW ist the only Project that needs DP.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31584 - Posted: 17 Jul 2013 | 12:35:24 UTC - in response to Message 31583.

If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise:

"ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability.
Because many GPUs have much higher performance for single-precision floating-point arithmetic than double precision, this mixed-precision scheme allows ACEMD to fully exploit the GPU without sacrificing the quality of simulation.

The validation tests for ACEMD demonstrate that its simulations converge to results comparable to that of an exclusively double-precision code such as NAMD. Other codes use a similar scheme. In their work describing the design of special purpose MD hardware, DE Shaw et al detail how numerical precision can be safely varied throughout the MD computations [link]."

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31585 - Posted: 17 Jul 2013 | 12:51:46 UTC - in response to Message 31584.
Last modified: 17 Jul 2013 | 12:54:03 UTC

If you have a look at the acellera website which I understand was the commercial spin off of GpuGrid they state otherwise:

"ACEMD uses a mixed-precision scheme, in which different parts of the MD computations are performed with different levels of precision depending on their numerical requirements. Double precision is used where required for numerical stability.

That makes sense. I have noticed that of the three Noelia failures on my GTX 660s that have been successfully completed by others, the ones that completed successfully were all on higher-level cards (GTX 670, GTX 680 and a GTX 690). Those very likely have higher-performance floating point units than the GTX 660.

I think the light bulb is beginning to go on.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31586 - Posted: 17 Jul 2013 | 13:14:37 UTC

Huh ok thx that must be very new info its not long ago that there was no use of DP here O.o
____________
DSKAG Austria Research Team: http://www.research.dskag.at



5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31588 - Posted: 17 Jul 2013 | 14:20:46 UTC

Just taking a wild guess, and saying the DP, if any, could be sent to the CPU.

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31592 - Posted: 17 Jul 2013 | 16:23:57 UTC

If DP is a bottleneck for Noelias then I would have expected the Fermi base cards to equal or outperform their Kepler equivalents. I have noticed a 650ti is typically equal in performance to a 560ti with non-Noelias, but not sure about Noelias.
I guess it is possible ACEMD executes DP on the CPU 'core' though I would have thought the GPU would still be faster even if it has been castrated in Kepler.
Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31594 - Posted: 17 Jul 2013 | 16:48:43 UTC - in response to Message 31592.

Still I am not convinced DP is at fault here as we know the Noelias are working on very large molecules - logically increasing the likelihood of execution stalls and hence why I suspect cache size maybe so important.

We users need a "super-size" category to know what to put on the project. Otherwise, the stalls will cause bonus point (and more importantly the work itself) to go out the window.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31595 - Posted: 17 Jul 2013 | 17:38:17 UTC - in response to Message 31594.

As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question. For research groups that need cuda and double precision they could use ACEMD along with GK110 Titans or 780's (at least in theory), as they have superior double precision over the GK10x cards.

Was running a NOELIA_1MB for >14h on a GTX660Ti, but it was only at 22%. Saw that the GPU temperature was too low. Looked in GPUZ and found that the memory controller load was once again at 1%.
Suspended the WU and forced it to run on the other GPU. After ~1min I got a driver restart and after that there was no progress.
Suspended and resumed again and while the WU is progressing, GPUZ still says 1% MCU. The time remaining keeps rising, and going by the progress increase (about 1/5th of a normal WU), it was likely to take another 2.3 days to complete.
I aborted the WU because this is abnormal behavior.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile Carlesa25
Avatar
Send message
Joined: 13 Nov 10
Posts: 328
Credit: 72,619,453
RAC: 251
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31596 - Posted: 17 Jul 2013 | 20:27:52 UTC - in response to Message 31595.

Hello over an hour ago I started a new job in Ubuntu 13.04 NATHAN long and is working perfectly in the GTX770 and 89% of CPU.

As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31597 - Posted: 17 Jul 2013 | 21:12:54 UTC - in response to Message 31596.

As repeatedly said it is clear that NOELIAs (beyond the Nvidia driver etc ...) have a problem that requires adequate analysis and if interested in maintaining a stable job, lost time and interest in the project.

Quite so. It is not clear whether this is a random problem with this batch of work, or whether a lesson has been learned that will prevent it from happening again. In fact, it is not even clear whether GPUGrid considers it to be a problem at all, or merely an acceptable cost of doing business. But for those of us who do not want to baby-sit our rigs, it would be of interest to know the answers.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31598 - Posted: 17 Jul 2013 | 21:15:25 UTC - in response to Message 31595.

SK wrote:
As far as I know, GPUGrid has not and does not use fp64 (double precision). What the ACEMD application can do is a different question.

I agree. And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

Jim1348 wrote:
Those (GTX 670, GTX 680 and a GTX 690) very likely have higher-performance floating point units than the GTX 660.

No, they don't. They're using the exact same SMX's as building blocks, down to the smallest Kepler. What differs iis just the amount and clock speed of these units. The exception is GK110 (Titan and GTX780), which did indeed get more DP units (but again of the same type).

MrS
____________
Scanning for our furry friends since Jan 2002

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31599 - Posted: 17 Jul 2013 | 21:53:31 UTC - in response to Message 31598.
Last modified: 17 Jul 2013 | 21:54:10 UTC

And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31600 - Posted: 17 Jul 2013 | 22:24:40 UTC - in response to Message 31599.

My guess is that Noelia went back to a previous app, for scientific/testing/reassessment reasons.

If at first you don't succeed...

GL
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Betting Slip
Send message
Joined: 5 Jan 09
Posts: 670
Credit: 2,498,095,550
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31601 - Posted: 17 Jul 2013 | 22:28:41 UTC

Nice to see the interaction between project managers and the contributors that make that project possible.

Oh, sorry, there hasn't been any!
____________
Radio Caroline, the world's most famous offshore pirate radio station.
Great music since April 1964. Support Radio Caroline Team -
Radio Caroline

5pot
Send message
Joined: 8 Mar 12
Posts: 411
Credit: 2,083,882,218
RAC: 0
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31607 - Posted: 17 Jul 2013 | 23:36:21 UTC - in response to Message 31599.

And as long as GT240 and simiular cards can still run the code (although too slow nowadays) we can be sure that not a single DP instruction is needed from the hardware (those chips don't have any such units).

We haven't checked the latest batch; the Keplers can't run them, so there is no guarantee about the GT 240. But the cause of the difference is not so important as the fact that it is there, in beta-test form, or maybe even alpha.


My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31609 - Posted: 18 Jul 2013 | 0:04:50 UTC - in response to Message 31607.

My 680s ran all the Noelia's, save several that failed within 10-30s. However, I've checked the ones that failed I've successfully completed, just different tasks, but same WU type. As there were multiple Noelia WUs out and about.

I just completed a 2-NOELIA_2HRUN (a new type for me) in 25 hours 31 minutes. There is nothing necessarily wrong with that; you don't get so many bonus points, but so what. I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31610 - Posted: 18 Jul 2013 | 0:07:47 UTC - in response to Message 31609.

Seems to be a matter of pure luck :-) I just got an Santi SR error after 5000 seconds on a GTX660. So its not only Noelia.
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31614 - Posted: 18 Jul 2013 | 8:42:46 UTC - in response to Message 31601.
Last modified: 18 Jul 2013 | 8:44:37 UTC

Nice to see the interaction between project managers and the contributors that make that project possible.

Oh, sorry, there hasn't been any!

Communications are limited to say the least, but at least 2 of the researchers are on leave and that means the others have to keep the project running, which is a challenge in itself.

I just think they need to alert the users about the different requirements, so that you can base your GPU purchasing decisions accordingly. It seems to be a new era; maybe when the dust settles, they will offer some guidance. Otherwise, it is just hit-or-miss as to what cards will work on what work units.

It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

GoodFodder
Send message
Joined: 4 Oct 12
Posts: 53
Credit: 333,467,496
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31616 - Posted: 18 Jul 2013 | 9:26:37 UTC

re: NOELIA_2HRUN some positive news:

GTX 650ti (stable clock 1084, 1525) - using 714MB GPU mem (of 1024MB):
http://www.gpugrid.net/result.php?resultid=7053079

108,240.23 secs - which is about 25% longer than a 660 (appears to be shader bound).

For comparison Jim1348 mentioned his GTX 660 was using 1406 MB GPU mem (of 2048MB).
http://www.gpugrid.net/result.php?resultid=7053163
86,491.77 secs

Thus GPU memory allocation appears to be working and the performance is in line (unlike the *MG).
- Be interesting to see what a 560ti runs them in.

Skgiven: I think we all appreciate the great work you moderators are doing however I can't help think the sentiments on this forum would be alot more positive if the researchers were a little more proactive in their communication.
All it would take is a small announcement in a dedicated thread when a new type of WU comes on line - a very brief summary in layman's terms of what the WU is related to would be nice so that volunteers feel apart of the project rather than just being 'used'; together with expected running time for a particular benchmark card or two - we are not all running this project for credits!

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31623 - Posted: 18 Jul 2013 | 11:03:08 UTC - in response to Message 31614.

It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.

My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time.

It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us.

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31625 - Posted: 18 Jul 2013 | 11:55:07 UTC - in response to Message 31623.
Last modified: 18 Jul 2013 | 11:55:56 UTC

Hi, Jim:

I agree with your conclusion. With my 650 Tis I will only process shorts from now on. I want to share these with Alzheimer's processing at Folding and this combination works for me.

Regards,

John


It's always been the case that the more expensive cards are faster and usually more reliable, however last time I looked they were not the best bang for buck. I don't know how the 670's and above are performing relative to the more mid-range cards such as the 650Ti and 660, but I'm seeing similar issues on my 660 and my 660Ti (mostly on Windows). On my Linux rig (650TiBoost) I've had no issues (using 304.88 drivers) but I have not run enough NOELIA WU's to say for sure... The one thing we do know is that 1GB GPU's struggle with WU's that require over 1GB GDDR. That's been noted and is stipulated in the Recommended GPU list, which does get updated when new GPU's arrive, and when we learn the hard way about task requirements.

My conclusion is that it now takes at least a GTX 670 on the Longs in order to avoid the great slowdown we see. Even the 660s with 2 GB memory are not enough; I have learned that the hard way, and am going to put mine on Shorts. There is no point spending 20 hours or more grinding away when they could be doing more productive work. Even the higher-level cards may still have problems, but I think those will be worked out over time.

It is all in the way of scientific progress, which is fine with me, but they could have mentioned it to us.

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31626 - Posted: 18 Jul 2013 | 13:15:39 UTC - in response to Message 31625.

Earlier in this thread Zoltan said he had problems on one of his systems (GTX670's I think). Moving from 320 to 307.9 (on XP) seems to have resolved the issues.

There was/is several different problems. One problem was GDDR limitations on some cards. I still had problem on a 2GB GTX660Ti and a 2GB GTX660, as have others, but I didn't have problems with a GTX650TiBoost on Linux.

I've had fewer errors on 314 than 320, but none on 304.88. Hardly conclusive, and may only reduce some problems but generally it appears that earlier drivers are better.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31627 - Posted: 18 Jul 2013 | 13:36:59 UTC - in response to Message 31625.

I'm now about a month experimenting with my new GTX660 and to me it also depends on the set up of the system. In some systems (could be low PSU, wrong MOBO or wrong MOBO settings) it did not great an another they do.
I got a GTX770 yesterday and that is now highly under-performing in an i7. That could not be the card. I will find that out to swap it in another system.
I still believe that the GTX660 is good for this project. There a few things going here with the WU's so currently not a good period the value or de-value a GPU.
____________
Greetings from TJ

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31629 - Posted: 18 Jul 2013 | 15:47:39 UTC

Looked at from a purely PPD point of view on my GTX 660s, I just got 20,500 points from a I654-SANTI_baxbim1 in the Short que, which took 3 hours 33 minutes, or about 139k PPD. In contrast, the last Noelia Long to complete was a 2-NOELIA_2HRUN which yielded 112,500.00 points in just over 24 hours. So with this card I may be slightly better off in the Shorts, though there will be some Nathans in the Long que that would probably complete without incident in the usual times.

However, the main point is that you could get hung up even longer with the new work units; there is presently a 44x1-NOELIA_1MG that has been running for 28 1/2 hours and is at 87%, or about 33 hours to complete, assuming that it completes at all. I think that if GPUGrid wants the support of the mid-range cards for the Long que, they will need to adjust the work units accordingly.

But they may be happy enough with the higher-end cards for the Longs, in order to get new science done, and this is one way of achieving it. So it is not clear if it is a bug or a feature; they haven't told us one way or the other.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31630 - Posted: 18 Jul 2013 | 15:58:45 UTC - in response to Message 31626.
Last modified: 18 Jul 2013 | 15:59:23 UTC

Earlier in this thread Zoltan said he had problems on one of his systems (GTX670's I think). Moving from 320 to 307.9 (on XP) seems to have resolved the issues.

It did. I'm crunching NOELIA tasks error free.
I have 4 active hosts at the moment, every one has different drivers and OS.
1. WinXPx64, v310.33, 2xGTX680: 2 errors (in short time)
2. WinXPx64, v307.90, 2xGTX670: 11 errors due to the previous driver, and the experiments with the v320.49
3. WinXPx64, v314.22, GTX680+GTX670: 1 error: NATHAN_KIDKIXc22 :))
4. WinXPx86, v314.07, GTX680: 0 errors
Not active host:
5. Win7x64, v311.06, GTX480: 6 errors due to low GPU voltage (1000mV)
BTW: it seems that the long queue nearly run out of NOELIA workunits, as my hosts have 8 NATHAN's and 5 NOELIA's in their queue.

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31631 - Posted: 18 Jul 2013 | 17:31:04 UTC

Reporting back on the NOELIA_xMG_RUN I was crunching on my 1GB 650Ti: it took long but went fine! Here is the WU info.

Since I'm running on Linux, I can't tell how GPU and MC load was, all I know is that its temperature was normal (~62C). Right from the start, estimated time was at ~44h and the unit completed in 42.5h, about 1.5h earlier. So, this tells me that I didn't hit the "MC load at 1%" issue. Or, if I did, I did right from the very start! The WU consumed 622MB on the GPU.

137.5K of credit for 2 days of crunching is kind of lame, but I'm glad I can crunch these beasts! :D

I even stopped / started BOINC and suspended / resumed the WU a couple of times without issue :O Well, the joys of Linux, I guess!
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31632 - Posted: 18 Jul 2013 | 18:00:08 UTC - in response to Message 31631.

I even stopped / started BOINC and suspended / resumed the WU a couple of times without issue :O Well, the joys of Linux, I guess!


No, no I did that too in Vista x86 and it worked as well without failing the WU. Even booting the system. I tried everything I knew to got more than the 1% MCU load but nothing helped. I guess you had not the 1% load, because when I had it, the estimated time to finish was wrong every time. it was several times updated by BOINC, though wrong. Happy crunching![/quote]
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31641 - Posted: 18 Jul 2013 | 21:19:35 UTC - in response to Message 31631.

Thanks Zoltan, that helps paint a picture.

Vagelis Giannadakis, 42.5h for a GTX650Ti (1023) is likely down to the GDDR limitations. Typical WU's for that sort of credit and on that card take just under 24h.
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Bedrich Hajek
Send message
Joined: 28 Mar 09
Posts: 467
Credit: 8,188,346,966
RAC: 10,548,139
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31643 - Posted: 19 Jul 2013 | 1:56:01 UTC

I had 4 of these units fail simultaneously, on my windows 7 computer. I did a routine reboot, without first suspending the units, and they all crashed.

http://www.gpugrid.net/result.php?resultid=7060754

http://www.gpugrid.net/result.php?resultid=7057895

http://www.gpugrid.net/result.php?resultid=7057887

http://www.gpugrid.net/result.php?resultid=7057605


This doesn't happen with xp, but on windows 7, I guess you have to suspend the units before rebooting. This is a major issue!

Vagelis Giannadakis
Send message
Joined: 5 May 13
Posts: 187
Credit: 349,254,454
RAC: 0
Level
Asp
Scientific publications
watwatwatwatwatwatwatwatwatwat
Message 31648 - Posted: 19 Jul 2013 | 10:18:06 UTC

alax117-NOELIA_UBQ1-0-1-RND6675_0: 16.3h on GTX 650Ti / Linux, 142.5K credit! Mmm, yummy!! :D
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31690 - Posted: 20 Jul 2013 | 21:41:26 UTC

I have got a new type of Noelia: leux12-NOELIA_UBQ1-0-1-RND9216, it took almost 83000 seconds to complete. I saw its MCU load was only 15%.
I have now a new one running: prox80-NOELIA_UBQ1-0-1-RND00515_0, expected run time: 23 hours. This is better than the 1% load but still not great for fast return times. GPU load is around 85% and temperature is low: 53°C.
____________
Greetings from TJ

Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31701 - Posted: 22 Jul 2013 | 1:58:59 UTC
Last modified: 22 Jul 2013 | 2:00:05 UTC

You have got to be kidding me...

12x6-NOELIA_2HRUN-3-5-RND4863_0 Completed and validated 86,791.58 2,841.42 112,500.00

That's 24.11 hours run time. Not even zero cache would have helped... Grrrrr. :-(
____________

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31703 - Posted: 22 Jul 2013 | 8:52:59 UTC - in response to Message 31701.

Well thanks, than its not my hardware :)
____________
Greetings from TJ

Profile skgiven
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 23 Apr 09
Posts: 3968
Credit: 1,995,359,260
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31705 - Posted: 22 Jul 2013 | 11:59:58 UTC - in response to Message 31701.
Last modified: 22 Jul 2013 | 12:16:45 UTC

24.11h on a GTX570 does sound bad.
What else were you running, did the MCU drop to 1% and did the GPU downclock?
____________
FAQ's

HOW TO:
- Opt out of Beta Tests
- Ask for Help

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31706 - Posted: 22 Jul 2013 | 12:56:46 UTC
Last modified: 22 Jul 2013 | 12:58:02 UTC

Do the 2hrun units had the same issue about gpu ram? Because his win7 desktop needs more of that memory while my xp with 570's (or perhAps the desktopless and vram empty second card) needed "only" between 60-62k secs.
____________
DSKAG Austria Research Team: http://www.research.dskag.at



Profile ritterm
Avatar
Send message
Joined: 31 Jul 09
Posts: 88
Credit: 244,413,897
RAC: 0
Level
Leu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31708 - Posted: 22 Jul 2013 | 14:18:53 UTC - in response to Message 31705.

24.11h on a GTX570 does sound bad.
What else were you running, did the MCU drop to 1% and did the GPU downclock?

Right now, I'm running CPDN, POGS, WCG, Einstein CPU tasks, nothing unusual that I haven't run before alongside GPUGrid. I didn't note any MCU reading.

I did notice a little while ago that the NATHAN_KIDKIX that's been running since the lengthy NOELIA_2HRUN completed was taking unusually long and the GPU load was maybe a little low. I rebooted and it seems to be back on track to finish in 15 hours.

Also, I see that I completed another NOELIA_2HRUN several days ago in a time that I would have expected (61.7 Ksec).

So, it seems I probably was a victim of downclocking. Must remember to check that! :D Thanks!
____________

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31712 - Posted: 22 Jul 2013 | 21:16:33 UTC - in response to Message 31708.

If BOINC suspends the Noelia (for whatever reason) you may get a driver crash, which may lead the card in some strange state. I've seen memory downclocking (which actually increase Memory controller load, so quite the opposite of what SK and others are seeing) and just downclocking the chip. In the 1st case it's enough to set proper clocks again, in the 2nd only a reboot helps.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31716 - Posted: 22 Jul 2013 | 23:03:34 UTC

I saw just that the clock was down to 50% and MCU load of 17% doing a Santi LR.
Checking my tasks showed a Santi LR had errorred out before. That should than have result in the clock to down clock as ETA says.
A reboot did help. This is not nice, if I am home and can not reach the rigs at my office at work or vice verse.
And I had an very long running Nathan just like ritterm.
Thus NOELIA's are not the only WU's that behave strange lately.
____________
Greetings from TJ

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31758 - Posted: 27 Jul 2013 | 13:19:39 UTC - in response to Message 31716.
Last modified: 27 Jul 2013 | 13:19:52 UTC

The reason was likely a driver reset triggered by some error happening in the GPU. It should be quite hot now in your attic, isn't it? Maybe GPU clocks of -13 MHz are in order.

MrS
____________
Scanning for our furry friends since Jan 2002

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31766 - Posted: 27 Jul 2013 | 15:19:24 UTC - in response to Message 31758.
Last modified: 27 Jul 2013 | 15:22:39 UTC

Yeah way to hot 33.7°C. I have taken one PC downstairs but my girlfriend is not enjoying the noise. So I guess a bit less crunching in the next days.
I still have my heat problems as you said ;-) If its not the PC then the ambient temperature. Crunchers like winter...

Edit: by the way ETA, I see your RAC is very low for you, also under pressure by the heat wave?
____________
Greetings from TJ

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31771 - Posted: 27 Jul 2013 | 19:16:38 UTC

Happy you :P we have here 38 degress ^^ tommorow it should get 39 or above :( :( :(
____________
DSKAG Austria Research Team: http://www.research.dskag.at



flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31772 - Posted: 27 Jul 2013 | 20:11:01 UTC - in response to Message 31771.

Happy you :P we have here 38 degress


104° F in Austria? Is that a record?

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2343
Credit: 16,201,255,749
RAC: 7,520
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31773 - Posted: 27 Jul 2013 | 20:37:01 UTC - in response to Message 31771.

Happy you :P we have here 38 degress ^^ tommorow it should get 39 or above :( :( :(

We're 1 day behind you here in Budapest. According to the local weather forecast we'll have 37°C tomorrow, and 39°C on Monday, so I've set my hosts not to request new tasks. Maybe I'll crunch only 1 workunit per GPU during the night until this heatwave is gone.

Profile dskagcommunity
Avatar
Send message
Joined: 28 Apr 11
Posts: 456
Credit: 817,865,789
RAC: 0
Level
Glu
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31774 - Posted: 27 Jul 2013 | 21:53:57 UTC - in response to Message 31772.

Happy you :P we have here 38 degress


104° F in Austria? Is that a record?


No, the alltime hottest day was 39,7. So whe're getting close tomorrow ^^ or better :(
____________
DSKAG Austria Research Team: http://www.research.dskag.at



flashawk
Send message
Joined: 18 Jun 12
Posts: 297
Credit: 3,572,627,986
RAC: 0
Level
Arg
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 31775 - Posted: 27 Jul 2013 | 23:27:29 UTC

Wow, that's crazy for Central and even Eastern Europe, I never realized it got that hot there. Those temperatures are normal for the Northern San Joaquin Valley here in Northern California, I live up in the Sierra's by Yosemite where it can get as high as 32°C and I thought I was suffering with my water cooling.

I know it's not good for the fish when the water gets warm in the rivers and streams, it lowers the oxygen saturation levels and starts killing them off. I feel for both you guys, I'd much rather have extreme cold than the heat.

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2705
Credit: 1,311,122,549
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31776 - Posted: 28 Jul 2013 | 13:16:56 UTC - in response to Message 31766.

Edit: by the way ETA, I see your RAC is very low for you, also under pressure by the heat wave?

My main GPU is actually suffering from a healthy supply of POEM WUs, GPU-Grid is "only" a backup ;)

And I switched to the short queue due to the recent Noelia problems.. but since I usually wasn't getting any credit bonus on the long runs any more this should be fine.

It's "only" ~30°C over here.. but was said to reach 38°C as well. Can't get the temperautres inside down any more without inviting all the spidies and mosquitos!

MrS
____________
Scanning for our furry friends since Jan 2002

John C MacAlister
Send message
Joined: 17 Feb 13
Posts: 181
Credit: 144,871,276
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 31777 - Posted: 28 Jul 2013 | 14:21:35 UTC
Last modified: 28 Jul 2013 | 14:21:57 UTC

Only high 20s here in Canada and Sevilla, Spain. Last year in Sevilla during my visit: 42C.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 819
Credit: 1,591,285,971
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31778 - Posted: 28 Jul 2013 | 14:45:24 UTC - in response to Message 31776.

And I switched to the short queue due to the recent Noelia problems.. but since I usually wasn't getting any credit bonus on the long runs any more this should be fine.

I haven't gotten any errors on my GTX 660s on the longs for the last 10 days. I think it is safe to come home.

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31779 - Posted: 28 Jul 2013 | 14:56:26 UTC - in response to Message 31771.

Happy you :P we have here 38 degress ^^ tommorow it should get 39 or above :( :( :(

No, no not happy me :( that temperature is in my attic (34.1°C at the moment).
Outside it is okay with 24°C now but was 32 for a couple a days in a row.
But you have a nice summer then in Austria, but not nice for computers that have to work hard.
____________
Greetings from TJ

TJ
Send message
Joined: 26 Jun 09
Posts: 815
Credit: 1,470,385,294
RAC: 0
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 31780 - Posted: 28 Jul 2013 | 15:04:40 UTC - in response to Message 31778.

And I switched to the short queue due to the recent Noelia problems.. but since I usually wasn't getting any credit bonus on the long runs any more this should be fine.

I haven't gotten any errors on my GTX 660s on the longs for the last 10 days. I think it is safe to come home.

I wish I could say that as well. It was a long time back my 660 finished a LR, in the last days all Santi LR crashed after long running, near finish. I like the Noelia´s, when they error they did that very quickly. I even downgraded the drivers again on several advise. Nathan´s still seem the best...for crunchers.
____________
Greetings from TJ

Stefan
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 5 Mar 13
Posts: 348
Credit: 0
RAC: 0
Level

Scientific publications
wat
Message 31786 - Posted: 29 Jul 2013 | 12:20:42 UTC

Guys, I think it would be better to move this discussion to other subforums to keep this subforum for actual news. This thread especially has 328 posts and I think it has served its purpose a while now, so I will lock it to let it rest in peace.
If there are any objections to the lock just send me a pm.

Message boards : News : Old Noelia WUs

//