Advanced search

Message boards : Number crunching : New app update (acemd3)

Author Message
Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51939 - Posted: 30 May 2019 | 16:01:47 UTC

I am testing the new acemd3 app. The app is entirely new: faster and more general. The idea is to replace the old one asap. We'll also try to make it more maintainable (a long standing issue) using the boinc wrapper.

I've sent a handful of test WUs for now -- cuda 8.0, linux.

The goal is that it should work on properly configured machines, i.e. with relatively recent drivers, where the previous app was already working. So far we got one success, i.e. 20962989.

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 51940 - Posted: 30 May 2019 | 19:36:59 UTC - in response to Message 51939.
Last modified: 30 May 2019 | 19:39:14 UTC

do you mean this one? crunched in 6 or 7 minutes.

http://www.gpugrid.net/result.php?resultid=20962989

but i cann't see which gpu is used to crunch this task

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 174
Credit: 289,449,460
RAC: 327,693
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51942 - Posted: 30 May 2019 | 21:32:24 UTC

How to select work for the new app? "New version of ACEMD" app is not a choice under project preferences. The current choices are only these:


    ACEMD short runs (2-3 hours on fastest card)
    ACEMD long runs (8-12 hours on fastest GPU)
    ACEMD Beta
    Quantum Chemistry (CPU)
    Quantum Chemistry (CPU, beta)
    Python Runtime



"ACEMD Beta" looks likely, but the name doesn't match "New version of ACEMD", which is how it is being reported over on wuprop. And also it does not match the name on the app page. In fact, the app page indicates that "ACEMD Beta" and "New version of ACEMD" are completely different apps.
____________
Reno, NV
Team: SETI.USA

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 51947 - Posted: 30 May 2019 | 23:38:22 UTC
Last modified: 31 May 2019 | 0:21:02 UTC

So far we got one success, i.e. 20962989.


The other 5 Test tasks seem "stuck". They have been in progress now for quite a while.

They must be really long, have errored, or hosts have downloaded the tasks and then been turned off.

Can our Linux crunchers check your Linux hosts for progress?

EDIT: The successful task above has also been accepted by 2 Windows Hosts ("New version of ACEMD v1.19" but failed. Also failed on 2 Linux hosts "New version of ACEMD v2.00"). So it seems the Test tasks are being accepted by Windows and Linux hosts. The successful Linux Host has Nvidia driver v430.14. The failed hosts had Nvidia drivers ranging from v375.70 to v418.19.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 966
Level
Lys
Scientific publications
wat
Message 51948 - Posted: 31 May 2019 | 1:16:13 UTC - in response to Message 51942.

How to select work for the new app? "New version of ACEMD" app is not a choice under project preferences. The current choices are only these:


    ACEMD short runs (2-3 hours on fastest card)
    ACEMD long runs (8-12 hours on fastest GPU)
    ACEMD Beta
    Quantum Chemistry (CPU)
    Quantum Chemistry (CPU, beta)
    Python Runtime



"ACEMD Beta" looks likely, but the name doesn't match "New version of ACEMD", which is how it is being reported over on wuprop. And also it does not match the name on the app page. In fact, the app page indicates that "ACEMD Beta" and "New version of ACEMD" are completely different apps.



I've just selected everything including test apps with only Use GPUs selected. Nothing yet though but I would think that should be enough. Devs can sneak in about anything under the test apps options.

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 51957 - Posted: 31 May 2019 | 19:01:43 UTC

probably my next build (in 6-10months) will be 4, 5 or 6 rtx cards. hopefully is the app then mature enough for investing couple of thousand euro for gpugrid

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51966 - Posted: 3 Jun 2019 | 10:57:57 UTC - in response to Message 51957.

The number of failures, and the existence of one success, is odd. Doesn't seem to be explained by driver versions alone.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,343,768,551
RAC: 3,059,220
Level
Arg
Scientific publications
watwatwat
Message 51967 - Posted: 3 Jun 2019 | 12:20:32 UTC

Try sending out more experimental WUs and see if it is one driver version

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51968 - Posted: 3 Jun 2019 | 13:19:29 UTC - in response to Message 51967.

Recent changes:
* sent 100 more test wus
* deprecated the windows "acemd3" app
* made acemd3 as beta
* fixed its name in prefs

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,343,768,551
RAC: 3,059,220
Level
Arg
Scientific publications
watwatwat
Message 51969 - Posted: 3 Jun 2019 | 14:16:23 UTC

Errored WUs on multiple different drivers and OS's

http://www.gpugrid.net/results.php?userid=306281

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51970 - Posted: 3 Jun 2019 | 14:25:12 UTC

Multiple failures of this task on both windows and linux

http://www.gpugrid.net/workunit.php?wuid=16517304

<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper (7.7.26016): starting
15:19:27 (30109): wrapper: running acemd3 (--boinc input --device 0)
# Engine failed: Error launching CUDA compiler: 32512
sh: 1: : Permission denied

15:19:28 (30109): acemd3 exited; CPU time 0.186092
15:19:28 (30109): app exit status: 0x1
15:19:28 (30109): called boinc_finish(195)

</stderr_txt>


Why is the app launching CUDA compiler?

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51971 - Posted: 3 Jun 2019 | 14:50:42 UTC
Last modified: 3 Jun 2019 | 15:29:06 UTC

My host 43404 got one of WU 16517259.

Like all the others, it failed within one second:

03/06/2019 15:36:58 | GPUGRID | Starting task a27-TONI_TEST3-0-1-RND0985_6
03/06/2019 15:36:58 | GPUGRID | [cpu_sched] Starting task a27-TONI_TEST3-0-1-RND0985_6 using acemd3 version 119 (cuda80) in slot 0
03/06/2019 15:36:59 | GPUGRID | [sched_op] Deferring communication for 00:01:03
03/06/2019 15:36:59 | GPUGRID | [sched_op] Reason: Unrecoverable error for task a27-TONI_TEST3-0-1-RND0985_6
03/06/2019 15:36:59 | GPUGRID | Computation for task a27-TONI_TEST3-0-1-RND0985_6 finished
03/06/2019 15:36:59 | GPUGRID | Output file a27-TONI_TEST3-0-1-RND0985_6_0 for task a27-TONI_TEST3-0-1-RND0985_6 absent
03/06/2019 15:36:59 | GPUGRID | Output file a27-TONI_TEST3-0-1-RND0985_6_9 for task a27-TONI_TEST3-0-1-RND0985_6 absent

with no further information than

Incorrect function.
(0x1) - exit code 1 (0x1)

But I did capture all the specifications and downloaded files between download and run, so I can recreate the attempt offline and see what additional crash information I can collect. May take me a little time...

Windows 7/64, GTX 970, runs v9.22 just fine.

Edit - all I can get in offline runs is

ACEMD can run with Boinc only!

- even when I supply a dummy init_data.xml file which has worked in other standalone test environments. I'll go out for a walk and see if that activates the little grey cells.

zombie67 [MM]
Avatar
Send message
Joined: 16 Jul 07
Posts: 174
Credit: 289,449,460
RAC: 327,693
Level
Asn
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51972 - Posted: 3 Jun 2019 | 15:18:12 UTC - in response to Message 51968.

Recent changes:
* sent 100 more test wus
* deprecated the windows "acemd3" app
* made acemd3 as beta
* fixed its name in prefs


Can you please explain which app we have to select in our project preferences to get these tasks? The app name "New version of ACEMD" is not a an option in the project preferences.
____________
Reno, NV
Team: SETI.USA

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51973 - Posted: 3 Jun 2019 | 15:22:39 UTC - in response to Message 51972.
Last modified: 3 Jun 2019 | 15:24:02 UTC

Should be called "ACEMD3 Beta". It's for Linux only (for now).
Windows machines should soon stop receiving it.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51974 - Posted: 3 Jun 2019 | 15:23:46 UTC - in response to Message 51972.
Last modified: 3 Jun 2019 | 15:25:40 UTC

Can you please explain which app we have to select in our project preferences to get these tasks? The app name "New version of ACEMD" is not a an option in the project preferences.

The computer I got a test app on has

If no work for selected applications is available, accept work from other applications?
yes

Nothing else out of the ordinary.

The app name appeared as 'acemd3'.

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51975 - Posted: 3 Jun 2019 | 16:08:19 UTC

I got 1 task but it failed.:-(

http://www.gpugrid.net/result.php?resultid=20974689

linux mint 19.1
GTX 1080
Driver: 390.116
Cuda version 9.1

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51976 - Posted: 3 Jun 2019 | 16:28:07 UTC

All but 2 of the libraries that were downloaded are marked as executable. Should libgcc and libOpenCL also be executable?


-rwxr-xr-x 1 boinc boinc 425056 Jun 3 11:52 libcudart.so.8.0.61.46fcfd92ffc5c805d076b5e2b17e9647 -rwxr-xr-x 1 boinc boinc 146772120 Jun 3 11:56 libcufft.so.8.0.61.b142ab8797d534b619ef19c7e98cffc7 -rwxr-xr-x 1 boinc boinc 1647707 Jun 3 11:53 libfftw3f.so.3.4.4.a4580ddf9efebaad56fab49847a8c899 -rwxr-xr-x 1 boinc boinc 31467 Jun 3 11:52 libfftw3f_threads.so.3.4.4.dd0c6fcfa550371acf730db2d9d5a270 -rw-r--r-- 1 boinc boinc 819744 Jun 3 11:52 libgcc_s.so.1.d7f787a9bf6c3633eaebb9015c6d9044 -rwxr-xr-x 1 boinc boinc 937656 Jun 3 11:52 libgomp.so.1.0.0.efdf718669edc7fff00e0c5f7f0b8791 -rwxr-xr-x 1 boinc boinc 9659424 Jun 3 11:54 libnvrtc-builtins.so.8.0.61.ef79235263e650333dd8c573faa47432 -rwxr-xr-x 1 boinc boinc 18517368 Jun 3 11:54 libnvrtc.so.8.0.61.1ac77468cd8086b8cd1a6c855da50f8c -rw-r--r-- 1 boinc boinc 31696 Jun 3 11:52 libOpenCL.so.1.0.0.343dee45a7d7eb4b9016b6cd9d1bd8d5 -rwxr-xr-x 1 boinc boinc 655240 Jun 3 11:54 libOpenMMCPU.so.19849b4ff1cf4d33f75d9433b4d5c6bb -rwxr-xr-x 1 boinc boinc 37096 Jun 3 11:53 libOpenMMCudaCompiler.so.aaed781fe4caa9d1099312d458a9b902 -rwxr-xr-x 1 boinc boinc 2774560 Jun 3 11:52 libOpenMMCUDA.so.8867021fdc0daf2e39f1b7228ece45af -rwxr-xr-x 1 boinc boinc 2979224 Jun 3 11:52 libOpenMMOpenCL.so.6a31fa1ff5ae3a26ea64f2abfb5a66cc -rwxr-xr-x 1 boinc boinc 80808 Jun 3 11:53 libOpenMMPME.so.3208e45e71567824e8390ab1c79c6a66 -rwxr-xr-x 1 boinc boinc 4062370 Jun 3 11:53 libOpenMM.so.5406dfd716045d08ad6369e2399a98e2 -rwxr-xr-x 1 boinc boinc 9536208 Jun 3 11:54 libstdc++.so.6.0.25.e344f48acfbd4f5abbf99b2c75cc5e50

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51977 - Posted: 3 Jun 2019 | 16:45:41 UTC

regarding the error on my task:

# Engine failed: Error launching CUDA compiler: 32512
sh: 1: : Permission denied

Is this solution relevant?

https://github.com/pandegroup/openmm/issues/1352

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 51978 - Posted: 3 Jun 2019 | 22:11:46 UTC

http://www.gpugrid.net/result.php?resultid=20974104

fail on msi gtx 1070, 8gb itx card, windows 10

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 51979 - Posted: 4 Jun 2019 | 0:04:55 UTC
Last modified: 4 Jun 2019 | 0:07:29 UTC

Hi Toni

are you explicitly naming the path to libnvrtc-builtins.so when compiling?

perhaps include boinc project folder in LD_LIBRARY_PATH

Nick Name
Send message
Joined: 3 Sep 13
Posts: 23
Credit: 968,483,244
RAC: 1,301,567
Level
Glu
Scientific publications
watwatwatwatwatwatwatwat
Message 51980 - Posted: 4 Jun 2019 | 5:18:42 UTC

name a11-TONI_TEST3-0-1-RND0663
https://www.gpugrid.net/workunit.php?wuid=16517242
Failure on all machines.

My result here:https://www.gpugrid.net/result.php?resultid=20976177

My log:
50 GPUGRID 6/4/2019 2:44:30 AM Started download of acemd3.119.exe
51 GPUGRID 6/4/2019 2:44:30 AM Started download of boost_filesystem-vc140-mt-1_65_1.119.dll
52 GPUGRID 6/4/2019 2:44:32 AM Finished download of acemd3.119.exe
53 GPUGRID 6/4/2019 2:44:32 AM Started download of boost_system-vc140-mt-1_65_1.119.dll
54 GPUGRID 6/4/2019 2:44:33 AM Finished download of boost_filesystem-vc140-mt-1_65_1.119.dll
55 GPUGRID 6/4/2019 2:44:33 AM Finished download of boost_system-vc140-mt-1_65_1.119.dll
56 GPUGRID 6/4/2019 2:44:33 AM Started download of cufft64_80.119.dll
57 GPUGRID 6/4/2019 2:44:33 AM Started download of msvcp140.119.dll
58 GPUGRID 6/4/2019 2:44:38 AM Finished download of msvcp140.119.dll
59 GPUGRID 6/4/2019 2:44:38 AM Started download of nvrtc64_80.119.dll
60 GPUGRID 6/4/2019 2:45:06 AM Finished download of nvrtc64_80.119.dll
61 GPUGRID 6/4/2019 2:45:06 AM Started download of nvrtc-builtins64_80.119.dll
62 GPUGRID 6/4/2019 2:45:28 AM Finished download of nvrtc-builtins64_80.119.dll
63 GPUGRID 6/4/2019 2:45:28 AM Started download of OpenMMCPU.119.dll
64 GPUGRID 6/4/2019 2:45:30 AM Finished download of OpenMMCPU.119.dll
65 GPUGRID 6/4/2019 2:45:30 AM Started download of OpenMMCudaCompiler.119.dll
66 GPUGRID 6/4/2019 2:45:32 AM Finished download of OpenMMCudaCompiler.119.dll
67 GPUGRID 6/4/2019 2:45:32 AM Started download of OpenMMCUDA.119.dll
68 GPUGRID 6/4/2019 2:45:39 AM Finished download of OpenMMCUDA.119.dll
69 GPUGRID 6/4/2019 2:45:39 AM Started download of OpenMM.119.dll
70 GPUGRID 6/4/2019 2:45:48 AM Finished download of OpenMM.119.dll
71 GPUGRID 6/4/2019 2:45:48 AM Started download of OpenMMOpenCL.119.dll
72 GPUGRID 6/4/2019 2:45:54 AM Finished download of OpenMMOpenCL.119.dll
73 GPUGRID 6/4/2019 2:45:54 AM Started download of OpenMMPME.119.dll
74 GPUGRID 6/4/2019 2:45:58 AM Finished download of OpenMMPME.119.dll
75 GPUGRID 6/4/2019 2:45:58 AM Started download of psprolib.119.dll
76 GPUGRID 6/4/2019 2:46:00 AM Finished download of psprolib.119.dll
77 GPUGRID 6/4/2019 2:46:00 AM Started download of vcruntime140.119.dll
78 GPUGRID 6/4/2019 2:46:01 AM Finished download of vcruntime140.119.dll
79 GPUGRID 6/4/2019 2:46:01 AM Started download of a11-TONI_TEST3-0-conf_file_enc
80 GPUGRID 6/4/2019 2:46:02 AM Finished download of a11-TONI_TEST3-0-conf_file_enc
81 GPUGRID 6/4/2019 2:46:02 AM Started download of a11-TONI_TEST3-0-coor_file
82 GPUGRID 6/4/2019 2:46:03 AM Finished download of a11-TONI_TEST3-0-coor_file
83 GPUGRID 6/4/2019 2:46:03 AM Started download of a11-TONI_TEST3-0-vel_file
84 GPUGRID 6/4/2019 2:46:04 AM Finished download of a11-TONI_TEST3-0-vel_file
85 GPUGRID 6/4/2019 2:46:04 AM Started download of a11-TONI_TEST3-0-idx_file
86 GPUGRID 6/4/2019 2:46:05 AM Finished download of a11-TONI_TEST3-0-idx_file
87 GPUGRID 6/4/2019 2:46:05 AM Started download of a11-TONI_TEST3-0-xsc_file
88 GPUGRID 6/4/2019 2:46:06 AM Finished download of a11-TONI_TEST3-0-xsc_file
89 GPUGRID 6/4/2019 2:46:06 AM Started download of a11-TONI_TEST3-0-pdb_file
90 GPUGRID 6/4/2019 2:46:11 AM Finished download of a11-TONI_TEST3-0-pdb_file
91 GPUGRID 6/4/2019 2:46:11 AM Started download of a11-TONI_TEST3-0-psf_file
92 GPUGRID 6/4/2019 2:46:24 AM Finished download of a11-TONI_TEST3-0-psf_file
93 GPUGRID 6/4/2019 2:46:24 AM Started download of a11-TONI_TEST3-0-par_file
94 GPUGRID 6/4/2019 2:46:26 AM Finished download of a11-TONI_TEST3-0-par_file
95 GPUGRID 6/4/2019 2:46:26 AM Started download of a11-TONI_TEST3-0-prmtop_file
96 GPUGRID 6/4/2019 2:46:27 AM Finished download of a11-TONI_TEST3-0-prmtop_file
97 GPUGRID 6/4/2019 2:49:48 AM Finished download of cufft64_80.119.dll
98 GPUGRID 6/4/2019 2:49:49 AM Starting task a11-TONI_TEST3-0-1-RND0663_5
99 GPUGRID 6/4/2019 2:49:50 AM Computation for task a11-TONI_TEST3-0-1-RND0663_5 finished
100 GPUGRID 6/4/2019 2:49:50 AM Output file a11-TONI_TEST3-0-1-RND0663_5_0 for task a11-TONI_TEST3-0-1-RND0663_5 absent
101 GPUGRID 6/4/2019 2:49:50 AM Output file a11-TONI_TEST3-0-1-RND0663_5_9 for task a11-TONI_TEST3-0-1-RND0663_5 absent
____________
Team USA forum | Team USA page
Always crunching / Always recruiting

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51981 - Posted: 4 Jun 2019 | 15:49:25 UTC - in response to Message 51980.

I think I debugged it (app version 201). 100 new WUs sent. Progress bar should also work (please report if not).

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51982 - Posted: 4 Jun 2019 | 16:03:29 UTC - in response to Message 51981.
Last modified: 4 Jun 2019 | 17:21:54 UTC

There are many more successes now.

Edit.

The reason for failures is not really clear. Question for anybody who has seen a success: do you have the CUDA Toolkit installed?

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,343,768,551
RAC: 3,059,220
Level
Arg
Scientific publications
watwatwat
Message 51983 - Posted: 4 Jun 2019 | 18:31:02 UTC - in response to Message 51982.
Last modified: 4 Jun 2019 | 18:32:31 UTC

There are many more successes now.

Edit.

The reason for failures is not really clear. Question for anybody who has seen a success: do you have the CUDA Toolkit installed?

Hello Toni, I have received many successes and when I typed "nvcc -V" to verify the CUDA Toolkit version, it says "The program 'nvcc' is currently not installed. You can install it by typing:

sudo apt install nvidia-cuda-toolkit"

My system seems to not have it installed.

This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281

Erich56
Send message
Joined: 1 Jan 15
Posts: 595
Credit: 3,086,512,044
RAC: 1,684,332
Level
Arg
Scientific publications
watwatwatwatwatwat
Message 51984 - Posted: 4 Jun 2019 | 18:36:35 UTC - in response to Message 51983.

This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281

access denied :-(

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,343,768,551
RAC: 3,059,220
Level
Arg
Scientific publications
watwatwat
Message 51985 - Posted: 4 Jun 2019 | 18:48:15 UTC - in response to Message 51984.

This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281

access denied :-(

Perhaps you can view a single WU? http://www.gpugrid.net/result.php?resultid=20978809

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51986 - Posted: 4 Jun 2019 | 19:06:44 UTC

I have 2 machines. Both have linux mint 19.1 installed, same nvidia driver (390.116), cuda toolkit release 9.1 (both tested as functional), same boinc version 7.14.2.

The hardware is different:

dual GTX 1080's on 2700X: All tasks are failing.

http://www.gpugrid.net/results.php?hostid=482792

dual GTX 1080 Ti's on E5-2690 v2: All tasks are completing successfully!

http://www.gpugrid.net/results.php?hostid=464987


There must be a clue here. Any ideas?

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 3,316,045
Level
Tyr
Scientific publications
wat
Message 51987 - Posted: 4 Jun 2019 | 20:07:29 UTC - in response to Message 51982.

Question for anybody who has seen a success: do you have the CUDA Toolkit installed?


No. I installed the Nvidia 430.14 drivers as Linux metapackages. According to the Synaptic Package Manager I do not have the CUDA Toolkit installed.

____________

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51988 - Posted: 4 Jun 2019 | 21:58:14 UTC - in response to Message 51986.

I have 2 machines. Both have linux mint 19.1 installed, same nvidia driver (390.116), cuda toolkit release 9.1 (both tested as functional), same boinc version 7.14.2.

The hardware is different:

dual GTX 1080's on 2700X: All tasks are failing.

http://www.gpugrid.net/results.php?hostid=482792

dual GTX 1080 Ti's on E5-2690 v2: All tasks are completing successfully!

http://www.gpugrid.net/results.php?hostid=464987


There must be a clue here. Any ideas?


I can't find anything in the logs. I was running Rosetta on the machine that had the failed GPUGrid tasks. There was no other project running on the machine that had the successful GPUGrid tasks.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 51989 - Posted: 5 Jun 2019 | 0:05:22 UTC - in response to Message 51983.

There are many more successes now.

Edit.

The reason for failures is not really clear. Question for anybody who has seen a success: do you have the CUDA Toolkit installed?

Hello Toni, I have received many successes and when I typed "nvcc -V" to verify the CUDA Toolkit version, it says "The program 'nvcc' is currently not installed. You can install it by typing:

sudo apt install nvidia-cuda-toolkit"

My system seems to not have it installed.

This is the list of the successful tasks: http://www.gpugrid.net/results.php?userid=306281

Even though nvcc is actually present on my Jetson Nano, nvcc -V yielded program not found. It is located at /usr/local/cuda-10.0/bin/nvcc

I had to export the directory where nvcc was located for it to be found. That enabled a program to find nvcc.
keith@Nano:~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sun_Sep_30_21:09:22_CDT_2018
Cuda compilation tools, release 10.0, V10.0.166

But as soon as I rebooted, nvcc could not be found. So I ended up adding the library directory as an export in .bashrc and then I could find nvcc after reboots.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 966
Level
Lys
Scientific publications
wat
Message 51990 - Posted: 5 Jun 2019 | 0:34:07 UTC
Last modified: 5 Jun 2019 | 0:34:21 UTC

I completed one while 5 others had errors.
https://www.gpugrid.net/workunit.php?wuid=16520276

nvcc -V results
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51991 - Posted: 5 Jun 2019 | 1:13:16 UTC - in response to Message 51989.


But as soon as I rebooted, nvcc could not be found. So I ended up adding the library directory as an export in .bashrc and then I could find nvcc after reboots.


Another option is to place the cuda library path in a file in /etc/ld.so.conf.d.

you could name the file cuda.conf

then:

sudo ldconfig



Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 51992 - Posted: 5 Jun 2019 | 6:00:16 UTC - in response to Message 51991.


But as soon as I rebooted, nvcc could not be found. So I ended up adding the library directory as an export in .bashrc and then I could find nvcc after reboots.


Another option is to place the cuda library path in a file in /etc/ld.so.conf.d.

you could name the file cuda.conf

then:

sudo ldconfig


Correct. That is the other method I researched as a popular solution.

So am I correct in understanding now is that one has to install the CUDA toolkit to run the new acemd application?

That the wrapper download itself is insufficient?

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51993 - Posted: 5 Jun 2019 | 7:33:01 UTC - in response to Message 51992.

Hi all, thanks for the reports.

The app SHOULD not require the cuda toolkit (which includes nvcc), yet on SOME hosts it is looking for it, and fails (the error message is more or less the same).

I still don't understand the conditions when this occurs. In particular, as biodoc's precious example, there is no clear relationship between the card generation, driver, and success/failure.

@biodoc, can you see other obvious differences between the two machines? E.g.

- boinc installation method
- presence of the gcc package

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 51994 - Posted: 5 Jun 2019 | 7:50:41 UTC

Well, I see I attempted to run a task that failed on one host. I looked over all the downloaded files and thought to do a sanity check on the executable. This is what ldd showed.

keith@Numbskull:~/Desktop/BOINC/projects/www.gpugrid.net$ ldd '/home/keith/Desktop/BOINC/projects/www.gpugrid.net/acemd.919-80.bin' linux-vdso.so.1 (0x00007ffdf14d5000)
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007fa630a0c000)
libcudart.so.8.0 => not found
libcufft.so.8.0 => not found
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa630808000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa6305e9000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa630260000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa62fec2000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa62fcaa000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa62f8b9000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fa62f6b1000)
libnvidia-fatbinaryloader.so.418.56 => /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.56 (0x00007fa62f463000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa631b63000)
keith@Numbskull:~/Desktop/BOINC/projects/www.gpugrid.net$


So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory.

I don't think it would make any difference if/when all the files and work unit get copied into a slot.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51995 - Posted: 5 Jun 2019 | 7:51:07 UTC - in response to Message 51992.



So am I correct in understanding now is that one has to install the CUDA toolkit to run the new acemd application?

That the wrapper download itself is insufficient?


You don't (shouldn't) need to install any additional software, if everything works as intended (not the wrapper, nor the cuda toolkit).

You may need to update the drivers, though.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51996 - Posted: 5 Jun 2019 | 8:02:27 UTC - in response to Message 51994.

libcudart.so.8.0 => not found
libcufft.so.8.0 => not found

So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory.

If somebody can post or upload the three components of a test workunit specification:

* <app_version>
* <workunit>
* <result>

all from client_state.xml - make sure you get the right (latest) version of <app_version>, there will be several of them - I can proofread that there are no bugs in the BOINC deployment of the app files. This one could be a problem with the version renaming or copying.

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51997 - Posted: 5 Jun 2019 | 8:42:59 UTC - in response to Message 51993.

Hi all, thanks for the reports.

@biodoc, can you see other obvious differences between the two machines? E.g.

- boinc installation method
- presence of the gcc package


No, the boinc installation method is the same (repository meta package) and gcc is installed on both machines (build-essential package). I ran ldd on wrapper_26198_x86_64-pc-linux-gnu and acemd3.e72153abf98cb1fcd0f05fc443818dfc on both machines and the output is identical.

Working machine:

mark@x20-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./wrapper_26198_x86_64-pc-linux-gnu linux-vdso.so.1 (0x00007ffc1bfab000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7ab23ba000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f7ab21a2000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7ab1f83000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7ab1b92000) /lib64/ld-linux-x86-64.so.2 (0x00007f7ab2758000) mark@x20-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./acemd3.e72153abf98cb1fcd0f05fc443818dfc linux-vdso.so.1 (0x00007ffda9bfe000) libOpenMM.so => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007ffb4cb37000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007ffb4c7ae000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007ffb4c410000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007ffb4c1f8000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007ffb4be07000) /lib64/ld-linux-x86-64.so.2 (0x00007ffb4cd3b000)


machine with failures:

mark@x16-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./wrapper_26198_x86_64-pc-linux-gnu linux-vdso.so.1 (0x00007ffd96952000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd300b09000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fd3008f1000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd3006d2000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd3002e1000) /lib64/ld-linux-x86-64.so.2 (0x00007fd300ea7000) mark@x16-linux:/var/lib/boinc/projects/www.gpugrid.net$ ldd ./acemd3.e72153abf98cb1fcd0f05fc443818dfc linux-vdso.so.1 (0x00007ffef0097000) libOpenMM.so => not found libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fabe9b83000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fabe97fa000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fabe945c000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fabe9244000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fabe8e53000) /lib64/ld-linux-x86-64.so.2 (0x00007fabe9d87000)



Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 51998 - Posted: 5 Jun 2019 | 9:16:14 UTC - in response to Message 51994.
Last modified: 5 Jun 2019 | 9:19:03 UTC


So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory.

I don't think it would make any difference if/when all the files and work unit get copied into a slot.


We are distributing the two files with the app. They are copied (via copy_file) into the slot, and the slot is added to LD_LIBRARY_PATH. It works locally and on many machines; I am inclined to think it's not the problem.

The "permission denied" bit seems related to a later stage, possibly an attempt to compile the cuda bytecode into the form necessary for the specific graphic card (done via nvrtc).

If anybody is able to capture the "progress.log" file before it's deleted, thanks!

T

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51999 - Posted: 5 Jun 2019 | 9:38:29 UTC

I did find a "messy" install of the nvidia driver on the offending machine. There seems to be remnants of a driver installed via download directly from nvidia. I'll clean that up.

'sudo apt search nvidia' showed significant differences between the 2 machines.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52000 - Posted: 5 Jun 2019 | 10:06:38 UTC - in response to Message 51999.
Last modified: 5 Jun 2019 | 10:07:04 UTC

I did find a "messy" install of the nvidia driver on the offending machine. There seems to be remnants of a driver installed via download directly from nvidia. I'll clean that up.

'sudo apt search nvidia' showed significant differences between the 2 machines.



From what I know, "apt search" does not look at the packages installed in your system but those "accessible" online. So, the difference may be in the repository configurations.

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52001 - Posted: 5 Jun 2019 | 10:49:24 UTC - in response to Message 52000.


From what I know, "apt search" does not look at the packages installed in your system but those "accessible" online. So, the difference may be in the repository configurations.


Yeah, dpkg -l | grep -i nvidia is the right command.

I went ahead and purged everything nvidia and reinstalled the nvidia driver. I didn't install the cuda toolkit though.

UPDATE: tasks still failing on this machine.

PurpleHat
Send message
Joined: 4 Jun 19
Posts: 3
Credit: 11,999,700
RAC: 15
Level
Pro
Scientific publications
wat
Message 52002 - Posted: 5 Jun 2019 | 11:40:30 UTC - in response to Message 51998.
Last modified: 5 Jun 2019 | 12:39:34 UTC

Toni
host:
CUDA: NVIDIA GPU 0: GeForce GTX 1080 (driver version 418.56, CUDA version 10.1, compute capability 6.1, 4096MB, 3968MB available, 9718 GFLOPS peak)
OpenCL: NVIDIA GPU 0: GeForce GTX 1080 (driver version 418.56, device version OpenCL 1.2 CUDA, 8112MB, 3968MB available, 9718 GFLOPS peak)

Progress.log from a vaild task:


#
# ACEMD version 3.2.0rc0-65-gdb8d7f8[/code]
#
# Copyright (C) 2017-2019 Acellera (www.acellera.com)
#
# When publishing, please cite:
# ACEMD: Accelerating Biomolecular Dynamics in the Microsecond Time Scale
# M. J. Harvey, G. Giupponi and G. De Fabritiis,
# J Chem. Theory. Comput. 2009 5(6), pp1632-1639
# DOI: 10.1021/ct9000685
#
# ACEMD is running in Boinc mode!
#
# Read input file: input
# Parse input file
# WARNING: Keyword "hydrogenscale" is deprecated: Hydrogen mass scaling enabled when timestep > 2.0
# WARNING: Keyword "rigidbonds" is deprecated: Rigid bonds set when timestep > 1.0
# WARNING: Keyword "exclude" is deprecated: 1-4 exclusion automatically set by force-field
# WARNING: Keyword "1-4scaling" is deprecated: 1-4 scaling automatically set by force-field
# WARNING: Keyword "pmegridsizex" is deprecated: Feature not supported
# WARNING: Keyword "pmegridsizey" is deprecated: Feature not supported
# WARNING: Keyword "pmegridsizez" is deprecated: Feature not supported
# WARNING: Keyword "pmefreq" is deprecated: MTS not supported
# WARNING: Deprecated keyword "langevin" is replaced with "thermostat"
# WARNING: Deprecated keyword "langevindamping" is replaced with "thermostatDamping"
# WARNING: Keyword "energyfreq" is deprecated: Energies are now output every trajectoryFreq steps
$
$# Forcefield configuration
$
$ parameters parameters
$
$# Initial State
$
$ structure structure.psf
$ coordinates structure.pdb
$ temperature 300.00 # K
$ celldimension 62.230000 62.230000 62.230000 # A
$
$# Output
$
$ trajectoryFile output.xtc
$ trajectoryFreq 25000
$
$# Electrostatics
$
$ PME on
$ cutoff 9.00 # A
$ switching on
$ switchDist 7.50 # A
$ implicit off
$
$# Temperature Control
$
$ thermostat on
$ thermostatTemp 298.15 # K
$ thermostatDamping 1.00 # /ps
$
$# Pressure Control
$
$ barostat off
$ barostatPressure 1.0000 # bar
$ useFlexibleCell off
$ useConstantArea off
$ useConstantRatio off
$
$# Integration
$
$ timestep 4.00 # fs
$
$# External forces
$
$
$# Restraints
$
$
$# Run Configuration
$
$ restart off
$ run 250000
# Topology reports 23558 atoms
# Initializing engine
# Version: 7.3.1
# WARNING: overriding the plugin path to /var/lib/boinc-client/slots/40 with ACEMD_PLUGIN_DIR
# Plugin directory: /var/lib/boinc-client/slots/40
# Loaded plugins
# libOpenMMCUDA
# libOpenMMPME
# libOpenMMOpenCL
# libOpenMMCPU
# libOpenMMCudaCompiler
# Available platforms
# CUDA
# OpenCL
# CPU
#
# Bonded interactions
# Harmonic bond interactions
# Number of terms: 16569
# Harmonic angle interactions
# Number of terms: 11584
# Urey-Bradley interactions
# Number of terms: 2117
# Proper dihedral interations
# Number of terms: 5621
# Number of skipped terms: 1379
# NOTE: the skipped terms have zero force constants
# Improper dihedral interations
# Number of terms: 408
# Number of skipped terms: 10
# NOTE: the skipped terms have zero force constants
# CMAP interactions
# Number of terms: 0
# NOTE: CMAP interations skipped
#
# Non-bonded interactions
# Number of exclusions: 34709
# Lennard-Jones terms
# Cutoff distance: 9.000 A
# Switching distance: 7.500 A
# Coulombic (PME) term
# Ewald tolerance: 0.000500
# No NBFIX
# No implicit solvent
#
# Constraining hydrogen (X-H) bonds
# Number of constrained bonds: 15267
# Making water molecules rigid
# Number of water molecules: 7023
# Number of constraints: 22290
#
# Repartitioning hydrogen atom mass
# New hydrogen mass: 4.032 au
# Number of hydrogen atoms: 15267
#
# Creating simulation system
# Number of particles: 23558
# Number of degrees of freedom 48381
# Periodic box size: 62.230 62.230 62.230 A
#
# Using Langevin integrator (with temperature control)
# Thermostat target temperature: 298.15 K
# Thermostat friction coeficient: 1.00 ps^-1
#


Slotfolder 40 zip: https://filebin.net/jfv8ec4c6q8uszuw/Slot_40.zip?t=tvn13kdj

On failed host slot folder are empty. Boinc wipe at crash or application never add files to slotfolder. I could not grab progress.log.
Task failed after in 1 sec is impossible to grab and it doesnt store to upload so it wiped out.
Getting error this on older os 16.04 with GTX970 driver: 418.56. Same drivers hand out valid task on later system 18.10.
So it looks to be on system not driver version. This compile issue still exist on latest application but only effect old system.

<core_client_version>7.6.31</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)
</message>
<stderr_txt>
14:22:07 (102554): wrapper (7.7.26016): starting
14:22:07 (102554): wrapper (7.7.26016): starting
14:22:07 (102554): wrapper: running acemd3 (--boinc input --device 0)
# Engine failed: Error launching CUDA compiler: 32512
sh: 1: : Permission denied

14:22:08 (102554): acemd3 exited; CPU time 0.132000
14:22:08 (102554): app exit status: 0x1
14:22:08 (102554): called boinc_finish(195)

</stderr_txt>
]]>

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52003 - Posted: 5 Jun 2019 | 12:15:25 UTC - in response to Message 52002.

Aehm, to clarify: I see the process.log file of successful tasks only.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52004 - Posted: 5 Jun 2019 | 13:38:42 UTC - in response to Message 52003.
Last modified: 5 Jun 2019 | 13:39:05 UTC

If anybody is so inclined, can they try to run the boinc client manually with the --exit_after_finish flag, so the slot directory is preserved on failure?


Thanks

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52005 - Posted: 5 Jun 2019 | 15:31:07 UTC - in response to Message 51996.


If somebody can post or upload the three components of a test workunit specification:

* <app_version>
* <workunit>
* <result>

all from client_state.xml - make sure you get the right (latest) version of <app_version>, there will be several of them - I can proofread that there are no bugs in the BOINC deployment of the app files. This one could be a problem with the version renaming or copying.


There is information in <app_version> but nothing for <workunit> or <result

<app_version>
<app_name>acemd3</app_name>
<version_num>202</version_num>
<platform>x86_64-pc-linux-gnu</platform>
<avg_ncpus>0.987442</avg_ncpus>
<flops>28742507251613.187500</flops>
<plan_class>cuda80</plan_class>
<api_version>7.7.0</api_version>
<file_ref>
<file_name>wrapper_26198_x86_64-pc-linux-gnu</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>acemd3.e72153abf98cb1fcd0f05fc443818dfc</file_name>
<open_name>acemd3</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>job.xml.1245cc127550a015dcc9b3e1c2c84e13</file_name>
<open_name>job.xml</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMMOpenCL.so.6a31fa1ff5ae3a26ea64f2abfb5a66cc</file_name>
<open_name>libOpenMMOpenCL.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenCL.so.1.0.0.43d4300566ce59d77e0fa316f8ee5b02</file_name>
<open_name>libOpenCL.so.1</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libgomp.so.1.0.0.efdf718669edc7fff00e0c5f7f0b8791</file_name>
<open_name>libgomp.so.1.0.0</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMM.so.5406dfd716045d08ad6369e2399a98e2</file_name>
<open_name>libOpenMM.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMMCUDA.so.8867021fdc0daf2e39f1b7228ece45af</file_name>
<open_name>libOpenMMCUDA.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libcudart.so.8.0.61.af43be839e6366e731accc514633bd1f</file_name>
<open_name>libcudart.so.8.0</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libfftw3f_threads.so.3.4.4.dd0c6fcfa550371acf730db2d9d5a270</file_name>
<open_name>libfftw3f_threads.so.3</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libgcc_s.so.1.d7f787a9bf6c3633eaebb9015c6d9044</file_name>
<open_name>libgcc_s.so.1</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libnvrtc-builtins.so.8.0.61.684f2f1d9f0934bcce91e77b69e17ec7</file_name>
<open_name>libnvrtc-builtins.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMMCudaCompiler.so.aaed781fe4caa9d1099312d458a9b902</file_name>
<open_name>libOpenMMCudaCompiler.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libfftw3f.so.3.4.4.a4580ddf9efebaad56fab49847a8c899</file_name>
<open_name>libfftw3f.so.3</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMMPME.so.3208e45e71567824e8390ab1c79c6a66</file_name>
<open_name>libOpenMMPME.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libnvrtc.so.8.0.61.ea3bff3d91151ddf671a0a1491635b57</file_name>
<open_name>libnvrtc.so.8.0</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libOpenMMCPU.so.19849b4ff1cf4d33f75d9433b4d5c6bb</file_name>
<open_name>libOpenMMCPU.so</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libcufft.so.8.0.61.889be25939bec6f9a2abec790772d28f</file_name>
<open_name>libcufft.so.8.0</open_name>
<copy_file/>
</file_ref>
<file_ref>
<file_name>libstdc++.so.6.0.25.e344f48acfbd4f5abbf99b2c75cc5e50</file_name>
<open_name>libstdc++.so.6</open_name>
<copy_file/>
</file_ref>
<coproc>
<type>NVIDIA</type>
<count>1.000000</count>
</coproc>
<gpu_ram>512.000000</gpu_ram>
<dont_throttle/>
</app_version>

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52006 - Posted: 5 Jun 2019 | 17:12:58 UTC - in response to Message 52005.

Thanks. The context was

libcudart.so.8.0 => not found
libcufft.so.8.0 => not found

So right off the bat, the app had no chance of succeeding when it can't find its own downloaded libcudart.so.8.0 and libcufft.so.8.0 files in the project directory.

Both files will be copied with the correct names into the slot directory, although they will be downloaded under a different (versioned) name. So a static test outside the running BOINC environment will fail to find them, but a dynamic test during running should be OK. I don't think this one will take us much further.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52007 - Posted: 5 Jun 2019 | 17:44:05 UTC - in response to Message 52004.

If anybody is so inclined, can they try to run the boinc client manually with the --exit_after_finish flag, so the slot directory is preserved on failure?


Thanks

I just tried the manual run of the client with the suggested --exit_after_finish parameter but it did not preserve the slot contents.

05-Jun-2019 10:38:59 [GPUGRID] Starting task a3-TONI_TEST9-2-3-RND2847_2
05-Jun-2019 10:39:03 [GPUGRID] [sched_op] Deferring communication for 00:06:31
05-Jun-2019 10:39:03 [GPUGRID] [sched_op] Reason: Unrecoverable error for task a3-TONI_TEST9-2-3-RND2847_2
mv: cannot stat 'slots/8/output.coor': No such file or directory
mv: cannot stat 'slots/8/output.vel': No such file or directory
mv: cannot stat 'slots/8/output.idx': No such file or directory
mv: cannot stat 'slots/8/output.dcd': No such file or directory
mv: cannot stat 'slots/8/COLVAR': No such file or directory
mv: cannot stat 'slots/8/log.file': No such file or directory
mv: cannot stat 'slots/8/HILLS': No such file or directory
mv: cannot stat 'slots/8/output.vel.dcd': No such file or directory
mv: cannot stat 'slots/8/output.xtc': No such file or directory
mv: cannot stat 'slots/8/output.xsc': No such file or directory
mv: cannot stat 'slots/8/output.xstfile': No such file or directory
05-Jun-2019 10:39:03 [GPUGRID] Computation for task a3-TONI_TEST9-2-3-RND2847_2 finished
05-Jun-2019 10:39:03 [GPUGRID] Output file a3-TONI_TEST9-2-3-RND2847_2_9 for task a3-TONI_TEST9-2-3-RND2847_2 absent
05-Jun-2019 10:39:05 [GPUGRID] Started upload of a3-TONI_TEST9-2-3-RND2847_2_0
05-Jun-2019 10:39:07 [GPUGRID] Finished upload of a3-TONI_TEST9-2-3-RND2847_2_0
^C05-Jun-2019 10:39:11 [---] Received signal 2
05-Jun-2019 10:39:11 [---] Exiting
keith@Darksider:~/Desktop/BOINC$

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52008 - Posted: 5 Jun 2019 | 18:17:09 UTC

I thought that all the tasks I had downloaded had failed but I see I have one host that has been successfully processing the acemd3 tasks.

But I just aborted the cache thinking all the hosts were unsuccessful. Oops.

Now to try and compare what is different about that machine compared to the rest.

I believe the difference is that at one time I had installed the cuda toolkit on that host and then removed it long in the past.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52009 - Posted: 5 Jun 2019 | 18:33:55 UTC

Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards.

Could this be the reason that app fails?

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52010 - Posted: 5 Jun 2019 | 18:49:55 UTC

I see that there is a new version 2.02, which I just tried on my GTX 1070 (Ubuntu 16.04.6). I just use the Ubuntu repository driver, which is 396.54 (proprietary), without any toolbox that I know of.

It failed immediately.

GPUGRID 2.02 New version of ACEMD (cuda80) a67-TONI_TEST8-2-3-RND3156_0 00:00:03 (-) 0.00 100.000 - 6/10/2019 2:42:16 PM 0.985C + 1NV Computation error 0.00 MB i7-4790-G

http://www.gpugrid.net/results.php?hostid=482386

Explain to me (simply) what I should check, and I will do it.

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52011 - Posted: 5 Jun 2019 | 19:16:03 UTC - in response to Message 52009.

Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards.

Could this be the reason that app fails?


I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards.

@jim1348, I get the same error on one of my machines with dual GTX 1080 cards.

mdxi
Send message
Joined: 11 Feb 18
Posts: 1
Credit: 8,292,820
RAC: 0
Level
Ser
Scientific publications
wat
Message 52012 - Posted: 5 Jun 2019 | 19:27:21 UTC

I am also seeing failures due to the acemd binary not finding some libs:


[root@node02 www.gpugrid.net]# ldd acemd.919-80.bin
linux-vdso.so.1 (0x00007fff6a317000)
libcuda.so.1 => /usr/lib/libcuda.so.1 (0x00007f740db2b000)
libcudart.so.8.0 => not found
libcufft.so.8.0 => not found
libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f740db26000)
libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f740db05000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f740d975000)
libm.so.6 => /usr/lib/libm.so.6 (0x00007f740d82d000)
libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f740d813000)
libc.so.6 => /usr/lib/libc.so.6 (0x00007f740d64e000)
librt.so.1 => /usr/lib/librt.so.1 (0x00007f740d644000)
libnvidia-fatbinaryloader.so.430.14 => /usr/lib/libnvidia-fatbinaryloader.so.430.14 (0x00007f740d3f6000)
/lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f740eca8000)


This is despite the libs being right there in the directory with the binary:


[root@node02 www.gpugrid.net]# ls -l libcu*
-rwxr-xr-x 1 boinc boinc 394472 May 17 18:34 libcudart.so.8.0
-rwxr-xr-x 1 boinc boinc 426680 Jun 4 18:26 libcudart.so.8.0.61.af43be839e6366e731accc514633bd1f
-rwxr-xr-x 1 boinc boinc 146745600 May 17 18:35 libcufft.so.8.0
-rwxr-xr-x 1 boinc boinc 146772424 Jun 4 18:28 libcufft.so.8.0.61.889be25939bec6f9a2abec790772d28f


This machine is running Arch linux. Boinc was compiled locally, from the github source. The NVIDIA drivers are from Arch, with no modifications.


[root@node02 www.gpugrid.net]# pacman -Ss nvidia | grep installed
extra/nvidia 430.14-6 [installed]
extra/nvidia-utils 430.14-1 [installed]
extra/opencl-nvidia 430.14-1 [installed]


This machine is currently successfully crunching GPGPU WUs for Primegrid and Einstein@Home, so its configuration is known good.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52013 - Posted: 5 Jun 2019 | 19:34:07 UTC - in response to Message 52011.

Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards.

Could this be the reason that app fails?


I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards.

@jim1348, I get the same error on one of my machines with dual GTX 1080 cards.

OK, that is a very different comprehension that I have for the wrapper app. I thought it was to allow use of the Turing cards.

I guess I should put the gpu_exclude back in play for the hosts that failed the tasks.

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52014 - Posted: 5 Jun 2019 | 19:55:12 UTC - in response to Message 52013.
Last modified: 5 Jun 2019 | 19:55:59 UTC

Anybody successfully run the new acemd3 app on a Turing card yet? I just realized that I still had a gpu_exclude for my Turing card on the host that had been successfully processing tasks. I somehow had skipped over removing the exclusion from that machine while I had done so on all the other hosts with Turing cards.

Could this be the reason that app fails?


I think the plan is to get a stable acemd3 app running on legacy hardware and then release a beta for turing cards.

@jim1348, I get the same error on one of my machines with dual GTX 1080 cards.

OK, that is a very different comprehension that I have for the wrapper app. I thought it was to allow use of the Turing cards.

I guess I should put the gpu_exclude back in play for the hosts that failed the tasks.


See this post: http://www.gpugrid.net/forum_thread.php?id=4927&nowrap=true#51934

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52015 - Posted: 5 Jun 2019 | 20:04:13 UTC - in response to Message 52014.
Last modified: 5 Jun 2019 | 20:50:16 UTC

Thanks for the edification.

[Edit]This is the error for trying to run on a Turing card.

<core_client_version>7.15.0</core_client_version>
<![CDATA[
<message>
process exited with code 195 (0xc3, -61)</message>
<stderr_txt>
12:04:16 (22587): wrapper (7.7.26016): starting
12:04:16 (22587): wrapper (7.7.26016): starting
12:04:16 (22587): wrapper: running acemd3 (--boinc input --device 0)
# Engine failed: Error compiling program: nvrtc: error: invalid value for --gpu-architecture (-arch)

12:04:17 (22587): acemd3 exited; CPU time 0.164594
12:04:17 (22587): app exit status: 0x1
12:04:17 (22587): called boinc_finish(195)

</stderr_txt>
]]>

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 966
Level
Lys
Scientific publications
wat
Message 52016 - Posted: 5 Jun 2019 | 22:20:05 UTC - in response to Message 51990.

I completed one while 5 others had errors.
https://www.gpugrid.net/workunit.php?wuid=16520276

nvcc -V results
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85


Same result on another PC but all tasks error on a 1080Ti
https://www.gpugrid.net/show_host_detail.php?hostid=477247
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52017 - Posted: 5 Jun 2019 | 23:11:10 UTC

I think I should take one of the hosts that fail the app and install the cuda toolkit and see if it changes anything.

I know that Toni said the toolkit is unnecessary supposedly, but it might show something.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 265
Credit: 647,845,139
RAC: 966
Level
Lys
Scientific publications
wat
Message 52018 - Posted: 6 Jun 2019 | 1:38:53 UTC

It won't hurt. One PC of mine with 1070/1070Ti works and another with 1080Ti doesn't. Both have the same nvcc -V results.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52019 - Posted: 6 Jun 2019 | 8:59:43 UTC - in response to Message 52018.
Last modified: 6 Jun 2019 | 9:00:22 UTC

Misc answers:

- No turing support YET. If the app works, there will be many more possibilities
- I don't think installing the cuda toolkit will change anything, but who knows... but please don't break your systems (e.g. tweaking PATH) to install it.
- I'm fairly positive about library copying/renaming being ok.
- I'll be updating the app soon. Seems some system-specific non-reproducible behavior.
- In any case, updated drivers won't hurt.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52020 - Posted: 6 Jun 2019 | 9:31:43 UTC - in response to Message 52019.
Last modified: 6 Jun 2019 | 9:32:27 UTC

It seems to be working. At least several previously-failing hosts switched to success.

Also possibly working (please check)

* progress bar
* pause/resume

Will be un-marking beta soon (so more hosts get it).

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52021 - Posted: 6 Jun 2019 | 10:00:53 UTC

v2.03 works on both my computers!

What was the change between v2.02 and v2.03 if you don't mind me asking. :)

The progress bar works.
I didn't try pause/resume.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52022 - Posted: 6 Jun 2019 | 10:10:39 UTC - in response to Message 52021.
Last modified: 6 Jun 2019 | 10:11:07 UTC

It was a cryptic bug in the order loading shared libraries, or something like that. Otherwise unexplainably system-dependent.

I see VERY few failures now. The new app will be a huge step forward on several aspects, not least maintainability. We'll be transitioning gradually.

PurpleHat
Send message
Joined: 4 Jun 19
Posts: 3
Credit: 11,999,700
RAC: 15
Level
Pro
Scientific publications
wat
Message 52023 - Posted: 6 Jun 2019 | 13:28:45 UTC - in response to Message 52022.

My host that failed on 2.02 now works with 2.03.

Great work

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52024 - Posted: 6 Jun 2019 | 13:29:42 UTC - in response to Message 52023.

App 2.04 should support cuda10, if the scheduler collaborates.
Expect hiccups...

T

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52025 - Posted: 6 Jun 2019 | 13:32:31 UTC

Well done Toni!

Much appreciated!

DRSMT
Send message
Joined: 23 Feb 17
Posts: 20
Credit: 618,195,847
RAC: 45,914
Level
Lys
Scientific publications
wat
Message 52026 - Posted: 6 Jun 2019 | 13:38:03 UTC

Toni Tests looking good on my machines, even on the 2080Ti!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52027 - Posted: 6 Jun 2019 | 15:45:25 UTC

Two hosts have received 2.04 and processed on the 2080 along with a 1070 successfully.

The other host is still on 2.03 I assume because I had the gpu_exclude running on it for the 2080 card. Removing the gpu_exclude for all hosts now.

It looks like we have a winner.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52028 - Posted: 6 Jun 2019 | 17:00:25 UTC - in response to Message 52024.

App 2.04 should support cuda10, if the scheduler collaborates.

Would it help the performance of a GTX 1070 (Ubuntu 16.04) to upgrade from CUDA 9.2 to CUDA 10 drivers?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52029 - Posted: 6 Jun 2019 | 18:18:40 UTC - in response to Message 52028.
Last modified: 6 Jun 2019 | 18:19:20 UTC

From what we've seen over at Seti with our special app, the CUDA9 application running the 410 series driver is fastest for Pascal cards. Of course if you have a Turing card, you are forced into using the 418 series drivers with CUDA10.

Not sure if the same observation applies here at GPUGrid with its apps.

My guess and suggestion would be to just stand pat until somebody runs both the 2.03 and 2.04 apps against the same tasks in the benchmark utilities and proves otherwise.

I assume all the kit 'n kaboodle of files needed for the wrapper would have to be put into the benchmark tool. Not as simple as testing on Seti tasks.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52030 - Posted: 6 Jun 2019 | 18:25:19 UTC - in response to Message 52029.

Yes, it is a bit premature. I will await for the final app. Thanks for the input.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52032 - Posted: 6 Jun 2019 | 18:33:15 UTC - in response to Message 52030.

Yes, it is a bit premature. I will await for the final app. Thanks for the input.

Generally for crunching, a newer app is never necessary. The normal reason for updated drivers is to add ever more compatibility with newer games. And the drivers get ever more bloated and slower.

Only if new architecture arrives requiring newer drivers that can drive them is it really necessary to update drivers.

The low level math functions of the drivers have been pretty much static for years.

Of course the silicon gets ever faster which is where the most improvement occurs.

On one host with both a 2080 and 1070, the 2080 ran the test WU in 125 seconds versus 230 seconds with the 1070. Both on the 2.04 app.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,828,447,169
RAC: 2,498,715
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52034 - Posted: 6 Jun 2019 | 20:43:22 UTC - in response to Message 52032.
Last modified: 6 Jun 2019 | 20:44:14 UTC

On one host with both a 2080 and 1070, the 2080 ran the test WU in 125 seconds versus 230 seconds with the 1070. Both on the 2.04 app.

I see 93 seconds on my RTX 2080 Ti (v2.04)
(The v2.03 had failed before on my host btw)

klepel
Send message
Joined: 23 Dec 09
Posts: 161
Credit: 2,817,832,438
RAC: 582,906
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52053 - Posted: 7 Jun 2019 | 18:01:10 UTC

I might be a little bit confused: Any time line when real production task of the 2.04 app will be supplied to Linux hosts? In the next days, weeks, months? I changed my BOINC settings to accommodate and give GPUGRID tasks the preference, however if there are no tasks available in the near future, I would like to change the settings again to better serve other projects.

Toni please answer as soon and as specific as possible. Thanks!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52054 - Posted: 7 Jun 2019 | 19:12:47 UTC

We'll be transitioning gradually.

Was Toni's last response. I would not expect the project to be as responsive as you request.
My suggestion is to move on to your other projects and occasionally check back in here for any news of non-beta apps and newly generated work units.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52062 - Posted: 8 Jun 2019 | 14:58:15 UTC - in response to Message 52054.
Last modified: 8 Jun 2019 | 15:05:08 UTC

To summarize: acemd3 cuda80 and cuda100 are very satisfactory for linux. Very few failures, fast, blazing on rtx 2080. The win app will need work though.

There is definitely NO NEED TO install the CUDA toolkit for GPUGRID. It's not required, so don't complicate your set-up unnecessarily.

You just will want to have not-too-old drivers, however.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52063 - Posted: 8 Jun 2019 | 16:24:34 UTC - in response to Message 52062.

Toni, did you discover a cutoff limit for driver version you should not go below or you will increase your errors?

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52066 - Posted: 8 Jun 2019 | 18:06:09 UTC

The most recent driver in the linux mint 19.1 repository is 390.116 (cuda 9).

You can get 4xx.xx proprietary drivers that support cuda 10 at this site by adding the PPA to your system. https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa

Or you can download and install the drivers from nvidia directly. The latter is a little more adventurous for linux users.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52067 - Posted: 8 Jun 2019 | 18:45:40 UTC - in response to Message 52066.

I was mainly asking if it was necessary to run the latest drivers like the 430 series. You have to run at least the 418 series to be compatible with Turing.

I have used the graphics-drivers ppa for years now. I always recommend that method for installing Nvidia drivers to Linux noobs as easiest.

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 3,316,045
Level
Tyr
Scientific publications
wat
Message 52071 - Posted: 8 Jun 2019 | 21:25:41 UTC - in response to Message 52066.

The most recent driver in the linux mint 19.1 repository is 390.116 (cuda 9).
My Linux Mint 19.1 computers have all been using CUDA 10 for a long time and were installed using the Driver Manager:

aurum@Rig-27:~$ nvidia-smi
Sat Jun 8 14:16:29 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.14 Driver Version: 430.14 CUDA Version: 10.2

Side note, I know just enough Linux to be dangerous. Someone once told me to install the CUDA Toolkit to get CUDA 10 and it caused problems with Linux Mint 19. This is anecdotal but I think my computers generally work better without CUDA Toolkit installed.
____________

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52072 - Posted: 8 Jun 2019 | 21:33:07 UTC

I agree, best to avoid the toolkit unless you need it to develop apps. They recently changed the way the toolkit handles the video drivers. You can get yourself in trouble with the toolkit installing its version of the graphics drivers alongside your runtime version of CUDA with the usual graphics drivers installation.

I have been reading this document which I just discovered.

https://docs.nvidia.com/deploy/cuda-compatibility/

Lots of good info about driver versions and CUDA version support among the various generations of cards.

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 507
Credit: 4,343,768,551
RAC: 3,059,220
Level
Arg
Scientific publications
watwatwat
Message 52073 - Posted: 9 Jun 2019 | 2:33:38 UTC - in response to Message 52062.

To summarize: acemd3 cuda80 and cuda100 are very satisfactory for linux. Very few failures, fast, blazing on rtx 2080. The win app will need work though.

There is definitely NO NEED TO install the CUDA toolkit for GPUGRID. It's not required, so don't complicate your set-up unnecessarily.

You just will want to have not-too-old drivers, however.

When can we expect the linux version of acemd3 released to the masses?

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52074 - Posted: 9 Jun 2019 | 2:46:07 UTC - in response to Message 52073.

My guess not until Toni can figure out how to get the acemd3 Windows app working.

Jim1348
Send message
Joined: 28 Jul 12
Posts: 695
Credit: 1,371,992,468
RAC: 3
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwat
Message 52075 - Posted: 9 Jun 2019 | 12:35:58 UTC

I hope the CUDA 9.2 drivers are good enough. When I tried to update to the latest CUDA 10.x drivers for my GTX 1070 from the Nvidia repository (Ubuntu 16.04.6), it blanked out the desktop. BOINC still worked, but I had to reload the OS to get the desktop back. I suppose something in the X.Org server (if that is what they use) does not like the latest drivers, though the Linux experts will need to fix it.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 70
Credit: 1,003,056,251
RAC: 33,337
Level
Met
Scientific publications
watwatwatwatwat
Message 52076 - Posted: 9 Jun 2019 | 15:57:45 UTC - in response to Message 52075.

I hope the CUDA 9.2 drivers are good enough. When I tried to update to the latest CUDA 10.x drivers for my GTX 1070 from the Nvidia repository (Ubuntu 16.04.6), it blanked out the desktop. BOINC still worked, but I had to reload the OS to get the desktop back. I suppose something in the X.Org server (if that is what they use) does not like the latest drivers, though the Linux experts will need to fix it.


I have nVidia proprietary drivers ver 430.14 with cuda ver 10.2 installed for three GTX-1060's working fine on my two FX-8350 Fedora 30 systems. Don't know where Ubuntu 16.x falls in the latest distro release line but F30 is the latest release (last month release date) and all is well.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52077 - Posted: 9 Jun 2019 | 21:56:08 UTC - in response to Message 52075.

I still think using the graphics-drivers ppa is the easiest solution for obtaining and installing the drivers. As long as you have a debian distro newer than 14, you are good to go with the drivers right up to the current 430 series.

Just add the ppa to your sources and make your choice of which driver you want. Either install from the command line or install from the Software Updater.

https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 436
Credit: 499,429,346
RAC: 280,109
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52078 - Posted: 10 Jun 2019 | 4:59:16 UTC

I have a task that appears to be misbehaving.



Application
Long runs (8-12 hours on fastest card) 9.22 (cuda80)
Name
e82s77_e68s28p1f217-PABLO_v3Q9UM73_MOR_14_IDP-0-2-RND6181
State
Running
Received
6/9/2019 3:14:13 AM
Report deadline
6/14/2019 3:14:14 AM
Resources
0.991 CPUs + 1 NVIDIA GPU
Estimated computation size
5,000,000 GFLOPs
CPU time
00:54:04
CPU time since checkpoint
00:00:01
Elapsed time
20:13:55
Estimated time remaining
---
Fraction done
73.853%
Virtual memory size
1.11 GB
Working set size
359.47 MB
Directory
slots/0
Process ID
3684
Progress rate
3.600% per hour
Executable
acemd-922-80.exe

The progess percentage has been frozen, or at least nearly frozen, for at least the last 10 hours.

The estimated remaining time has been --- for at least the last 10 hours.

How long should I let it run before aborting it? Is there anything else I should do to make it run properly?

http://www.gpugrid.net/workunit.php?wuid=16519044

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52079 - Posted: 10 Jun 2019 | 8:55:56 UTC - in response to Message 52078.

Try to suspend and restart it. Or kill it.

Profile robertmiles
Send message
Joined: 16 Apr 09
Posts: 436
Credit: 499,429,346
RAC: 280,109
Level
Gln
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52082 - Posted: 11 Jun 2019 | 0:40:45 UTC - in response to Message 52079.

Try to suspend and restart it. Or kill it.

Suspending and restarting it set it back many hours, but only let it have many more hours to show the same problem.

Aborting it worked, though.

Azmodes
Send message
Joined: 7 Jan 17
Posts: 10
Credit: 602,937,915
RAC: 844,065
Level
Lys
Scientific publications
wat
Message 52085 - Posted: 15 Jun 2019 | 12:07:17 UTC

Any plans of sending out more beta tasks for Linux? I'm eager to try it on my 2070 and 1660 Ti.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52086 - Posted: 15 Jun 2019 | 16:46:13 UTC

This thread has been very quiet since the last posting by Toni. I should hope that means he is quietly working hard on polishing off the new apps for both Linux and Windows. And soon that new work with new apps will become available.

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52090 - Posted: 17 Jun 2019 | 13:18:06 UTC - in response to Message 52086.

Yes. The problem is now the windows app and its dependencies. We also need to setup some hardware to test it.

By the way, any experience with W10 bootable USBs for testing purposes?

T

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 52091 - Posted: 17 Jun 2019 | 13:22:07 UTC - in response to Message 52090.

to make a bootable win10 disk i use wintousb software, with win10 iso image.
then unplug internal disk, setup the bios in a proper way to boot from usb disk and you go.

or you can use ultraiso, write image disk and then usb hdd+ option to install iso-image on disk or usb

Toni
Volunteer moderator
Project administrator
Project developer
Project scientist
Send message
Joined: 9 Dec 08
Posts: 821
Credit: 4,294,282
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 52092 - Posted: 17 Jun 2019 | 13:56:41 UTC - in response to Message 52091.

Thanks for the tips!

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2686
Credit: 1,164,652,899
RAC: 408,544
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52097 - Posted: 18 Jun 2019 | 21:58:30 UTC

Erik was referring to Win 10 installation from USB. Supposedly Win 10 can also boot from a stick - however, this requires certified sticks. I don't have any further experience with that.

MrS
____________
Scanning for our furry friends since Jan 2002

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52100 - Posted: 19 Jun 2019 | 2:46:05 UTC
Last modified: 19 Jun 2019 | 2:47:09 UTC

Below is a link to a step by step guide with screenshots and links to the appropriate software to make the USB.
You will need an educational or enterprise license of Win10 to run it.

https://au.pcmag.com/windows-10-1/46896/how-to-run-windows-10-from-a-usb-drive

biodoc
Send message
Joined: 26 Aug 08
Posts: 160
Credit: 1,405,920,847
RAC: 396
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52104 - Posted: 19 Jun 2019 | 10:16:28 UTC
Last modified: 19 Jun 2019 | 10:18:44 UTC

I would consider purchasing a 250 GB SSD drive and setting up a Win/linux dual boot option using grub on one of your linux workstations. You'd still have a buy a license for Win10 pro. Building a "cheap" dedicated Win10 system would be even better. I would think Win10 on a USB stick would a be painful experience (slow).

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 52105 - Posted: 19 Jun 2019 | 10:31:18 UTC - in response to Message 52097.

Erik was referring to Win 10 installation from USB. Supposedly Win 10 can also boot from a stick - however, this requires certified sticks. I don't have any further experience with that.

MrS

nope, i was referring about how to make bootable win10 usb or portable disk. i was not talking about bootable usb to install win10, like you are thinking.

my suggestion was: already installed win10 on bootable usb or ssd (or whatever) using wintousb or ultraiso

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52106 - Posted: 19 Jun 2019 | 11:29:30 UTC

I would think Win10 on a USB stick would a be painful experience (slow).

Agreed.

consider purchasing a 250 GB SSD drive and setting up a Win/linux dual boot option

+1
Due to the Win10 licensing model, the USB stick would not be portable between host PCs. It would go to setup mode every time it is plugged into a different host PC, and another Win 10 license would be needed.

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 52111 - Posted: 19 Jun 2019 | 18:33:27 UTC - in response to Message 52106.

I would think Win10 on a USB stick would a be painful experience (slow).

Agreed.

consider purchasing a 250 GB SSD drive and setting up a Win/linux dual boot option

+1
Due to the Win10 licensing model, the USB stick would not be portable between host PCs. It would go to setup mode every time it is plugged into a different host PC, and another Win 10 license would be needed.
it is portabke between all host machines. my experience with enterprise and education version wwin10 is portable between intel and amd platforms. only....when you switch between platforms, you need to activate win10. but be honest....who cares about activating when you are running it for test purposes and temporarly

ExtraTerrestrial Apes
Volunteer moderator
Volunteer tester
Avatar
Send message
Joined: 17 Aug 08
Posts: 2686
Credit: 1,164,652,899
RAC: 408,544
Level
Met
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52112 - Posted: 19 Jun 2019 | 20:47:33 UTC

Erik: thanks, I didn't even know this was possible.

I would think Win10 on a USB stick would a be painful experience (slow).

That's what the USB stick certification from MS is about: only allowing drives which are not painfully slow.

MrS
____________
Scanning for our furry friends since Jan 2002

rod4x4
Send message
Joined: 4 Aug 14
Posts: 95
Credit: 1,596,312,969
RAC: 1,347,351
Level
His
Scientific publications
watwatwatwatwatwatwat
Message 52114 - Posted: 19 Jun 2019 | 23:38:15 UTC - in response to Message 52111.
Last modified: 20 Jun 2019 | 0:20:15 UTC

it is portabke between all host machines.

Ahh... Nice to know.

It seems Microsoft don't insist on licensing each host individually, but rely on the PC being covered by Software Assurance.

This is for Windows 8.1, I assume is the same for Win10.
https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-8.1-and-8/jj592680(v=ws.11)
See heading How is Windows To Go licensed?

Profile JStateson
Avatar
Send message
Joined: 31 Oct 08
Posts: 154
Credit: 2,552,519,628
RAC: 2,615,305
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52131 - Posted: 23 Jun 2019 | 14:54:31 UTC
Last modified: 23 Jun 2019 | 15:11:49 UTC

I am not sure if this was an anomaly or not but I did get a "usb windows 10" stick to boot and be activated.

Background: I bought several used HPZ-400 motherboards on eBay. They are all licensed for windows 7 due to having an SLIC2.1 in the bios. The activation code for windows 7 pro is the same generic HP windows 7 pro key so all these systems can legally run win7pro as the license is bound to the system and the key is listed all over the internet.

I installed windows 7 on 2 mombos and did that still free upgrade to 10. According to one of the Microsoft MVP'ers it was not necessary to activate on windows 7, I could have put in 10 and simply used my windows 7 HP key. While I did not try that I did install 7 pro and upgraded to 10 pro using a USB stick on mombo #2. I then put that USB stick in the 3rd HPZ-400 mombo and was able to activate it. However, I had named the 3rd system the same name as the 2nd and I had registered the 2nd one on Microsoft's web site as one of my devices. Later I changed the name of that 2nd one and all three were working just fine wind10pro.

I suspect if the motherboard has an SLIC2.1 license then a USB win10 stick can boot and be activated using the generic win7pro license from the original vendor (dell, hp, etc)

Just a guess.

STARBASEn
Avatar
Send message
Joined: 17 Feb 09
Posts: 70
Credit: 1,003,056,251
RAC: 33,337
Level
Met
Scientific publications
watwatwatwatwat
Message 52161 - Posted: 29 Jun 2019 | 17:59:19 UTC

Any word on the status of acemd3? Last I saw there was a hold up due to a Windows (10?) issue.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52162 - Posted: 29 Jun 2019 | 19:12:27 UTC

Haven't heard a peep either. Still waiting on Windows app I guess.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52198 - Posted: 6 Jul 2019 | 7:20:05 UTC

Toni is trying out a new acemd3 application. 2.04 works as well as the previous 2.03 version. Still having issues getting the Windows app to work consistently.

He has a new thread. https://www.gpugrid.net/forum_thread.php?id=4955

Diplomat
Send message
Joined: 1 Sep 10
Posts: 9
Credit: 204,368,400
RAC: 140
Level
Leu
Scientific publications
wat
Message 52249 - Posted: 14 Jul 2019 | 2:42:52 UTC

We are still dreaming of Linux app :з

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,828,447,169
RAC: 2,498,715
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52261 - Posted: 14 Jul 2019 | 16:44:44 UTC - in response to Message 52249.

We are still dreaming of Linux app :з
My RTX 2080Ti's eagerly waiting for the acemd3 Linux app.
My ranking in SETI@home has risen by 43,140 in the meantime. (51,148 -> 8,008)

PurpleHat
Send message
Joined: 4 Jun 19
Posts: 3
Credit: 11,999,700
RAC: 15
Level
Pro
Scientific publications
wat
Message 52270 - Posted: 15 Jul 2019 | 20:55:06 UTC

We have an application working to new application but forced to wait on win. Not fair when our old application expired.

Backup project very happy but my goal with gpus are lost.
(linux user)

Toby Broom
Send message
Joined: 11 Dec 08
Posts: 18
Credit: 197,494,943
RAC: 330,412
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52614 - Posted: 11 Sep 2019 | 16:10:27 UTC

Hi,

Will the new version support Titan V CUDA 7.0?

Thanks

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52616 - Posted: 11 Sep 2019 | 18:14:37 UTC - in response to Message 52614.

Hi,

Will the new version support Titan V CUDA 7.0?

Thanks

I would assume so.

Toby Broom
Send message
Joined: 11 Dec 08
Posts: 18
Credit: 197,494,943
RAC: 330,412
Level
Ile
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwat
Message 52618 - Posted: 11 Sep 2019 | 20:18:23 UTC
Last modified: 11 Sep 2019 | 20:18:42 UTC

I setup for some acemd3 if any WU drop then I'll see what happens

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 289
Credit: 238,117,713
RAC: 123,455
Level
Leu
Scientific publications
wat
Message 52619 - Posted: 11 Sep 2019 | 21:03:52 UTC - in response to Message 52618.

I setup for some acemd3 if any WU drop then I'll see what happens

The wrapper app is able to handle CC capabilities of up to 7.5 for Turing cards. So your Titan V should be good to go

Problem is getting some of the test tasks. I have been hammering the project since the announcement of the new acemd3 apps and I have yet to receive one.

I hope that Toni releases some more work so I can get back to crunching for this project.

I had no issues with the test versions back in July. Want some more work to get the latest apps.

gFreezer
Send message
Joined: 29 Nov 17
Posts: 4
Credit: 54,878,475
RAC: 513,968
Level
Thr
Scientific publications
wat
Message 52817 - Posted: 9 Oct 2019 | 10:23:14 UTC
Last modified: 9 Oct 2019 | 10:24:37 UTC

I'm really dissatisfied with the new acemd3 tasks for the following reasons:


  • The app always spins a whole CPU thread. With the old app, you could configure this behaviour using the SWAN_SYNC environment variable. It made almost no difference with the old app for me (like half a percent in GPU utilization), so I would prefer the app to leave some cycles for a CPU-only task instead.

  • The wrapper spawns the acemd3 process with the lowest priority possible (nice 19). If you configure BOINC to run GPU tasks with a higher priority, only the wrapper has higher priority this way. The wrapper should instead spawn the acemd3 process with the same nice value that it has itself.

  • The credit you get for these work units seems to be way off, I only get little more than half of the credit per hour that I get with acemd2 work units

erik
Send message
Joined: 30 Apr 19
Posts: 54
Credit: 168,971,875
RAC: 714
Level
Ile
Scientific publications
wat
Message 52818 - Posted: 9 Oct 2019 | 10:31:20 UTC

i have build 4 rigs (with 24 gpu cards, gtx 1070, gtx 1080, gtx 1080 ti and rtx cards, all mixed together) special for gpugrid. but after summer vacation i have impression that this project is nearly dead. because after summer i dont get any WU to crunch (2 rigs with 12* gtx 1070 only) while before summer no problems at all.

so, i am leaving gpugrid, going to F@H (folding @ home) with my hardware.

bye bye gpugrid

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,828,447,169
RAC: 2,498,715
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52821 - Posted: 9 Oct 2019 | 11:59:51 UTC - in response to Message 52817.

I'm really dissatisfied with the new acemd3 tasks for the following reasons:
    *The app always spins a whole CPU thread. With the old app, you could configure this behaviour using the SWAN_SYNC environment variable. It made almost no difference with the old app for me (like half a percent in GPU utilization), so I would prefer the app to leave some cycles for a CPU-only task instead.

That's true under Windows Vista+ because of WDDM. Linux and Windows XP have significant performance benefit from SWAN_SYNC.

    *The wrapper spawns the acemd3 process with the lowest priority possible (nice 19). If you configure BOINC to run GPU tasks with a higher priority, only the wrapper has higher priority this way. The wrapper should instead spawn the acemd3 process with the same nice value that it has itself.

This should be corrected.

    *The credit you get for these work units seems to be way off, I only get little more than half of the credit per hour that I get with acemd2 work units

That's because this is beta work, and work from the short queue.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52823 - Posted: 9 Oct 2019 | 12:58:36 UTC - in response to Message 52821.

    *The wrapper spawns the acemd3 process with the lowest priority possible (nice 19). If you configure BOINC to run GPU tasks with a higher priority, only the wrapper has higher priority this way. The wrapper should instead spawn the acemd3 process with the same nice value that it has itself.

This should be corrected.

The wrapper app can handle that - see https://boinc.berkeley.edu/trac/wiki/WrapperApp#Thejobdescriptionfile: the final parameter in the job description file should be 2 (I think).

If you're running a test task, you should be able to see and change that value. If changing it doesn't correct the problem, we'll have to dig deeper.

Aurum
Send message
Joined: 12 Jul 17
Posts: 110
Credit: 7,368,016,843
RAC: 3,316,045
Level
Tyr
Scientific publications
wat
Message 52824 - Posted: 9 Oct 2019 | 15:21:14 UTC - in response to Message 52818.
Last modified: 9 Oct 2019 | 15:21:29 UTC

i am leaving gpugrid, going to F@H (folding @ home) with my hardware. bye bye gpugrid

You'll be back when you get tired of F@H's buggy software.
____________

gFreezer
Send message
Joined: 29 Nov 17
Posts: 4
Credit: 54,878,475
RAC: 513,968
Level
Thr
Scientific publications
wat
Message 52825 - Posted: 9 Oct 2019 | 15:37:36 UTC - in response to Message 52821.

That's true under Windows Vista+ because of WDDM. Linux and Windows XP have significant performance benefit from SWAN_SYNC.

Well, when I played around with SWAN_SYNC about a year ago on Linux, there definitely was a small performance benefit, but it was so minimal that I decided it's not worth sacrificing half a CPU core for it for me.

That's because this is beta work, and work from the short queue.

Ah, so the tasks for acemd3 are just tasks that have been moved over from the "Short runs" application? Makes sense. Is there any particular reason why the "short runs" give much less credit than the "long runs" for the same runtime?

For the "TEST" work units, I figured that they give less credit because they are just tests. Is it possible to opt-out of test work units?

The wrapper app can handle that - see https://boinc.berkeley.edu/trac/wiki/WrapperApp#Thejobdescriptionfile: the final parameter in the job description file should be 2 (I think).

If you're running a test task, you should be able to see and change that value. If changing it doesn't correct the problem, we'll have to dig deeper.

This would solve the problem, but I'm pretty sure that BOINC verifies the files and re-downloads them if they have been altered...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52826 - Posted: 9 Oct 2019 | 15:53:03 UTC - in response to Message 52825.

The wrapper app can handle that - see https://boinc.berkeley.edu/trac/wiki/WrapperApp#Thejobdescriptionfile: the final parameter in the job description file should be 2 (I think).

If you're running a test task, you should be able to see and change that value. If changing it doesn't correct the problem, we'll have to dig deeper.

This would solve the problem, but I'm pretty sure that BOINC verifies the files and re-downloads them if they have been altered...

You could still inspect the file and confirm whether GPUGrid have deployed it properly. A local edit wouldn't be permanent, for the reasons you state, but an edit to a task that has downloaded but not yet started might last for long enough for you to observe and report back.

gFreezer
Send message
Joined: 29 Nov 17
Posts: 4
Credit: 54,878,475
RAC: 513,968
Level
Thr
Scientific publications
wat
Message 52827 - Posted: 9 Oct 2019 | 16:26:04 UTC - in response to Message 52826.


You could still inspect the file and confirm whether GPUGrid have deployed it properly. A local edit wouldn't be permanent, for the reasons you state, but an edit to a task that has downloaded but not yet started might last for long enough for you to observe and report back.

If the task isn't started yet, the files aren't in the "slots" directory yet. But suspending GPU, editing the file of the already started task and resuming GPU worked!

On Linux, setting priority to 5 achieves the desired result: The acemd3 process always has the exact priority defined in cc_config.xml. Now the question is whether this would actually increase the priority unwantedly on Windows...

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52828 - Posted: 9 Oct 2019 | 17:53:40 UTC - in response to Message 52827.

What figure did you find there, before you edited it?

gFreezer
Send message
Joined: 29 Nov 17
Posts: 4
Credit: 54,878,475
RAC: 513,968
Level
Thr
Scientific publications
wat
Message 52830 - Posted: 9 Oct 2019 | 20:05:15 UTC - in response to Message 52828.

It was unset. I added the "priority" xml tag myself.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 912
Credit: 2,197,798,745
RAC: 837,678
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52831 - Posted: 9 Oct 2019 | 21:36:47 UTC - in response to Message 52830.

It was unset. I added the "priority" xml tag myself.

OK, memo to project devs. I'd recommend value 2 for general use, rather than 5.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2048
Credit: 14,828,447,169
RAC: 2,498,715
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 52840 - Posted: 12 Oct 2019 | 22:23:41 UTC - in response to Message 52825.

Well, when I played around with SWAN_SYNC about a year ago on Linux, there definitely was a small performance benefit, but it was so minimal that I decided it's not worth sacrificing half a CPU core for it for me.
It depends also on the GPU. High-end GPUs gain more (up to 30% on a GTX 1080Ti). To optimize the performance of the GPUGrid app, you should not over-commit the CPU feeding the GPU(s). That is one CPU task per CPU core. (on hyperthreaded CPUs you should reduce the number of CPU tasks down to 50%). You can achieve best GPUGrid performance (on high-end GPUs) if only one CPU task is running parallel the GPUGrid app (or none).

Is there any particular reason why the "short runs" give much less credit than the "long runs" for the same runtime?
Because they intended to be shorter than the "long run" workunits. The actual run time got mixed since then, some "long" workunits take about the same time to process as a "short" takes.

For the "TEST" work units, I figured that they give less credit because they are just tests. Is it possible to opt-out of test work units?
Yes. You could set your venues in your GPUGrid preferences not to receive "beta tasks" or you can deselect the entire ACEMD3 queue. But there are not so many beta and short tasks to even bother with this.

Post to thread

Message boards : Number crunching : New app update (acemd3)