Advanced search

Message boards : Graphics cards (GPUs) : Advice for GPU placement

Author Message
Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51217 - Posted: 9 Jan 2019 | 0:11:45 UTC
Last modified: 9 Jan 2019 | 0:20:17 UTC

Hey folks,

I bought an RTX 2080 on Amazon*, being delivered tomorrow, and was wondering if you might help confirm my plans for my GPUs.

I have 7 GPUs worth crunching with:
2080, 1050 Ti, 980 Ti, 980 Ti, 970, 660 Ti, 660 Ti.

I have 2 rigs each capable of housing 3 GPUs:
- 3-yr-old Alienware Area-51 R2 main gaming rig
- 9-yr-old XPS 730x basement cruncher.

My goals are, in this order:
- Use the 2080 for gaming
- Keep a 980 Ti for gaming and testing
- Optimize ability to crunch GPU Grid tasks on all GPUs, knowing that 2080 isn't supported and may cause tasks to not be issued on the PC that has it until acemd is updated (right?)
- Use the GPUs to crunch other projects if GPUGrid isn't setup yet to support me

I'm thinking of doing this:

Area-51 R2:
- 2080 (for sure staying in this PC)
- 980 Ti (for sure staying in this PC)
- 660 Ti
* reasoning: Game on 2080, 980 Ti as backup/testing games, and all 3 crunch BOINC for non-GPU-Grid projects until GPUGrid gets fixed

XPS 730x:
- 1050 Ti
- 980 Ti
- 970
* reasoning: Maximum GPUGrid with remaining GPUs

Shelf:
- 660 Ti
- 460 (gets a friend on the shelf)

Does this sound like it'd "best optimize" my goals? Let me know. Thanks.

* Great deal here, in my opinion - $850, on sale for $50 for $800, and a 5% %40 coupon checkbox, for final total $760 pre-tax, includes 2 games, Battlefield V and Anthem.
https://www.amazon.com/gp/product/B07GHVK4KN

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51218 - Posted: 9 Jan 2019 | 3:34:36 UTC

Also consider CUDA GPU Capability:
2080, >6.1
1050 Ti, 6.1
980 Ti, 5.2
970, 5.2
660 Ti, 3.0
460, 2.1
https://developer.nvidia.com/cuda-gpus
I'd put them:
980 Ti + 970
2080 + 1050 Ti
Capability is one of the factors the work server can consider in assigning WUs. This may get more of the same kind of work to all cards.
Just a thought.

BTW, I donate my legacy cards, etc to the New-2-U charity here.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 255
Credit: 647,700,389
RAC: 39
Level
Lys
Scientific publications
wat
Message 51221 - Posted: 9 Jan 2019 | 18:48:29 UTC
Last modified: 9 Jan 2019 | 18:51:12 UTC

Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup.

You can tell BOINC to ignore GPUGrid for just the 2080 until Turing cards work with GPUGrid. No need to not run GPUGrid on all 3 cards just because one doesn't work.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51222 - Posted: 9 Jan 2019 | 19:50:52 UTC - in response to Message 51221.
Last modified: 9 Jan 2019 | 19:51:31 UTC

Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how?

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51224 - Posted: 9 Jan 2019 | 20:56:53 UTC

Do you have this line in your cc_config.xml ???

<use_all_gpus>1</use_all_gpus>

If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart.

Profile Retvari Zoltan
Avatar
Send message
Joined: 20 Jan 09
Posts: 2031
Credit: 14,717,071,269
RAC: 1,542,948
Level
Trp
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51225 - Posted: 9 Jan 2019 | 20:58:50 UTC - in response to Message 51222.
Last modified: 9 Jan 2019 | 20:59:06 UTC

Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work.
Actually GPUGrid gives work to 20x0 cards, but it will fail on the host, because the app does not contain the code for CC7.5 (>6.1) cards.

Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080?
Yes.

If so, how?
You should put the following to the <options> section in your c:\ProgramData\BOINC\cc_config.xml:
<exclude_gpu> <url>http://gpugrid.net</url> <device_num>0</device_num> </exclude_gpu>
You should check the device number of your GTX 2080 in the first ~20 lines of the BOINC client's event log, and put that number there (I guessed that it will be device 0).

mmonnin
Send message
Joined: 2 Jul 16
Posts: 255
Credit: 647,700,389
RAC: 39
Level
Lys
Scientific publications
wat
Message 51226 - Posted: 9 Jan 2019 | 21:26:28 UTC - in response to Message 51222.

Hmm, I thought that wouldn't work because BOINC reports to the projects the "biggest card", so GPUGrid would think I have 3 2080 GPUs and thus wouldn't give me any work. Are you sure your proposal would work and still allow me to do BOINC work from other projects on the 2080? If so, how?


Yes, the BOINC exclusion goes by Vendor and GPU index. If only one vendor then just the index as Retvari Zoltan has suggested. It doesn't care what card it is. I have excluded a 980Ti in a system (running FAH) and allowed just the 970 to crunch on several projects.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51227 - Posted: 9 Jan 2019 | 21:40:01 UTC - in response to Message 51226.

:) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it.

I didn't know that GPUGrid was giving work to 20-series GPUs, even though it would fail on them. That ends up being good for me, in a way, because I can get GPUGrid work for the other 2 GPUs in the system, and BOINC should get work from other projects for the 2080.

Thanks for helping to clarify that for me. I'll definitely be adding the GPU exclusion.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51230 - Posted: 10 Jan 2019 | 3:01:12 UTC

I'm currently employing gpu_exclude statements for both GPUGrid and Einstein for one of my 4 card hosts with a RTX 2080. Works fine preventing that card from being used.

But that causes issues with a project_max_concurrent statement for Seti.

It prevents all Seti cpu tasks from running leaving only the four gpu tasks running.

I have a thread in the Linux/Unix section of the Questions and Answers forum at Seti.

https://setiathome.berkeley.edu/forum_thread.php?id=83645

For now I have to remove the project_max_concurrent statement from cc_config and use the cpu limitation in Local Preferences to limit the number of cores to 16.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51233 - Posted: 10 Jan 2019 | 12:19:06 UTC

Why do you have an exclude for Einstein?

Zalster
Avatar
Send message
Joined: 26 Feb 14
Posts: 175
Credit: 4,013,368,076
RAC: 5,070
Level
Arg
Scientific publications
watwatwat
Message 51240 - Posted: 10 Jan 2019 | 15:52:58 UTC - in response to Message 51233.

Why do you have an exclude for Einstein?


Certain types of GPU work on Einstein fail. I believe short running work units do fine, the long running fail. Might be reverse. Anyway, the terminology they use to describe the work unit (as defined by the users not the scientist) is wrong nomenclature. So I stopped paying attention to the discussion. Keith can fill you in on the specifics.
____________

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51241 - Posted: 10 Jan 2019 | 16:09:08 UTC
Last modified: 10 Jan 2019 | 16:09:23 UTC

Thanks. I confirmed that at least one of the Einstein task types failed immediately on my 2080, so I also added them to my GPU Exclusion list for that GPU. Pity, really.

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1104
Credit: 6,101,732,079
RAC: 542
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51244 - Posted: 10 Jan 2019 | 16:57:34 UTC - in response to Message 51227.

:) Did you guys know that I'm responsible for <exclude_gpu> being included in BOINC? I know how it works and how to use it.

I remember this. I'd like to thank you as I use <exclude_gpu> extensively. Some machines are running GPU Grid, Amicable Numbers and Enigma on dedicated GPUs. Also thanks for all the great work that you do debugging BOINC and testing new features.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51245 - Posted: 10 Jan 2019 | 17:07:16 UTC - in response to Message 51244.
Last modified: 10 Jan 2019 | 17:08:03 UTC

:) I'm a rock star at breaking things, for sure!
I am happy to hear that you find the feature as useful as I do!

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1104
Credit: 6,101,732,079
RAC: 542
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51248 - Posted: 10 Jan 2019 | 18:39:38 UTC - in response to Message 51221.
Last modified: 10 Jan 2019 | 18:43:36 UTC

Power supply/cooling capability/spacing would be a bigger concern for me if that was my setup.

Cooling is a major consideration in GPU placement for me. All my Ryzen 7 boxes are running 3 GPUs, usually 3 x 1060 cards. The top GPU is flexible, the middle GPU is a blower, and the bottom card that sits up against the blower is a short card that leaves the blower fan uncovered. If the machine still runs hotter than I like, I sometimes put a 1050ti in the lower position.

Another consideration is bus width. On X370/X470 boards for instance the 2 top PCIe slots run at PCIe 3.0 x8 (if both are used). The bottom slot is PCIe 2.0 x4. The bottom slot handles a 1060 at full speed on a Ryzen 7, but not always on machines with slower processors. For example I have an ITX box with a Celeron and PCI 2.0 x4 and it constricts a 1060 but a 1050ti runs at full speed. BTW, the Ryzen 7 machines use far less CPU to run 3 1060 cards at full blast. My slower boxes take a lot more CPU allocation to run 2 1060 cards than the Ryzens do to handle 3. In this regard I've also found that SWAN_SYNC helps noticeably on all my machines except for the Ryzens, which seem to feed the GPUs fully without SWAN_SYNC. BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0.

Of course power/watt ratio is another major consideration. I recently retired my pre 10xx NV GPUs. The 750ti cards use about the same power as a 1050ti but the 1050ti is ~60% faster (it depends somewhat on the project). My 670 is still viable (24hr deadline-wise) but I replaced it because it's slower than a 1060 and uses much more power and much more heat. You might find that good used GPUs are selling inexpensively now as disillusioned miners seem to be fleeing the mines. Perhaps black lung disease? ;-)

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51250 - Posted: 10 Jan 2019 | 18:49:02 UTC - in response to Message 51248.

:)

I'm not too concerned with cooling. My cramped GPUs run hot, and I know it. For my main rig, I just don't do GPU work on it during the day, and instead let it run at night. Even with a max fan curve, it routinely runs around 75-80*C overnight, and the small office becomes very well heated. But I already stress tested the overclocks, and know it is 100% stable, and the GPU fans have proven to be very durable too.

What can I say? I like it hot and I like it loud. ;)

Profile Beyond
Avatar
Send message
Joined: 23 Nov 08
Posts: 1104
Credit: 6,101,732,079
RAC: 542
Level
Tyr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51253 - Posted: 10 Jan 2019 | 19:24:37 UTC - in response to Message 51250.

There's something to be said for white noise...

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51257 - Posted: 10 Jan 2019 | 22:18:46 UTC

Power connectors.

The basement cruncher, my old XPS 730x, has 4 6-pin connectors, and I previously made 2-more using molex adapters, for a total of 6 6-pin connectors. But 0 8-pin connectors.

My GPU additional power requirements are:
6+8 : EVGA GeForce RTX 2080 XC ULTRA GAMING
None: EVGA GeForce GTX 1050 Ti SSC GAMING
8+8 : EVGA GeForce GTX 980 Ti FTW GAMING ACX 2.0+
6+8 : Dell GTX 980 Ti
6+6 : EVGA GeForce GTX 970 FTW ACX 2.0
6+6 : EVGA GeForce GTX 660 Ti FTW+ 3GB w/Backplate
6+6 : MSI GTX 660 Ti TwinFrozr III OC 3GB

So, I think this means I'm going to:
Area 51 R2: 2080, 980 Ti, 980 Ti
XPS 730x: 970, 1050, 660 Ti
Shelf: 660 Ti

Fun!

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51264 - Posted: 11 Jan 2019 | 3:06:36 UTC - in response to Message 51245.
Last modified: 11 Jan 2019 | 3:12:55 UTC

:) I'm a rock star at breaking things, for sure!
I am happy to hear that you find the feature as useful as I do!

I about to join you as a "rock star" for breaking things to apparently.

The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement.

The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty.

Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection.

The PR2918 commit is close to be being accepted into the master branch.

I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51265 - Posted: 11 Jan 2019 | 3:18:18 UTC - in response to Message 51248.

BTW, the new Ryzens coming out mid year will be PCIe 4.0, so again double the speed of PCIe 3.0. You'll need a 500 series MB for PCIe 4.0, on the older boards they'll still run at PCIe 3.0.

There's talk from CES that the PCIe 4.0 spec cards would still work in the first PCIe slot closest to the cpu on the existing X370/X470 motherboards as the signaling requirements for PCIe 4.0 devices limits the signal path to 6 inches without redrivers.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51266 - Posted: 11 Jan 2019 | 3:37:07 UTC - in response to Message 51264.
Last modified: 11 Jan 2019 | 3:39:15 UTC

:) I'm a rock star at breaking things, for sure!
I am happy to hear that you find the feature as useful as I do!

I about to join you as a "rock star" for breaking things to apparently.

The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement.

The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty.

Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection.

The PR2918 commit is close to be being accepted into the master branch.

I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns.


It sounds like you were using "max concurrent" to mean "only run this many at the same time, but allow fetch of more."

David is likely arguing that, if you can't run more than that many simultaneously, then why buffer more? Consider tasks that take 300 days to complete (yes, RNA World has them). If you're set to only run 3 as "max concurrent", then why would you want to get a 4th task that would sit there for 300 days?

You might consider asking for a separation of functionality --- "max_concurrent_to_schedule" [which is what you want] vs "max_concurrent_to_fetch" [which is what David is changing max_concurrent to mean]. Then you could set the first one to a value, and leave the second one unbound, and get back your desired behavior.

I hope this makes sense to you. Please feel free to add the text/info to the PR.

Note: I doubt it waits until the cache is completely exhausted of max_blah items before asking more. I'm betting, instead, that work fetch will still top off, even if you have some of that task type, but only up to the max_blah setting.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51267 - Posted: 11 Jan 2019 | 3:54:32 UTC - in response to Message 51266.

:) I'm a rock star at breaking things, for sure!
I am happy to hear that you find the feature as useful as I do!

I about to join you as a "rock star" for breaking things to apparently.

The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement.

The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty.

Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection.

The PR2918 commit is close to be being accepted into the master branch.

I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns.


It sounds like you were using "max concurrent" to mean "only run this many at the same time, but allow fetch of more."

David is likely arguing that, if you can't run more than that many simultaneously, then why buffer more? Consider tasks that take 300 days to complete (yes, RNA World has them). If you're set to only run 3 as "max concurrent", then why would you want to get a 4th task that would sit there for 300 days?

You might consider asking for a separation of functionality --- "max_concurrent_to_schedule" [which is what you want] vs "max_concurrent_to_fetch" [which is what David is changing max_concurrent to mean]. Then you could set the first one to a value, and leave the second one unbound, and get back your desired behavior.

I hope this makes sense to you. Please feel free to add the text/info to the PR.

Note: I doubt it waits until the cache is completely exhausted of max_blah items before asking more. I'm betting, instead, that work fetch will still top off, even if you have some of that task type, but only up to the max_blah setting.


Yes, I guess I generalized. I didn't wait to see if all my 1000 tasks finished before the work request was initiated.

From the testing by Richard and in the host emulator, I assume that when the number of tasks fell below my <project_max_concurrent>16<project_max_concurrent> statement that the client would finally report all 485 completed tasks and finally ask for more work.

But from the client configuration document https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

the intended purpose of max_concurrent and project_max_concurrent is:

max_concurrent
The maximum number of tasks of this application to run at a given time.



and:

project_max_concurrent
A limit on the number of running jobs for this project.


The original purpose of the max_concurrent parameters shouldn't be circumvented by the new commit code.

The key point that needs to be emphasized is run at a given time

and number of running jobs

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51268 - Posted: 11 Jan 2019 | 3:55:32 UTC - in response to Message 51266.

:) I'm a rock star at breaking things, for sure!
I am happy to hear that you find the feature as useful as I do!

I about to join you as a "rock star" for breaking things to apparently.

The client code commit that DA wrote to fix my original problem is going to cause major problems for anyone using a max_concurrent or project_max_concurrent statement.

The unintended consequence of the code change prevents requesting work fetch task replacement until the hosts cache is empty.

Only then does the host report all finished work and then asks for work to refill the cache. So the end of keeping your cache topped up at every 5 minute scheduler connection.

The PR2918 commit is close to be being accepted into the master branch.

I have voiced my displeasure but since only DA usually authorizes pull requests into the master branch, that decision is up to him. Richard Haselgrove also has voiced his concerns.


It sounds like you were using "max concurrent" to mean "only run this many at the same time, but allow fetch of more."

David is likely arguing that, if you can't run more than that many simultaneously, then why buffer more? Consider tasks that take 300 days to complete (yes, RNA World has them). If you're set to only run 3 as "max concurrent", then why would you want to get a 4th task that would sit there for 300 days?

You might consider asking for a separation of functionality --- "max_concurrent_to_schedule" [which is what you want] vs "max_concurrent_to_fetch" [which is what David is changing max_concurrent to mean]. Then you could set the first one to a value, and leave the second one unbound, and get back your desired behavior.

I hope this makes sense to you. Please feel free to add the text/info to the PR.

Note: I doubt it waits until the cache is completely exhausted of max_blah items before asking more. I'm betting, instead, that work fetch will still top off, even if you have some of that task type, but only up to the max_blah setting.


Yes, I guess I generalized. I didn't wait to see if all my 1000 tasks finished before the work request was initiated.

From the testing by Richard and in the host emulator, I assume that when the number of tasks fell below my <project_max_concurrent>16<project_max_concurrent> statement that the client would finally report all 485 completed tasks and finally ask for more work.

But from the client configuration document https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration

the intended purpose of max_concurrent and project_max_concurrent is:

max_concurrent
The maximum number of tasks of this application to run at a given time.



and:

project_max_concurrent
A limit on the number of running jobs for this project.


The original purpose of the max_concurrent parameters shouldn't be circumvented by the new commit code.

The key point that needs to be emphasized is run at a given time

and number of running jobs

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51269 - Posted: 11 Jan 2019 | 4:05:53 UTC
Last modified: 11 Jan 2019 | 4:26:26 UTC

Thanks for the comments Jacob. I have added your observations to my post and will await what Richard has to say about your new classifications when the new day for him begins.

[Edit] I should also comment that I have been using <project_max_concurrent> statements in all my hosts for years.

They have never been limited to the N number of tasks in those statements in their caches. I have always maintained the server side 100 cpu +100 tasks per gpu limit for the caches on all hosts. Never had any issues fetching replacement work for my hosts to keep topped up at the cache limit.

The problem I incurred in my original post was when I added the gpu_exclude statements which prevented all cpu tasks from running.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51270 - Posted: 11 Jan 2019 | 4:22:43 UTC - in response to Message 51269.

You're welcome. Since max_concurrent documentation has a 'documented meaning' already, you might use that to suggest (gently push) toward keeping it the same, and putting any work fetch max limits into a new variable. I could see it ending up that way maybe.

Richard Haselgrove
Send message
Joined: 11 Jul 09
Posts: 902
Credit: 2,140,463,345
RAC: 346,767
Level
Phe
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51274 - Posted: 11 Jan 2019 | 10:49:35 UTC

Hi Keith and Jacob!

Slightly odd place for this conversation, but I hope other posters will excuse me joining in.

There's a problem with <max_concurrent>. I noticed a couple of years ago that it could prevent full use of your computer, if you tried to run both GPU and CPU versions of the same app at the same time. Last year, Keith fell over the same problem, and with two independent reports David started work on fixing it.

This work is explicitly about the proper definition of max_concurrent, but in fixing it, David realised that if you do less work concurrently, you'll do less work overall, and you might over-fill the cache and run into deadline problems.

So, for the time being, he's put in a rather crude kludge. I've seen and documented a case where a project ran completely dry before finally requesting new work. I think that there's universal agreement that this is the wrong sledgehammer, used to crack the wrong nut.

I'm trying to put forward the concept of "proportional work fetch". I tend to run a 0.25 day cache (partly because of the fast turnround required by this project). If you run on all four cores of a quad CPU, BOINC will interpret that as a 1.0 day CPU cache to keep all four cores busy for 0.25 days. A max_concurrent of 2 should limit that to 0.5 days, and so on. Unless anyone can suggest a better solution?

I was on a conference call last night, where I heard other developers, in no uncertain terms, urge David to start work on a better work fetch solution as soon as possible. And, in particular, not to release a new client in the gap where the max_concurrent bug has been fixed but work fetch is broken.

We're getting very close to the completion of the max_concurrent fix. Keith's logs from two nights ago revealed a small bug, which I've reported, but should be easy to fix. And then we can move on to phase 2.

Contrary to what Keith said, under the new "Community Governance" management of BOINC, no developer is allowed to merge their own code without independent scrutiny - not even David. I got approval last night that, if Keith and I can say that the max_concurrent code works within its design limits (i.e. putting aside the work fetch bug for the moment), and if Juha can confirm that the C++ code has been written correctly, then we can collectively approve and merge. David wants that to happen so he has a clean code base before he starts on the work fetch part of the problem.

I'll keep an eye on progress, and it's very easy in Windows to test as we go along (replacement binaries are available - automatically, within minutes - of any code change: And if the code won't compile, that's also reported back to the developer automatically). But I don't run the sort of heavy metal that you guys do, so any help with the monitoring and testing process that you can contribute will be greatly appreciated.

Keith Myers
Send message
Joined: 13 Dec 17
Posts: 257
Credit: 234,046,463
RAC: 47
Level
Leu
Scientific publications
wat
Message 51280 - Posted: 11 Jan 2019 | 16:02:53 UTC

Good morning Richard, sorry about the location of this discussion. Jacob has provided some useful information to enhance my understanding.

My concerns were that David in the PR2918 comments said:

Not fetching work if max_concurrent is reached is the intended behavior.


and that is what alarmed me greatly.

Very encouraging to hear that your conference call with the other developers also mentioned their concerns with work fetching.

I was getting overly worked up I guess in thinking the commit was going to master soon with the unintended consequence of breaking work fetch. I thought that would cause massive complaints from everyone who would notice that they didn't maintain their caches at all times.

Thanks for clarification that even David has to have consensus from the other developers to merge code into master.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 255
Credit: 647,700,389
RAC: 39
Level
Lys
Scientific publications
wat
Message 51282 - Posted: 11 Jan 2019 | 19:58:55 UTC

I've never liked using those 2 commands when trying to limit a project to something like 50% of CPU threads awhile wanting another project to use the other 50%. I would many times end up with a full queue of the project using max # of tasks command and the other threads would be idle as the queue would be full. Another situation where the task run priority should be separate from the work download priority and another BOINC client instance ends up being the favorable way to finely tune BOINC management on a PC.

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51440 - Posted: 7 Feb 2019 | 17:02:55 UTC
Last modified: 7 Feb 2019 | 17:03:35 UTC

Are you guys sure that GTX 2080 GPUs get work from GPU Grid?
From my testing, it seems my PC hasn't gotten a single task from that project since installing that GPU.
http://www.gpugrid.net/results.php?hostid=326413

PappaLitto
Send message
Joined: 21 Mar 16
Posts: 502
Credit: 4,277,198,401
RAC: 1,428,793
Level
Arg
Scientific publications
watwatwat
Message 51441 - Posted: 7 Feb 2019 | 17:31:44 UTC

The only 2080s at this time that should get WUs are the ones in the same machine as a 1000 series card or below. The WU will not yet work on 2000 series cards

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51442 - Posted: 7 Feb 2019 | 17:43:54 UTC - in response to Message 51441.

My machine has 2080, 980 Ti, 980 Ti. And I have a GPU Exclusion setup so GPUGrid work doesn't run work on the 2080.

Yet, GPUGrid work fetch never gives any work for my PC.

Any idea why?

mmonnin
Send message
Joined: 2 Jul 16
Posts: 255
Credit: 647,700,389
RAC: 39
Level
Lys
Scientific publications
wat
Message 51445 - Posted: 8 Feb 2019 | 0:35:13 UTC - in response to Message 51442.

My machine has 2080, 980 Ti, 980 Ti. And I have a GPU Exclusion setup so GPUGrid work doesn't run work on the 2080.

Yet, GPUGrid work fetch never gives any work for my PC.

Any idea why?


Do you have a device_num with your project URL exclusion? Otherwise all GPUs will be excluded instead of just the Turing card.

<device_num>0</device_num>

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51446 - Posted: 8 Feb 2019 | 2:09:26 UTC

Yes, I have that set correctly.

2/7/2019 12:12:35 PM | | Starting BOINC client version 7.14.2 for windows_x86_64
2/7/2019 12:12:35 PM | | log flags: file_xfer, sched_ops, task, scrsave_debug, unparsed_xml
2/7/2019 12:12:35 PM | | Libraries: libcurl/7.47.1 OpenSSL/1.0.2g zlib/1.2.8
2/7/2019 12:12:35 PM | | Data directory: E:\BOINC Data
2/7/2019 12:12:35 PM | | Running under account jacob
2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 0: GeForce RTX 2080 (driver version 418.81, CUDA version 10.1, compute capability 7.5, 4096MB, 3551MB available, 10687 GFLOPS peak)
2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 418.81, CUDA version 10.1, compute capability 5.2, 4096MB, 3959MB available, 6060 GFLOPS peak)
2/7/2019 12:12:35 PM | | CUDA: NVIDIA GPU 2: GeForce GTX 980 Ti (driver version 418.81, CUDA version 10.1, compute capability 5.2, 4096MB, 3959MB available, 7271 GFLOPS peak)
2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 0: GeForce RTX 2080 (driver version 418.81, device version OpenCL 1.2 CUDA, 8192MB, 3551MB available, 10687 GFLOPS peak)
2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 1: GeForce GTX 980 Ti (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3959MB available, 6060 GFLOPS peak)
2/7/2019 12:12:35 PM | | OpenCL: NVIDIA GPU 2: GeForce GTX 980 Ti (driver version 418.81, device version OpenCL 1.2 CUDA, 6144MB, 3959MB available, 7271 GFLOPS peak)
2/7/2019 12:12:35 PM | | Host name: Speed
2/7/2019 12:12:35 PM | | Processor: 16 GenuineIntel Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz [Family 6 Model 63 Stepping 2]
2/7/2019 12:12:35 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 smep bmi2
2/7/2019 12:12:35 PM | | OS: Microsoft Windows 10: Professional x64 Edition, (10.00.18329.00)
2/7/2019 12:12:35 PM | | Memory: 63.89 GB physical, 73.39 GB virtual
2/7/2019 12:12:35 PM | | Disk: 80.00 GB total, 59.08 GB free
2/7/2019 12:12:35 PM | | Local time is UTC -5 hours
2/7/2019 12:12:35 PM | | No WSL found.
2/7/2019 12:12:35 PM | | VirtualBox version: 5.2.27
2/7/2019 12:12:35 PM | GPUGRID | Found app_config.xml
2/7/2019 12:12:35 PM | GPUGRID | Your app_config.xml file refers to an unknown application 'acemdbeta'. Known applications: 'acemdlong', 'acemdshort'
2/7/2019 12:12:35 PM | GPUGRID | Config: excluded GPU. Type: all. App: all. Device: 0
2/7/2019 12:12:35 PM | Einstein@Home | Config: excluded GPU. Type: all. App: all. Device: 0
2/7/2019 12:12:35 PM | Albert@Home | Config: excluded GPU. Type: all. App: all. Device: 0
2/7/2019 12:12:35 PM | | Config: event log limit 20000 lines
2/7/2019 12:12:35 PM | | Config: use all coprocessors

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51447 - Posted: 8 Feb 2019 | 2:11:19 UTC
Last modified: 8 Feb 2019 | 2:17:50 UTC

However, this message appears to be related, and appears to show whenever a Short/Long task is available but not given to me.
I'll have to think about what it means, and if I have something misconfigured.

Edit: My cache settings are for 10 days buffer, plus 0.5 additional. Since I believe BOINC interprets that as "may be disconnected for 10 days", it may be limiting my ability to get work based on a calculation involving that 10 days.

Time for me to rethink my cache settings (which I have intentionally set for other valid reasons), and then retest.


2/7/2019 3:36:39 PM | GPUGRID | Tasks won't finish in time: BOINC runs 85.2% of the time; computation is enabled 95.1% of that

Jacob Klein
Send message
Joined: 11 Oct 08
Posts: 1111
Credit: 1,754,882,714
RAC: 889,089
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 51453 - Posted: 8 Feb 2019 | 13:50:49 UTC - in response to Message 51447.
Last modified: 8 Feb 2019 | 13:51:36 UTC

After I changed my cache settings, from 10.0d and 0.5d, to 2.0d and 0.5d .... The message went away, and I started getting GPUGrid work for the first time on this PC since getting the RTX 2080.

http://www.gpugrid.net/results.php?hostid=326413

Thanks.

mmonnin
Send message
Joined: 2 Jul 16
Posts: 255
Credit: 647,700,389
RAC: 39
Level
Lys
Scientific publications
wat
Message 51454 - Posted: 8 Feb 2019 | 23:03:06 UTC

Good to see. I don't think I've ever seen that message about not finishing in time.

Chris S
Send message
Joined: 18 Jan 09
Posts: 21
Credit: 1,262,690
RAC: 0
Level
Ala
Scientific publications
watwatwatwatwat
Message 51473 - Posted: 13 Feb 2019 | 8:16:25 UTC


2/7/2019 3:36:39 PM | GPUGRID | Tasks won't finish in time: BOINC runs 85.2% of the time; computation is enabled 95.1% of that


Well, there is your explanation clearly set out above! You weren't being sent any work because Boinc knew that you wouldn't finish it in time. If it doesn't suit you to run Boinc 24/7 then it seems the only was forward is to drop the cache levels.

Glad you got it sorted out.
____________

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51474 - Posted: 13 Feb 2019 | 8:35:33 UTC - in response to Message 51230.

For now I have to remove the project_max_concurrent statement from cc_config and use the cpu limitation in Local Preferences to limit the number of cores to 16.
Why not use this in your cc_config???
<ncpus>16</ncpus>

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51475 - Posted: 13 Feb 2019 | 9:00:38 UTC - in response to Message 51274.
Last modified: 13 Feb 2019 | 9:01:17 UTC

We're getting very close to the completion of the max_concurrent fix.

Richard, Not sure what you guys are up to but I sure hope you take the WCG MIP project into consideration before you roll it out. https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=569786
They coded the use of the L3 Cache wrong and it uses 4-5 MB per MIP WU. If you exceed that I've BOINC performance cut in half. I have to use max_current in my WCG app_config or I cannot run MIP simulations.
<app_config>
<app>
<name>mip1</name>
<!-- needs 5 MB L3 cache per mip1 WU, use 5-10 -->
<!-- Xeon E5-2699v4, L3 Cache = 55 MB -->
<max_concurrent>10</max_concurrent>
<fraction_done_exact>1</fraction_done_exact>
</app>
</app_config>

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51476 - Posted: 13 Feb 2019 | 9:01:50 UTC - in response to Message 51475.

We're getting very close to the completion of the max_concurrent fix.

Richard, Not sure what you guys are up to but I sure hope you take the WCG MIP project into consideration before you roll it out. https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=569786
They coded the use of the L3 Cache wrong and it uses 4-5 MB per MIP WU. If you exceed that I've had BOINC performance cut in half. I have to use max_current in my WCG app_config or I cannot run MIP simulations.
<app_config>
<app>
<name>mip1</name>
<!-- needs 5 MB L3 cache per mip1 WU, use 5-10 -->
<!-- Xeon E5-2699v4, L3 Cache = 55 MB -->
<max_concurrent>10</max_concurrent>
<fraction_done_exact>1</fraction_done_exact>
</app>
</app_config>

Aurum
Send message
Joined: 12 Jul 17
Posts: 97
Credit: 7,219,554,643
RAC: 51,900
Level
Tyr
Scientific publications
wat
Message 51477 - Posted: 13 Feb 2019 | 9:02:57 UTC - in response to Message 51475.
Last modified: 13 Feb 2019 | 9:07:04 UTC

We're getting very close to the completion of the max_concurrent fix.

Richard, Not sure what you guys are up to but I sure hope you take the WCG MIP project into consideration before you roll it out. https://www.worldcommunitygrid.org/forums/wcg/viewpostinthread?post=569786
They coded the use of the L3 Cache wrong and it uses 4-5 MB per MIP WU. If you exceed that I've seen BOINC performance cut in half. I have to use max_current in my WCG app_config or I cannot run MIP simulations.
<app_config>
<app>
<name>mip1</name>
<!-- needs 5 MB L3 cache per mip1 WU, use 5-10 -->
<!-- Xeon E5-2699v4, L3 Cache = 55 MB -->
<max_concurrent>10</max_concurrent>
<fraction_done_exact>1</fraction_done_exact>
</app>
</app_config>

Post to thread

Message boards : Graphics cards (GPUs) : Advice for GPU placement