Advanced search

Message boards : Server and website : general server issue on 2 of 3 PCs

Author Message
Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50887 - Posted: 16 Nov 2018 | 21:05:36 UTC
Last modified: 16 Nov 2018 | 21:28:57 UTC

Hello,

i have big trouble with 2 of my PCs to get connected to the GPUGRID Servers.
At the beginning (2 or 3 days ago) the connection was slow and sometimes ended in timeout. But reload did help.
Now GPUgrid ist completely dead. The Website is unreachable, the WUs wont get uploaded.

But a ping to www.gpugrid.net works, only about 10% packet loss.
On the 3rd PC verything is fine.

All 3 PCs are connected to the same router.
The not working PCs have Ubuntu16LTS and the working PC has Ubuntu 18LTS.

The rest of the internet is working also fine.

Any ideas why 2 of my PCs cant talk any more with GPUGRID?
Reboot didnt help.

One thing that might be one piece of the problem: i cloned the SSD of one PC somedays ago and used the clone in the other PC. But the PCs got different names and the IDs in the project are also different. Und at the beginning that was no problem. Both PCs got work and delivered the results. And other projects can communicate with their servers without problems.

Traceroute looks quite similar.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50895 - Posted: 17 Nov 2018 | 9:51:19 UTC - in response to Message 50887.

traceroute of not communicating PC:
traceroute to www.gpugrid.net (84.89.134.145), 30 hops max, 60 byte packets
1 fritz.box (192.168.178.1) 0.448 ms 0.598 ms 1.278 ms
2 62.155.xxx.xxx (x) 19.643 ms 19.865 ms 21.690 ms
3 f-ed8-i.F.DE.NET.DTAG.DE (217.5.95.70) 27.551 ms 27.593 ms 27.604 ms
4 80.157.201.198 (80.157.201.198) 27.908 ms 35.696 ms 35.993 ms
5 be3187.ccr42.fra03.atlas.cogentco.com (130.117.1.118) 36.079 ms 36.515 ms 41.965 ms
6 be2799.ccr41.par01.atlas.cogentco.com (154.54.58.234) 51.110 ms 36.931 ms 42.366 ms
7 be3324.ccr52.bio02.atlas.cogentco.com (130.117.2.65) 49.509 ms be3325.ccr51.bio02.atlas.cogentco.com (130.117.48.205) 48.605 ms 48.388 ms
8 be3357.ccr31.mad05.atlas.cogentco.com (130.117.1.21) 58.225 ms be3358.ccr32.mad05.atlas.cogentco.com (130.117.1.97) 58.315 ms be3357.ccr31.mad05.atlas.cogentco.com (130.117.1.21) 58.335 ms
9 be3374.agr21.mad05.atlas.cogentco.com (130.117.2.62) 58.400 ms be3375.agr22.mad05.atlas.cogentco.com (130.117.50.202) 58.466 ms *
10 be3480.nr51.b015537-1.mad05.atlas.cogentco.com (154.25.1.18) 58.511 ms be3481.nr51.b015537-1.mad05.atlas.cogentco.com (154.25.1.110) 58.583 ms 65.188 ms
11 * 149.11.68.2 (149.11.68.2) 72.797 ms 149.11.68.50 (149.11.68.50) 72.670 ms
12 * 130.206.245.122 (130.206.245.122) 77.586 ms *
13 anella-val1-router.red.rediris.es (130.206.211.70) 74.310 ms 74.347 ms 74.362 ms
14 * * *
15 84.89.159.147 (84.89.159.147) 74.431 ms * 74.467 ms
16 * * *
17 * * *
18 grosso.upf.edu (84.89.134.145) 73.621 ms !X 68.978 ms !X *


working PC:
traceroute to www.gpugrid.net (84.89.134.145), 30 hops max, 60 byte packets
1 fritz.box (192.168.178.1) 0.406 ms 1.125 ms 1.317 ms
2 62.155.xxx.xxx (xx) 17.419 ms 18.208 ms 19.095 ms
3 f-ed8-i.F.DE.NET.DTAG.DE (217.5.95.70) 26.236 ms 26.304 ms 29.490 ms
4 80.157.201.198 (80.157.201.198) 32.720 ms 40.808 ms 40.877 ms
5 be3186.ccr41.fra03.atlas.cogentco.com (130.117.0.1) 40.976 ms 41.038 ms 41.071 ms
6 be2799.ccr41.par01.atlas.cogentco.com (154.54.58.234) 54.945 ms be2800.ccr42.par01.atlas.cogentco.com (154.54.58.238) 40.347 ms be2799.ccr41.par01.atlas.cogentco.com (154.54.58.234) 40.352 ms
7 be3324.ccr52.bio02.atlas.cogentco.com (130.117.2.65) 55.123 ms 52.034 ms *
8 be3357.ccr31.mad05.atlas.cogentco.com (130.117.1.21) 59.177 ms be3358.ccr32.mad05.atlas.cogentco.com (130.117.1.97) 57.064 ms 57.268 ms
9 be3379.agr22.mad05.atlas.cogentco.com (154.54.39.146) 55.661 ms 57.581 ms be3375.agr22.mad05.atlas.cogentco.com (130.117.50.202) 52.864 ms
10 be3481.nr51.b015537-1.mad05.atlas.cogentco.com (154.25.1.110) 58.658 ms be3480.nr51.b015537-1.mad05.atlas.cogentco.com (154.25.1.18) 53.014 ms 57.263 ms
11 149.11.68.2 (149.11.68.2) 57.465 ms 149.11.68.50 (149.11.68.50) 57.372 ms 57.493 ms
12 130.206.245.122 (130.206.245.122) 65.125 ms * 66.149 ms
13 anella-val1-router.red.rediris.es (130.206.211.70) 72.201 ms 72.461 ms 73.529 ms
14 * * *
15 84.89.159.147 (84.89.159.147) 86.897 ms 69.216 ms 69.280 ms
16 * * *
17 * * *
18 grosso.upf.edu (84.89.134.145) 73.386 ms !X 74.061 ms !X 74.559 ms !X

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50896 - Posted: 17 Nov 2018 | 9:59:09 UTC - in response to Message 50895.

transfer log:

Sa 17 Nov 2018 10:54:48 CET | GPUGRID | update requested by user
Sa 17 Nov 2018 10:55:04 CET | GPUGRID | Started upload of e29s4_e18s5p0f36-ADRIA_FOLDPUCB_NTL9_NoTica_KCenter_20_crystal_ss_contacts_20_ntl9_3-0-1-RND5283_1_0
Sa 17 Nov 2018 10:55:04 CET | GPUGRID | Started upload of e29s4_e18s5p0f36-ADRIA_FOLDPUCB_NTL9_NoTica_KCenter_20_crystal_ss_contacts_20_ntl9_3-0-1-RND5283_1_1
Sa 17 Nov 2018 10:55:05 CET | GPUGRID | [http] [ID#873] Info: Found bundle for host www.gpugrid.org: 0x55f8b685ff30 [serially]
Sa 17 Nov 2018 10:55:05 CET | GPUGRID | [http] [ID#872] Info: Trying 84.89.134.145...
Sa 17 Nov 2018 10:55:05 CET | GPUGRID | [http] [ID#873] Info: Hostname was found in DNS cache
Sa 17 Nov 2018 10:55:05 CET | GPUGRID | [http] [ID#873] Info: Trying 84.89.134.145...

Sa 17 Nov 2018 10:55:49 CET | | [http] [ID#0] Info: Connection timed out after 120116 milliseconds
Sa 17 Nov 2018 10:55:49 CET | | [http] [ID#0] Info: Closing connection 152
Sa 17 Nov 2018 10:55:49 CET | | [http] HTTP error: Timeout was reached
Sa 17 Nov 2018 10:55:49 CET | | [http] HTTP_OP::init_get(): http://www.gpugrid.net/notices.php?u...d165fb2f5bf6c7
Sa 17 Nov 2018 10:55:49 CET | | [http] [ID#0] Info: Found bundle for host www.gpugrid.net: 0x55f8b670f0b0 [serially]
Sa 17 Nov 2018 10:55:49 CET | | [http] [ID#0] Info: Trying 84.89.134.145...
Sa 17 Nov 2018 10:55:53 CET | GPUGRID | [http] [ID#1] Info: Connection timed out after 120124 milliseconds
Sa 17 Nov 2018 10:55:53 CET | GPUGRID | [http] [ID#1] Info: Closing connection 153
Sa 17 Nov 2018 10:55:53 CET | GPUGRID | [http] HTTP error: Timeout was reached

AuxRx
Send message
Joined: 3 Jul 18
Posts: 22
Credit: 2,758,801
RAC: 0
Level
Ala
Scientific publications
wat
Message 50898 - Posted: 17 Nov 2018 | 14:39:07 UTC - in response to Message 50887.

I'll hazard a guess that you did not bother to reset the network settings after cloning and at least two PCs are stuck with the same address.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50902 - Posted: 17 Nov 2018 | 18:11:17 UTC
Last modified: 17 Nov 2018 | 18:13:25 UTC

The IP-adresses are different.
The router is giving this adress before the OS is booting.
And one of the PCs is off since hours - in hope of the other one will get contact sometimes.
I resetted all projects before the clone got online again. And GPUgrid was not active on both PCs before. On both i added it as new project.

Whatever the problem is: its only a problem of GPUgrid.
All other internet things are working well, all other projects get WUs and their websites are visible.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50905 - Posted: 17 Nov 2018 | 22:11:46 UTC - in response to Message 50902.
Last modified: 17 Nov 2018 | 22:19:25 UTC

I tried to boot the PC with another SSD.
In this system (Lubuntu) the gpugrid.net Homepage is also unavailable.
The same when i boot Windows7 from this SSD.

This looks to me as if grpugrid blocked the Mac-adress of my PCs. Very strange.

AuxRx
Send message
Joined: 3 Jul 18
Posts: 22
Credit: 2,758,801
RAC: 0
Level
Ala
Scientific publications
wat
Message 50910 - Posted: 18 Nov 2018 | 9:04:29 UTC - in response to Message 50902.
Last modified: 18 Nov 2018 | 9:10:00 UTC

Still, 10% packet loss is not normal.

There are ways in which your GPUGRID setup could be different, and let's be honest, GPUGRID servers are slow themselves. Easy answer: either troubleshoot a hardware issue or try a clean install. Make sure to reset your router and physically disconnect the other systems.

You can see if your host has been blacklisted with BOINC's tools here. Just append your hostid. This would not work if anything other than BOINC, e.g. a firewall, was blocking your host
http://www.gpugrid.net/host_app_versions.php?hostid=

EDIT: url

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50912 - Posted: 18 Nov 2018 | 10:23:31 UTC
Last modified: 18 Nov 2018 | 10:26:18 UTC

I dont see any hint for blacklisting but dont know how it would look like.

After booting from a live-DVD i got a new IP-adress and www.gpugrid.net was still unavailable in firefox.

So i dont think that any change in the OS would help me.

My next stepp will be to boot the SSD in the still not blocked PC to upload the WUs. And than i remove the project.
I cant imagine a hardware bug that only effects one website.

One WU was uploaded over night - but is still impossible to be reported.

AuxRx
Send message
Joined: 3 Jul 18
Posts: 22
Credit: 2,758,801
RAC: 0
Level
Ala
Scientific publications
wat
Message 50914 - Posted: 18 Nov 2018 | 11:22:52 UTC - in response to Message 50912.

I dont see any hint for blacklisting but dont know how it would look like.


Max tasks per day = 1

I.e. the BONIC system filters "rogue" hosts that return too many invalid results to reduce server load. Routinely they will receive one task to check if the issue has been resolved.

Which projects are working atm?

Try a different LAN card, port and cable. Check the mac address (yes, mac address, maybe it's used for id on project side), make sure it actually is different. Your upload speed is remarkably slow and, again, packet loss is an issue.

I don't trust these various SSDs, frankly, because you might have included the same issue. And some issues (hw, mac address) do not change with install.

Back on topic: I don't have any issues reaching the servers.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50915 - Posted: 18 Nov 2018 | 12:18:42 UTC

Max tasks per day is 30...32 on both machines.
But the problem is not getting new WU, its sending back the old one.

MAC are different on both machines (onboard Network card)

Only GPUgrid and wuprop on one PC, additional WCG on the other one.

again: All issues only appear during communication with gpugrid - all other internet stuff is working fine and fast.
And its also without any HDD or SSD. The live-DVD also doesn't get contact - but only to gpugrid.

The only explanation for me is that gpugrid is blocking the PCs - or the BIOS of the PCs are blocking gpugrid (which would be the strangest thing on earth).

How could i analyze the network traffic (without much knowledge in this area)?
The only error message that boinc gives me is "timeout".

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50916 - Posted: 18 Nov 2018 | 13:31:56 UTC

Good news.
After booting the SSD in the last working PC, the WUs could be sent home.
And in the same time i pushed an update on the other not working PC - now this one also is getting a connection.
Now both are uploaded.

AuxRx
Send message
Joined: 3 Jul 18
Posts: 22
Credit: 2,758,801
RAC: 0
Level
Ala
Scientific publications
wat
Message 50918 - Posted: 18 Nov 2018 | 15:04:32 UTC - in response to Message 50915.

If even your browser fails to reach the servers, I have no idea what a sensible route for troubleshooting would be.

You can enable more granular logging in BOINC (see Options >> Event log options), but that won't help much in this case. Things you could control are local to you anyway, so start by checking your router and filters, I guess?

Sorry, I am at a loss. These issues are difficult to solve over the internet.

Magiceye04
Send message
Joined: 1 Apr 09
Posts: 11
Credit: 16,689,372
RAC: 0
Level
Pro
Scientific publications
watwatwatwat
Message 50924 - Posted: 18 Nov 2018 | 23:26:54 UTC

I think that gpugrid was a little confused about my cloning of Ubuntu.
Or it was all a coincidence.
But after upgrading to Ubuntu18 ant putting the SSD into a completely different PC the problem was solved. Maybe through overconfusing the server. ;)
Now it cannot be analysed any more, everything works fine again.

Post to thread

Message boards : Server and website : general server issue on 2 of 3 PCs