Advanced search

Message boards : Server and website : Server down

Author Message
Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1919
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8377 - Posted: 14 Apr 2009 | 8:25:12 UTC

So, as you notice, we had a powercut just during holiday which could not be fixed until today.
We are still in recovery phase.

gdf

TomaszPawel
Send message
Joined: 18 Aug 08
Posts: 121
Credit: 59,836,411
RAC: 0
Level
Thr
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8378 - Posted: 14 Apr 2009 | 8:56:31 UTC - in response to Message 8377.

I am happy that You'r back.

:)

Profile Sandro
Send message
Joined: 19 Aug 08
Posts: 22
Credit: 3,660,304
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8379 - Posted: 14 Apr 2009 | 9:25:25 UTC - in response to Message 8377.

but you have still something to do with the server ;)


Di 14 Apr 2009 11:10:33 CEST|GPUGRID|[error] Error reported by file upload server: Server is out of disk space

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8395 - Posted: 14 Apr 2009 | 13:07:00 UTC - in response to Message 8377.

A gentle question, is work being made available yet? The server status page appears normal with increasing WUs apparently available, and all server indicators showing "running" and I get:

"14/04/2009 14:01:56 GPUGRID Message from server: No work sent"

as work request response. I well appreciate lots of work may well be continuing in recovery from the holiday dramas - I would hate to find after an extended period aka tomorrow, that in fact the surface indicators are right, work is available, and its a fault at my end.

Regards
Zy

localizer
Send message
Joined: 17 Apr 08
Posts: 113
Credit: 1,656,514,857
RAC: 0
Level
His
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8396 - Posted: 14 Apr 2009 | 13:13:40 UTC - in response to Message 8395.

..... same here. When the servers came back up each of my boxes grabbed 2 WUs - since then I get the 'no work sent' message. Perhaps there is more than a network failure to recover from and hence the extra time until work is available.

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8398 - Posted: 14 Apr 2009 | 13:20:46 UTC - in response to Message 8396.

Thanks for that - if you are getting the same - then its highly likely there is recovery work ongoing, or the pipe is clogged up with update/work requests from similarly dry machines.

Either way, time to put the kettle on again and comtemplate "the meaning of life" :)

Regards
Zy

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1919
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8402 - Posted: 14 Apr 2009 | 14:23:00 UTC - in response to Message 8398.

Still work in progress!
The server is receiving all the workunits which is saturating the bandwidth.
We then need to send new ones, because some steps have been lost and we need all of them.

gdf

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8408 - Posted: 14 Apr 2009 | 14:57:59 UTC - in response to Message 8402.

Okie Doke - Many Thanks :)

Regards
Zy

Profile GDF
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Send message
Joined: 14 Mar 07
Posts: 1919
Credit: 629,356
RAC: 0
Level
Gly
Scientific publications
watwatwatwatwat
Message 8417 - Posted: 14 Apr 2009 | 17:07:45 UTC - in response to Message 8408.

Even now, the system has not recovered and there are hosts uploading results.
At least, this is what it seems, with WUs not being canceled.

The problem was at the end a network failure. The net interface simply hanged.
We are considering moving the server somewhere where we can physically access it if needed.

Also, we are considering buying a new server with more redundancy, like double network, double power supply and so on.

I will inform you when things are back to normal.

GDF.

Jere Hakanen
Send message
Joined: 22 Jan 09
Posts: 2
Credit: 107,221,255
RAC: 0
Level
Cys
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8419 - Posted: 14 Apr 2009 | 17:21:59 UTC

This is what I get now:

stdoutdae.txt wrote:
14-Apr-2009 20:10:22 [GPUGRID] Requesting new tasks
14-Apr-2009 20:10:27 [GPUGRID] Scheduler request completed: got 0 new tasks
14-Apr-2009 20:10:27 [GPUGRID] Message from server: No work sent

14-Apr-2009 20:18:57 [GPUGRID] Requesting new tasks
14-Apr-2009 20:19:02 [GPUGRID] Scheduler request completed: got 0 new tasks
14-Apr-2009 20:19:02 [GPUGRID] Message from server: Server can't parse configuration file

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8421 - Posted: 14 Apr 2009 | 17:35:50 UTC - in response to Message 8417.

I've just started getting the message below - wierd - aware it's probably a symptom of the ongoing recovery, posted for info so you know whats happening at this end of life.

14/04/2009 18:32:38 GPUGRID Sending scheduler request: Requested by user.
14/04/2009 18:32:38 GPUGRID Requesting new tasks
14/04/2009 18:32:43 GPUGRID Scheduler request completed: got 0 new tasks
14/04/2009 18:32:43 GPUGRID Message from server: No work sent
14/04/2009 18:32:43 GPUGRID Message from server: CUDA app exists for Full-atom molecular dynamics but no CUDA work requested
14/04/2009 18:32:43 GPUGRID Message from server: Full-atom molecular dynamics on Cell processor is not available for your type of computer.


Regards
Zy

Eric
Send message
Joined: 17 Nov 08
Posts: 13
Credit: 15,272,287
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8426 - Posted: 14 Apr 2009 | 18:56:47 UTC - in response to Message 8421.

I am also getting that message. I have also recieved credits for 2 of the 3 WU's that I had completed. It will probably take a few days for things to return to a sense of normality. I am happy to see the sytem back up, if not runing too well yet. Still, the work on the server is progressing and hopefully soon we shall see work appearing.

Eric

Profile Zydor
Send message
Joined: 8 Feb 09
Posts: 252
Credit: 1,309,451
RAC: 0
Level
Ala
Scientific publications
watwatwatwat
Message 8431 - Posted: 14 Apr 2009 | 19:21:05 UTC - in response to Message 8421.

Sorted - I needed to suspend SETI CUDA else it would not download a GPUGRID WU. Two WUs downloading now, will be crunching them shortly :)

Regards
Zy

Eric
Send message
Joined: 17 Nov 08
Posts: 13
Credit: 15,272,287
RAC: 0
Level
Pro
Scientific publications
watwatwatwatwatwatwatwatwatwatwatwatwatwatwat
Message 8435 - Posted: 14 Apr 2009 | 20:13:48 UTC - in response to Message 8431.

I have also recieved 2 WU's. Both are working well.

Eric

Post to thread

Message boards : Server and website : Server down