Mac OS X Problems

John Philip

Just Member
hi,
Have a problem on a small network, with an OSX server:
4 Heavy-duty G4 workstations running OS9 and X, 4 Medium duty Workstations (G4) running OS9, Admin sector with 3 Mac's running OS9 (all G3's or 4's).
Cat 5e network, a 100Mbit 24-ports Asanté swith (with the admin an the medium machinery attached), uplinked to a 7- port Asante Giga Switch (with the server and the heavy workstations attached).
Have been getting constant server freezes, without any warnings to the workstations or on the server itself. Server seems to be running even though it has ceased to respond to mouse or keyboard. Freezes from 1 to 6 times a day.
Excerpts from the tail of the server log:
2002-05-16 10:13:54 CEST Started child "/usr/sbin/slpd" as pid 432.
2002-05-16 10:13:54 CEST Started child "/usr/sbin/sambadmind" as pid 433.
2002-05-16 10:13:54 CEST Started child "/usr/sbin/PrintServiceMonitor" as pid 434.
2002-05-16 10:13:54 CEST Started child "/usr/sbin/serveradmind" as pid 435.
2002-05-16 10:13:54 CEST Automatic reboot timer enabled.
2002-05-16 10:13:54 CEST Reaped child process 432 ("/usr/sbin/slpd"); quit with exit status 253.
2002-05-16 10:13:54 CEST Process "/usr/sbin/slpd" respawning too rapidly!

After which the rest is silence..

Any ideas out there ? Any help would be much appreciated..

Kind regards

John Philip
 
It looks a problem with slpd (the SLP daemon)

Have you checked the client and router compatibility?
<<SLP on MacOS 9.0 will use IP multicast. If your network uses routers that are not capable of IP multicast, you will need to upgrade them or set up tunneling.>>

<<If a service stops working almost immediately after being started (the current default is less than 10 seconds), watchdog marks that service as "faulty" and stops trying to restart it. This is a safety measure to prevent a situation in which a failing process restarts indefinitely.>>


action:
How to monitor a service and what to do if it stops working. The possible action options are:

* off - If the specified service is not running, do not start it. If it is running, stop it.
* boot - Only start the service when watchdog starts. If it stops after that, it is not restarted.
* bootwait - Currently this behaves the same way as boot.
* respawn - Watchdog will restart the application if it stops working.
* now - Watchdog ignores this entry at startup and will actually change the entry in the configuration file to "off". If you change an entry to "now" and force watchdog to reread its configuration file, it will treat that entry as if it were marked "respawn".

Check the etc/slpa.conf file

Maybe you get a better idea of what's wrong.


Cheers...
 
After checking the System profiler - I discovered that The log starts like this:
--
Version of ASP that generated this report = 2.7
(note:this string only shows up in pre-final or debug versions)
--
..and then further down in the list - most of what has to do with networking is designated 'Dev' for developer - and the rest as 'GM' - possible Golden Master.

The problem is that it is the third server on the same site and the third OSX server sw. pack. So the problem can only stem from Apples Software Updater..

Get back with a comment from Apple - if they care to give one..

John Philip
 
Well, well.
First Apple (verbally) expressed concern for the matter - and claimed to have analyzed the logs to show 2 (two) problems:
1) A networking protocol problem
and
2) A problem with the file system.
--
After careful consideration, Apple has returned (in writing):
Basically the error is a NetInfo failure. That happens when NI hang
which is when the filesystem does.
They suggest that try it without both scsi cards and using ATA drive...
There have been a lot of complaints about SCSI hanging and a t one point
Adaptec stated the 39160 and 29160 were not compatible with MacOS X...
Since then they released newer drivers than the ones we included with OS
X server. So maybe worth a try to update them...
--
So - Apple has no part in the problem!
However, the reply raises a few other questions:
1) The server acted the same way, when the filesystem was on the internal disk for a period.
2) If the Adaptec drivers incl. in OSX at one point was 'denounced' by Adaptec, why has Apple not released som sort of technote to this effect?
3) '..maybe worth an update...' is just about as uncommitted an answer as possible.
4) Why should it take local Apple Support, European Apple Support and finally some bigbrain in Cupertino so long to figure it out, if the Adaptec problem has been known to all of them for a while?

Probably the filesystem will feel better by an upgrade - but there's no definite indication that this will remedy the problem

PS.: OSX server version 10.1.3 worked a lot better with only one breakdown a week as opposed to the versions on either side of this - that has resulted in breakdowns 3-4 times a day (on a good day) - which is also a bit worrying. Does the Adaptec drives act less faulty on 10.1.3 ? I wonder...

John Philip

sigh, sigh - and shame on Apple

John Philip
 
John Philip,

Unfortunately, the Adaptec SCSI drivers have been the source of many problems as they were a little late to release new drivers for MacOS X.

But, Adaptec for long has been supporting the Mac. And I wonder whom is to blame in this case.

Yes, Apple easily could have added a technote to this effect, and more...really.

But anyhow, now, at least we know.


Cheers...
 
The essence is that I do not believe in Apple's explanation:

First the server freezes have been more or less constant - Adaptec or no Adaptec.
The only time there has been a significant smaller number of server freezes has been the period ver. 10.1.3 has been on the machine.
At that point the freezes decreased from 3-4 or more a day to 1-4 times a week.
The Adaptec controlled filesystem has been off and the working data has been copied to the internal disk - WITHOUT any significant improvement.

Secondly the Cupertino experts first stalled over the System profiler readout, stating that most of the components in 10.1.5 was 'Dev.', 'GM' or 'Beta' - and only a small portion of the OS was noted as being 'Finished version'.
In fact they asked where the software had come from - and we were ready to swear that the only source of the software was Apple Software Updater system - which they did not like at all, and stated they would look into that and get back to us (they did not - of course).

Thirdly the Cupertino experts got back stating that there were two significant problems: 1) A Networking problem and 2) A file system problem.
The networking problem, we already knew was there as the logs showed that the first error almost always came from the networking part of the OS - after which a multitude of errors tumbled the whole system.

The conclusion seems to be that Apple selects to point at the filesystem, and thereby Adaptec - and in doing so avoids any responsibility on their part of the problem.
Again, doing so by stating that it might be worth a try to update the drivers, seems to me to be a rather diplomatic and unspecific way of diagnosing the problem.

Ofcourse the filesystem and adaptec will get it's upgrading - but at the moment I am afraid it will not make the problem go away.

I'll get back after having tried this.

JP
 
Originally posted by sao
John Philip,

I understand. Good luck and...cross your fingers.

Let me know.

Cheers...
[/QUOTE

I feel I owe 'The Forum' an update:
My customer lives with the 1-3 times a day server breakdowns.
As he is an optimistic fellow, he has decided to exchange his present OS X Server with an Apple X-serve.
This product will arrive here (probably) end this week.
We will do as much 'laboratory testing' as possible here - and then proceed to set it up at his location - probably end this month.
Hopefully the X-serve will behave itself...
Anyway, we will configure and set it up only with Apple equipment and according to Apple's spec's - so if a/several problems should arise, there will only be one supplier to discuss it with - but hopefully not.

As an addendum to the story: The first G4 set up as OSX server and having all the problems, I have described so far, was taken 'home' by my company.
We then had our Autorized Apple Repair facilities to check the hardware out - without finding anything wrong.
We sold it to a 'consultant' that reformatted it totally, stuck in a new Adaptec board and a fresh set of disks - updated everything - and lo and behold - he has just reported back that the server mysteriously goes down 5-6 times a day...

Kind regards

John Philip
DK
 
Back
Top