system load wirdness

Here you can post threads specific to the current release of the core (099)

Moderator: POL Developer

Post Reply
User avatar
andenixa
Grandmaster Poster
Posts: 105
Joined: Tue Nov 09, 2010 1:33 am

system load wirdness

Post by andenixa »

Hello everyone,

I wonder if there is a definate way to know the reason of a certain system load.
We usually have a modest system loads, approx. 10-15% evening times. But sometimes it jumps 100% and stays that way until the server is restarted. The wierdness is that there is no gradual growth, it either 10-15% or 100%. The script_profiles doesn't reveal anything unusual (are those even represent the real picture)? I tried killing the scrips with the top instructions percent, still the sysload doesn't respond to that. I wonder if there are some steps can be done to clarify the picture.

Thank a lot in advance.
User avatar
CWO
POL Expert
Posts: 1159
Joined: Sat Feb 04, 2006 5:49 pm
Location: Chicago, IL USA

Re: system load wirdness

Post by CWO »

what you do is clear the profile counters with POLCore().clear_script_profile_counters(), wait 60 seconds, then do script profiles. This will tell you what is doing the most instructions in the last minute.
User avatar
CWO
POL Expert
Posts: 1159
Joined: Sat Feb 04, 2006 5:49 pm
Location: Chicago, IL USA

Re: system load wirdness

Post by CWO »

Here's what I use...

It clears the script profiles, waits 60 seconds, then generates a webpage listing all scripts and their instructions, invocations, instructions per invocation, and percentage in the last 60 seconds from most to least. Do this when your server is taking normal load and compare it to when the server is at 100% load.

.scriptstats

Code: Select all

use uo;
use os;

const WEBADDRESS := "";//Put your webserver address, port, and script address here. Ex: "http://server.myserver.com:80/scriptstats.ecl"

program scriptstats(who, text)
	var waittime := CInt(text)*60;
	if (!waittime)
		waittime := 60;
	endif
	SendSysMessage(who, "Clearing script profile counters.");
	Polcore().clear_script_profile_counters();
	SendSysMessage(who, "Waiting " + CStr(waittime)+ " seconds.");
	Sleep(waittime);
	var profiles := PolCore().script_profiles;
	var orderone := dictionary;
	foreach thing in profiles
		if (!thing.instr)
			continue;
		endif
		var instructions := thing.instr;
		while (orderone[CInt(instructions)])
			instructions := instructions+1;
		endwhile
		orderone[CInt(instructions)] := thing;
		sleepms(5);
	endforeach
	var ordertwo := orderone.keys();
	ordertwo.reverse();
	var thearray := array;
	var counting := 0;
	foreach number in ordertwo
		counting := counting + 1;
		thearray[counting] := orderone[CInt(number)];
		sleepms(5);
	endforeach
	SetGlobalProperty("#Scriptcycles", thearray);
	SendSysMessage(who, "Script logging complete.");
	OpenURL(who, WEBADDRESS);
	Sleep(30);
	EraseGlobalProperty("#Scriptcycles");
endprogram
pol/scripts/www/scriptstats.src

Code: Select all

use uo;
use http;
use os;

program scriptstats()
	WriteHTML("<html><head>");
	WriteHTML("<STYLE>a{font-family: Arial, Helvetica, sans-serif; color: #FFFFFF; text-decoration: none}</STYLE>");
	WriteHTML("<title>CG Stats</title></head><body bgcolor=\"#000000\" text=\"#FFFFFF\">");
	WriteHTML("<H1>Status:</H1>");
	var pol:=polcore();
	var uptime:=pol.uptime;
	var hours:=uptime/60/60;
	var minutes:=uptime/60%60;
	var seconds:=uptime%60;
	WriteHTML("Online: "+CStr(EnumerateOnlineCharacters().size())+"<BR>");
	WriteHTML("Items: "+pol.itemcount+"<BR>");
	WriteHTML("mobiles: "+pol.mobilecount+"<BR>");
	WriteHTML("Sysload: "+pol.sysload+"<BR>");
	WriteHTML("Severity: "+pol.sysload_severity+"<BR>");
	WriteHTML("Events/min: " +pol.events_per_min+ "<BR>");
	WriteHTML("skillchecks/min: " +pol.skill_checks_per_min+ "<BR>");
	WriteHTML("combatops/min: " +pol.combat_operations_per_min+ "<BR>");
	WriteHTML("script errors/min: " + pol.error_creations_per_min+ "<BR>");
	WriteHTML("Tasks on time/min: " +pol.tasks_ontime_per_min+ "<BR>");
	WriteHTML("Tasks late/min: " +pol.tasks_late_per_min+ "<BR>");
	WriteHTML("Tasks late ticks/min: " +pol.tasks_late_ticks_per_min+ "<BR>");
	WriteHTML("Instructions/min: " +pol.instr_per_min+ "<BR>");
	WriteHTML("Uptime: "+hours+":"+minutes+":"+seconds+"<BR>");
	WriteHTML("<HR>");

	var sent:=pol.bytes_sent;
	var recvd:=pol.bytes_received;
	WriteHTML("Bytes Sent: "+sent+"<BR>");
	WriteHTML("Bytes Received: "+recvd+"<BR>");

	WriteHTML("Average kb/s: "+(sent/1024)/uptime+"<BR>");
	WriteHTML("MB per month: "+(sent/1048576)/hours*744+"<BR>");
	WriteHTML("<HR>");

	WriteHTML("Currently Running Scripts");
	var scriptcycles := GetGlobalProperty("#Scriptcycles");
	If (!scriptcycles)
		WriteHTML("Unavailable... This is only available 30 seconds after the command is done in game.");
	else
		WriteHTML("<table border=\"\" style=\"border-collapse: collapse\" bordercolor=\"#111111\" width=\"100%\" id=\"AutoNumber1\">");
		WriteHTML("<tr><td width=\"40%\">Script Path/Name</td><td width=\"20%\">Instructions</td><td width=\"10%\">Invocations</td><td width=\"20%\">Instr/Invoc</td><td width=\"10%\">Percent</td></tr>");
		foreach thing in scriptcycles
			WriteHTML("<tr><td width=\"40%\">"+thing.name+"</td><td width=\"20%\">"+thing.instr+"</td><td width=\"10%\">"+thing.invocations+"</td><td width=\"20%\">"+thing.instr_per_invoc+"</td><td width=\"10%\">"+thing.instr_percent+"</td></tr>");
			sleepms(2);
		endforeach
	endif
	writehtml("</table></BODY></HTML>");
endprogram
User avatar
andenixa
Grandmaster Poster
Posts: 105
Joined: Tue Nov 09, 2010 1:33 am

Re: system load wirdness

Post by andenixa »

Thanks you,
Thats the excellent approach to try.
I suspect it to be the npc, because when I killed all the npc scripts the load went back to 5%.
But the actual script doesn't show in the stat, so I think it might be not the instruction counter, but some other thing like error generation or a high number of events.
Thank you anyway for the great idea.
User avatar
CWO
POL Expert
Posts: 1159
Joined: Sat Feb 04, 2006 5:49 pm
Location: Chicago, IL USA

Re: system load wirdness

Post by CWO »

Did you do the command before or after you killed the NPC? If it was after, the command cleared all of that info before generating the info it gave to you.
User avatar
andenixa
Grandmaster Poster
Posts: 105
Joined: Tue Nov 09, 2010 1:33 am

Re: system load wirdness

Post by andenixa »

CWO wrote:Did you do the command before or after you killed the NPC? If it was after, the command cleared all of that info before generating the info it gave to you.
I did the command before killing npc scripts to gather the data while sysload is high.
RusseL
Forum Regular
Posts: 375
Joined: Fri Feb 20, 2009 8:30 pm

Re: system load wirdness

Post by RusseL »

Online: 3
Items: 57241
mobiles: 3723
Sysload: 23
Severity: 175
Events/min: 44
skillchecks/min: 9
combatops/min: 23
script errors/min: 33132
Tasks on time/min: 73
Tasks late/min: 28
Tasks late ticks/min: 73
Instructions/min: 12089356
Uptime: 1:35:14

scripts/ai/spellkillpcs.ecl 4.82968e+06 1120 4312 39.7421
scripts/ai/killpcs.ecl 2.24826e+06 772 2912 18.5003
scripts/ai/slime.ecl 1.66001e+06 372 4462 13.6597
scripts/ai/doppel.ecl 1.12014e+06 259 4324 9.21734
scripts/ai/firebreather.ecl 670262 167 4013 5.5154
in 2 minutes:
Online: 3
Items: 57241
mobiles: 3723
Sysload: 53
Severity: 11686
Events/min: 69
skillchecks/min: 6
combatops/min: 3
script errors/min: 428203
Tasks on time/min: 62
Tasks late/min: 51
Tasks late ticks/min: 10437
Instructions/min: 34087569
Uptime: 1:37:35

scripts/ai/spellkillpcs.ecl 4.87889e+06 1120 4356 40.1388
scripts/ai/killpcs.ecl 2.28635e+06 772 2961 18.8099
scripts/ai/slime.ecl 1.66831e+06 372 4484 13.7252
scripts/ai/doppel.ecl 1.11398e+06 259 4301 9.16473
scripts/ai/firebreather.ecl 675626 167 4045 5.5584
top percentage of running scripts still the same...but sysload grows
any ideas? why it is so?
xeon
Forum Regular
Posts: 338
Joined: Fri Oct 31, 2008 3:18 am
Location: Italy

Re: system load wirdness

Post by xeon »

Have you looked in log/scripts.log for some runaway scripts?
RusseL
Forum Regular
Posts: 375
Joined: Fri Feb 20, 2009 8:30 pm

Re: system load wirdness

Post by RusseL »

there aren't runaway....
User avatar
CWO
POL Expert
Posts: 1159
Joined: Sat Feb 04, 2006 5:49 pm
Location: Chicago, IL USA

Re: system load wirdness

Post by CWO »

Your instructions per minute almost tripled which is why your sysload also did the same... Were your players fighting a bunch of NPCs at the time? 4000 instructions per NPC per minute seems quite high...
xcpustyle
New User
Posts: 17
Joined: Fri Apr 27, 2012 5:55 pm

Re: system load wirdness

Post by xcpustyle »

CWO wrote:Your instructions per minute almost tripled which is why your sysload also did the same... Were your players fighting a bunch of NPCs at the time? 4000 instructions per NPC per minute seems quite high...
Now, I used this topic to fix a few things over here.

I noticed several different scripts going "runaway".

How can I fix it? Is it that important to do, so?

Bcuz over here (phenom II X4 965@3.8Ghz 4GB RAM) I got sysload=3-6% all the time at iddle.
On my SV (a mediocre dual core 2180@2.0Ghz 2GB RAM) it is standing still in something life sysload=70-80%.

Thank you!
User avatar
CWO
POL Expert
Posts: 1159
Joined: Sat Feb 04, 2006 5:49 pm
Location: Chicago, IL USA

Re: system load wirdness

Post by CWO »

Well, it is important to remedy the sysload if it is constantly at 70-80% because then your shard is vulnerable to lagging. How you fix it depends entirely on what scripts are processing the most instructions. If it's AI that is consistently running a high amount of instructions, maybe you have to look into the sleep mode area of the script (the part of the script that executes when no one is around the NPC). If it happens when people are fighting NPCs, you should look into its fight function. In the case of my shard, I found that when the NPC would go through its instructions to calculate its next move toward its target, there was a bug that made it calculate this thousands of times for each move when only one calculation was enough. It didn't show up as a runaway only because it had a wait_for_event(0) in it (this was actually what I designed .scriptstats for, to find runaway scripts that don't get reported as a runaway). If you have something that is reported as a runaway script every now and then, it's not something to be too concerned about but it's saying that there might be a little too much going on in the script too fast. I have a general performance guide posted here http://forums.polserver.com/viewtopic.p ... 432#p16432. It's not really for AI but may help you if other scripts are also running a bit too much.
Post Reply