pol099 rev 525 keeps crashing

Bug reports and feature requests. New features can only be added to the current development version. Bug-fixes may be back-ported.

Current release: 099 / Current development: 100
Post Reply
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

pol099 rev 525 keeps crashing

Post by AsYlum »

I'm having big problem with finding reason why pol keeps crashing.

It's from rev 525 compiled on debian (but buildtag was set to 'ubuntu') with -o0 switch, unstripped binary. POL sometimes crash randomly after few minutes after start, without players online or when i'm trying to wipe area from npcs.

I've tried to switch ai scripts to dummy ones with "return 1;" or empty "while (me) sleep(2 minutes) end" loop. No luck here. What i've found that the same data files run on 32 bit rev 525 compilation under windows are working just fine. I can wipe whole world from npcs and no crash here. But under linux it keeps crashing randomly on 64bit and 32bit binaries. I've tried running pol under debian and slackware 13. Debian was installed on vmware with 4 GB of ram and slackware is old pc box with p4 3,6Ghz HT and 4 GB of ram.

I've already tried to revert scripts to one of my old backups but no luck either. I've noticed that when i delete npcs.txt and npceqip.txt POL works fine. I can also leave npcs alone and delete storage.txt and POL will work w/o crash. But that's not an options as i want to keep my worldstate intact.

Code: Select all

Caught SIGSEGV (Segfault).  Please post the following on http://forums.polserver.com/tracker.php :
=== CUT ===
Build: POL099-2011-05-02 Break Everything Even Rudder (ubuntu - 64bit)
Last Script: pkg/mobiles/oldAI/ai/townguard.ecl PC: 467
Stack Backtrace:
[0x699e78]
[0x70fef0]
[0x515ab0]
[0x515b08]
[0x514f60]
[0x513ecb]
[0x535228]
[0x69d6bc]
[0x69d8b6]
[0x70b6ea]
[0x77d8c9]
=== CUT ===
gdb output

Code: Select all

Reading symbols from /usr/local/Ultima/dmnew/pol...done.
(gdb) l *0x699e78
0x699e78 is in segv_handler(int) (clib/strexcpt.cpp:275).
270             cerr << "Build: " << progverstr << " (" << buildtagstr << ")" << endl;
271             cerr << "Last Script: " << scripts_thread_script << " PC: " << scripts_thread_scriptPC << endl;
272         cerr << "Stack Backtrace:" << endl;
273
274         void* bt[ 200 ];
275         int n = backtrace( bt, 200 );
276             char** strings = backtrace_symbols( bt, n );
277             for (int i = 0; i < n; i++)
278             {
279                     Log( "%s\n", strings[ i ] );
(gdb) l *0x70fef0
No source file for address 0x70fef0.
(gdb) l *0x515ab0
0x515ab0 is in std::tr1::__detail::_Hashtable_iterator_base<std::pair<unsigned int const, ref_ptr<UObject> >, false>::_M_incr_bucket() (/usr/include/c++/4.4/tr1_impl/hashtable_policy.h:271).
266         _M_incr_bucket()
267         {
268           ++_M_cur_bucket;
269
270           // This loop requires the bucket array to have a non-null sentinel.
271           while (!*_M_cur_bucket)
272             ++_M_cur_bucket;
273           _M_cur_node = *_M_cur_bucket;
274         }
275
(gdb) l *0x515b08
0x515b08 is in std::tr1::__detail::_Hashtable_iterator_base<std::pair<unsigned int const, ref_ptr<UObject> >, false>::_M_incr() (/usr/include/c++/4.4/tr1_impl/hashtable_policy.h:252).
247           _M_incr()
248           {
249             _M_cur_node = _M_cur_node->_M_next;
250             if (!_M_cur_node)
251               _M_incr_bucket();
252           }
253
254           void
255           _M_incr_bucket();
256
(gdb) l *0x514f60
0x514f60 is in std::tr1::__detail::_Hashtable_iterator<std::pair<unsigned int const, ref_ptr<UObject> >, false, false>::operator++() (/usr/include/c++/4.4/tr1_impl/hashtable_policy.h:327).
322
323           _Hashtable_iterator&
324           operator++()
325           {
326             this->_M_incr();
327             return *this;
328           }
329
330           _Hashtable_iterator
331           operator++(int)
(gdb) l *0x513ecb
0x513ecb is in ObjectHash::Reap() (pol/objecthash.cpp:161).
156         while (count_this--)
157         {
158             OH_iterator save_iterator = reap_iterator;
159             ++reap_iterator;
160
161             UObject* obj = (*save_iterator).second.get();
162
163             // We want the objecthash to be the holder of the last reference to an
164             // object when it is deleted - hence the ref_counted_count() check.
165             if (obj->orphan() && obj->ref_counted_count() == 1)
(gdb) l *0x535228
0x535228 is in reap_thread() (pol/pol.cpp:1595).
1590                    {
1591                            PolLock lck;
1592                            polclock_checkin();
1593                            objecthash.Reap();
1594
1595                            for_each( dynamic_item_descriptors.begin(), dynamic_item_descriptors.end(), delete_ob<ItemDesc>() );
1596                            dynamic_item_descriptors.clear();
1597                    }
1598
1599                    threadhelp::thread_sleep_ms( 2000 );
(gdb) l *0x69d6bc
0x69d6bc is in threadhelp::run_thread(void (*)()) (clib/threadhelp.cpp:221).
216
217     void run_thread( void (*threadf)(void) )
218     {
219         // thread creator calls inc_child_thread_count before starting thread
220         try {
221             (*threadf)();
222         }
223         catch( std::exception& ex )
224         {
225                     cerr << "Thread exception: " << ex.what() <<endl;
(gdb) l *0x69d8b6
0x69d8b6 is in threadhelp::thread_stub2(void*) (clib/threadhelp.cpp:282).
277             run_thread( entry_noparam );
278
279         #ifdef _WIN32
280         _endthreadex(0);
281         #else
282         pthread_exit(NULL);
283         #endif
284             return 0;
285     }
286
(gdb) l *0x70b6ea
0x70b6ea is in start_thread (pthread_create.c:300).
295     pthread_create.c: Nie ma takiego pliku ani katalogu. (no such file or directory)
        in pthread_create.c
(gdb) l *0x77d8c9
No source file for address 0x77d8c9.
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

I'm having the exact same issue. Problem with the object reaper thread. My stack trace is pretty much the same thing.. Line numbers in objecthash.cpp are different because i tried to do some debugging/fixing to no avail.

Code: Select all

Thread [8] 5488 [core: 0] (Suspended : Signal : SIGSEGV:Segmentation fault)
        std::tr1::__detail::_Hashtable_iterator_base<std::pair<unsigned int const, ref_ptr<UObject> >, false>::_M_incr_bucket() at hashtable_policy.h:271 0x515968
        std::tr1::__detail::_Hashtable_iterator_base<std::pair<unsigned int const, ref_ptr<UObject> >, false>::_M_incr() at hashtable_policy.h:251 0x5159c0
        std::tr1::__detail::_Hashtable_iterator<std::pair<unsigned int const, ref_ptr<UObject> >, false, false>::operator++() at hashtable_policy.h:326 0x514e06
        ObjectHash::Reap() at objecthash.cpp:160 0x513dcd
        reap_thread() at pol.cpp:1,593 0x534a46
        threadhelp::run_thread() at threadhelp.cpp:221 0x69b5d4
        threadhelp::thread_stub2() at threadhelp.cpp:277 0x69b7ce
        start_thread() at pthread_create.c:300 0x708e6d
        clone() at 0x7ff1a9
        0x0
I'll try to escalate this to the other core devs.
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

Hi AsYlum,

I talked with some C++ devs and they said:
13:45 <rizlah> kevin_: Generally speaking when you iterate you want to keep the iterator valid by assigning it the return value of the relevant .erase call
So, I've done that on my shard and it's running now and I will be keeping track of any crashes. If you want to help me test / verify the fix, can you change this in objecthash.cpp?

From:

Code: Select all

objecthash.cpp:168: hash.erase( save_iterator );
To:

Code: Select all

objecthash.cpp:168: reap_iterator = hash.erase(save_iterator);
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

Sure i'm compiling it right now rev 526 + fix and time will tell :) Thank you.

With present worldsave data crash normally occured after 10-15 minutes, now it crashed after around 34.

Code: Select all

[03/26 22:15:02] Caught SIGSEGV (Segfault).  Please post the following on http://forums.polserver.com/tracker.php :
=== CUT ===
Build: POL099-2011-05-02 Break Everything Even Rudder (ubuntu - 64bit)
Last Script: pkg/mobiles/oldAI/ai/townguard.ecl PC: 598
Stack Backtrace:
[0x64789b]
[0x6c02e0]
[0x4e320a]
[0x4f8bd7]
[0x64dfd7]
[0x64e48a]
[0x6bbada]
[0x72dd19]
=== CUT ===
gdb output

Code: Select all

(gdb) l *0x64789b
0x64789b is in segv_handler(int) (clib/strexcpt.cpp:276).
271             cerr << "Last Script: " << scripts_thread_script << " PC: " << scripts_thread_scriptPC << endl;
272         cerr << "Stack Backtrace:" << endl;
273
274         void* bt[ 200 ];
275         int n = backtrace( bt, 200 );
276             char** strings = backtrace_symbols( bt, n );
277             for (int i = 0; i < n; i++)
278             {
279                     Log( "%s\n", strings[ i ] );
280                     cerr << strings[ i ] << endl;
(gdb) l *0x6c02e0
No source file for address 0x6c02e0.
(gdb) l *0x4e320a
0x4e320a is in ObjectHash::Reap() (/usr/include/c++/4.4/tr1_impl/hashtable_policy.h:271).
266         _M_incr_bucket()
267         {
268           ++_M_cur_bucket;
269
270           // This loop requires the bucket array to have a non-null sentinel.
271           while (!*_M_cur_bucket)
272             ++_M_cur_bucket;
273           _M_cur_node = *_M_cur_bucket;
274         }
275
(gdb) l *0x4f8bd7
0x4f8bd7 is in reap_thread() (/usr/include/c++/4.4/bits/stl_iterator.h:686).
681           typedef typename iterator_traits<_Iterator>::pointer   pointer;
682
683           __normal_iterator() : _M_current(_Iterator()) { }
684
685           explicit
686           __normal_iterator(const _Iterator& __i) : _M_current(__i) { }
687
688           // Allow iterator to const_iterator conversion
689           template<typename _Iter>
690             __normal_iterator(const __normal_iterator<_Iter,
(gdb) l *0x64dfd7
0x64dfd7 is in threadhelp::run_thread(void (*)()) (clib/threadhelp.cpp:144).
139         res = pthread_attr_setdetachstate( &create_detached_attr, PTHREAD_CREATE_DETACHED );
140         passert_always( res == 0 );
141     }
142     void threadsem_lock()
143     {
144         pid_t pid = getpid();
145         int res = pthread_mutex_lock( &threadsem );
146         if (res != 0)
147         {
148             Log( "pthread_mutex_lock: res=%d, pid=%d\n", res, pid );
(gdb) l *0x64e48a
0x64e48a is in threadhelp::thread_stub2(void*) (clib/threadhelp.cpp:277).
272         td = NULL;
273
274         if (entry != NULL)
275             run_thread( entry, arg );
276         else
277             run_thread( entry_noparam );
278
279         #ifdef _WIN32
280         _endthreadex(0);
281         #else
(gdb) l *0x6bbada
0x6bbada is in start_thread (pthread_create.c:300).
295     pthread_create.c: Nie ma takiego pliku ani katalogu.
        in pthread_create.c
(gdb) l *0x72dd19
No source file for address 0x72dd19.
(gdb)
I can try to generate 32 bit binary tomorrow. For now i'm testing only with 64bit.
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

Kevin the current code is valid I'm sorry :)
as you can see the iterator for the loop doesn't get directly erased, the iterator gets stored into safe iterator and then steps one up. your change will not make the iterator safer, its even the opposite, at least for older compilers. cause returning the next valid iterator for map or unordered_map is non standard, nowadays its standard but when the code was created I think not. eg for std::vector it was always standard to return the next iterator but for most other containers it was void.
the loop looks maybe a bit weird, but its valid and only not really today coding standard.
so the loop is not our problem its only a symptom that somewhere something gets invalid.
we could secure this function more but that's no real fix, I would like to find the place where we create the invalid pointer and fix it there.

sadly currently I'm really really busy with coding at work, so don't count on me the next weeks. :(

oh and its no 64bit specific problem, at least I would be really really amazed :p
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

Sorry AsYlum. Yeah, that didn't seem to fix the error. Buuut I believe i have found the issue now.

I was able to cause a consistent crash using a simple CreateItemAtLocation() loop that would always crash on the ++reap_iterator line after some time.

Looking at the unordered_map<> docs for TR1 spec (which is used by gcc >=4.3), located http://www.open-std.org/jtc1/sc22/wg21/ ... /n1745.pdf , pg 94
12. The insert members ... may invalidate all iterators to the container. The erase members shall invalidate only
iterators and references to the erased elements.

13. The insert members shall not affect the validity of iterators if (N+n) < z * B, where N is the number of elements in
the container prior to the insert operation, n is the number of elements inserted, B is the container’s bucket count, and z
is the container’s maximum load factor.

In other words: #12 says inserting may invalidate existing iterators and #13 provides the conditions in which existing iterators cannot be invalidated.

The combination of #12 and #13 sorta explains why the error is only now showing up (for me in particular): now that the shard actually has players, there are a lot of entries for objects in ObjectHash::hash, and therefore when ObjectHash::Insert() gets called, the reap_iterator may become invalidated.

In contrast, the C++ standards spec is a little different, draft located http://www.open-std.org/JTC1/SC22/WG21/ ... /n3376.pdf , pg 719:
9. The insert ... member shall not affect the validity of iterators and references to the container,
and the erase members shall invalidate only iterators and references to the erased elements
So, I think instead of using unordered_map<>, we should just use the map<> STL container. I've switched this in my code and ran CreateItemAtLocation() loop script, and it is no longer crashing. Thoughts Turley?

AsYlum, can you try using a map<> instead of unordered_map<> in objecthash.h and see if that solves your issues as well?
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

mmh OK that could be an explanation. it was hash_map since years till gcc removed it. and the nearest replacement was unordered_map. changing it to map would be a huge slowdown. if it really helps if you replace it with map I think I have to find a new algorithm for unordered_map :p
can you post the code that pol crashes so I can test some stuffs when I have time.
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

Sure. It's pretty simple. Could probably combine the two...

Code: Select all

// crash.src
program crash()
	var i;
	for (i := 0; 1; i := i+1)
		start_script("crash-createitem",i);
	endfor
	Unload_Scripts();
endprogram

// crash-createitem.src
use uo;
program createitem(i)
	print("creating item "+i+": "+CreateItemAtLocation(0,0,0, 0xeed, 1));
endprogram
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

So, i pinpointed the moment where any existing iterators would be invalidated by adding

Code: Select all

	if (hash.size() >= hash.bucket_count() * hash.max_load_factor()) {
		cout<<"HASH ITERATOR INVALIDATED ("<<hash.size()<<" >= B = "<<hash.bucket_count()<<" * z = "<<hash.max_load_factor()<<endl; flush(cout);
		invalidated = true;
	}
to ObjectHash::Insert().

You can see in the following snippet that when the reaper runs after reap_iterator is invalidated, segfault occurs. (you can ignore the "iterator invalidated" messages during world save load; it just shows that the hash.bucket_count() is increasing):

First, with some real-world shard data with a lot of objects (takes longer to crash since hash.bucket_count() is already high):

Code: Select all

  data/pcs.txt:
HASH ITERATOR INVALIDATED (11 >= B = 11 * z = 1)
New: B = 23, z = 1
HASH ITERATOR INVALIDATED (23 >= B = 23 * z = 1)
New: B = 47, z = 1
HASH ITERATOR INVALIDATED (47 >= B = 47 * z = 1)
New: B = 97, z = 1
HASH ITERATOR INVALIDATED (97 >= B = 97 * z = 1)
New: B = 199, z = 1
 181 elements in 65 ms.
  data/pcequip.txt:
HASH ITERATOR INVALIDATED (199 >= B = 199 * z = 1)
New: B = 409, z = 1
HASH ITERATOR INVALIDATED (409 >= B = 409 * z = 1)
New: B = 823, z = 1
HASH ITERATOR INVALIDATED (823 >= B = 823 * z = 1)
New: B = 1741, z = 1
HASH ITERATOR INVALIDATED (1741 >= B = 1741 * z = 1)
New: B = 3739, z = 1
 1566 elements in 267 ms.
  data/npcs.txt:..HASH ITERATOR INVALIDATED (3739 >= B = 3739 * z = 1)
New: B = 7517, z = 1
HASH ITERATOR INVALIDATED (7517 >= B = 7517 * z = 1)
New: B = 15173, z = 1
 11991 elements in 7197 ms.
data/npcequip.txt: 
 HASH ITERATOR INVALIDATED (15173 >= B = 15173 * z = 1)
 New: B = 30727, z = 1
 1925 elements in 705 ms.
  data/items.txt:.... 4123 elements in 263 ms.
  data/multis.txt: 43 elements in 10 ms.
  data/storage.txt:.......... 10800 elements in 580 ms.
  data/parties.txt: 0 elements in 0 ms.
Done! 9116 milliseconds.
HASH ITERATOR INVALIDATED (30727 >= B = 30727 * z = 1)
New: B = 62233, z = 1
checkpoint: running start scripts
.....

Reaper running
Reaper finished
Reaper running
Reaper finished
scripts/crash.ecl RUNNING.
Reaper running
Reaper finished
[03/29 19:21:51 log/script.log] Runaway script[35831]: scripts/crash.ecl (20000 cycles)
Reaper running
Reaper finished
[03/29 19:21:53 log/script.log] Runaway script[35831]: scripts/crash.ecl (60000 cycles)
Reaper running
Reaper finished
[03/29 19:21:57 log/script.log] Runaway script[35831]: scripts/crash.ecl (100000 cycles)
Reaper running
Reaper finished
[03/29 19:21:59 log/script.log] Runaway script[35831]: scripts/crash.ecl (140000 cycles)
Reaper running
Reaper finished
[03/29 19:22:01 log/script.log] Runaway script[35831]: scripts/crash.ecl (180000 cycles)
Reaper running
Reaper finished
[03/29 19:22:03 log/script.log] Runaway script[35831]: scripts/crash.ecl (200000 cycles)
[03/29 19:22:04 log/script.log] Runaway script[35831]: scripts/crash.ecl (220000 cycles)
Reaper running
Reaper finished
[03/29 19:22:05 log/script.log] Runaway script[35831]: scripts/crash.ecl (240000 cycles)
Reaper running409 * z = 1
New: B = 823, z = 1
Reaper running
Caught SIGSEGV (Segfault).  Please post 
Reaper finished
[03/29 19:22:10 log/script.log] Runaway script[35831]: scripts/crash.ecl (260000 cycles)
[03/29 19:22:11 log/script.log] Runaway script[35831]: scripts/crash.ecl (280000 cycles)
Reaper running
Reaper finished
[03/29 19:22:12 log/script.log] Runaway script[35831]: scripts/crash.ecl (300000 cycles)
[03/29 19:22:13 log/script.log] Runaway script[35831]: scripts/crash.ecl (320000 cycles)
Reaper running
Reaper finished
[03/29 19:22:14 log/script.log] Runaway script[35831]: scripts/crash.ecl (340000 cycles)
[03/29 19:22:15 log/script.log] Runaway script[35831]: scripts/crash.ecl (360000 cycles)
Reaper running
Reaper finished
[03/29 19:22:17 log/script.log] Runaway script[35831]: scripts/crash.ecl (380000 cycles)
Reaper running
Reaper finished
[03/29 19:22:18 log/script.log] Runaway script[35831]: scripts/crash.ecl (400000 cycles)
HASH ITERATOR INVALIDATED (62233 >= B = 62233 * z = 1)
New: B = 126271, z = 1
Reaper running
Caught SIGSEGV (Segfault).  Please post the following on http://forums.polserver.com/tracker.php :
=== CUT ===
Second, with a pretty empty shard. It crashed without even me running the "object creator crash script", as some startup packages were creating objects:

Code: Select all

checkpoint: reading data
  data/pol.txt: 2 elements in 0 ms.
  data/objects.txt: 0 elements in 0 ms.
  data/pcs.txt:HASH ITERATOR INVALIDATED (11 >= B = 11 * z = 1
New: B = 23, z = 1
HASH ITERATOR INVALIDATED (23 >= B = 23 * z = 1
New: B = 47, z = 1
 39 elements in 16 ms.
  data/parties.txt: 0 elements in 0 ms.
Done! 21 milliseconds.
HASH ITERATOR INVALIDATED (47 >= B = 47 * z = 1
New: B = 97, z = 1
HASH ITERATOR INVALIDATED (97 >= B = 97 * z = 1
New: B = 199, z = 1
HASH ITERATOR INVALIDATED (199 >= B = 199 * z = 1
New: B = 409, z = 1409 * z = 1
New: B = 823, z = 1
Reaper running
Caught SIGSEGV (Segfault).  Please post 
checkpoint: running start scripts
Running startup script.
.
----------------------------
checkpoint: start threadstatus thread
Reaper runningcheckpoint: start clienttransmit thread

Starting Aux Listener (:ircbot:ircbot, port 42500)
Starting Aux Listener (:sql:connection, port 1702)
Listening for HTTP requests on port 5000
Reaper finished
Starting Aux Listener (:accounts:auxsvc/newAccount, port 5666)
HASH ITERATOR INVALIDATED (409 >= B = 409 * z = 1
New: B = 823, z = 1
Reaper running
Caught SIGSEGV (Segfault).  Please post the following on http://forums.polserver.com/tracker.php :
=== CUT ===
Build: POL099-2011-05-02 Break Everything Even Rudder (debian - 64bit)
Last Script: pkg/utils/control/initializer/cmdbarmenus.ecl PC: 330
Stack Backtrace:
So, if we really must stick with unordered_hash, i think the only option we have is to reset ObjectHash::reap_iterator to hash.begin() on iterator invalidation. I did add a "reap_iterator = hash.begin();" on rehash and it no longer crashes; however, this may not be optimal (eg, if a lot of objects are created, it may never get to the end of the hash), but the max bucket count does seem to be growing nearly exponentially (almost doubles with every rehash). You could specify a higher initial bucket count, but you would still end up reaching that rehash point (eventually, unless initial bucket count * load factor is large like 0xFFFFFFF or such, although max values are implementation dependent)

... or re-write a new hash class ;p
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

I'll check map vs unordered_map soon. Meanwhile i've found something like this http://code.google.com/p/sparsehash/ Looks like another hash_map implenentation.

edit: for now pol is running w/o players for almost 2 hours without crash.
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

i tried once map and it was a real pain. cause after each insert the whole map gets sorted. and we do not care about the sorting, we want fast removing and appending, plus good searching. that's why we used hash_map or unordered_map. my current idea is to replace it with a list. we loose the benefit of very fast access(with this I mean search). but appending and removing is very fast.
if you have time I would suggest you make a small rewrite and replace the container to list and test this. I would love to see the result.
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

rev 531: changed objecthash to std::list to fix crash with invalid iterator. No real time to test it, but looks ok...
Thanks Turley! I'm going to test this right now :)
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

Turley,

Why did you choose list over hash_map? It seems that a list is even more inefficient, as you have to do a lot of "lets loop from beginning to end to find something" which is more inefficient than hash maps (as long as your hash map has enough buckets)

edit: we have had two different people (agatha, asylum) say bootup is at crawling speeds and doesn't even complete (within any acceptable timeframe). i think the list is definitely a no-go. hash_map might just have to do.
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

i knew that its not very effient, but on my test setup it worked, so I simply implemented it :) at least the solution didn't crash a shard :p
maybe we should simply change the reap function to iter every time over the complete hash, and change the timing to every x minutes instead of every 2 seconds.
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

Yup lists solution is very slow. For now i'm sticking with set+map and that works. 5 days w/o crash and it's way faster than list version. So maybe for now we can leave linux version with simple map+set. Next when you'll find some free time we can test new solutions?
RusseL
Forum Regular
Posts: 375
Joined: Fri Feb 20, 2009 8:30 pm

Re: pol099 rev 525 keeps crashing

Post by RusseL »

loading takes 15-20 minutes on rev531 :D :D
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: pol099 rev 525 keeps crashing

Post by kevin »

Turley wrote:maybe we should simply change the reap function to iter every time over the complete hash, and change the timing to every x minutes instead of every 2 seconds.
if we do it that way, we will need to use mutex to lock out the ObjectHash::Insert() while iterating the map -- otherwise, we'll run into the same problem. Not sure if that is better than just using hash_map because iterating over several thousand entries might take too much time to lock out other threads.
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

I think that with sparsehash we can achive both: add lock to thread and be able to get top-performance or we can probably stick to unordered_map but with locking and scanning whole objecthash to avoid interator invalidation.

http://blog.aggregateknowledge.com/2011 ... parsehash/
http://incise.org/hash-table-benchmarks.html
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

No need for a lock, the reap function is already locked.
Turley
POL Developer
Posts: 670
Joined: Sun Feb 05, 2006 4:45 am

Re: pol099 rev 525 keeps crashing

Post by Turley »

Has anyone tested how fast/slow it is with a simple std::map ?
Since all other containers will invalidate on insert.
Big downside is the sorting after insert, but since most of the time the inserted key will be at the end, could be a speedup to give the position hint on insert.
something like
hs.insert(hs.end(),make_pair( obj->serial,UObjectRef(obj));
User avatar
AsYlum
Grandmaster Poster
Posts: 109
Joined: Sun Feb 05, 2006 5:24 am
Location: Poland

Re: pol099 rev 525 keeps crashing

Post by AsYlum »

Well i've switched to ordinary map+set around april 8th. Server looks stable and so far we didn't experienced any major slowdowns. Few smaller ones but i can't tell if they're somehow related to objecthash and map container.
RusseL
Forum Regular
Posts: 375
Joined: Fri Feb 20, 2009 8:30 pm

Re: pol099 rev 525 keeps crashing

Post by RusseL »

whats now with rev353 ? is it stable and fast as before, or you still working on it (this bug i mean)?
Agata
Journeyman Poster
Posts: 63
Joined: Sun Oct 30, 2011 6:33 am

Re: pol099 rev 525 keeps crashing

Post by Agata »

353? You mean 535, right? It's stable and fast to me. It loads all of my world data in less than 90 seconds. Much better than before, when I could go take a shower, wash my hair, comb it, and it wouldn't be done loading yet.
Post Reply