Linux server logon problem with 099

Here you can post threads specific to the current release of the core (099)

Moderator: POL Developer

Post Reply
alexsperi
New User
Posts: 5
Joined: Wed Aug 20, 2008 12:31 am

Linux server logon problem with 099

Post by alexsperi »

Hi,

I'm doing the sourcecode migration of a 097-based shard to latest development version (099). To do this I've checked-out latest svn server source code, compiled it under Windows and recompiled all my shard code with this new pol version. After about one day of compile-and-correct process I've managed to start a local copy of my shard and connect to it! This process was pretty straightforward, so I logged into my Linux-based server and checked out latest svn pol sourcecode and recompiled it. Again the process was successful as it was the shard codebase recompile... but this time when I tried to log in my uo client, after showing me shard selection window, freezed for a while before showing me connection error window. I've double-checked servers.cfg e uoconfig.cfg but either files are correct. I tried with 4.x.x, 6.x.x and 7.x.x clients but the result is the same. Lastly I managed to capture raw tcp packets (with wireshark) and it seems that under linux the server upon shard selection does not acknowledged SYN messages generated from the client.
I'll try to outline what I recorded with wireshard in the following two code boxes removing unnecessary packet data.

Code: Select all

Linux-based 099 server

CtoS    0x82 ......
StoC    0xA8 ......
CtoS    0xA0 ......
StoC    0x8C .....
CtoS    [FIN]
CtoS    [SYN]
StoC    [FIN,ACK]
CtoS    [SYN]
CtoS    [SYN]
CtoS    [SYN]
...
CtoS    [SYN]

Code: Select all

Windows-based 099 server

CtoS    0x82 ......
StoC    0xA8 ......
CtoS    0xA0 ......
StoC    0x8C .....
CtoS    [FIN]
StoC    [FIN,ACK]
CtoS    [SYN]
StoC    [SYN,ACK]
...
Where CtoS are packets sended by the client to the server and StoC are server generated packets.
Anyone could tell me if this is a known problem (I could have messed up something in the codebase upgrade process ^_^) or help me narrow down the search of the polserver files involved in the logon process in order to try to debug this beast :P ?
alexsperi
New User
Posts: 5
Joined: Wed Aug 20, 2008 12:31 am

Re: Linux server logon problem with 099

Post by alexsperi »

Ok, some updates to my journey to madness :cheesy:

I've recompiled the pol distro in debug mode and upon connection the server segfaulted with the following message;

Code: Select all

=== CUT ===
Build: POL099-2011-05-02 Break Everything Even Rudder (ubuntu)
Last Script: pkg/systems/accounts/hook/onLogin.ecl PC: 22
Stack Backtrace:
[0x826cdd4]
[0xb7776400]
/lib/i386-linux-gnu/libc.so.6(get_nprocs+0x112) [0xa8fb2d62]
/lib/i386-linux-gnu/libc.so.6(+0x77009) [0xa8f41009]
/lib/i386-linux-gnu/libc.so.6(__libc_malloc+0x151) [0xa8f41e21]
/lib/i386-linux-gnu/libc.so.6(+0x63c78) [0xa8f2dc78]
/lib/i386-linux-gnu/libc.so.6(fopen+0x2b) [0xa8f2dd4b]
/lib/i386-linux-gnu/libnss_files.so.2(+0x4368) [0xa9073368]
/lib/i386-linux-gnu/libnss_files.so.2(_nss_files_gethostbyname_r+0x63) [0xa9073b83]
[0x83497c7]
[0x834955b]
[0x81036f2]
[0x812bc82]
[0x813d028]
[0x813e65e]
[0x81eb1d9]
[0x8183dfa]
[0x826fe2f]
[0x826ff47]
[0x82d0bfb]
[0x8346c1e]
=== CUT ===
In my previous post I forgot to mention that the Linux system is a 32 bit Ubuntu 12.04 based and my gcc version is 4.6.3. Yesterday I recompiled the server in debug mode on my home-laptop (64 bit Ubuntu 12.04) and I saw it segfaulting on libpthread. Could it be something related to gcc or glibc version?
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: Linux server logon problem with 099

Post by kevin »

If you compiled with debugging symbols, run it in gdb and wait for the crash again, then get a backtrace. Post that here, and we can look at the src lines where you are having issues.

You are running the latest svn rev?
alexsperi
New User
Posts: 5
Joined: Wed Aug 20, 2008 12:31 am

Re: Linux server logon problem with 099

Post by alexsperi »

Yes, I'm running the latest svn. Here the backtrace of gdb:

Code: Select all

Client connected from X.X.X.X (1 connections)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xa20f4b40 (LWP 7477)]
0xa983bd62 in get_nprocs () from /lib/i386-linux-gnu/libc.so.6
(gdb) backtrace
#0  0xa983bd62 in get_nprocs () from /lib/i386-linux-gnu/libc.so.6
#1  0xa97ca009 in ?? () from /lib/i386-linux-gnu/libc.so.6
#2  0xa97cae21 in malloc () from /lib/i386-linux-gnu/libc.so.6
#3  0xa97b6c78 in ?? () from /lib/i386-linux-gnu/libc.so.6
#4  0xa97b6d4b in fopen () from /lib/i386-linux-gnu/libc.so.6
#5  0xa98fc368 in ?? () from /lib/i386-linux-gnu/libnss_files.so.2
#6  0xa98fcb83 in _nss_files_gethostbyname_r () from /lib/i386-linux-gnu/libnss_files.so.2
#7  0x083497c7 in gethostbyname_r ()
#8  0x0834955b in gethostbyname ()
#9  0x081036f2 in loginserver_login (client=0xa8300560, msg=0xa8301ff0) at pol/login.cpp:176
#10 0x0812bc82 in ExportedPacketHookHandler (client=<incomplete type>, data=0xa8300589) at pol/network/packethooks.cpp:139
#11 0x0813d028 in process_data (client=<incomplete type>) at pol/pol.cpp:822
#12 0x0813e65e in client_io_thread (client=<incomplete type>) at pol/pol.cpp:1142
#13 0x081eb1d9 in UoClientThread::run (this=0xa8300468) at pol/uolisten.cpp:80
#14 0x08183dfa in _thread_stub2 (arg=0xa8300468) at clib/socketsvc.cpp:63
#15 0x0826fe2f in threadhelp::run_thread (threadf=0x8183dca <_thread_stub2(void*)>, arg=0xa8300468)
    at clib/threadhelp.cpp:236
#16 0x0826ff47 in threadhelp::thread_stub2 (v_td=0xa83006c0) at clib/threadhelp.cpp:275
#17 0x082d0bfb in start_thread (arg=0xa20f4b40) at pthread_create.c:308
#18 0x08346c1e in clone ()
Edit:

This is the code involved in segfault. It seems that something goes wrong with gethostbyname function... Any ideas?

Code: Select all

    for( idx = 0; idx < servers.size(); idx++ )
	{
        ServerDescription* server = servers[idx];

        if (!server->hostname.empty())
        {
            struct hostent* he = gethostbyname( server->hostname.c_str() ); // FIXME: here is a potential server lockup
            if (he != NULL && he->h_addr_list[0])
            {
                char* addr = he->h_addr_list[0];
                server->ip[0] = addr[3];
                server->ip[1] = addr[2];
                server->ip[2] = addr[1];
                server->ip[3] = addr[0];
            }
            else
            {
                Log( "gethostbyname(\"%s\") failed for server %s\n",
                      server->hostname.c_str(),
                      server->name.c_str() );
                continue;
            }
        }
Edit2:

I used a statically linked debug binary... but it seems that gethostbyname crash with this type of linking...

http://lists.gnu.org/archive/html/bug-g ... 00083.html (look at the year of the bug :O )

So I solved this problem compiling a dynamically linked debug version and I returned to the old one... stuck on connection as the first post. :/
kevin
POL Developer
Posts: 53
Joined: Wed Sep 29, 2010 3:47 pm
Contact:

Re: Linux server logon problem with 099

Post by kevin »

Hrm...

So, I had the same problem using statically-linked application. It errors if your server's Host is an actual hostname and not an IP address (at least, for me). However, my dynamically-linked version works fine.

There is no reason for the gethostbyname() to fail if running on the same machine you're compiling on. I'm asking around in some POSIX-related IRC channels for any insight.

If you *really* must run your static build, temporarily change the hostname of the server to a hard-coded IP until I can figure out a workaround.

Edit:
[ 6:28 pm] - <seldon> Is there a reason you're linking libc statically?
[ 6:29 pm] - <seldon> I suspect the problem is that gethostbyname returns a pointer to static data, and that that static data is not the same that the operating system writes the results of the lookup to because you made the linker dance to a weird tune.
[ 6:39 pm] - <kevin> well, i don't understand why it would crash if you're running it on the same machine that it's built on...?
[ 6:42 pm] - <seldon> If you want to know for sure, use a debugger. As I said, I suspect you have the memory area gethostbyname is working with twice and get it mixed up, and using the broken pointers then leads to unhappiness, but don't expect me to give you a reliable diagnosis from afar.
[ 6:44 pm] - <seldon> I find that if your linker tells you what you're doing is dangerous, it's best to stop doing it.
In short, just use dynamically-linked executable.
alexsperi
New User
Posts: 5
Joined: Wed Aug 20, 2008 12:31 am

Re: Linux server logon problem with 099

Post by alexsperi »

Ok, static linking isn't a must have, so I switched with no segfault problems to dynamic linking.

Having said that, I'm still having the connection issue. I don't have any packet hook installed on connection, so this behaviour is still puzzling me.
Post Reply