My World in 8 hours: August 2007

Missing Left Mirror.

I fetch my wife to school. (She being a teacher.) And early in the morning what greet me was a broken off left side mirror.

Some fat brute must be rushing off damn fast to have broken off the side mirror. There was no dent, scratches associated with a car bumping it off. Furthermore the mirror has some flexibility to flex forward or backwards.

Oh Well ... $100 down the drain due to someone else's mistake.

Nothing compare to Ms Ho executive decision. That one cost $2 billion in paper lost. We still dunno the real amount.

"I don't think it is a bug.", he said.

So Mr Sun Tech support guy came back from vacation and finally find time to my query. He called and asking plenty of information which I had already provided in my email. The most important question asked was "Can you recreate it?"

An engage debate beings. The debate revolves around, if the person reporting the bug is responsible to prove that the problem happen or should the tech support people be responsible to prove that what i reported was due to the "Stupid user" syndrome. This is similar to how justice is implemented in US and Singapore. You are innocent unless proven guilty by the prosecution vs You are guilty unless the defense proof that you are innocent.

So after the debate. Yes I proofed the scenario is reproducible.
The steps are as follows.

Prerequisites:
Sun Application Server
Oracle DB. (I suspect that it would happen to other DB too.)

Steps in Sequence:
1) Limit the number of concurrent session for a particular db user to eg: 150.
2) Configure the min/max pool size to be the same and large number like 300.
3) Restart the server to have a clean start.
4) Notice that the server starts up fine and then access the application using the connection pool. This will kick start the initialization of the connection pool.
5) Then you will notice that the connection pool will initialize until it is 150. (as define by the DB concurrent session limit)
6) Using the Server monitoring feature, you can observe that the number of connection created was 150.
7) The DB should report 150 at this time too. The strange thing is that, although there are connection in the connection pool, none of them can be use by the application. The application requesting for connection will report that the SQL Exeception that the number of concurrent session for the DB user has exceeded.
8) Then request the DBA to up the number of concurrent session to 500. This is to simulate a situation which the server encounter problem when initializing the connection pool.
9) Try accessing the application again. This time the application will return with the correct and successful results.
10) However using the server monitoring feature, you will realize that the number of connection has exceeded the number which you define. In this example 300. The total number of connection created and in the pool is 450.

Conclusion:
The server failed to completely initialize the pool to the required size and the application server is not smart enough to detect that there are available connections which the web application can still use. When the problem was resolve, the pool is reinitialize. It failed to detect the current state of the connection pool and re-run the initialization process fully. It then establish the x+(steady state connections) where x is the number of connection previously initialized. Eat these! This is a bug. BUG BUG BUG I tell you.

There are however a few question marks. Is the problem caused by having a large number of connection? Is it cause by having the same min max values? If I need to find this out, might as well give me the source code and I fix it for them. Damn it, earn your pay Sun Support Staff. Where is your customer service Sun Microsystems!

It is a bug!!!

So to find out if it is a BUG, I asked the official people. SUN. The origin of Java. The Write Once, Run anywhere language. So I raise a support request with Sun Support. Although it is not an critical issue, it is still an issue. It may even be a bug.

I email Sun Support with my problem description, the server logs, the monitoring data and the domain.xml early this week. After an exchange of email to get more information, I got my first explanation from Sun.

I was utterly disappointed with the quality. And this is why.
1) They send me an email with the explanation in plain text. The text was so badly formated that it was unreadable. The sentence was chopped off at inappropriate location. Words was broken such that it did not make sense. (It is not a bad case of Justification gone wrong, it is horrible formating.) The points did not flow. ( As in point 1, 2, 3, 5.... what happen to the 4) The paragraph has no indentation and there is reference to non existent information (eg P15).

I tried to adjust the page size but to no avail.

2) So I replied that I could not understand. So he "here by send the document in word format". That was his exact words. I expected better formatting. But alast........... the same thing. What he has done was to cut and past the text into the word doc. Atrocious formating, incomprehensible English and all. No effort to clean up the formating at all. There was even a few cases of symbols which is evidence that the article was cut and paste from somewhere where the character encoding did not match.

Is it a case of Open Office converting to MS Words? I doubt so. The plain text in the first place already say it all.

3) Lastly, with a colleague we attempt to decipher his text. We found out, it did not answer the question. The text was a rip off from somewhere informing that there is a problem with Sun Java Application Server's connection pool having an issue if there is a firewall between the app server and the db. The text also try to explain how the connection pooling mechanism work.

Conclusion:
After a discussion with another colleague, we decided that it is a bug. Under no condition should the application server create a pool that is more than the maximum pool size specified in the domain.xml.

The person who answered was from XXXXX, XXXXXXX.

Although I still support Java and Sun MicroSystems, this incident is really shaking my confidence. This is why:


AS 8.x JDBC Connection Pool
===========================
Let take the following configurable Connection pool properties from AS8.x
e="  
oracle.jdbc.pool.OracleDataSource" fail-all-connections="false" idle-timeout-in-
seconds="300"  
is-connection-validation-required="true" is-isolation-level-guaranteed="false" m
ax-pool-size="32" max-waittime-  
in-millis="60000" name="oracle" pool-resize-quantity="2" res-type="javax.sql.Dat
aSource" steadypool-  
size="8" validation-table-name="dual">




In this pool setting in domain.xml, you will notice some of the pool attributes
1. is-connection-validation-required, connection-validation-method, validation-t
able-name  
2. fail-all-connections
3. idle-timeout-in-seconds
5. max-pool-size, max-wait-time-in-millis, pool-resize-quantity, steady-pool-siz
e  
Connection pool behaviour before P15
1. If . is-connection-validation-required is true,
a. And if connection-validation-method can be table. In the case, before t
he JDBC connection  
is returned to the application (when asked by calls like DataSource.getConnectio
n()), the  
connection is checked that is it is valid by select count(*) from table. N
ot that the table is  
configured by validation-table-name. This is the recommended validation fo
r Oracle and the  
table name is normally DUAL
b. If connection-validation-method is auto-commit, then connection is test
ed by sequences of  
calls to Connection.setAutoCommit(), getAutoCommit() and isAutoCommit(). It has 
been  
reported that for JDBC drivers like Oracle, this method of testing is not reliab
le (hence table  
validation is recommended for Oracle).
c. If the connection-validation-method is metadata, the database Connectio
n  
DatabaseMetaData query is used to test connection is valid. If one notice that t
his is used by  
HADB JDBC driver.
2. Next, the fail-all-connections is a flag to indicate that if the while 
taking a connection from the pool  
and it is detected or encountered an exception, then all the connections in the 
pool will be failed (if this  
attribute is true).
Normally, fail-all-connections should be false since all the other connections i
n the pool might still be  
valid and instead it would be more drastic to fail all of them due to that conne
ction.  
3. Next, the idle-timeout-in-seconds is an attribute of how long the connection 
can be in the pool if it is  
idle. Note that implicitly, there is a background thread that is scheduled every
idle-timeout-inseconds 
to do operation on pool cleanup. We will discuss this next.
4. The pool size is governed by steady-pool-size and max-pool-size.
a. Initially when the system is started the pool will be empty. On the first req
uest to ask for a  
database connection, in P13, a new connection will be created up to the steady p
ool size.  
b. When a new connection is created and all the other other connections in the s
teady pool size is  
already taken up (ie there is no free connections in the pool), then ONE connect
ion will be  
created.
c. So lets take an example,
Steady pool size is 0, and the max-pool-size is 32 and the pool-resize-quantity 
is 2, when you  
ask for a new connection, and return the connection, the pool should now contain
 1  
CONNECTION (for idle-timeout-in-seconds).
So after for idle-timeout-in-seconds, when the JDBC cleanup thread runs, this co
nnection will  
be cleared and the pool size will be 0 (since the steady pool is denoted 0)
d. Now as for the max-pool-size, it is obviously binds the number of connections
 this pool will  
create.

5. The role of the JDBC cleanup thread in P13 is
a. Periodically wakes up idle-timeout-in-seconds
b. For any connections in the pool that exceed the steady pool size, and for the
se connections if  
they are idle > idle-timeout-in-seconds, destroy them (so that they poo
l goes down to steady  
pool size). Note that the number of idle connections that is clear is BOUNDED to
 be only  
pool-resize-quantity
c. Note that the cleanup thread also tries to maintain the pool to be steady-poo
l-size.  
d. However, do note that connections that is inside the steady-pool remains in t
he pool. The  
implication is that these connections can be prone to firewall timeout. The fact
 is that if these  
connections in the steady pool are not touched, the firewall may decide to timeo
ut the TCP  
connections. If that happens, bad behaviour will happen when these connections a
re validated.  


Connection pool behaviour at P15/16/17
=======================================
Now all the attributes in the above are still there. However there is some subtl
e behaviour.  
In P15, an RFE was implemented so that connections in the steady pool those are 
idle more than idletimeout-  
seconds will be destroyed. So the following is what the cleanup thread does
1 The role of the JDBC cleanup thread in P15 is
a. Periodically wakes up idle-timeout-in-seconds
b. For all connections in the pool that are idle idle-timeout-in-seconds
, destroy them (so that  
they pool goes down to steady pool size). Note that the number of idle connectio
ns that is clear is  
BOUNDED to be only pool-resize-quantity. Note that ALL idle connections AR
E destroyed.  
c. Note that the cleanup thread also tries to maintain the pool to be steady-poo
l-size. This implies too  
that if the above connections is destroyed and less than steady-pool-size then a
 number of  
connections is created (up to the steady pool size)
Due to this behaviour, the connections in the pool should be firewall friendly a
s long as the â€œidle-timeout-insecondsâ€  
is well below the firewall timeout (Typically that means for a firewall friendly
 timeout the value of  
idle-timeout-in-seconds should be firewall timeout.
2 Next, there is also some subtle behaviour to the way â€œpool-resize-quantity
â€ means. In P15, this parameter  
applies not only to resize the pool downward to the steady pool size but applies
 too in terms of growing  
the number of connections in the pool.
Take the same example, Steady pool size is 0, and the max-pool-size is 32 and th
e pool-resize-quantity is  
2, when you ask for a new connection, and return the connection, the pool will h
ave 2 CONNECTIONS  
(since the implementation will grow them at pool-resize-quantity everytime.).
Now, due to the new cleanup thread and the pool resize behaviour, you probably n
otice that having a low  
â€œidle-timeout-secondsâ€ may cause an extremely many database connection creat
ion calls (since idle  
connections will be destroy

Is it a bug?

Sometimes an application server can really be a piece of puzzle that is so hard to solve. Recently encountered this problem when starting up the application server. The server starts up and initialize the JDBC connection pool. However it did not manage to fully initialize the JDBC pool. It is suppose to initialize the pool with 300 connection but it hit a snag at 191. The DB did not allow it to as it has reach the allowable session limit.

A few minute later, the session is release and the application server continue to initialize the pool. It continue to initialize the pool from the start and added another 300 connection to the pool. It now had a total of 491 connection. So is this a bug?

23 Aug: Just to add on, the server did not complain that it has more than the configured connections. The DBA also registered the higher number of connection.