OVERHEAD: Java Application Scalability

Posted on July 6, 2009 by Max J. Pucher — 7 Comments

Scalability of applications has become a lot more complex than buying a bigger Windows server or z/OS mainframe. It is not just a system’s ability to handle user requests well as the numbers grow, but has mostly to do with managing system load, handling priorities, remaining responsive under high-load situations, while scaling as linearly as possible. A key problem of large-scale systems is data integrity as the number of transactions with write operations grows and the danger of data locks slowing down the system increases. Often the maximum number of concurrent users is seen as a measurement of scalability, but that is a dangerous shortcut. The number and kind of transactions that are required within a certain time frame is the only true measure of scalability. Response times for a transaction mix should remain below a limit at the expected peak transaction load.

For Java applications is difficult and expensive to achieve predictable, scalable performance. Simple switching from the single-server environments to multi-server can cause overall application throughput to collapse. Adding more data can cause the database to need progressively more resources. Java applications that work instantly in test, can be twenty times slower in response time when loaded with concurrent users.

The only (very complex) way to truly scale Java applications is to use three server tiers. The first is a HTTP-driven GUI, the second a Java server that de/serializes the tables of the third database tier into Java objects to perform the application logic. These tree tiers are often referred to as clustering, which is incorrect. Clustering is not related to vertically fragmenting the application but paralleling each server tier horizontally. So clustering can only be implemented independently for each of the tiers and requires very different functionality in each. In the case of the Web tier, no server-to-server communication is necessary, whereas at the other extreme, the database tier must manage transactional consistency across its cluster. This means that the cost of scalability is dramatically different for each of those tiers. Management and tuning is also completely different for each as well.

The HTTP tier performs SSL encryption and decryption duties, handles unique client connections by means of HTTP headers with ‘sticky load balancing’ information. HTTP requests are load-balanced across stateless servers and routed to designated J2EE server for processing. The application has to support sticky load balancing from the Web tier to be able to cluster both HTTP and Java tiers. Such Java applications exhibit intra-tier communication complexity that directly impairs scalable performance. The clustering of the database server has to be a function of the database product used, but it is impossible to scale the Java server without implementing some form of local data caching. Common forms are coherent clustered caching, fully replicated caching and partitioned caching.

Clustered caching maintains data in the Java application tier to satisfy data access requests from the cache without loading the database. The cache must not grow too large and provide data as current as necessary. Not all data is needed up-to-date. The cache expires data by LRU – least recently used and elapsed time algorithms. In clustered systems, servers notify each other about data modifications and either delete or refresh the outdated data page. Coherent clustered caching needs the same as application logic to synchronize cache pages to guarantee a thread-safe implementation. A kind of dual-phase commit on page changes is required in a distributed cache to maintain transactional data consistency.

A fully replicated cache however does not have the problems of clustered caching but shares modifications with other members of the cluster in a “push” model. While the data access has no measurable latency, a transaction should only be closed when the data from the cache has been serialized to the database tier. This is however mostly done in lazy writes and thus risky. The other problem is related to data locking because theoretically two servers can access the same data to write at the same time and therefore cause conflicts at transaction close.

The solution is the partitioned cache where each cluster server owns one unique fragment of the complete data set. All data are cached to the other servers but writes have to be synchronized to the owning server, but because communication is any-to-any, partitioned caches scale linearly with number of requests and data volume as you add servers. Replicating data from the owner and de-serializing tables into objects adds substantial latency. Therefore the application servers typically ‘object-cache the database-cache’ to avoid the additional de-serialization overhead for multiple object requests. The cache can load many objects from a single cache page in one I/O request rather than reading multiple table entries.

However, a substantially growing number of applications no longer read data from a database. Particularly process oriented applications (and that should be all of them) require data synchronization via backend interfaces such as SOA or MQ series from application silos. More often than not such interfaces require stateful conversations. Compared to database access, Web services based communications are at least a magnitude worse in performance, response time and throughput. Much of that has to do with the problems of XML defined data structures, the weak, ambiguous, and non-canonical XSD-based data definitions and the resultant parsing and validation. To reduce the number of transfers, SOA requests often pass a lot of incidental data because it might be needed, to avoid multiple data accesses. Obviously that is similar to creating a kind of SOA cache, but without any of the above mentioned mechanisms of data concurrency. Web services provided data are in principle always out of date. Web services transactions are so slow that they can not be used for applications that involve user interactions but just for straight-through processes without user intervention. To implement clustered caching for SOA Web services creates substantial problems, mostly because there is no defined mechanism for push-updates and lazy database writes. All of it has to be manually implemented by the Java programmers and is mostly hard-coded and application specific.

The complexity of measuring and predicting hardware and software performance becomes therefore a very complex task. During my years in IBM one of the key elements of designing a well working mainframe environment was based on the balanced system approach. The right mix of CPU power, RAM, disk I/Os was needed to get the most bang for your buck. Today the need for network bandwidth for the various intricacies of each of the tiers becomes the key element while HW is looked at as a cheap commodity. The immense OVERHEAD involved in juggling Web/HTTP clusters, Java tier clusters and expensive High-Availability clusters for the database tier requires huge amounts of commodity HW that is mostly not balanced to achieve scalable applications. Most systems just provide enough of everything to work. The idea of Cloud computing has to do with providing virtualized resources to such tiered systems adding more overhead for the virtualization. Hm, it sounds not very professional and well thought out to me …

CONCLUSION: Could the substantial overhead in using sticky-load-balancing, transaction-safe Java caching and database clustering, plus the substantial overhead for parsing and validating the XML data for SOA not be avoided? All of the above have been reasons why I chose a different route for the Papyrus Platform. Rather than the immense complexity of multi-layered caches – with multiple conversion from tables to cache pages to objects and reverse – I decided to collapse the horizontal structures and work with a purely (vertical) object model from definition to storage. That enables the use of the meta-data repository for application life-cycle management and substantially simplifies systems management and tuning.

The proxy replication mechanism of the object-relational database of the Papyrus Platform uses a partitioned caching concept where there is always a unique data owner and each server node uses a replicated copy that is either pull or push updated as per proxy definition. The objects do not have to be de/serialized as they are cached/replicated/binary stored as is. The same mechanism works transparently for all objects that are populated via Web services or other backend interfaces. The drawback is that during testing or at low load it seems to be not as fast as a Java app, but in difference it does scale linearly without needing additional programming or software as you add more commodity HW. Papyrus uses the same concepts across all servers because there are no tiers necessary and it also does not need virtualization.

About Max J. Pucher

I am the founder and Chief Technology Officer of Papyrus Software, a medium size software company offering solutions in communications and process management around the globe. I am also the owner and CEO of MJP Racing, a motorsports company focused on Rallycross or RX, a form of circuit racing on mixed surfaces that has been around for 40 years. I hold 8 national and international championship titles in RX. My team participates in the World Championship along Petter Solberg, Sebastian Loeb and Ken Block.

Posted in Uncategorized

7 comments on “OVERHEAD: Java Application Scalability”

Papyrus Application Scalability « ISIS Papyrus Platform Architecture says:

July 7, 2009 at 10:21

[…] to for example three tiered Web/Java/SQL/SOA application clusters. A more detailed discussion of Java Application Scalabilty I have posted on my ‘Real World’ […]

LikeLike

Reply
Mike Gualtieri says:

August 30, 2009 at 20:55

Hi Max,
I am very interested in your thoughts on Caching architecture since John Rymer and I plan to evaluate several of the products available for distributed caching such as IBM ExtremeScale, Oracle Coherance, Terracotta, Gigspaces, Mechased and others. I don’t know what you mean by “I decided to collapse the horizontal structures and work with a purely (vertical) object model”. It seems that you are saying that you are storing the objects in their native format to avoid de/serialization. But, is there more to it than that. How is your approach different from any of the products I mentioned above? By the way, your description of the issues with scaling Java apps is very informative. Thank you!

Mike

LikeLike

Reply
Max J. Pucher says:

August 31, 2009 at 11:48

Hello Mike, thanks for your question and interest. For the benefit of other readers I will answer here. More by email if necessary.

Papyrus is very different to these other products as it uses a special stream format for its objects that is stored as-is in the local and distributed object storage and chache. It is that stream format that is being executed by the transaction engine. Kind of a higher-level byte code that is created by a object-definition in the repository, and includes the oo-relationship definitions and thus the DB structure. There are no additional table definitions necessary. If a user defines a new object, Papyrus automatically creates the tables and all distributed caching functionality.

The reason to use partitioned caching has to do with unlimited scalability and ease of use an maintenance. Papyrus is peer-to-peer or n-tier if you want because if you instantiate the GUI-EYE objects on one server group, the process tasks on another and access the content stored on the archive nodes, then the system by itself scales unlimited as different groups of cluster servers. The only tricky element is to avoid business rules that broadly access and trigger object changes across all servers. That would be similar to wildcard SQL searches across all indices into an archive.

While the issues and solutions are the same I would not put Papyrus in the same group as the DB products you mention. These are used to solve the problems that the limited application technology and infrastructure of Java application servers and relational DBs creates. The Papyrus Platform does not have those problems from the outset!

But then it is not Java and thus not ‘standard’.

Go figure!

Thanks, Max.

LikeLike

Reply
Alexander Mintchev says:

September 11, 2009 at 18:10

Hello Max!

First of all, thanks for the very interesting paper!

I have the following question /comments:

1.) I guess, Papyrus must be using a (kind of) object-oriented, not relational database, so that an object “can be stored as it is” (and so to the avoid de/ -serialization concerns and object/ relational mapping). But then why are you mentioning Tables? Tables suggest a relational database model; In a relational database, you can only store a stream format in a DB field of type BINARY; But then how would you search for an attributes of a given object in the DB, if it is stored in a BINARY field?
2.) In the concept of a partitioned caching, is the unique node data owner somehow assigned to the object, when the letter is persisted? Or it is only used while cashing?

Thanks a lot,
Alexander

LikeLike

Reply
- Max J. Pucher says:
  
  September 13, 2009 at 7:30
  
  Alexander, thanks for the feedback. I am sorry if I caused some ambiguity, but I only mention tables because every database – also an OO one – consists of (mostly b-tree keyed) tables at its core. We use Oracle Berkeley for that.
  
  Yes, we do use a special stream format for the stored objects. Normally you cannot search for an object attribute in such storage. You can only pick an object based on its relationship pointed from another object. To be able to search such objects somewhat efficiently, certain attributes can be defined to be key attributes and the system automatically creates the necessary b-tree tables for fast searches. Therefore we call our storage OO-relational as it uses relationship pointers between objects as well as attribute search tables.
  
  The node that creates a persistent object is also its owner. To assign another one you simply have to move it. Moving objects is complex as all relationship pointers have to be realigned in a transaction safe mode. Our storage was not just defined this way to cache more efficiently but mostly to allow truly distributed applications. Each object-ID contains as first identifier the ID of the owning node. The second identifier is the ID of the parent node that created the object. The reason is that the object requires a pointer to its class definition as we are not duplicating those in each OO-stream. Additionally this enables class versioning as a stored object can point its class object and that has a pointer to its latest version. Independently stored class definitions also enable inheritance.
  
  Partitioned caching comes as a bonus of such a solution as storage nodes can be created at will and totally transparently. Our query language PQL can automatically spread the query across all servers that own objects of a certain type automatically. The main problem with all that is that there are no people with such skills left within corporations to make proper use of this power.
  
  I hope that answers your questions, Max
  
  LikeLike
  
  Reply
alexandermintchev says:

September 12, 2009 at 12:00

Hello Max,

Very interesting paper. If I understood you right, you use an object-relational database to avoid de/ serializing of objects from/ into DB tables, and a partitioned caching concept, where each object is in the custody of a given server node, so that only this node is authorized to perform write (modify, delete) operations on the object; Communication between server nodes in the cluster is peer-to peer, which means that all servers are “equal”, i.e. there are no “master” and “slave” nodes, is that right? As a consequence, there is no need for three tiers in your system, or at least you do not need a separate persistence layer, as objects are persisted as they are, and simultaneous access to a single object from different processes (nodes) is by the design of the partitioned cashing not possible. Correct?

I agree with you that the complexity of multi-tired java systems cause at least their slow adoption. This is for example confirmed by the fact that whereas SAP ABAP-based systems enjoy a wild implementation and adoption, the SAP Net Weaver Java Application server is still weakly accepted. Large, custom SAP J2EE applications are very rare.

Similarly to your Papyrus system, ABAP-based applications are much simpler than the multi-tired J2EE applications, and, although they use relation databases, they at least do not have a separate DB layer, and often they do not also have a separate presentation application layer.

I am interested in your Papyrus system design and I am curios about what happens, when a given server node fails? Is the custodian node information part of the object attributes? Also, given you have an object-relational database, you cannot easily integrate your data with other relational DB vendors, I am right? What do customers say in this case, aren’t they worried about the fact that they have to support different DB systems in their landscape simultaneously?

Thank you,
Best wishes: Alexander

LikeLike

Reply
- Max J. Pucher says:
  
  September 13, 2009 at 12:32
  
  Alexander, the proxy replication mechanism takes care of the caching for distributed read. For write operations, the object owner node manages the locking and dual-phase commit transactions. There is also an optimistic merge function and a merge conflct resolution mechanism to avoid long locks. But that needs additional application function to resolve conflicts. There are no master/slave nodes except for hot-standby. There is an optional node hierarchy because the system has to be secure but that is solely for system administration and for finding the object definitions in the repository.
  
  Your observations on the database part are correct. Customer do however not need to worry about the Db seperately because it is transparent. They only need to worry about the node operation and that might include DB recovery actions. There is no maintenance as with an SQL DB. Papyrus is a homogenous system that works identically and transparently across Windows, Unix and Mac with binary compatibility of the storage objectspace. You can simply copy it from one machine to the other and restart the node kernel program.
  
  As I mentioned we have currently hot-standby and runtime database backup, but we are working on a full cluster mode, where a number of nodes back each other up and automatically recover as well. It won’t even need RAID disks anymore. Our database has a fairly high I/O-write overhead because of the persistent OO-state information but as we are moving fast towards Solid-State disks, that will not be an issue with write speeds of 100MB plus.
  
  Non technical issues for a system like Papyrus:
  The main problem for us is that there are no skilled people left in the businesses who could use the power of Payprus and they have no management left who understands the business. They all want to buy standard applications with the processes inbuilt. That in the end might kill large enterprises as they no longer can be efficiently compete. If market domination by pure financial power will last remains to be seen.
  
  LikeLike
  
  Reply