The most important aspect that affects the performance of a system is architecture. It is often seen that systems fails to perform as expected because of wrong architectural decisions. Liferay is a leading open source platform for developing high-performing portals. In this chapter, we will focus on the architecture of Liferay-Portal-based solutions. We will learn about various aspects which should be considered while defining the architecture of a Liferay-based solution. By the end of this chapter, we will learn about:
The Liferay Portal reference architecture
The Deployment sizing approach
Documents and Media Library architecture options
Database architecture options
Architectural options for handling static resources
Caching architecture options
Search engine architecture options
Defining the architecture of a system from scratch requires an enormous amount of effort for researching, investigating, and taking right architectural decisions. We can reduce the effort by referring to the reference architecture for similar kinds of solutions. We can also ensure including a set of architectural best practices from the reference architecture. In this section, we will talk about the reference architecture of Liferay-Portal-based solution. This reference architecture can be used as a base for any Liferay-Portal-based portal solution. Of course, necessary changes have to be made in the reference architecture depending upon specific requirements. The rest of the chapter will help Liferay architects to make the right architectural decisions for such changes.
Here is the reference architecture diagram of Liferay-Portal-based solution:

As shown in the previous diagram, users of the portal will access the Portal using tablets, mobile devices, or through PC browsers. Liferay Portal 6.1 supports various devices, and we won't need any special component to render content for mobile devices. Liferay Portal can even detect specific devices and respond with device-specific content. Liferay also supports creating responsive web design using its UI framework called AlloyUI.
As shown in the reference architecture, every request will pass through Firewall. Firewall will filter unsecure requests. All valid user requests will be passed to the Hardware Load Balancer. The hardware load balancer is a hardware appliance which distributes loads between multiple web servers. The hardware load balancer can also deal with the failure of web servers. In case a of failure of any web server, the hardware load balancer diverts traffic to working web servers. There are a number of hardware load balancers available on the market. Some of the popular hardware load balancer vendors include F5, Cisco, Radware, CoyotePoint, and Barracuda.
The Web tier includes a series of Apache Web Servers. As shown in the reference architecture diagram, each Web Server is connected with each Application Server. The Web Server acts as a Software Load Balancer for Application Servers. Web servers can also act as components to serve static resources. The Apache Web Server connects with the Liferay Portal application server using mod_jk
, mod_proxy
, or mod_proxy_ajp
connectors. These are popular connecters available with the Apache Web Server.
The Application tier includes one or more Liferay Portal application servers. Liferay Portal can be deployed on many different application servers. The reference architecture recommends using the most popular Apache Tomcat Server. Application servers are connected with web servers using the AJP protocol or the HTTP protocol. As shown in the diagram, there is a communication link between Application Servers. Each Application Server is connected with other Application Servers to replicate the session information, and cache and/or search indexes. Each Application Server is connected to dedicated Database Servers and Active Directory Servers.
The Liferay Portal server connects to the Database Repository tier. For production systems, it is advisable to set up multiple database instances with replication. Such a setup ensures high availability of Database Servers. Liferay Portal works with majority of open source and propriety databases. In our reference architecture, we will use MySQL, which is one of the popular open source databases.
Liferay Portal comes with an embedded Apache Lucene search engine. The Lucene search engine stores search indexes in a filesystem. As shown in the reference architecture diagram, each Application Server has its own search index repository in the Search Repository tier. Search engine repositories can be synchronized by the Liferay Portal server using the Cluster Link feature.
Liferay Portal comes with a media repository, which includes a document library, image gallery, and so on. Liferay Portal provides different options to store the media repository content. By default, Liferay stores the media repository content on a filesystem. It can be configured to store the media repository content on a database, Java Content Repository (JCR), CMIS-based repository, or Amazon S3. As shown in the reference architecture diagram, we have used a centralized filesystem to store the media repository content. To avoid issues related to concurrent access on a centralized filesystem, it is recommended to use Storage Area Network (SAN) as the centralized filesystem to store the Media Library content.
Liferay comes with its own user repository. Liferay maintains its user repository in a database. But for production systems, it is recommended to integrate the user repository with identity management systems. The reference architecture refers using the Active Directory server. Liferay Portal connects with the Active Directory Server using the LDAP protocol.
In the previous section, we learned about various tiers of the reference architecture. Let's understand how the reference architecture addresses architectural concerns.
As shown in the architecture diagram, horizontal scaling is used for both the Web tier and the Application tier. Most of the components in the architecture are decoupled and hence if the user base is increased, we can scale up by adding extra nodes. We can establish linear scalability of the solution by using a performance benchmarking exercise. This can enable us to increase the capacity of the system by increasing 'x' number of Liferay application servers, web servers, or database servers.
The reference architecture divides the load of the system to multiple tiers. A static resource's requests can be served by the Web tier directly. Also, the Web tier is load balanced using the Hardware Load Balancer. So, the load on each web server is also controlled. Similarly, all application requests will be served by the clustered Application Server tier. The Application Server connects with the Database tier which is again clustered to ensure the load is distributed. The reference architecture ensures that the architecture of the solution is robust enough for delivering high performance.
The reference architecture ensures that the most important tiers of the solutions are clustered and load balanced to ensure that the system is highly available and fault tolerant. As shown in the diagram, the Web tier, Application tier, and Database tier are clustered, which means that if any nodes from these tiers go down, the system will still respond to user requests.
The reference architecture places Firewall in front of the Hardware Load Balancer, which ensures that all the security threats are filtered. Depending upon the security needs, it is advisable to set up a firewall between each tier as well. So for example, the Web tier can access the Application tier, but the opposite can be prevented. Depending upon the project need, the architecture supports configuring SSL-based access.
In the previous section, we learned about the Liferay Portal reference architecture. The reference architecture is generic in nature. It can be used as a reference to define an architecture that is more specific to a project. One of the important activities in defining a specific architecture is sizing. We need to be sure of the number of Liferay Portal application servers or web servers to meet performance expectations. In the beginning of the project when the system is yet to be developed, it is impossible to size the architecture with 100 percent accuracy. Hence, the idea is to size the architecture based on previous benchmarks, and then review the sizing during the load testing phase when the system is ready. Liferay Inc. publishes the performance benchmark for every major Liferay Portal release. It is a best practice to use this benchmark as a reference and size the deployment architecture of the solution. In this section, we will learn how to size the deployment architecture of the Liferay-Portal-based solution based on Liferay's performance benchmark whitepaper.
Note
This section refers to the Liferay Portal 6.1 performance white paper published by Liferay Inc.. This whitepaper can be accessed through the following URL:
The first step of the sizing activity is to capture some of the basic non-functional requirements. The following table provides a list of these questions. The answers to these questions will act as parameters for sizing calculations.
No. |
The requirement question |
Mandatory? |
Details |
---|---|---|---|
1 |
How many concurrent users will log in at the same time? |
Yes |
Login is the most resource-consuming use case in Liferay Portal. It is very important to know the answer to this question. |
2 |
What is the number of concurrent users accessing the Message Board functionality including login? |
No |
The Liferay performance benchmark report publishes the result of this scenario. If the project requirement matches the scenario, we can use this to size the deployment architecture more accurately. |
3 |
What is the number of concurrent users accessing the Blogging functionality including login? |
No |
If such a scenario is applicable to our requirement, we can derive a more accurate deployment architecture. |
4 |
What is the number of concurrent users accessing the document management functionality including login? |
No |
Depending upon the project requirement if such a scenario exists, using this parameter we can size the deployment architecture more accurately. |
Once we get the answers to these questions, the next step is to compare the answers with performance benchmark results from the white paper and derive the exact number of application servers we will need. The whitepaper establishes linear scalability based on various tests. Based on the report, we can establish the exact number of application servers that we will need to handle a specific number of concurrent users. Before we jump on to the calculation, let us summarize the performance benchmark report.
In the performance benchmark test, Liferay Inc. used the following hardware configurations:
Server type |
Configuration |
---|---|
Apache Web Server |
1 x Intel Core 2 Duo E6405 2.13 GHz CPU, 2 MB L2 cache (2 cores in total) 4 GB memory, 1 x 146 GB 7.2k RPM IDE |
Liferay Portal Application Server |
2 x Intel Core 2 Quad X5677 3.46 GHz CPU, 12 MB L2 cache (8 cores and 16 threads) 16 GB memory, 2 x 146 GB 10k RPM SCSI |
Database Server |
2 x Intel Core 2 Quad X5677 3.46 GHz CPU, 12 MB L2 cache (8 cores and 16 threads) 16 GB memory, 4 x 146 GB 15k RPM SCSI |
In the performance benchmark test, Liferay Inc. concluded the following:
No. |
Scenario |
Result summary |
---|---|---|
1 |
Isolated logins: During this test, a number of concurrent users tried to log in at the same time. Based on this scenario, the breaking point of the Liferay Portal application server was identified. In this scenario, no customizations were considered and the Liferay login scenario with out of the box home page was tested. |
According to the results, one Liferay Portal application server was able to handle 27,000 concurrent logins at the same time. After , concurrent login requests if we increase the requests, the application starts becoming loaded and the response time increases. |
2 |
Login with Legacy Simulator: In this scenario a two-second delay was included in one of the home page portlets. As we build our application on top of Liferay Portal and we normally have some additional processing time after login for custom home page portlets, a delay of two seconds was included to simulate this scenario. This is the realistic scenario for estimating possible concurrent logins by a server. |
The results proved that the performance of the system degrades after 6,300 concurrent login requests. That means one application server should handle 6,300 concurrent login requests only. If expected concurrent users are more than 6,300 but less than 12,600 concurrent requests, one more application server should be added in the cluster. |
3 |
Message Board: In this scenario, a number of concurrent users will log in and perform various transactions on the Message Board portlet. |
It was proved that one application server was stable until 5,800 concurrent requests. After that, the system performance started to degrade. So in this scenario, one application server was able to handle 5,800 concurrent requests smoothly. |
4 |
Blogging: In this scenario, a number of concurrent users performed blogging transactions, such as view blog list, view blog entry, post new blog, and so on. |
The result proved that one application server was able to handle 6,000 concurrent requests smoothly. |
5 |
Document management: In this scenario, a number of concurrent users accessed document management functionalities. |
The results proved that the system was able to handle 5,400 concurrent requests smoothly with one application server. |
We learned about the reference hardware and benchmark results. Now, let's size the deployment architecture for a sample project.
The example Portal solution should be able to handle 15,000 concurrent requests. This is the only requirement that we received from the customer, and we need to size our initial deployment architecture based on that.
Login
is the most resource-consuming operation in a Liferay-based portal. Also, the login use case takes care of authentication as well as rendering of the home page, which is displayed after authentication. We have not received any use case-specific performance needs. So for sizing, we can refer to the benchmark results of the Login with Legacy Simulator scenario. According to the results of this benchmark test, one Liferay Portal application server can handle 6,300 concurrent login requests. So to handle 15,000 concurrent login requests, we will need three Liferay Portal application servers. Generally, the load on the web server is less than 50 percent of application servers. Hence, we can derive the number of web servers as half of the application servers. So in our case, we will need two web servers (3 application servers/2). For the database server as per our reference architecture, it is recommended to have a master-slave database server. This calculation is valid for similar hardware configurations as it was used in the benchmark performance test. Hence, we need to use the same hardware configuration for the application server, web server, and database servers.
Documents and Media Library is one of the most important functionality of Liferay Portal. It allows users to manage documents, images, videos, and other types of documents. This functionality is designed in such a way that metadata is stored in the database, while actual files are stored on pluggable repository stores. Liferay Portal ships with various built-in repository stores. In this section, we will learn about these repository stores and the best practices associated with them.
Both File System store and Advance File System store are similar with some exceptions. Both of these store files on the filesystem. Advanced File System stores additionally distributes files in a multiple folder structure to eliminate limitations of the filesystem. The File System store is the default repository store used by Liferay Portal. Compared to other repository stores, both of these stores give better performance.
Liferay doesn't handle file locking when we use any of these two stores. Hence on production environments, they must be used with Storage Area Network (SAN) with file locking capabilities. Most of the SAN providers support file locking, but this has to be verified before using them.
To get best performance results, it is recommended to use an Advanced File System store with SAN. In our reference architecture, we have used the same approach for the Media Library repository. Liferay can be configured to use the Advanced File System store by using the following properties in portal-ext.properties
:
dl.store.impl=com.liferay.portlet.documentlibrary.store.AdvancedFileSystemStore dl.store.file.system.root=<Location of the SAN directory>
This repository store simply stores files in the Liferay database. Concurrent access to files is automatically managed as files are stored in the database. From the performance point of view, this store will give bad results when compared to File System and Advanced File System stores. Also, if the Portal is expected to have heavy use of the Media Library functionality, then this repository store will also affect the overall performance of the Portal, as the load on the database will increase for file management. It is not recommended to use this store unless the use of the Media Library is limited. Liferay Portal can be configured to use the Database store by adding the following property in portal-ext.properties
:
dl.store.impl=com.liferay.portlet.documentlibrary.store.DBStore
Java Content Repository (JCR) is the result of the standardization of content repositories used across content management systems. It follows the JSR-170 standard specification. Liferay Portal also provides the JCR store, which can be configured with the Media Library. The JCR store internally uses Apache Jackrabbit, which is an implementation of JSR-170. Apache Jackrabbit also, by default, stores files in a filesystem. It can be also configured to use the database for storing medial library files. For the production environment if we plan to use JCR, it must be configured to store files in the database. As on a filesystem, we can get file locking issues. The JCR store is a good option for the production environment when it is not possible to use the Advanced File System store with SAN. To configure Liferay to use the JCR store, we need to add the following properties to portal-ext.properties
:
dl.store.impl=com.liferay.portlet.documentlibrary.store.JCRStore
Content Management Interoperability Services (CMIS) is an open standard that defines services for controlling document management repositories. It was created to standardize content management services across multiple platforms. It is the latest standard used by most of the content management systems to make content management systems interoperable. It uses web services and RESTful services that any application can access. Liferay provides the CMIS store which can connect to any CMIS-compatible content repositories. The metadata of the Media Library content will be stored in Liferay, and the actual files will be stored in the external CMIS-compatible repository. This repository store can be used when we need to integrate Liferay Portal with external repositories. For example, Alfresco is one of the leading open source content management systems. If we have a requirement to integrate the Alfresco content repository with Liferay, we can use the CMIS store which will internally connect with Alfresco using CMIS services. To configure Liferay with the CMIS repository, we need to add the following properties to portal-ext.properties
:
dl.store.impl=com.liferay.portlet.documentlibrary.store.CMISStore dl.store.cmis.credentials.username=<User Name to be used for CMIS authentication> dl.store.cmis.credentials.password=<Password to be used for CMIS authentication> dl.store.cmis.repository.url=<URL of CMIS Repository> dl.store.cmis.system.root.dir=Liferay Home
Nowadays, companies are moving their infrastructures to the cloud. It provides great benefit in procuring and managing hardware infrastructure. It also allows us to increase or decrease the infrastructure capacity quickly. One of the most popular cloud providers is Amazon AWS. Amazon offers a cloud-based storage service called Amazon Simple Storage Service (Amazon S3). The Liferay Media Library can be configured to store Media Library files on Amazon S3. This is a good option when the production environment is deployed on the Amazon Cloud infrastructure. To configure Liferay to use Amazon S3 for the Media Library store, we need to add the following properties to portal-ext.properties
:
dl.store.impl=com.liferay.portlet.documentlibrary.store.S3Store dl.store.s3.access.key=<amazon s3 access key id> dl.store.s3.secret.key=<amazon s3 encrypted secret access key> dl.store.s3.bucket.name=<amazon s3's root folder name>
Liferay Portal requires storing its data on database systems. It is possible to store custom portlet data in a separate database. But for the core features of Liferay Portal, we need to connect Liferay with a database. In our reference architecture, we suggested using the MySQL cluster for this purpose. In this section, we will talk about various deployment strategies for the database server.
In case of transaction-centric applications, it is a good idea to separate read and write databases. In this situation, all write transactions will be executed on the write database and all read transactions will be executed on the read-only database. Using database replication mechanism, data from the write database is replicated to the read database. By using this mechanism, we can optimize the write database to perform extensive write transactions and the read database to perform extensive read transactions. Liferay Portal supports configuring read and write databases through portal-ext.properties
. Here are some high-level steps to configure the read/write database through portal-ext.properties
.
In
portal-ext.properties
, append the following value at the end of original values. This configuration change will load the following spring configuration file during startup and load the rest of the read/write database properties:spring.configs=<Existing config files>, META-INF/dynamic-data-source-spring.xml
Add the following properties to
portal-ext.properties
to configure the read database:jdbc.read.driverClassName=<Read Database Driver Class Name> jdbc.read.url=<Read Database JDBC URL> jdbc.read.username=<Read Database User Name> jdbc.read.password=<Read Database Password>
Add the following properties to
portal-ext.properties
to configure the write database:jdbc.write.driverClassName=<Read Database Driver Class Name> jdbc.write.url=<Read Database JDBC URL> jdbc.write.username=<Read Database User Name> jdbc.write.password=<Read Database Password>
Database sharding is the architectural solution to separate the data of same the tables in multiple database instances. Liferay supports this feature. Liferay Portal can be used to host multiple portals within the same portal server using Portal Instances (Companies). By default, Liferay Portal stores data of all the instances in the same database. If we are hosting multiple portals using portal instances, the same tables will have data from multiple instances. Gradually, tables will grow rapidly because of the data from multiple portals. At some point in time, this will affect the performance as tables grow rapidly, and for any request internally the system will need to scan the data of all instances. We can configure multiple database shards (separate databases), and we can provide how shards should be chosen. Depending on the shard selection algorithm, each portal instance will be mapped to a specific shard database. By using this architectural approach, data from multiple instances will be distributed in multiple databases. By default, Liferay supports configuring three shards. But we can add more shards by changing configuration files. We can enable database sharding by changing portal-ext.properties
. Here are some high-level steps to configure database sharding:
Append the following property in
portal-ext.properties
to enable database sharding:spring.configs=<Existing config files>, META-INF/shard-data-source-spring.xml
Configure database shards by adding the following properties in
portal-ext.properties
:#Shard 1 jdbc.default.driverClassName=<Database Driver Class Name for shard 1> jdbc.default.url=<Database JDBC URL for shard 1> jdbc.default.username=<Database User Name for shard 1> jdbc.default.password=<Database Password for shard 1> #Shard 2 jdbc.one.driverClassName=<Database Driver Class Name for shard 2> jdbc.one.url=<Database JDBC URL for shard 2> jdbc.one.username=<Database User Name for shard 2> jdbc.one.password=<Database Password for shard 2> #shard 3 jdbc.two.driverClassName=<Database Driver Class Name for shard 3> jdbc.two.url=<Database JDBC URL for shard 3> jdbc.two.username=<Database User Name for shard 3> jdbc.two.password=<Database Password for shard 3>
By default, shards will be assigned to each portal instance based on the round ribbon algorithm. Liferay also supports the manual selection algorithm. This algorithm allows for the selecting of a specific shard through the control panel. To enable the manual shard selection algorithm, we need to add the following property in portal-ext.properties
:
shard.selector=com.liferay.portal.dao.shard.ManualShardSelector
In any dynamic web application, majority of the web requests are for static resources, such as JavaScript, CSS, images, or videos. The same rule also applies to Liferay-Portal-based solutions. Hence, it is very important from an architectural point of view how we serve these static resources. In a basic Liferay Portal setup, static resources are served from the Liferay Portal application server. In this section, we will learn about other options to serve static resources.
Content Delivery Network (CDN) is a large network of servers deployed across the world to serve static resources. The same static resources are stored on multiple servers across the world. When these static resources are requested, they will be retrieved from a server nearby the location of user. This feature reduces response time drastically. Liferay Portal also supports integration with CDNs. In Liferay Portal, majority of the static resources are a part of themes. Liferay provides a way to rewrite URLs of static resources within themes to a URL of the same resource in CDN. By using this feature, we can also reduce the load on the Liferay Portal application server by reducing the number of requests. To configure Liferay with CDN, we need to perform the following steps:
Upload all the static resources from the theme into CDN. CDN providers provide the UI to do the same. This step requires referring to the CDN provider's documentation.
Add the following properties to the
portal-ext.properties
file:cdn.host.http=<CDN host name to server static resources from http request> cdn.host.https=<CDN host name to server static resources from https request>
This solution is highly recommended when the intended users are spread across the globe.
If we serve static resources directly from the web server, it can reduce the number of requests coming to the Liferay Portal application server. Also, static resources can be served faster from the web server than the application server. All portal requests pass through the web server. Hence, it is easy to filter static resource requests and serve them directly from the web server. To implement this option, we do not need to change any configuration on the Liferay Portal application. We need to copy all static resources from all the Liferay plugins to the web server public directory. We need to make changes in the web server configuration so that all the static resource requests are directly served from the web server public directory. In this approach, we need to ensure that we copy the static resources to the web server every time we deploy a new version. This option can be used along with CDN to serve static resources of portlets.
Caching is a very important aspect for any system to achieve high performance. Liferay Portal provides integration with different caching frameworks. Liferay Portal, by default, caches entity records, content, and so on. In this section, we will learn about various caching options available with Liferay Portal.
Ehcache is a very powerful-distributed caching framework. Liferay Portal, by default, comes with the Ehcache integration. The default configuration uses a cache on local instances. This means that if we are using a clustered environment, each node will have its own cache. So in a clustered environment, it is required to replicate the cache across all the nodes. There are different options available to replicate a cache across multiple nodes. Here are the options available to replicate Ehcache across the cluster.
Ehcache framework supports cache replication using RMI. In this scenario, when the server starts up using IP multicast, each node in the cluster will connect with other nodes using RMI. All the cache updates are replicated to other nodes using RMI. It is a kind of point-to-point connection between all the nodes in the cluster. The following diagram explains how each node connects with the other to replicate the cache:

As shown in the preceding diagram, we have four Liferay Portal nodes in the cluster. Each node is connected with each other. So in total, it will create around twelve RMI links to replicate the cache across other nodes. This option uses a thread-per-cache replication algorithm. Hence, it creates a massive number of threads for replicating the cache over the cluster. Because of this algorithm, this option adds a lot of overhead and affects the overall performance of the system.
This option is available for the enterprise version of Liferay Portal. In this approach, Liferay Portal creates a limited number of dispatcher threads that are responsible for replicate cache over the cluster. As in this approach all requests pass through a single place before they are actually distributed in the network, it gives a chance to remove unnecessary requests. For example, if the same cache object is changed by multiple nodes, instead of sending two requests to all the nodes to invalidate cache, only one request will be sent. This feature reduces network traffic. The following architectural diagram explains this feature in detail:

As shown in the preceding diagram, all four Liferay Portal nodes are connected to each other using Cluster Link. Internally, this feature uses UDP multicast to establish a connection with cluster nodes. A small group of threads is created to distribute cache update events to all the connected nodes. It is recommended to use this option for Ehcache replication.
In the previous section, we talked about Liferay Ehcache integration. In order to use Ehcache in a distributed environment, we need to replicate the cache across the cluster. Another approach is to use the centralized caching server. All nodes connect to the centralized cache server and store/retrieve cached objects. In this approach, we do not need to worry about cache replication. Terracotta is one of the leading products which provides this solution. Liferay Portal supports integration with Terracotta. If a portal is intended to have a large amount of cache objects and a large number of cache changes, it is recommended to go with this approach. Terracotta also provides solutions for storing web sessions and quartz jobs. By using Terracotta, we can even prevent session replication and replication of quartz job data. The following diagram explains how Terracotta fits into the Liferay Portal architecture:

As shown in the preceding diagram when we use Terracotta, we will not need any communication between individual Liferay Portal application nodes. Each node will directly communicate with Terracotta and store/retrieve cached objects, sessions, and quartz data. It is recommended to use this architectural approach if the portal is going to have huge cache objects. This approach gives the best performance by omitting replication overhead.
We have talked about the caching of objects at the Application tier. But in many situations, it is even possible to cache whole web pages and deliver them directly from the cache. This option can be used for content that doesn't change frequently. This approach can reduce the load on the web server, application server, and database server drastically, and also improve the overall response time. Such caching tools are also called web application accelerators. Varnish is one of the popular open source web application accelerators.
The following architectural diagram explains where Varnish can fit in our reference architecture:

As shown in the preceding diagram, the Varnish server runs in front of web servers. The Hardware load balancer will sent all the requests to the Varnish server. Based on the configuration, the Varnish server will decide if the request should be served from the cache or should be send to the web server. It provides a way to clear the cache as well. Depending upon the hardware configuration of the web server, it is also possible to run the Varnish server on the web server itself. This architectural option can be used with many portals which serves kind of static contents. Some of the examples include news portals and product catalogue portals.
Search is an inescapable feature in every portal application. Liferay Portal also provides search functionality out of the box. Liferay Portal includes the search framework which can be integrated with external search engines. In this section, we will look at various search integration options available with Liferay Portal.
Liferay Portal, by default, uses the embedded Apache Lucene search engine. Apache Lucene is the leading open source search engine available in the market. By default, Liferay Portal's search API connects with the local embedded Lucene search engine. It stores search indexes on the local filesystem. When we use Lucene in a clustered environment, we need to make sure the indexes are replicated across the cluster. There are different approaches to make sure the same search indexes are available to all Liferay Portal nodes.
One of the options is to configure Lucene to store indexes on a centralized network location. Hence, all the Liferay Portal nodes will refer to the same version of indexes. Liferay provides a way to configure indexes on a particular location. This approach is recommended only if we have SAN installed, and the SAN provider handles file locking issues. As indexes are accessed and changed too often, if SAN is not able to handle file locking issues, we will end up having problems with the search functionality. This option gives the best performance. To configure the location of the index directory, we need to add the following property in portal-ext.properties
:
lucene.dir=<SAN lucene index location>
We have learned about the Cluster Link feature of Liferay Portal which replicates Ehcache. Cluster Link also replicates Lucene indexes across the Liferay Portal nodes. Cluster Link connects to all the Liferay Portal nodes using UDP multicast. When Cluster Link is enabled, the Liferay search engine API raises an event on Cluster Link to replicate specific index changes across the cluster. The Cluster Link dispatcher threads distribute index changes to other nodes. This is a very powerful feature. This feature doesn't require specialized hardware. But it adds overhead on the network and the Liferay Portal server. This option is recommended if we cannot go with centralized index storage on SAN.
Apache Solr is one of the powerful open source search engines. It is based on the Apache Lucene search engine. In simple words, it wraps the Lucene search engine and provides access to Lucene search engine APIs through web services. Unlike Lucene, Solr runs as a separate web application. Liferay provides integration with Apache Solr as well. To integrate Apache Solr with Liferay, we need to install the Solr web plugin. We can configure the URL of the Solr server by modifying the configuration of the Solr web plugin. It is recommended to use Solr with Liferay Portal when the Portal is expected to write a large amount of data in search indexes. In such situations, Apache Lucene will add a lot of overhead due to index replication over the cluster. As Apache Solr runs as a separate web application, it makes the Portal architecture more scalable. The following diagram explains the basic Liferay-Solr integration:

As shown in the preceding diagram, Apache Solr is installed on a separate server. The Apache Solr server internally stores indexes on the filesystem. All Liferay Portal servers are connected with the Apache Solr server. Every search request and index write request will be sent to the Apache Solr server.
In the preceding architecture, we are using a single Solr server for both read and write operations. Internally, the Solr server performs concurrent read and write operations on the same index storage. If the Portal application is expected to perform heavy write and search operations on the Solr server, this architecture as explained earlier will not give good performance. In such situations, it is recommended to use the master-slave Solr setup. In this approach, one master and many slave Solr servers are configured to work together. The master server will handle all the write operations and the slave servers will handle all read and search operations. Here is the diagram explaining the master-slave Solr setup:

As shown in the preceding diagram, we have one Solr master server and one Solr slave server. The Solr master server is configured such that it automatically replicates indexes to the slave server. Each Liferay Portal application server will be connected to both master and slave servers. The Liferay Solr web plugin provides a way to configure separate Solr servers for read and write operations. To scale the search functionality further, we can also configure separate slave servers for each Liferay portal node. This will reduce the load on the slave server by limiting search requests.
We have covered most of the important architectural aspects that we should consider while designing a Liferay-based portal. We learned about the reference architecture of Liferay-Portal-based solutions and the sizing approach. We also learned about various architectural options for managing the Document and Medial library, caching, and static content delivery. We also talked about caching options available to boost performance. In the last section, we learned various architectural options available for the search functionality.
Now let's get ready to learn about load balancing and clustering in detail.