Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-using-redis-hostile-environment-advanced

27 Dec 2013

12 min read

Using Redis in a hostile environment (Advanced)

27 Dec 2013

(For more resources related to this topic, see here.) How to do it... Anyone who can read the files that Redis uses to persist your dataset has a full copy of all your data. Worse, anyone who can write to those files can, with a minimal amount of effort and some patience, change the data that your Redis server contains. Both of these things are probably not what you want, and thankfully it isn't particularly difficult to prevent. All you have to do is prevent anyone but the user running your Redis server from accessing the data directory your Redis instance is using. The simplest way to achieve this is by changing the owner of the directory to the user who runs your Redis server, and then disallow all privileges to everyone else, like this: Determine the user under whom you are running your Redis instance. You can typically find this out by running ps caux |grep redis-server. The name in the first column is the user under which Redis is running. Determine the directory in which Redis is storing its files. If you don't already know this, you can ask Redis by running CONFIG GET dir from within redis-cli. Ensure that the user running your Redis instance owns its data directory: chown <redisuser> /path/to/redis/datadir Restrict permissions on the data directory so that only the owner can access it at all: chmod 0700 /path/to/redis/datadir It is important that you protect the Redis data directory, and not individual data files, because Redis is regularly rewriting those data files, and the permissions you choose won't necessarily be preserved on the next rewrite. It is also a good practice to restrict access to your redis.conf file, because in some cases it can contain sensitive data. This is simply achieved: chmod 0600 /path/to/redis.conf If you run your Redis using applications on a server which is shared with other people, your Redis instance is at pretty serious risk of abuse. The most common way of connecting to Redis is via TCP, which can only limit access based on the address connecting to it. On a shared server, that address is shared amongst everyone using it, so anyone else on the same server as you can connect to your Redis. Not cool! If, however, the programs that need to access your Redis server are on the same machine as the Redis server, there is another, more secure, method of connection called Unix sockets. A Unix socket looks like a file on disk, and you can control its permissions just like a file, but Redis can listen on it (and clients can connect to it), in a very similar way to a TCP socket. Enabling Redis to listen on a Unix socket is fairly straightforward: Set the port parameter to 0 in your Redis configuration file. This will tell Redis to not listen on a TCP socket. This is very important to prevent miscreants from still being able to connect to your Redis server while you're happily using a Unix socket. Set the unixsocket parameter in your Redis configuration file to a fully-qualified filename where you want the socket to exist. If your Redis server runs as the same user as your client programs (which is common in shared-hosting situations), I recommend making the name of the file redis.sock, in the same directory as your Redis dataset. So, if you keep your Redis data in /home/joe/redis, set unixsocket to /home/joe/redis/redis.sock. Set the unixsocketperm parameter in your Redis configuration file to 600, or a more relaxed permission set if you know what you're doing. Again, this assumes that your Redis server and Redis-using programs are running as the same user. If they're not, you'll probably need a dedicated group and things get a lot more complicated—and beyond the scope of what can be covered in this guide. Once you've changed those configuration parameters and restarted Redis, you should find that the file you specified for unixsocket has magically appeared, and you can no longer connect to Redis using TCP. All that remains to do now is to configure your Redis-using programs to connect using the Unix socket, which is something you should find how to do in the manual for your particular Redis client library or application. Configuring Redis to use Unix sockets is all well and good when it's practical, but what about if you need to connect to Redis over a network? In that case, you'll need to let Redis listen on a TCP socket, but you should at least limit the computers that can connect to it with a suitable firewall configuration. While the properly paranoid systems administrator runs their systems with a default deny firewalling policy, not everyone shares this philosophy. However, given that by default, anyone who can connect to your Redis server can do anything they want with it, you should definitely configure a firewall on your Redis servers to limit incoming TCP connections to those which are coming from machines that have a legitimate need to talk to your Redis server. While it won't protect you from all attacks, it will cut down significantly on the attack surface, which is an important part of a defense-in-depth security strategy. Unfortunately, it is hard to give precise commands to configure a firewall ruleset, because there are so many firewall management tools in common use on systems today. In the interest of addressing the greatest common factor, though, I'll provide a set of Linux iptables rules, which should be translatable to whatever means of managing your firewall (whether it be an iptables wrapper of some sort on Linux, or a pf-based system on a BSD). In all of the following commands, replace the word <port> with the TCP port that your Redis server listens on. Also, note that these commands will temporarily stop all traffic to your Redis instance, so you'll want to avoid doing this on a live server. Setting up your firewall in an init script is the best course of action. Insert a rule that will drop all traffic to your Redis server port by default: iptables -I INPUT -p tcp --dport <port> -j DROP For each IP address you want to allow to connect, run these two commands to let the traffic in: iptables -I INPUT -p tcp --dport <port> -s <clientIP> -j ACCEPT iptables -I OUTPUT -p tcp --sport <port> -d <clientIP> -j ACCEPT A firewall is great, but sometimes you can't trust everyone with access to a machine that needs to talk to your Redis instance. In that case, you can use authentication to provide a limited amount of protection against miscreants: Select a very strong password. Redis is not hardened against repeated password guessing, so you want to make this very long and very random. If you make the password too short, an attacker can just write a program that tries every possible password very quickly and guess your password that way, not cool! Thankfully, since humans should rarely be typing this password, it can be a complete jumble, and very long. I like the command pwgen -sy 32 1 for all my "generating very strong password" needs. Configure all your clients to authenticate against the server, by sending the following command when they first connect to the server: AUTH <password> Edit your Redis configuration file to include a line like this: requirepass "\:d!&!:Y<p'TXBI0"ys96rfH]lxaA7|E" If your selected password contains any double-quotes, you'll need to escape them with a backslash (so " would become "), as I've done in the preceding example. You'll also need to double any actual backslashes (so becomes \), again as I've done in the password of the preceding example. Let the configuration changes take effect by restarting Redis. The authentication password cannot be changed at runtime. If you don't need certain commands, or want to limit the use of certain commands to a subset of clients, you can use the rename-command configuration parameter. Like firewalling, restricting, or disabling commands reduces your attack surface, but is not a panacea. The simplest solution to the risk of a dangerous command is to disable it. For example, if you want to stop anyone from accidentally (or deliberately) nuking all the data in your Redis server with a single command, you might decide to disable the FLUSHDB and FLUSHALL commands, by putting the following in your Redis config file: rename-command FLUSHDB ""rename-command FLUSHALL "" This doesn't stop someone from enumerating all the keys in your dataset with KEYS * and then deleting them all one-by-one, but it does raise the bar somewhat. If you never wanted to delete keys (but, say, only let them expire) you could disable the DEL command; although all that would probably do is encourage the wily cracker to enumerate all your keys and run PEXPIRE 1 over them. Arms races are a terrible thing... While disabling commands entirely is great when it can be done, you sometimes need a particular command, but you'd prefer not to give access to it to absolutely everyone—commands that can cause serious problems if misused, such as CONFIG. For those cases, you can rename the command to something hard-to-guess, as shown in the following command: rename-command CONFIG somegiantstringnobodywouldguess It's important to not make the new name of the command something easy-to-guess. Like the AUTH command, which we discussed previously, someone who wanted to do bad things could easily write a program to repeatedly guess what you've renamed your commands to. For any environment in which you can't trust the network (which these days is pretty much everywhere, thanks to the NSA and the Cloud), it is important to consider the possibility of someone watching all your data as it goes over the wire. There's little point configuring authentication, or renaming commands, if an attacker can watch all your data and commands flow back and forth. The least-worst option we have for generically securing network traffic from eavesdropping is still the Secure Sockets Layer (SSL). Redis doesn't support SSL natively; however, through the magic of the stunnel program, we can create a secure tunnel between Redis clients and servers. The setup we will build will look like the following diagram: In order to set this up, you'll need to do the following: In your redis.conf, ensure that Redis is only listening on 127.0.0.1, by setting the bind parameter: bind 127.0.0.1 Create a private key and certificate, which stunnel will use to secure the network communications. First, create a private key and a certificate request, by running: openssl req -out /etc/ssl/redis.csr -keyout /etc/ssl/redis.key -nodes -newkey rsa:2048 This will ask you all sorts of questions which you can answer with whatever you like. Create the self-signed certificate itself, by running: openssl x509 -req -days 3650 -signkey /etc/ssl/redis.key -in /etc/ssl/redis.csr -out /etc/ssl/redis.crt Finally, stunnel expects to find the private key and the certificate in the same file, so we'll concatenate the two together into one file: cat /etc/ssl/redis.key /etc/ssl/redis.crt >/etc/ssl/redis.pem Now, we've got our SSL keys, we can start stunnel on the server side, configuring it to listen out for SSL connections, and forward them to our local Redis server: stunnel -d 46379 -r 6379 -p /etc/ssl/redis.pem If your local Redis instance isn't listening on port 6379, or if you'd like to change the public port that stunnel listens on, you can, of course, adjust the preceding command line to suit. Also, don't forget to open up your firewall for the port you're listening on! Once you run the preceding command, you should be returned to a command line pretty quickly, because stunnel runs in the background. Although you examine your listening ports with netstat -ltn, you will still find that port 46379 is listening. If that's the case, you're done configuring the server. On the client(s), the process is somewhat simpler, because you don't have to create a whole new key pair. However, you do need the certificate from the server, because you want to be able to verify that you're connecting to the right SSL-enabled service. There's little point in using SSL if an attacker can just set up a fake SSL service and trick you into connecting to it. To set up the client, do the following: Copy /etc/ssl/redis.crt from the server to the same location on the client. Start stunnel on the client, as shown in the following code snippet: stunnel -c -v 3 -A /etc/ssl/redis.crt -d 127.0.0.1:56379 -r 192.0.2.42:46379 Replace 192.0.2.42 with the IP address of your Redis server. Verify that stunnel is listening correctly by running netstat -ltn, and look for something listening on port 56379. Reconfigure your client to connect to 127.0.0.1:56379, rather than directly to the remote Redis server. Summary This article contains an assortment of quick enhancements that you can deploy to your systems to protect them from various threats, which are frequently encountered on the Internet today. Resources for Article: Further resources on this subject: Implementing persistence in Redis (Intermediate) [Article] Python Text Processing with NLTK: Storing Frequency Distributions in Redis [Article] Coding for the Real-time Web [Article]

0
0
5755

Packt

26 Dec 2013

21 min read

Implementing the Naïve Bayes classifier in Mahout

Packt

26 Dec 2013

21 min read

0
0
3207

article-image-using-faceted-search-searching-finding

Packt

24 Dec 2013

11 min read

Using Faceted Search, from Searching to Finding

Packt

24 Dec 2013

11 min read

0
0
1945

Packt

20 Dec 2013

12 min read

SAP HANA Architecture

Packt

20 Dec 2013

12 min read

0
0
9284

Packt

20 Dec 2013

19 min read

Learning Option Pricing

Packt

20 Dec 2013

19 min read

(for more resources related to this topic, see here.) Introduction to options Options come in two variants, puts and calls. The call option gives the owner of the option the right, but not the obligation, to buy the underlying asset at the strike price. The put gives the holder of the contract, the right but not the obligation to sell the underlying asset. The Black-Scholes formula describes the European option, which can only be exercised on the maturity date, in contrast to for example American options. The buyer of the option pays a premium for this, to cover the risk taken from the counterpart side. Options have become very popular and they are traded on the major exchanges throughout the world, covering most asset-classes. The theory behind options can become complex pretty quick. In this article we'll look at the basics of options and how to explore them using code written in F#. Looking into contract specifications Options comes in a wide number of variations, some of them will be covered briefly below. The contract specifications for options will also depend on its type. Generally there are some properties that are more or less general to all of them. The general specifications are as follows: Side Quantity Strike price Expiration date Settlement terms The contract specifications, or know variables, are used then we valuate options. European options European options are the basic form of options that the other variants derive, American options and exotic options are some examples. We'll stick to European options in this article. American options American options are options that may be exercised on any trading day on or before expiry. Exotic options Exotic options are any of the broad category of options that may include complex financial structures and may be combinations of other instruments as well. Learning about Wiener processes Wiener processes are closely related to stochastic differential equations and volatility. Wiener processes or geometric Brownian motion, is defined as this: The formula describes the change in the stock price, or underlying, with a drift, μ, and a volatility, σ, and the Wiener process, Wt. This process is used to model the prices in Black-Scholes. We'll simulate market data using a Brownian motion, or Wiener process implemented in F# as a sequence. Sequences can be infinite and only the values used are evaluated, which suites or needs. We'll implement a generator function, to generate the Wiener process as a sequence as follows: // A normally distributed random generator let normd = new Normal(0.0, 1.0) let T = 1.0 let N = 500.0 let dt:float = T / N /// Sequences represent infinite number of elements // p -> probability mean // s -> scaling factor let W s = let rec loop x = seq { yield x; yield! loop (x + sqrt(dt)*normd.Sample()*s)} loop s;; Here we use the random function in normd.Sample(). Let's explain the parameters and the theory behind Brownian motion before looking at the implementation. The parameter T is the time used to create a discrete time increment dt. Notice that dt will assume there is 500 N:s, 500 items in the sequence, this is of course not always the case but will do fine in here. Next, we use recursion to create the sequence, where we add an increment to the previous value (x+...), where x c] xt-1. We can easily generate an arbitrary length of the sequence: > Seq.take 50 (W 55.00);; val it : seq<float> = seq [55.0; 56.72907873; 56.96071054;58.72850048; ...] Here we create a sequence of length 50. Let's plot the sequence to get a better understanding about the process. A Wiener process generated from the sequence generator above. Next we'll look at the code to generate the graph in the figure above. open System open System.Net open System.Windows.Forms open System.Windows.Forms.DataVisualization.Charting open Microsoft.FSharp.Control.WebExtensions open MathNet.Numerics.Distributions; // A normally distributed random generator let normd = new Normal(0.0, 1.0) // Create chart and form let chart = new Chart(Dock = DockStyle.Fill) let area = new ChartArea("Main") chart.ChartAreas.Add(area) let mainForm = new Form(Visible = true, TopMost = true, Width = 700, Height = 500) do mainForm.Text <- "Wiener process in F#" mainForm.Controls.Add(chart) // Create series for stock price let wienerProcess = new Series("process") do wienerProcess.ChartType <- SeriesChartType.Line do wienerProcess.BorderWidth <- 2 do wienerProcess.Color <- Drawing.Color.Red chart.Series.Add(wienerProcess) let random = new System.Random() let rnd() = random.NextDouble() let T = 1.0 let N = 500.0 let dt:float = T / N /// Sequences represent infinite number of elements let W s = let rec loop x = seq { yield x; yield! loop (x +/ sqrt(dt)*normd.Sample()*s)} loop s;; do (Seq.take 100 (W 55.00)) |> Seq.iter (wienerProcess.Points.Add>> ignore) Most of the code will be familiar to you at this stage, but the interesting part is the last line, where we can simply feed a chosen number of elements from the sequence into the Seq.iter which will plot the values, elegant and efficient. Learning the Black-Scholes formula The Black-Scholes formula was developed by Fischer Black and Myron Scholes in the 1970s. The Black-Scholes formula is a stochastic partial differential equation, which estimates the price an the option. The main idea behind the formula is the delta neutral portfolio. They created the theoretical delta neutral portfolio, to reduce the uncertainty involved. This was a necessary step to be able to come to the analytical formula which we’ll cover in this section. Below is the assumptions made under Black-Scholes: No arbitrage Possible to borrow money at a constant risk-free interest rate (throughout the holding of the option) Possible to buy, sell and short fractional amounts of underlying asset No transaction costs Price of underlying follows a Brownian Motion, constant drift and volatility No dividends paid from underlying security The simplest of the two variants is the one for call options. First the stock price is scaled using the cumulative distribution function with d1 as a parameter. Then the stock price is reduced by the discounted strike price scaled by the cumulative distribution function of d2. In other words, it’s the difference between the stock price and the strike using probability scaling of each and discounting the strike price. The formula for the put is a little more involved, but follows the same principles. The Black-Scholes formula are often separated into parts, where d1, d2 are the probability factors, describing the probability of the stock price being related to the strike price. The parameters used in the formula above can be summarized as follows: N – The cumulative distribution function T - Time to maturity, expressed in years S – The stock price, or other underlying K – The strike price r – The risk free interest rate σ – The volatility of the underlying Implementing Black-Scholes in F# Now that we've looked at the basics behind the Black-Scholes formula, and the parameters involved, we can implement it ourselves. The cumulative distribution function is implemented here to avoid dependencies and to illustrate that it's quite simple to implement it yourself too. Below is the Black-Scholes implemented in F#. It takes six arguments; the first is a call-put-flag that determines if it's a call or put option. The constants a1 to a5 are the Taylor series coefficients used in the approximation for the numerical implementation. let pow x n = exp(n * log(x)) type PutCallFlag = Put | Call /// Cumulative distribution function let cnd x = let a1 = 0.31938153 let a2 = -0.356563782 let a3 = 1.781477937 let a4 = -1.821255978 let a5 = 1.330274429 let pi = 3.141592654 let l = abs(x) let k = 1.0 / (1.0 + 0.2316419 * l) let w = (1.0-1.0/sqrt(2.0*pi)*exp(-l*l/2.0)*(a1*k+a2*k*k+a3*(pow k 3.0)+a4*(pow k 4.0)+a5*(pow k 5.0))) if x < 0.0 then 1.0 - w else w /// Black-Scholes // call_put_flag: Put | Call // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes call_put_flag s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) let d2=d1-v*sqrt(t) //let res = ref 0.0 match call_put_flag with | Put -> x*exp(-r*t)*cnd(-d2)-s*cnd(-d1) | Call -> s*cnd(d1)-x*exp(-r*t)*cnd(d2) Let's use the black_scholes function using some various numbers for call and put options. Suppose we want to know the price of an option, where the underlying is a stock traded at $58.60 with an annual volatility of 30%. The risk free interest rate is, let's say, 1%. Then we can use our formula, we defined previously to get the theoretical price according the Black-Scholes formula of a call option with 6 month to maturity (0.5 years): > black_scholes Call 58.60 60.0 0.5 0.01 0.3;; val it : float = 4.465202269 And the value for the put option, just by changing the flag to the function: > black_scholes Put 58.60 60.0 0.5 0.01 0.3;; val it : float = 5.565951021 Sometimes it's more convenient to express the time to maturity in number of days, instead of years. Let's introduce a helper function for that purpose. /// Convert the nr of days to years let days_to_years d = (float d) / 365.25 Note the number 365.25 which includes the factor for leap years. This is not necessary in our examples, but used for correctness. We can now use this function instead, when we know the time in days. > days_to_years 30;; val it : float = 0.08213552361 Let's use the same example above, but now with 20 days to maturity. > black_scholes Call 58.60 60.0 (days_to_years 20) 0.01 0.3;; val it : float = 1.065115482 > black_scholes Put 58.60 60.0 (days_to_years 20) 0.01 0.3;; val it : float = 2.432270266 Using Black-Scholes together with Charts Sometimes it's useful to be able to plot the price of an option until expiration. We can use our previously defined functions and vary the time left and plot the values coming out. In this example we'll make a program that outputs the graph seen below. Chart showing prices for call and put option as function of time /// Plot price of option as function of time left to maturity #r "System.Windows.Forms.DataVisualization.dll" open System open System.Net open System.Windows.Forms open System.Windows.Forms.DataVisualization.Charting open Microsoft.FSharp.Control.WebExtensions /// Create chart and form let chart = new Chart(Dock = DockStyle.Fill) let area = new ChartArea("Main") chart.ChartAreas.Add(area) chart.Legends.Add(new Legend()) let mainForm = new Form(Visible = true, TopMost = true, Width = 700, Height = 500) do mainForm.Text <- "Option price as a function of time" mainForm.Controls.Add(chart) /// Create series for call option price let optionPriceCall = new Series("Call option price") do optionPriceCall.ChartType <- SeriesChartType.Line do optionPriceCall.BorderWidth <- 2 do optionPriceCall.Color <- Drawing.Color.Red chart.Series.Add(optionPriceCall) /// Create series for put option price let optionPricePut = new Series("Put option price") do optionPricePut.ChartType <- SeriesChartType.Line do optionPricePut.BorderWidth <- 2 do optionPricePut.Color <- Drawing.Color.Blue chart.Series.Add(optionPricePut) /// Calculate and plot call option prices let opc = [for x in [(days_to_years 20)..(-(days_to_years 1))..0.0]do yield black_scholes Call 58.60 60.0 x 0.01 0.3] do opc |> Seq.iter (optionPriceCall.Points.Add >> ignore) /// Calculate and plot put option prices let opp = [for x in [(days_to_years 20)..(-(days_to_years 1))..0.0]do yield black_scholes Put 58.60 60.0 x 0.01 0.3] do opp |> Seq.iter (optionPricePut.Points.Add >> ignore) The code is just a modified version of the code seen in the previous article, with the options parts added. We have two series in this chart, one for call options and one for put options. We also add a legend for each of the series. The last part is the calculation of the prices and the actual plotting. List comprehensions are used for compact code, and the Black-Scholes formula is called for everyday until expiration, where the days are counted down by one day at each step. It's up to you as a reader to modify the code to plot various aspects of the option, such as the option price as a function of an increase in the underlying stock price etc. Introducing the greeks The greeks are partial derivatives of the Black-Scholes formula, with respect to a particular parameter such as time, rate, volatility or stock price. The greeks can be divided into two or more categories, with respect to the order of the derivatives. Below we'll look at the first and second order greeks. First order greeks In this section we'll present the first order greeks using the table below. Name Symbol Description Delta Δ Rate of change of option value with respect to change in the price of the underlying asset. Vega ν Rate of change of option value with respect to change in the volatility of the underlying asset. Referred to as the volatility sensitivity. Theta Θ Rate of change of option value with respect to time. The sensitivity with respect to time will decay as time elapses, phenomenon referred to as the "time decay." Rho ρ Rate of change of option value with respect to the interest rate. Second order greeks In this section we'll present the second order greeks using the table below. Name Symbol Description Gamma Γ Rate of change of delta with respect to change in the price of the underlying asset. Veta - Rate of change in Vega with respect to time. Vera - Rate of change in Rho with respect to volatility. Some of the second order greeks are omitted for clarity, we'll not cover these in this book. Implementing the greeks in F# Let's implement the greeks; Delta, Gamma, Vega, Theta and Rho. First we look at the formulas for each greek. In some of the cases they vary for calls and puts respectively. We need the derivative of the cumulative distribution function, which in fact is the normal distribution with zero mean and standard deviation of one: /// Normal distribution open MathNet.Numerics.Distributions; let normd = new Normal(0.0, 1.0) Delta Delta is the rate of change of option price with respect to change in the price of the underlying asset. /// Black-Scholes Delta // call_put_flag: Put | Call // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes_delta call_put_flag s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) match call_put_flag with | Put -> cnd(d1) - 1.0 | Call -> cnd(d1) Gamma Gamma is the rate of change of delta with respect to change in the price of the underlying asset. This is the 2nd derivative, with respect to price of the underlying asset. It measures the acceleration of the price of the option with respect to the underlying price. /// Black-Scholes Gamma // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes_gamma s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) normd.Density(d1) / (s*v*sqrt(t) Vega Vega is the rate of change of option value with respect to change in the volatility of the underlying asset. It is referred to as the volatility sensitivity. /// Black-Scholes Vega // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes_vega s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) s*normd.Density(d1)*sqrt(t) Theta Theta is the rate of change of option value with respect to time. The sensitivity with respect to time will decay as time elapses, phenomenon referred to as the “time decay.” /// Black-Scholes Theta // call_put_flag: Put | Call // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes_theta call_put_flag s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) let d2=d1-v*sqrt(t) let res = ref 0.0 match call_put_flag with | Put -> -(s*normd.Density(d1)*v)/(2.0*sqrt(t))+r*x*exp(-r*t)*cnd(-d2) | Call -> -(s*normd.Density(d1)*v)/(2.0*sqrt(t))-r*x*exp(-r*t)*cnd(d2) Rho Rho is rate of change of option value with respect to the interest rate. /// Black-Scholes Rho // call_put_flag: Put | Call // s: stock price // x: strike price of option // t: time to expiration in years // r: risk free interest rate // v: volatility let black_scholes_rho call_put_flag s x t r v = let d1=(log(s / x) + (r+v*v*0.5)*t)/(v*sqrt(t)) let d2=d1-v*sqrt(t) let res = ref 0.0 match call_put_flag with | Put -> -x*t*exp(-r*t)*cnd(-d2) | Call -> x*t*exp(-r*t)*cnd(d2) Investigating the sensitivity of the of the greeks Now that we have all the greeks implemented we'll investigate the sensitivity of some of them and see how they vary when the underlying stock price changes. The figure below is a surface plot with four of the greeks where time and underlying price is changing. The figure below is generated in MATLAB, and will not be generated in F#. We’ll use a 2D version of the graph to study the greeks below. Surface plot of Delta, Gamma, Theta and Rho of a call option. In this section we'll start by plotting the value of Delta for a call option where we vary the price of the underlying. This will result in the following 2D plot: A plot of call option delta versus price of underlying The result in the plot seen in figure above will be generated by the code presented next. We'll reuse most of the code from the example where we looked at the option prices for calls and puts. A slightly modified version is presented here, where the price of the underlying varies from $10.0 to $70.0. /// Plot delta of call option as function of underlying price #r "System.Windows.Forms.DataVisualization.dll" open System open System.Net open System.Windows.Forms open System.Windows.Forms.DataVisualization.Charting open Microsoft.FSharp.Control.WebExtensions /// Create chart and form let chart = new Chart(Dock = DockStyle.Fill) let area = new ChartArea("Main") chart.ChartAreas.Add(area) chart.Legends.Add(new Legend()) let mainForm = new Form(Visible = true, TopMost = true, Width = 700, Height = 500) do mainForm.Text <- "Option delta as a function of underlying price" mainForm.Controls.Add(chart) /// Create series for call option delta let optionDeltaCall = new Series("Call option delta") do optionDeltaCall.ChartType <- SeriesChartType.Line do optionDeltaCall.BorderWidth <- 2 do optionDeltaCall.Color <- Drawing.Color.Red chart.Series.Add(optionDeltaCall) /// Calculate and plot call delta let opc = [for x in [10.0..1.0..70.0] do yield black_scholes_delta Call x 60.0 0.5 0.01 0.3] do opc |> Seq.iter (optionDeltaCall.Points.Add >> ignore) We can extend the code to plot all four greeks, as in the figure with the surface plots, but here in 2D. The result will be a graph like seen in the figure below. Graph showing the for Greeks for a call option with respect to price change (x-axis). Code listing for visualizing the four greeks Below is the code listing for the entire program used to create the graph above. #r "System.Windows.Forms.DataVisualization.dll" open System open System.Net open System.Windows.Forms open System.Windows.Forms.DataVisualization.Charting open Microsoft.FSharp.Control.WebExtensions /// Create chart and form let chart = new Chart(Dock = DockStyle.Fill) let area = new ChartArea("Main") chart.ChartAreas.Add(area) chart.Legends.Add(new Legend()) let mainForm = new Form(Visible = true, TopMost = true, Width = 700, Height = 500) do mainForm.Text <- "Option delta as a function of underlying price" mainForm.Controls.Add(chart) We’ll create one series for each greek: /// Create series for call option delta let optionDeltaCall = new Series("Call option delta") do optionDeltaCall.ChartType <- SeriesChartType.Line do optionDeltaCall.BorderWidth <- 2 do optionDeltaCall.Color <- Drawing.Color.Red chart.Series.Add(optionDeltaCall) /// Create series for call option gamma let optionGammaCall = new Series("Call option gamma") do optionGammaCall.ChartType <- SeriesChartType.Line do optionGammaCall.BorderWidth <- 2 do optionGammaCall.Color <- Drawing.Color.Blue chart.Series.Add(optionGammaCall) /// Create series for call option theta let optionThetaCall = new Series("Call option theta") do optionThetaCall.ChartType <- SeriesChartType.Line do optionThetaCall.BorderWidth <- 2 do optionThetaCall.Color <- Drawing.Color.Green chart.Series.Add(optionThetaCall) /// Create series for call option vega let optionVegaCall = new Series("Call option vega") do optionVegaCall.ChartType <- SeriesChartType.Line do optionVegaCall.BorderWidth <- 2 do optionVegaCall.Color <- Drawing.Color.Purple chart.Series.Add(optionVegaCall) Next, we’ll calculate the values to plot for each greek: /// Calculate and plot call delta let opd = [for x in [10.0..1.0..70.0] do yield black_scholes_delta Call x 60.0 0.5 0.01 0.3] do opd |> Seq.iter (optionDeltaCall.Points.Add >> ignore) /// Calculate and plot call gamma let opg = [for x in [10.0..1.0..70.0] do yield black_scholes_gamma x 60.0 0.5 0.01 0.3] do opg |> Seq.iter (optionGammaCall.Points.Add >> ignore) /// Calculate and plot call theta let opt = [for x in [10.0..1.0..70.0] do yield black_scholes_theta Call x 60.0 0.5 0.01 0.3] do opt |> Seq.iter (optionThetaCall.Points.Add >> ignore) /// Calculate and plot call vega let opv = [for x in [10.0..1.0..70.0] do yield black_scholes_vega x 60.0 0.1 0.01 0.3] do opv |> Seq.iter (optionVegaCall.Points.Add >> ignore) Summary In this article, we looked into using F# for investigating different aspects of volatility. Volatility is an interesting dimension of finance where you quickly dive into complex theories and models. Here it's very much helpful to have a powerful tool such as F# and F# Interactive. We've just scratched the surface of options and volatility in this article. There is a lot more to cover, but that's outside the scope of this book. Most of the content here will be used in the trading system. resources for article: further resources on this subject: Working with Windows Phone Controls [article] Simplifying Parallelism Complexity in C# [article] Watching Multiple Threads in C# [article]

0
0
3148

article-image-key-components-and-inner-working-impala

Packt

20 Dec 2013

7 min read

Key components and inner working of Impala

Packt

20 Dec 2013

7 min read

(For more resources related to this topic, see here.) Impala Core Components: Here we will discuss following three important components: Impala Daemon Impala Statestore Impala Metadata and Metastore Putting together above components with Hadoop and application or command line interface, we can conceptualize them as below: Impala Execution Architecture: Essentially Impala daemons receives queries from variety of sources and distribute query load to other Impala daemons running on other nodes and while doing so interact with Statestore for node specific update and access Metastore, either stored in centralized database or in local cache. Now to complete the Impala execution we will discuss how Impala interacts with other components i.e. Hive, HDFS and HBase. Impala working with Apache Hive: We have already discussed earlier about Impala Metastore using the centralized database as Metastore and Hive also uses the same MySQL or PostgreSQL database for same kind of data. Impala provides same SQL like queries interface use in Apache Hive. Because both Impala and Hive share same database as Metastore, Impala can access Hive specific tables definitions if Hive table definition use the same file format, compression codecs and Impala-supported data types in their column values. Apache Hive provides various kinds of file type processing support to Impala. When using other then text file format i.e. RCFile, Avro, SequenceFile the data must be loaded through Hive first and then Impala can query the data from these file formats. Impala can perform read operation on more types of data using SELECT statement than it can perform write operation using INSERT statement. The ANALYZE TABLE statement in Hive generates useful table and column statistics and Impala use these valuable statistics to optimize the queries. Impala working with HDFS: Impala table data is actually regular data files stored in HDFS (Hadoop Distributed File System) and Impala uses HDFS as its primary data storage medium. As soon as a data file or a collection of files is available in specific folder of new table, Impala reads all of the files regardless of their name and new data is included in files with the name controlled by Impala. HDFS provides data redundancy through replication factor and Impala relies on such redundancy to access data on other datanodes in case it is not available on a specific datanode. We have already learnt earlier that Impala also maintains the information about physical location of the blocks about data files in HDFS,which helps data access in case of node failure. Impala working with HBase: HBase is a distributed, scalable, big data storage system, provides random, real-time read and write access to data stored on HDFS. HBase is a database storage system, sits on top of HDFS however like other traditional database storage system, HBase does not provide built-in SQL support however 3party applications can provide such functionality. To use HBase, first user defines tables in Impala and then maps them to the equivalent HBase tables. Once table relationship is established, users can submit queries into HBase table through Impala. Not only that join operations can be formed including HBase and Impala tables. Impala Security: Impala is designed & developed on run on top of Hadoop. So you must understand the Hadoop security model as well as the security provided in OS where Hadoop is running. If Hadoop is running on Linux then as Linux administrator and Hadoop administrator user can harden and tighten the security, which definitely can be taken in account with the security provided by Impala. Impala 1.1 or later uses Sentry Open Source Project to provide detailed authorization framework for Hadoop. Impala 1.1.1 supports auditing capabilities in cluster by creating auditing data, which can be collected from all nodes and then processing for further analysis and insight. Data Visualization using Impala: Visualizing data is as important as processing the data. Human brain perceives pictures fast then reading data in tables and because of it data visualization provides super fast understanding to large amount of data in split seconds. Reports, charts, interactive dashboards and any form of info-graphics are all part of data visualization and provide deeper understanding of results. To connect with 3rd party applications, Cloudera provides ODBC and JDBC connectors. These connectors are installed on machines where 3rd party applications are running and by configuring correct Impala server and port details on those connectors, 3rd party applications connect with Impala and submit those queries and then take results back to application. The result then displayed on 3rd party application, where it is rendered on graphics device for visualization or displayed in table format or processed further depending on application requirement. In this section we will cover few notable 3rd party applications, which can take advantage of Impala super fast query processing and than display amazing graphical results. Tableau and Impala: Tableau Software supporting Impala by providing access to tables on Impala using Impala ODBC connector provided by Tableau. Tableau is one of the most prominent data visualization software technologies in recent days and used by thousands of enterprises daily to get intelligence out of their data. Tableau software is available on Windows OS and an ODBC connector is provided by Cloudera to make this connection a reality. You can visit the link below to download Impala connector for Tableau: http://go.cloudera.com/tableau_connector_download Once Impala connector is installed on a machine where Tableau software is running, and configured correctly, Tableau software is ready to work with Impala. In this image below Tableau is connected to Impala server at port 21000, and then selected a table located at Impala: Once table is selected, particular fields are select and data is displayed in graphical format in various mind-blowing visualizations. The screenshot below displays one example of showing such visualization: Microsoft Excel and Impala Microsoft Excel is one of the widely adopted data processing application used by business professional worldwide. You can connect Microsoft Excel with Impala using another ODBC connector provided by Simba Technology. Microstrategy and Impala Microstrategy is another big player in data analysis and visualization software and uses ODBC drive to connect with Impala to render amazing looking visualizations. The connectivity model between Microstrategy software and Cloudera Impala is shown as below: Zoomdata and Impala: Zoomdata is considered to new generation of data user interface by addressing streams of data instead of sets of data. Zoomdata processing engine performs continuous mathematical operations across data streams in real-time to create visualization on multitude of devices. The visualization updates itself as the new data arrives and re-computed by Zoomdata. As shown in in the image below, you can see Zoomdata application uses Impala as a source of data, which is configured underneath to use of one the available connectors to connect with Impala: Once connection are made user can see amazing data visualization as shown below: Real-time Query with Impala on Hadoop: Impala is marketed as a product, which can do “Real-time queries on Hadoop” by its developer Cloudera. Impala is open source implementation based on above-mentioned Google Dremel technology, available free for anyone of use. Impala is available as package product, free to use or can be compiled from its source, which can run queries in memory to make them real-time and in some cases depending on type of data, if Parquet file format is used as input data source, it can expedite the query processing to multifold speed. Real-time query subscription with Impala: Cloudera provides Real-time Query (RTQ) Subscription as an add-on to Cloudera Enterprise subscription. You can still use Impala as free open source product however taking RTQ subscription makes you take advantage of Cloudera paid service to extend its usability and resilience. By accepting RTQ subscription you cannot only have access to Cloudera Technical support, but also you can work with Impala development team to provide ample feedback to shape up the product design and implementation. Summary Thus concludes the discussion on the key components of Impala and their inner working. Resources for Article: Further resources on this subject: Securing the Hadoop Ecosystem [Article] Cloudera Hadoop and HP Vertica [Article] Hadoop and HDInsight in a Heartbeat [Article]

0
0
2651

article-image-sql-server-analysis-services-administering-and-monitoring-analysis-services

Packt

20 Dec 2013

19 min read

SQL Server Analysis Services – Administering and Monitoring Analysis Services

Packt

20 Dec 2013

19 min read

(For more resources related to this topic, see here.) If your environment has only one or a handful of SSAS instances, they can be managed by the same database administrators managing SQL Server and other database platforms. In large enterprises, there could be hundreds of SSAS instances managed by dedicated SSAS administrators. Regardless of the environment, you should become familiar with the configuration options as well as troubleshooting methodologies. In large enterprises, you might also be required to automate these tasks using the Analysis Management Objects (AMO) code. Analysis Services is a great tool for building business intelligence solutions. However, much like any other software, it does have its fair share of challenges and limitations. Most frequently encountered enterprise business intelligence system goals include quick provision of relevant data to the business users and assuring excellent query performance. If your cubes serve a large, global community of users, you will quickly learn that SSAS is optimized to run a single query as fast as possible. Once users send a multitude of heavy queries in parallel, you can expect to see memory, CPU, and disk-related performance counters to quickly rise, with a corresponding increase in query execution duration which, in turn, worsens user experience. Although you could build aggregations to improve query performance, doing so will lengthen cube processing time, and thereby, delay the delivery of essential data to decision makers. It might also be tempting to consider using ROLAP storage mode in lieu of MOLAP so that processing times are shorter, but MOLAP queries usually outperform ROLAP due to heavy compression rates. Hence, figuring out the right storage mode and appropriate level of aggregations is a great balancing act. If you cannot afford using ROLAP, and query performance is paramount to successful cube implementation, you should consider scaling your solution. You have two options for scaling, given as follows: Scaling up: This option means purchasing servers with more memory, more CPU cores, and faster disk drives Scaling out: This option means purchasing several servers of approximately the same capacity and distributing the querying workload across multiple servers using a load balancing tool SSAS lends itself best to the second option—scaling out. Later in this article you will learn how to separate processing and querying activities and how to ensure that all servers in the querying pool have the same data. SSAS instance configuration options All Analysis Services configuration options are available in the msmdsrv.ini file found in the config folder under the SSAS installation directory. Instance administrators can also modify some, but not all configuration properties, using SQL Server Management Studio (SSMS). SSAS has a multitude of properties that are undocumented—this normally means that such properties haven't undergone thorough testing, even by the software's developers. Hence, if you don't know exactly what the configuration setting does, it's best to leave the setting at default value. Even if you want to test various properties on a sandbox server, make a copy of the configuration file prior to applying any changes. How to do it... To modify the SSAS instance settings using the configuration file, perform the following steps: Navigate to the config folder within your Analysis Services installation directory. By default, this will be C:\Program Files\Microsoft SQL Server\MSAS11.instance_name\OLAP\Config. Open the msmdsrv.ini file using Notepad or another text editor of your choice. The file is in the XML format, so every property is enclosed in opening and closing tags. Search for the property of interest, modify its value as desired, and save the changes. For example, in order to change the upper limit of the processing worker threads, you would look for the <ThreadPool><Process><MaxThreads> tag sequence and set the values as shown in the following excerpt from the configuration file: <Process> <MinThreads>0</MinThreads> <MaxThreads>250</MaxThreads> <PriorityRatio>2</PriorityRatio> <Concurrency>2</Concurrency> <StackSizeKB>0</StackSizeKB> <GroupAffinity/> </Process> To change the configuration using SSMS, perform the following steps: Connect to the SSAS instance using the instance administrator account and choose Properties. If your account does not have sufficient permissions, you will get an error that only administrators can edit server properties. Change the desired properties by altering the Value column on the General page of the resulting dialog, as shown in the following screenshot: Advanced properties are hidden by default. You must check the Show Advanced (All) Properties box to see advanced properties. You will not see all the properties in SSMS even after checking this box. The only way to edit some properties is by editing msmdsrv.ini as previously discussed. Make a note of the Reset Default button in the bottom-right corner. This button comes in handy if you've forgotten what the configuration values were before you changed them and want to revert to the default settings. The default values are shown in the dialog box, which can provide guidance as to which properties have been altered. Some configuration settings require restarting the SSAS instance prior to being executed. If this is the case, the Restart column will have a value of Yes. Once you're happy with your changes, click on OK and restart the instance if necessary. You can restart SSAS using the Services.msc applet from the command line using the NET STOP / NET START commands, or directly in SSMS by choosing the Restart option after right-clicking on the instance. How it works... Discussing every SSAS property would make this article extremely lengthy; doing so is well beyond the scope of the book. Instead, in this section, I will summarize the most frequently used properties. Often, synchronization has to copy large partition datafiles and aggregation files. If the timeout value is exceeded, synchronization fails. Increase the value of the <Network><Listener><ServerSendTimeout> and <Network><Listener><ServerReceiveTimeout> properties to allow a longer time span for copying each file. By default, SSAS can use a lazy thread to rebuild missing indexes and aggregations after you process partition data. If the <OLAP><LazyProcessing><Enabled> property is set to 0, the lazy thread is not used for building missing indexes—you must use an explicit processing command instead. The <OLAP><LazyProcessing><MaxCPUUsage> property throttles the maximum CPU that could be used by the lazy thread. If efficient data delivery is your topmost priority, you can exploit the ProcessData option instead of ProcessFull. To build aggregations after the data is loaded, you must set the partition's ProcessingMode property to LazyAggregations. The SSAS formula engine is single threaded, so queries that perform heavy calculations will only use one CPU core, even on a multiCPU computer. The storage engine is multithreaded; hence, queries that read many partitions will require many CPU cycles. If you expect storage engine heavy queries, you should lower the CPU usage threshold for LazyAggregations. By default, Analysis Services records subcubes requested for every 10th query in the query log table. If you'd like to design aggregations based on query logs, you should change the <Log><QueryLog><QueryLogSampling> property value to 1 so that the SSAS logs subcube requests for every query. SSAS can use its own memory manager or the Windows memory manager. If your SSAS instance consistently becomes unresponsive, you could try using the Windows memory manager. Set <Memory><MemoryHeapType> to 2 and <Memory><HeapTypeForObjects> to 0. The Analysis Services memory manager values are 1 for both the properties. You must restart the SSAS service for the changes to these properties to take effect. The <Memory><PreAllocate> property specifies the percentage of total memory to be reserved at SSAS startup. SSAS normally allocates memory dynamically as it is required by queries and processing jobs. In some cases, you can achieve performance improvement by allocating a portion of the memory when the SSAS service starts. Setting this value will increase the time required to start the service. The memory will not be released back to the operating system until you stop the SSAS service. You must restart the SSAS service for changes to this property to take effect. The <Log><FlightRecorder><FileSizeMB>and <Log><FlightRecorder><LogDurationSec> properties control the size and age of the FlightRecorder trace file before it is recycled. You can supply your own trace definition file to include the trace events and columns you wish to monitor using the <Log><FlightRecorder><TraceDefinitionFile> property. If FlightRecorder collects useful trace events, it can be an invaluable troubleshooting tool. By default, the file is only allowed to grow to 10 MB or 60 minutes. Long processing jobs can take up much more space, and their duration could be much longer than 60 minutes. Hence, you should adjust the settings as necessary for your monitoring needs. You should also adjust the trace events and columns to be captured by FlightRecorder. You should consider adjusting the duration to cover three days (in case the issue you are researching happens over a weekend). The <Memory><LowMemoryLimit> property controls the point—amount of memory used by SSAS—at which the cleaner thread becomes actively engaged in reclaiming memory from existing jobs. Each SSAS command (query, processing, backup, synchronization, and so on) is associated with jobs that run on threads and use system resources. We can lower the value of this setting to run more jobs in parallel (though the performance of each job could suffer). Two properties control the maximum amount of memory that a SSAS instance could use. Once memory usage reaches the value specified by <Memory><TotalMemoryLimit>, the cleaner thread becomes particularly aggressive at reclaiming memory. The <Memory><HardMemoryLimit> property specifies the absolute memory limit—SSAS will not use memory above this limit. These properties are useful if you have SSAS and other applications installed on the same server computer. You should reserve some memory for other applications and the operating system as well. When HardMemoryLimit is reached, SSAS will disconnect the active sessions, advising that the operation was cancelled due to memory pressure. All memory settings are expressed in percentages if the values are less than or equal to 100. Values above 100 are interpreted as kilobytes. All memory configuration changes require restart of the SSAS service to take effect. In the prior releases of Analysis Services, you could only specify the minimum and maximum number of threads used for queries and processing jobs. With SSAS 2012, you can also specify the limits for the input/output job threads using the <ThreadPool><IOProcess> property. The <Process><IndexBuildThreshold> property governs the minimum number of rows within a partition for which SSAS will build indexes. The default value is 4096. SSAS decides which partitions it needs to scan for each query based on the partition index files. If the partition does not have indexes, it will be scanned for all the queries. Normally, SSAS can read small partitions without greatly affecting query performance. But if you have many small partitions, you should lower the threshold to ensure each partition has indexes. The <Process><BufferRecordLimit> and <Process><BufferMemoryLimit> properties specify the number of records for each memory buffer and the maximum percentage of memory that can be used by a memory buffer. Lower the value of these properties to process more partitions in parallel. You should monitor processing using the SQL Profiler to see if some partitions included in the processing batch are being processed while the others are in waiting. The <ExternalConnectionTimeout> and <ExternalCommandTimeout> properties control how long an SSAS command should wait for connecting to a relational database or how long SSAS should wait to execute the relational query before reporting timeout. Depending on the relational source, it might take longer than 60 seconds (that is, the default value) to connect. If you encounter processing errors without being able to connect to the relational source, you should increase the ExternalConnectionTimeout value. It could also take a long time to execute a query; by default, the processing query will timeout after one hour. Adjust the value as needed to prevent processing failures. The contents of the <AllowedBrowsingFolders> property define the drives and directories that are visible when creating databases, collecting backups, and so on. You can specify multiple items separated using the pipe (|) character. The <ForceCommitTimeout> property defines how long a processing job's commit operation should wait prior to cancelling any queries/jobs which may interfere with processing or synchronization. A long running query can block synchronization or processing from committing its transaction. You can adjust the value of this property from its default value of 30 seconds to ensure that processing and queries don't step on each other. The <Port> property specifies the port number for the SSAS instance. You can use the hostname followed by a colon (:) and a port number for connecting to the SSAS instance in lieu of the instance name. Be careful not to supply the port number used by another application; if you do so, the SSAS service won't start. The <ServerTimeout> property specifies the number of milliseconds after which a query will timeout. The default value is 1 hour, which could be too long for analytical queries. If the query runs for an hour, using up system resources, it could render the instance unusable by any other connection. You can also define a query timeout value in the client application's connection strings. Client setting overrides the server-level property. There's more... There are many other properties you can set to alter SSAS instance behavior. For additional information on configuration properties, please refer to product documentation at http://technet.microsoft.com/en-us/library/ms174556.aspx. Creating and dropping databases Only SSAS instance administrators are permitted to create, drop, restore, detach, attach, and synchronize databases. This recipe teaches administrators how to create and drop databases. Getting ready Launch SSMS and connect to your Analysis Services instance as an administrator. If you're not certain that you have administrative properties to the instance, right-click on the SSAS instance and choose Properties. If you can view the instance's properties, you are an administrator; otherwise, you will get an error indicating that only instance administrators can view and alter properties. How to do it... To create a database, perform the following steps: Right-click on the Databases folder and choose New Database. Doing so launches the New Database dialog shown in the following screenshot. Specify a descriptive name for the database, for example, Analysis_Services_Administration. Note that the database name can contain spaces. Each object has a name as well as an identifier. The identifier value is set to the object's original name and cannot be changed without dropping and recreating the database; hence, it is important to come up with a descriptive name from the very beginning. You cannot create more than one database with the same name on any SSAS instance. Specify the storage location for the database. By default, the database will be stored under the \OLAP\DATA folder of your SSAS installation directory. The only compelling reason to change the default is if your data drive is running out of disk space and cannot support the new database's storage requirements. Specify the impersonation setting for the database. You could also specify the impersonation property for each data source. Alternatively, each data source can inherit the DataSourceImpersonationInfo property from the database-level setting. You have four choices as follows: Specific user name (must be a domain user) and password: This is the most secure option but requires updating the password if the user changes the password Analysis Services service account Credentials of the current user: This option is specifically for data mining Default: This option is the same as using the service account option Specify an optional description for the database. As with majority of other SSMS dialogs, you can script the XMLA command you are about to execute by clicking on the Script button. To drop an existing database, perform the following steps: Expand the Databases folder on the SSAS instance, right-click on the database, and choose Delete. The Delete objects dialog allows you to ignore errors; however, it is not applicable to databases. You can script the XMLA command if you wish to review it first. An alternative way of scripting the DELETE command is to right-click on the database and navigate to Script database as | Delete To | New query window. Monitoring SSAS instance using Activity Viewer Unlike other database systems, Analysis Services has no system databases. However, administrators still need to check the activity on the server, ensure that cubes are available and can be queried, and there is no blocking. You can exploit a tool named Analysis Services Activity Viewer 2008 to monitor SSAS Versions 2008 and later, including SSAS 2012. This tool is owned and maintained by the SSAS community and can be downloaded from www.codeplex.com. Activity Viewer allows viewing active and dormant sessions, current XMLA and MDX queries, locks, as well as CPU and I/O usage by each connection. Additionally, you can define rules to raise alerts when a particular condition is met. How to do it... To monitor an SSAS instance using Activity Viewer, perform the following steps: Launch the application by double-clicking on ActivityViewer.exe. Click on the Add New Connection button on the Overview tab. Specify the hostname and instance name or the hostname and port number for the SSAS instance and then click on OK. For each SSAS instance you connect to, Activity Viewer adds a new tab. Click on the tab for your SSAS instance. Here, you will see several pages as shown in the following screenshot: Alerts: This page shows any sessions that met the condition found in the Rules page. Users: This page displays one row for each user as well as the number of sessions, total memory, CPU, and I/O usage. Active Sessions: This page displays each session that is actively running an MDX, Data Mining Extensions (DMX), or XMLA query. This page allows you to cancel a specific session by clicking on the Cancel Session button. Current Queries: This page displays the actual command's text, number of kilobytes read and written by the command, and the amount of CPU time used by the command. This page allows you to cancel a specific query by clicking on the Cancel Query button. Dormant Sessions: This page displays sessions that have a connection to the SSAS instance but are not currently running any queries. You can also disconnect a dormant session by clicking on the Cancel Session button. CPU: This page allows you to review the CPU time used by the session as well as the last command executed on the session. I/O: This page displays the number of reads and writes as well as the kilobytes read and written by each session. Objects: This page shows the CPU time and number of reads affecting each dimension and partition. This page also shows the full path to the object's parent; this is useful if you have the same naming convention for partitions in multiple measure groups. Not only do you see the partition name, but also the full path to the partition's measure group. This page also shows the number of aggregation hits for each partition. If you find that a partition is frequently queried and requires many reads, you should consider building aggregations for it. Locks: This page displays the locks currently in place, whether already granted or waiting. Be sure to check the Lock Status column—the value of 0 indicates that the lock request is currently blocked. Rules: This page allows defining conditions that will result in an alert. For example, if the session is idle for over 30 minutes or if an MDX query takes over 30 minutes, you should get alerted. How it works... Activity Viewer monitors Analysis Services using Dynamic Management Views (DMV). In fact, capturing queries executed by Activity Viewer using SQL Server Profiler is a good way of familiarizing yourself with SSAS DMV's. For example, the Current Queries page checks the $system.DISCOVER_COMMANDS DMV for any actively executing commands by running the following query: SELECT SESSION_SPID,COMMAND_CPU_TIME_MS,COMMAND_ELAPSED_TIME_MS, COMMAND_READ_KB,COMMAND_WRITE_KB, COMMAND_TEXT FROM $system.DISCOVER_COMMANDS WHERE COMMAND_ELAPSED_TIME_MS > 0 ORDER BY COMMAND_CPU_TIME_MS DESC The Active Sessions page checks the $system.DISCOVER_SESSIONS DMV with the session status set to 1 using the following query: SELECT SESSION_SPID,SESSION_USER_NAME, SESSION_START_TIME, SESSION_ELAPSED_TIME_MS,SESSION_CPU_TIME_MS, SESSION_ID FROM $SYSTEM.DISCOVER_SESSIONS WHERE SESSION_STATUS = 1 ORDER BY SESSION_USER_NAME DESC The Dormant sessions page runs a very similar query to that of the Active Sessions page, except it checks for sessions with SESSION_STATUS=0—sessions that are currently not running any queries. The result set is also limited to top 10 sessions based on idle time measured in milliseconds. The Locks page examines all the columns of the $system.DISCOVER_LOCKS DMV to find all requested locks as well as lock creation time, lock type, and lock status. As you have already learned, the lock status of 0 indicates that the request is blocked, whereas the lock status of 1 means that the request has been granted. Analysis Services blocking can be caused by conflicting operations that attempt to query and modify objects. For example, a long running query can block a processing or synchronization job from completion because processing will change the data values. Similarly, a command altering the database structure will block queries. The database administrator or instance administrator can explicitly issue the LOCK XMLA command as well as the BEGIN TRANSACTION command. Other operations request locks implicitly. The following table documents most frequently encountered Analysis Services lock types: Lock type identifier Description Acquired for 2 Read lock Processing to read metadata. 4 Write lock Processing to write data after it is read from relational sources. 8 Commit shared During the processing, restore or synchronization commands. 16 Commit exclusive Committing the processing, restore, or synchronization transaction when existing files are replaced by new files.

0
0
20720

article-image-sharing-your-bi-reports-and-dashboards

Packt

19 Dec 2013

4 min read

Sharing Your BI Reports and Dashboards

Packt

19 Dec 2013

4 min read

(For more resources related to this topic, see here.) The final objective of the information in the BI reports and dashboards is to detect the cause-effect business behavior and trends, and trigger actions to solve them. These actions supported by visual information, via scorecards and dashboards. This process requires an interaction with several people. MicroStrategy includes the functionality to share our reports, scorecards, and dashboards, regardless of the location of the people. Reaching your audience MicroStrategy offers the option to share our reports via different channels that leverage the latest social technologies that are already present in the marketplace, that is, MicroStrategy integrates with Twitter and Facebook. The sharing is like avoiding any related costs and maintaining the design premise of the do-it-yourself approach without any help from specialized IT personnel. Main menu The main menu of MicroStrategy shows a column named Status. When we click on that column, as shown in the following screenshot, the Share option appears: The Share button The other option is the Share button within our reports, that is, the view that we want to share. Select the Share button located at the bottom of the screen, as shown in the following screenshot: The share options are the same, regardless of the location where you activate the option; the various alternate menus are shown in the following screenshot: E-mail sharing While selecting the e-mail option from the Scorecards-Dashboards model, the system will ask you for the e-mail programs that you want to use in order to send an e-mail; in our case, we select Outlook. MicroStrategy automatically prepares an e-mail with a link to share it. You can modify the text, and select the recipients of the e-mail, as shown in the following screenshot: The recipients of the e-mail will click on the URL that is included in the e-mail, send it by this schema, and the user will be able to analyze the report in a read-only mode with only the Filters panel enabled. The following screenshot shows how the user will review the report. Also, the user is not allowed to make any modifications. This option does not require a MicroStrategy platform user account. When a user clicks on the link, he is able to edit the filters and perform their analyses, as well as switch to any available layout, in our case, scorecards and dashboards. As a result, any visualization object can be maximized and minimized for better analysis, as shown in the following screenshot: In this option, the report can be visualized in a fullscreen mode by clicking on the fullscreen button [] located at the top-right corner of the screen. In this sharing mode, the user is able to download the information in Excel and PDF formats for each visualization object. For instance, if you need all the data included in the grid for the stores in region 1 opened in the year 2000. Perform the following steps: In the browser, open the URL that is generated when you select the e-mail share option. Select the ScoreCard tab. In the Open Year filter, type 2012 and in the Region filter, type 1. Now, maximize the grid. Two icons will appear in the top-left corner of the screen: one for exporting the data to Excel and the other for exporting it to PDF for each visualization object, as shown in the following screenshot: Please keep in mind that these two export options only apply to a specific visualization object; it is not possible to export the complete report from this functionality that is offered to the consumer. Summary In this article, we learned how to share our scorecards and dashboards via several channels, such as e-mails, social networks (Twitter and Facebook), and blogs or corporate intranet sites. Resources for Article: Further resources on this subject: Participating in a business process (Intermediate) [Article] Self-service Business Intelligence, Creating Value from Data [Article] Exploring Financial Reporting and Analysis [Article]

0
0
1644

article-image-downloading-and-setting-elasticsearch

Packt

19 Dec 2013

8 min read

Downloading and Setting Up ElasticSearch

Packt

19 Dec 2013

8 min read

(For more resources related to this topic, see here.) Downloading and installing ElasticSearch ElasticSearch has an active community and the release cycles are very fast. Because ElasticSearch depends on many common Java libraries (Lucene, Guice, and Jackson are the most famous ones), the ElasticSearch community tries to keep them updated and fix bugs that are discovered in them and in ElasticSearch core. If it's possible, the best practice is to use the latest available release (usually the more stable one). Getting ready A supported ElasticSearch Operative System (Linux/MacOSX/Windows) with installed Java JVM 1.6 or above is required. A web browser is required to download the ElasticSearch binary release. How to do it... For downloading and installing an ElasticSearch server, we will perform the steps given as follows: Download ElasticSearch from the Web. The latest version is always downloadable from the web address http://www.elasticsearch.org/download/. There are versions available for different operative systems: elasticsearch-{ version-number} .zip: This is for both Linux/Mac OSX, and Windows operating systems elasticsearch-{ version-number} .tar.gz: This is for Linux/Mac elasticsearch-{ version-number} .deb: This is for Debian-based Linux distributions (this also covers Ubuntu family) These packages contain everything to start ElasticSearch. At the time of writing this book, the latest and most stable version of ElasticSearch was 0.90.7. To check out whether this is the latest available or not, please visit http://www.elasticsearch.org/download/. Extract the binary content. After downloading the correct release for your platform, the installation consists of expanding the archive in a working directory. Choose a working directory that is safe to charset problems and doesn't have a long path to prevent problems when ElasticSearch creates its directories to store the index data. For windows platform, a good directory could be c:es, on Unix and MacOSX /opt/ es. To run ElasticSearch, you need a Java Virtual Machine 1.6 or above installed. For better performance, I suggest you use Sun/Oracle 1.7 version. We start ElasticSearch to check if everything is working. To start your ElasticSearch server, just go in the install directory and type: # bin/elasticsearch –f (for Linux and MacOsX) or # binelasticserch.bat –f (for Windows) Now your server should start as shown in the following screenshot: How it works... The ElasticSearch package contains three directories: bin: This contains script to start and manage ElasticSearch. The most important ones are: elasticsearch(.bat): This is the main script to start ElasticSearch plugin(.bat): This is a script to manage plugins config: This contains the ElasticSearch configs. The most important ones are: elasticsearch.yml: This is the main config file for ElasticSearch logging.yml: This is the logging config file lib: This contains all the libraries required to run ElasticSearch There's more... During ElasticSearch startup a lot of events happen: A node name is chosen automatically (that is Akenaten in the example) if not provided in elasticsearch.yml. A node name hash is generated for this node (that is, whqVp_4zQGCgMvJ1CXhcWQ). If there are plugins (internal or sites), they are loaded. In the previous example there are no plugins. Automatically if not configured, ElasticSearch binds on all addresses available two ports: 9300 internal, intra node communication, used for discovering other nodes 9200 HTTP REST API port After starting, if indices are available, they are checked and put in online mode to be used. There are more events which are fired during ElasticSearch startup. We'll see them in detail in other recipes. Networking setupM Correctly setting up a networking is very important for your node and cluster. As there are a lot of different install scenarios and networking issues in this recipe we will cover two kinds of networking setups: Standard installation with autodiscovery working configuration Forced IP configuration; used if it is not possible to use autodiscovery Getting ready You need a working ElasticSearch installation and to know your current networking configuration (that is, IP). How to do it... For configuring networking, we will perform the steps as follows: Open the ElasticSearch configuration file with your favorite text editor. Using the standard ElasticSearch configuration file (config/elasticsearch. yml), your node is configured to bind on all your machine interfaces and does autodiscovery broadcasting events, that means it sends "signals" to every machine in the current LAN and waits for a response. If a node responds to it, they can join in a cluster. If another node is available in the same LAN, they join in the cluster. Only nodes with the same ElasticSearch version and same cluster name (cluster.name option in elasticsearch.yml) can join each other. To customize the network preferences, you need to change some parameters in the elasticsearch.yml file, such as: cluster.name: elasticsearch node.name: "My wonderful server" network.host: 192.168.0.1 discovery.zen.ping.unicast.hosts: ["192.168.0.2","192.168.0.3[9300- 9400]"] This configuration sets the cluster name to elasticsearch, the node name, the network address, and it tries to bind the node to the address given in the discovery section. We can check the configuration during node loading. We can now start the server and check if the network is configured: [INFO ][node ] [Aparo] version[0.90.3], pid[16792], build[5c38d60/2013-08-06T13:18:31Z] [INFO ][node ] [Aparo] initializing ... [INFO ][plugins ] [Aparo] loaded [transport-thrift, rivertwitter, mapper-attachments, lang-python, jdbc-river, langjavascript], sites [bigdesk, head] [INFO ][node ] [Aparo] initialized [INFO ][node ] [Aparo] starting ... [INFO ][transport ] [Aparo] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.5:9300]} [INFO ][cluster.service] [Aparo] new_master [Angela Cairn] [yJcbdaPTSgS7ATQszgpSow][inet[/192.168.1.5:9300]], reason: zendisco- join (elected_as_master) [INFO ][discovery ] [Aparo] elasticsearch/ yJcbdaPTSgS7ATQszgpSow [INFO ][http ] [Aparo] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.5:9200]} [INFO ][node ] [Aparo] started In this case, we have: The transport bounds to 0:0:0:0:0:0:0:0:9300 and 192.168.1.5:9300 The REST HTTP interface bounds to 0:0:0:0:0:0:0:0:9200 and 192.168.1.5:9200 How it works... It works as follows: cluster.name: This sets up the name of the cluster (only nodes with the same name can join). node.name: If this is not defined, it is automatically generated by ElasticSearch. It allows defining a name for the node. If you have a lot of nodes on different machines, it is useful to set this name meaningful to easily locate it. Using a valid name is easier to remember than a generated name, such as whqVp_4zQGCgMvJ1CXhcWQ network.host: This defines the IP of your machine to be used in binding the node. If your server is on different LANs or you want to limit the bind on only a LAN, you must set this value with your server IP. discovery.zen.ping.unicast.hosts: This allows you to define a list of hosts (with ports or port range) to be used to discover other nodes to join the cluster. This setting allows using the node in LAN where broadcasting is not allowed or autodiscovery is not working (that is, packet filtering routers). The referred port is the transport one, usually 9300. The addresses of the hosts list can be a mix of: host name, that is, myhost1 IP address, that is, 192.168.1.2 IP address or host name with the port, that is, myhost1:9300 and 192.168.1.2:9300 IP address or host name with a range of ports, that is, myhost1:[9300-9400], 192.168.1.2:[9300-9400] Setting up a node ElasticSearch allows you to customize several parameters in an installation. In this recipe, we'll see the most used ones to define where to store our data and to improve general performances. Getting ready You need a working ElasticSearch installation. How to do it... The steps required for setting up a simple node are as follows: Open the config/elasticsearch.yml file with an editor of your choice. Set up the directories that store your server data: path.conf: /opt/data/es/conf path.data: /opt/data/es/data1,/opt2/data/data2 path.work: /opt/data/work path.logs: /opt/data/logs path.plugins: /opt/data/plugins Set up parameters to control the standard index creation. These parameters are: index.number_of_shards: 5 index.number_of_replicas: 1 How it works... The path.conf file defines the directory that contains your configuration: mainly elasticsearch.yml and logging.yml. The default location is $ES_HOME/config with ES_HOME your install directory. It's useful to set up the config directory outside your application directory so you don't need to copy configuration files every time you update the version or change the ElasticSearch installation directory. The path.data file is the most important one: it allows defining one or more directories where you store index data. When you define more than one directory, they are managed similarly to a RAID 0 configuration (the total space is the sum of all the data directory entry points), favoring locations with the most free space. The path.work file is a location where ElasticSearch puts temporary files. The path.log file is where log files are put. The control how to log is managed in logging.yml. The path.plugins file allows overriding the plugins path (default $ES_HOME/plugins). It's useful to put "system wide" plugins. The main parameters used to control the index and shard is index.number_of_shards, that controls the standard number of shards for a new created index, and index.number_ of_replicas that controls the initial number of replicas. There's more... There are a lot of other parameters that can be used to customize your ElasticSearch installation and new ones are added with new releases. The most important ones are described in this recipe and in the next one.

0
0
11041

Packt

19 Dec 2013

15 min read

Applied Modeling

Packt

19 Dec 2013

15 min read

0
0
2362

Packt

18 Dec 2013

8 min read

Settings goals

Packt

18 Dec 2013

8 min read

0
0
1481

article-image-getting-started-apache-nutch

Packt

18 Dec 2013

13 min read

Getting Started with Apache Nutch

Packt

18 Dec 2013

13 min read

(For more resources related to this topic, see here.) Introduction of Apache Nutch Apache Nutch is a very robust and scalable tool for webcrawling and it can be integrated with scripting language i.e Python for web crawling. You can use it whenever your application contains huge data and you want to apply crawling on your data. Apache Nutch is an Open Source WebCrawler Software which is used for crawling websites. You can create your own search engine like google if you understand Apache Nutch clearly. It will provide you your own search engine using which you can increase your application page rank in searching and also customize your application searching according to your needs. It is extensible and scalable. It facilitates for parsing, indexing, creating your own search engine, customize search according to needs, scalability, robustness and ScoringFilter for custom implementations. ScoringFilter is a Java class which is used while creating Apache Nutch plugin. It is used for manipulating scoring variables. We can run Apache Nutch on a single machine as well as distributed environment like Apache Hadoop. It is written in Java. We can find broken links using Apache Nutch, create a copy of all the visited pages for searching over for example: Build indexes. We can find Web page hyperlinks in an automated manner. Apache Nutch can be integrated with Apache Solr easily and we can index all the webpages which are crawled by Apache Nutch to Apache Solr. We can then use Apache Solr for searching the webpages which are indexed by Apache Nutch. Apache Solr is a search platform which is built on top of Apache Lucene. It can be used for searching any type of data for example webpages. Crawling your first website Crawling is driven by Apache Nutch crawling tool and certain related tools for building and maintaining several data structures. It includes web database, the index and a set of segments. Once Apache Nutch has indexed the webpages to Apache Solr, you can search for the required webpage(s) in Apache Solr. Apache Solr Installation Apache Solr is a search platform which is built on top of Apache Lucene. It can be used for searching any type of data for example webpages. It’s a very powerful searching mechanism and provides full-text search, dynamic clustering, database integration, rich document handling and many more. Apache SOLR will be used for indexing urls which are crawled by Apache Nutch and then one can search the details in Apache SOLR crawled by Apache Nutch. Crawling your website using the crawl script Apache Nutch 2.2.1 comes with the facility of crawl script which does crawling by just executing one single script. In earlier version, we have to manually do each step like generating data, fetching data, parsing data and so on for perfrom crawling. Crawling the web, the CrawlDb, and URL filters When user invokes crawling command in Apache Nutch 1.x, crawlDB is generated by Apache Nutch which is nothing but a directory which contains details about crawling. In Apache 2.x, crawlDB is not present. Instead Apache Nutch keeps all the crawling data directly into the database. InjectorJob The injector will add the necessary urls to the crawldb. Crawldb is the directory which is created by Apache Nutch for storing data related to crawling. You need to provide urls to InjectorJob either by downloading urls from internet or writing your own file which contains urls. Let’s say you have created one directory called urls which contains all the urls that needs to be injected in cralwdb. Following command will be used for perform the InjectorJob: #bin/nutch inject crawl/crawldb urls Urls will be directory which contains all the urls which needs to be injected in crawldb. Crawl/crawldb is the directory in which injected urls will be placed. After performing this job, you have number of unfetched urls inside your database i.e crawldb. GeneratorJob Once we have done with the InjectorJob, now it’s time to fetch the injected urls from crawldb. So for fetching the urls, you need to perform GeneratorJob before. Follwing command will be used for GeneratorJob: #bin/nutch generate crawl/crawldb crawl/segments Crawldb is the directory from where urls are generated. Segments is the directory which is used by GeneratorJob to fetch the necessary information required for crawling. FetcherJob The job of the fetch is to fetch the urls which are generated by GeneratorJob. It will use the input provided by GeneratorJob. Follwing command will be used for FetcherJob: #bin/nutch fetch –all Here I have provided input parameters –all which means this job will fetch all the urls which are generated by GeneratorJob. You can use different input parameters according to your needs. ParserJob After FetcherJob, ParserJob is to parse the urls which are fetched by FetcherJob. Follwing command will be used for ParserJob: # bin/nutch parse –all I have used input parameters –all which will parse all the urls which are fetched by FetcherJob. You can use different input parameter according to your needs. DbUpdaterJob Once the ParserJob has been completed, we need to update the database by providing results of the FetcherJob. This will update the respected databases with the last fetched urls. Following command will be used for DbUpdaterJob: # bin/nutch updatedb crawl/crawldb –all After performing this job, database will contain both updated entries of all the initial pages and also contains the new entities which are correspond to the newly discovered pages which are linked from the initial set. Invertlinks Before applying indexing, we need to first invert all the links. After this we will be able to index incoming anchor text with the pages. Following command will be used for Invertlinks: # bin/nutch invertlinks crawl/linkdb -dir crawl/segments Apache Hadoop Apache Hadoop is designed for running your application on servers where there will be lot of computers in which one will be master computer and rest will be the slave computers. So it’s huge data warehouse. Master computers are the computers which will direct slave computers for data processing. So processing is done by slave computers. This is the reason why Apache Hadoop is used for processing huge amount of data as process is divided into the number of slave computers and that’s why Apache Hadoop gives highest throughput for any processing. So as data will increase, you need to increase number of slave computers. That’s how Apache Hadoop functionality runs. Integration of Apache Nutch with Apache Hadoop Apache Nutch can be easily integrated with Apache Hadoop and we can make our process much faster than running Apache Nutch on single machine. After integrating Apache Nutch with Apache Hadoop, we can perform crawling on Apache Hadoop cluster environment. So the process will be much faster and we will get highest amount of throughput. Apache Hadoop Setup with Cluster This setup is not required a huge hardware to purchase and running Apache Nutch and Apache Hadoop. It is designed in such a way to make the use of hardware maximum. Formatting the HDFS filesystem using the NameNode HDFS stands for Hadoop Distributed File system is a directory which is used by Apache Hadoop for storage purpose. So it’s the directory which stroes all the data related to Apache Hadoop. It has two components as NameNode and DataNode in which NameNode manages the filesystem metadata and DataNodes actually stores the data. It’s highly configurable and suited well for many installations. When there are very large clusters, at that time configuration needs to be tuned. The first step for getting start your Apache Hadoop is the formatting Hadoop filesystem which is implemented on top of the local filesystem of your cluster(which will include only your local machine if you have followed). Setting up the deployment architecture of Apache Nutch We have to setup Apache Nutch on each of the machine which we are using. In this case, we are using six machines cluster. So we have to setup Apache Nutch on each machine. For the less number of machines in our cluster configuration, we can setup manually on each machine. But when the machines are more, let’s say we have 100 machines in our cluster environment. So we can’t setup on each machine manually. For that we require some deployment tool such as Chef or ateleast distributed ssh. You can refer to http://www.opscode.com/chef/ for getting familiar with Chef. You can refer http://www.ibm.com/developerworks/aix/library/au-satdistadmin/for getting familiar with distributed ssh.I will just demonstrate about running Apache Hadoop on Ubuntu for Single-Node Cluster. If you want to go for running Apache Hadoop on Ubuntu for Multi-Node cluster then I have already provided reference link above. You can follow that and configure the same. Once we have done with the deployment of Apache Nutch to single machine, we will run this script start-all.sh that will start the services on the master node and data nodes. It means the script will begin the hadoop daemons on the master node and so we are able to login into all the slave nodes using ssh command as explained above and will begin daemons on the slave nodes. The start-all.sh script expects that Apache Nutch should be put on the same location on each machine. It is also expecting that Apache Hadoop is storing the data at the same filepath on each machine. The start-all.sh script which starts the daemons on the master and slave nodes are going to use password-less login using ssh. Introduction of Apache Nutch configuration with Eclipse Apache Nutch can be easily configured with Eclipse. After that we can perform crawling easily using Eclipse. So need to perform crawling from command line. We can use eclipse for all the operations of crawling which we are doing from command line.Instructions are provided for fixing a development environment for Apache Nutch with Eclipse IDE. It's supposed to give a comprehensive starting resource for configuring, building, crawling and debugging of Apache Nutch within the above of context. Following are the prerequisites for Apache Nutch integration with Eclipse: Get the latest version of Eclipse from http://www.eclipse.org/downloads/packages/release/juno/r All the required subsequent are available from the Eclipse Marketplace. But if they are not, you can download eclipse market place as follows http://marketplace.eclipse.org/marketplace-client-intro Once you've configuired Eclipse, Download as per here http://subclipse.tigris.org/. If you have faced a problem with the 1.8.x release, try 1.6.x. This may resolve compatability issues. Download IvyDE plugin for Eclipse as here http://ant.apache.org/ivy/ivyde/download.cgi Download m2e plugin for Eclipse here http://marketplace.eclipse.org/content/maven-integration-eclipse Introduction of Apache Accumulo Accumulo is basically used as the datastore for storing data. So same way as we are using different databases like MySQL, Oracle, etc. So same way Apache Accumulo can be used. The key point of Apache Accumulo is, it is running on Apache Hadoop Cluster environment. So that's a very good feature with Accumulo.Accumulo sorted, distributed key/value store could be a strong, scalable, high performance information storage and retrieval system. Apache Accumulo depends on Google's BigTable design and is built ontop of Apache Hadoop, ,Thrift and Zookeeper. Apache Accumulo features a some novel improvement on the BigTable design within a form of cell-based access management and the server-side programming mechanism which will do modificationication in key/value pairs at varied points within the data management process Introduction of Apache Gora Apache Gora open source framework providesin-memory data model and persistence for large data. Apache Gora supports persisting to column stores, key and value stores, document stores and RDBMSs and analyzing the data with extensive Apache Hadoop MapReduce support. Supported Datastores Apache Gora presently supports the subsequent datastores: AccumuloProphetess PApache Hbase Amazon DynamoDB Use of Apache Gora Although there are many excellent ORM frameworks for relational databases and data modeling in NoSQL data stores different profoundly from their relative cousins. DataD-model agnostic frameworks like JDO aren't comfortable to be used cases, wherever one has to use the complete power of data models in column stores. Gora fills the thegap giving user an easy-to-use in-memory data model plus persistence for large data frameworkproviding data store specific mappings and also in built Apache Hadoop support. Integration of Apache Nutch with Apache Accumulo In this section, we are going to cover the integration process for integrating Apache Nutch with Apache Accumulo. Apache Accumulo is basically used for a huge data storeage. It is built on the top of Apache Hadoop, Zookeeper and Thrift. So a potential use of integrating Apache Nutch with Apache Accumulo is when our application has huge data to process and we want to run our application in cluste environment. At that time we can use Apache Accumulo as data storage purpose. As Apache Accumulo only running with Apache Hadoop, maximum use of Apache Accumulo would be in cluster based environment. So first we will start with the configuration of Apache GORA with Apache Nutch. Then we will setup Apache Hadoop and Zookeeper. Then we will do installation and configuration of Apache Accumulo. Then we will test Apache Accumulo and at the end we will see Crawling with Apache Nutch on Apache Accumulo. Setup Apache Hadoop and Apache Zookeeper for Apache Nutch Apache Zookeeper is a centralized service which is used for maintaining configuration information, provideses distributed synchronization, naming and also provideses group services. All these services are used by distributed applications in one or another manner. So all these services are provided by zookeeper so you don’t have to write these services from scratch. You can use these services for implementing consensus, management, group, leader election and presence protocols and you can also build it for your own requirements. Apache Accumulo is built on the top of Apache Hadoop, Zookeeper. So we must configure Apache Accumulo within Apache Hadoop and Apache Zookeeper. You can referrer to http://www.covert.io/post/18414889381/accumulo-nutch-and-gora for any queries related to setup. Integration of Apache Nutch with MySQL In this section, we are going to integrate Apache Nutch with MySQL. So after that you can crawled webpages in Apache Nutch that will be stored in MYSQL. So you can go to MySQL and check your crawled webpages and also perform necessary operations. We will start with the introduction of MySQL then we will cover what is the need of integrating MySQL with Apache Nutch. After that we will see configuration of MySQL with Apache Nutch and at the end we will do crawling with Apache Nutch on MySQL. So let’s just start with the introduction of MYSQL. Summary We covered the following: Downloading Apache Hadoop and Apache Nutch Perform Crawling on Apache Hadoop Cluster in Apache Nutch Apache Nutch configuration with eclipse Installation steps of building Apache Nutch with Eclipse Crawling in Eclipse Configuration of Apache GORA with Apache Nutch Installation and Configuration of Apache Accumulo Crawling with Apache Nutch on Apache Accumulo Need of integrating MySQL with Apache Nutch Resources for Article: Further resources on this subject: Getting Started with the Alfresco Records Management Module [Article] Making Big Data Work for Hadoop and Solr [Article] Apache Solr PHP Integration [Article]

0
0
3017

article-image-managing-ibm-cognos-bi-server-components

Packt

12 Dec 2013

6 min read

Managing IBM Cognos BI Server Components

Packt

12 Dec 2013

6 min read

(for more resources related to this topic, see here.) Cognos BI architecture The IBM Cognos 10.2 BI architecture is separated into the following three tiers: Web server (gateways) Applications (dispatcher and Content Manager) Data (reporting/querying the database, content store, metric store) Web server (gateways) The user starts a web session with Cognos to connect to the IBM Cognos Connection's web-based interface/application using the web browser (Internet Explorer and Mozilla Firefox are the currently supported browsers). This web request is sent to the web server where the Cognos gateway resides. The gateway is a server-software program that works as an intermediate party between the web server and other servers, such as an application server. The following diagram shows the basic view of the three tiers of the Cognos BI architecture: The Cognos gateway is the starting point from where a request is received and transferred to the BI Server. On receiving a request from the web server, the Cognos gateway applies encryption to the information received, adds necessary environment variables and authentication namespace, and transfers the information to the application server (or dispatcher). Similarly, when the data has been processed and the presentation is ready, it is rendered towards the user's browser via the gateway and web server. The following diagram shows the Tier 1 layer in detail: The gateways must be configured to communicate with the application component (dispatcher) in a distributed environment. To make a failover cluster, more than one BI Server may be configured. The following types of web gateways are supported: CGI: This is also the default gateway. This is a basic gateway. ISAPI: This is for the Windows environment. It is the best for Windows IIS (Internet Information Services). Servlet: This gateway is the best for application servers that are supporting servlets. Apache_mod: This gateway type may be used for the Apache server. The following diagram shows an environment in which the web server is load balanced by two server machines: To improve performance, gateways (if more than one) must be installed and configured on separate machines. The application tier (Cognos BI Server) The application tier comprises one or multiple BI Servers. A server's job is to run user requests, for example, queries, reports, and analysis that are received from a gateway. The GUI environment (IBM Cognos Connection) that appears after logging in is also rendered and presented by Cognos BI Server. Another such example is the Metric Studio interface. The BI Server must include the dispatcher and Content Manager (the Content Manager component may be separated from the dispatcher). The following diagram shows BI Server's Tier 2 in detail: Dispatcher The dispatcher has static handlers to many services. Each request that is received is routed to the corresponding service for further processing. The dispatcher is also responsible for starting all the Cognos services at startup. These services include the system service, report service, report data service, presentation service, Metric Studio service, log service, job service, event management service, Content Manager service, batch report service, delivery service, and many others. When there are multiple dispatchers in a multitier architecture, a dispatcher may also send and route requests to another dispatcher. The URIs for all dispatchers must be known to the Cognos gateway(s). All dispatchers are registered in Content Manager (CM), making it possible for all dispatchers to know each other. A dispatcher grid is formed in this way. To improve the system performance, multiple dispatchers must be installed but on separate computers, and the Content Manager component must also be on a separate server. The following diagram shows how multiple dispatcher servers can be added. Services for the BI Server (dispatcher) Each dispatcher has a set of services, which are listed alphabetically in the following table. When the Cognos service is started from Cognos Configuration, all services are started one by one. The following table shows the dispatcher services and their short descriptions: Service Description Agent service Runs the agent. Annotation service Adds comments to reports. Batch report service Handles background report requests. Content manager cache service Handles cache for frequent queries to enhance performance of Content Manager. Content manager service Performs DML in content store db. Cognos deployment is another task for this service. Delivery service For sending e-mails. Event management service Manages the Event Objects (creation, scheduling, and so on) Graphics service Renders graphics for other services such as report service. Human task service Manages human tasks. Index data service For basic full-text functions for storage and retrieval of terms and indexed summary documents. Index search service For search and drill-through functions, including lists of aliases and examples. Index update service For write, update, delete, and administration-related functions. Job service Runs jobs in coordination with the monitor service. Log service For extensive logging of the Cognos environment (file, database, remote-log server, event viewer, and system log). Metadata service For displaying data lineage information (data source, calculation expressions) for the Cognos studios and viewer. Metric studio service This service is used for providing a user interface to metric studio for monitoring and manipulating system KPIs. Migration service Used for migration from old versions to new versions, especially series 7. Monitor service Works as a timer service-it manages the monitoring and running of tasks that were scheduled or marked as background tasks. Helps in failover and recovery for running tasks. Presentation service This service prepares and displays the presentation layer by converting the XML data to HTML or any other format view. IBM Cognos Connection is also prepared by this service. Query service For managing dynamic query requests. Report data service This service prepares data for other applications; for example mobile, Microsoft Office, and so on. Report service Manages report requests. The output is displayed in IBM Cognos Connection. System service This service defines the BI-Bus API compliant service. It gives more data about the BI configuration parameters. Summary This article covered the IBM Cognos BI architecture. Now you must be familiar with the single tier and multitier architectures and a variety of features and options that Cognos provides. resources for article: further resources on this subject: IBM Cognos Insight [article] Integrating IBM Cognos TM1 with IBM Cognos 8 BI [article] IBM Cognos 10 BI dashboarding components [article]

0
0
4850

article-image-apache-solr-php-integration

Packt

25 Nov 2013

7 min read

Apache Solr PHP Integration

Packt

25 Nov 2013

7 min read

(For more resources related to this topic, see here.) We will be looking at installation on both Windows and Linux environments. We will be using the Solarium library for communication between Solr and PHP. This article will give a brief overview of the Solarium library and showcase some of the concepts and configuration options on Solr end for implementing certain features. Calling Solr using PHP code A ping query is used in Solr to check the status of the Solr server. The Solr URL for executing the ping query is http://localhost:8080/solr/collection1/admin/ping/?wt=json. Response of Solr ping query in browser We can use Curl to get the ping response from Solr via PHP code; a sample code for executing the previous ping query is as below $curl = curl_init("http://localhost:8080/solr/collection1/admin/ping/?wt=json"); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($curl); $data = json_decode($output, true); echo "Ping Status : ".$data["status"].PHP_EOL; Though Curl can be used to execute almost any query on Solr, but it is preferable to use a library which does the work for us. In our case we will be using Solarium. To execute the same query on Solr using the Solarium library the code is as follows. include_once("vendor/autoload.php"); $config = array("endpoint" => array("localhost" => array("host"=>"127.0.0.1", "port"=>"8080", "path"=>"/solr", "core"=>"collection1",) ) ); We have included the Solarium library in our code. And defined the connection parameters for our Solr server. Next we will need to create a Solarium client with the previous Solr configuration. And call the createPing() function to create the ping query. $client = new SolariumClient($config); $ping = $client->createPing(); Finally execute the ping query and get the result. $result = $client->ping($ping); $result->getStatus(); The output should be similar to the one shown below. Output of ping query using PHP Adding documents to Solr index To create a Solr index, we need to add documents to the Solr index using the command line, Solr web interface or our PHP program. But before we create a Solr index, we need to define the structure or the schema of the Solr index. Schema consists of fields and field types. It defines how each field will be treated and handled during indexing or during search. Let us see a small piece of code for adding documents to the Solr index using PHP and Solarium library. Create a solarium client. Create an instance of the update query. Create the document in PHP and finally add fields to the document. $client = new SolariumClient($config); $updateQuery = $client->createUpdate(); $doc1 = $updateQuery->createDocument(); $doc1->id = 112233445; $doc1->cat = 'book'; $doc1->name = 'A Feast For Crows'; $doc1->price = 8.99; $doc1->inStock = 'true'; $doc1->author = 'George R.R. Martin'; $doc1->series_t = '"A Song of Ice and Fire"'; Id field has been marked as unique in our schema. So we will have to keep different values for Id field for different documents that we add to Solr. Add documents to the update query followed by commit command. Finally execute the query. $updateQuery->addDocuments(array($doc1)); $updateQuery->addCommit(); $result = $client->update($updateQuery); Let us execute the code. php insertSolr.php After executing the code, a search for martin will give these documents in the result. http://localhost:8080/solr/collection1/select/?q=martin Document added to Solr index Executing search on Solr Index Documents added to the Solr index can be searched using the following piece of PHP code. $selectConfig = array( 'query' => 'cat:book AND author:Martin', 'start' => 3, 'rows' => 3,'fields' => array('id','name','price','author'), 'sort' => array('price' => 'asc') ); $query = $client->createSelect($selectConfig); $resultSet = $client->select($query); The above code creates a simple Solr query and searches for book in cat field and Martin in author field. The results are sorted in ascending order or price and fields returned are id, name of book, price and author of book. Pagination has been implemented as 3 results per page, so this query returns results for 2nd page starting from 3rd result. In addition to this simple select query, Solr also supports some advanced query modes known as dismax and edismax. With the help of these query modes, we can boost certain fields to give more importance to certain fields in our query. We can also use function queries to do some type of dynamic boosting based on values in fields. If no sorting is provided, the Solr results are sorted by the score of documents which are calculated based on the terms in the query and the matching terms in the documents in the index. Score is calculated for each document in the result set using two main factors - term frequency known as tf and inverse document frequency known as idf. In addition to these, Solr provides a way of narrowing down the results using filter queries. Also facets can be created based on fields in the index and it can be used by the end users to narrow down the results. Highlighting search results using PHP and Solr Solr can be used to highlight the fields returned in a search result based on the query. Here is a sample code for highlighting the results for search keyword harry. $query->setQuery('harry'); $query->setFields(array('id','name','author','series_t','score','last_modified')); Get the highlighting component from the query, set the fields to be highlighted and also set the html tags to be used for highlighting. $hl = $query->getHighlighting(); $hl->setFields('name,series_t'); $hl->setSimplePrefix('<strong>')->setSimplePostfix('</strong>'); Once the query is run and result set is received, we will need to retrieve the highlighted results from the result set. Here is the output for the highlighting code. Highlighted search results In addition to highlighting, Solr can be used to create a spelling suggester and a spell checker. Spelling suggester can be used to prompt input query to the end user as the user keeps on typing. Spell check can be used to prompt spelling corrections similar to 'did you mean' to the user. Solr can also be used for finding documents which are similar to a certain document based on words in certain fields. This functionality of Solr is known as more like this and is exposed via Solarium by the MoreLikeThis component. Solr also provides grouping of the result based on a particular query or a certain field. Scaling Solr Solr can be scaled to handle large number of search requests by using master slave architecture. Also if the index is huge, it can be sharded across multiple Solr instances and we can run a distributed search to get results for our query from all the sharded instances. Solarium provides a load balancing plug-in which can be used to load balance queries across master-slave architecture. Summary Solr provides an extensive list of features for implementing search. These features can be easily accessed in PHP using the Solarium library to build a full features search application which can be used to power search on any website. Resources for Article: Further resources on this subject: Apache Solr Configuration [Article] Getting Started with Apache Solr [Article] Making Big Data Work for Hadoop and Solr [Article]

0
0
8984

Packt

22 Nov 2013

14 min read

Portlet

Packt

22 Nov 2013

14 min read

(For more resources related to this topic, see here.) The Spring MVC portlet The Spring MVC portlet follows the Model-View-Controller design pattern. The model refers to objects that imply business rules. Usually, each object has a corresponding table in the database. The view refers to JSP files that will be rendered into the HTML markup. The controller is a Java class that distributes user requests to different JSP files. A Spring MVC portlet usually has the following folder structure: In the previous screenshot, there are two Spring MVC portlets: leek-portlet and lettuce-portlet. You can see that the controller classes are clearly named as LeekController.java and LettuceController.java. The JSP files for the leek portlet are view/leek/leek.jsp, view/leek/edit.jsp, and view/leek/help.jsp. The definition of the leek portlet in the portlet.xml file is as follows: <portlet> <portlet-name>leek</portlet-name> <display-name>Leek</display-name> <portlet-class>org.springframework.web.portlet.DispatcherPortlet</portlet-class> <init-param> <name>contextConfigLocation</name> <value>/WEB-INF/context/leek-portlet.xml</value> </init-param> <supports> <mime-type>text/html</mime-type> <portlet-mode>view</portlet-mode> <portlet-mode>edit</portlet-mode> <portlet-mode>help</portlet-mode> </supports> ... <supported-publishing-event> <qname >x:ipc.share</qname> </supported-publishing-event> </portlet> You can see from the previous code that the portlet class for a Spring MVC portlet is the org.springframework.web.portlet.DispatcherPortlet.java class. When a Spring MVC portlet is called, this class runs. It also calls the WEB-INF/context/leek/portlet.xml file and initializes the singletons defined in that file when the leek portlet is deployed. The leek portlet supports the view, edit, and help mode. It can also fire a portlet event with ipc.share as its qualified name. Yo can use the method to import the leek and lettuce portlets (whose source code can be downloaded from the Packt site) to your Liferay IDE. Then, carry out the following steps: Deploy the leek-portlet package and wait until the leek portlet and lettuce portlet are registered by the Liferay Portal. Log in as the portal administrator and add the two Spring MVC portlets onto a portal page. Your portal page should look similar to the following screenshot: The default view of the leek portlet comes from the view/leek/leek.jsp file whose logic is defined through the following method in the com.uibook.leek.portlet.LeekController.java class: @RequestMapping public String render(Model model, SessionStatus status, RenderRequest req) { return "leek"; } This method calls the view/leek/leek.jsp file. In the default view of the leek portlet, when you click on the radio button for Snow water from last winter and then on the Get Water button, the following form will be submitted: <form action="http://localhost:8080/web/uibook/home?p_auth=wwMoBV4C&p_p_id=leek_WAR_leekportlet&p_p_lifecycle=1&p_p_state=normal&p_p_mode=view&p_p_col_id=column-2&p_p_col_pos=1&p_p_col_count=2&_leek_WAR_leekportlet_action=sprayWater" id="_leek_WAR_leekportlet_leekFm" method="post" name="_leek_WAR_leekportlet_leekFm"> This form will fire an action URL because p_p_lifecycle is equal to 1. As the action name is sprayWater in the URL, the DispatcherPortlet.java class (as specified in the portlet.xml file) calls the following method: @ActionMapping(params="action=sprayWater") public void sprayWater(ActionRequest request, ActionResponse response, SessionStatus sessionStatus) { String waterType = request.getParameter("waterSupply"); if(waterType != null){ request.setAttribute("theWaterIs", waterType); sessionStatus.setComplete(); } } This method simply gets the value for the waterSupply parameter as specified in the following code, which comes from the view/leek/leek.jsp file: <input type="radio" name="<portlet:namespace />waterSupply" value="snow water from last winter">Snow water from last winter The value is snow water from last winter, which is set as a request attribute. As the previous sprayWater(…) method does not specify a request parameter for a JSP file to be rendered, the logic goes to the default view of the leek portlet. So, the view/leek/leek.jsp file will be rendered. Here, as you can see, the two-phase logic is retained in the Spring MVC portlet, as has been explained in the Understanding a simple JSR-286 portlet section of this article. Now the theWaterIs request attribute has a value, which is snow water from last winter. So, the following code in the leek.jsp file runs and displays the Please enjoy some snow water from last winter. message, as shown in the previous screenshot: <c:if test="${not empty theWaterIs}"> <p>Please enjoy some ${theWaterIs}.</p> </c:if> In the previous screenshot, the Passing you a gift... link is rendered with the following code in the leek.jsp file: <a href="<portlet:actionURL name="shareGarden"></portlet:actionURL>">Passing you a gift ...</a> When this link is clicked, an action URL named shareGarden is fired. So, the DispatcherPortlet.java class will call the following method: @ActionMapping("shareGarden") public void pitchBallAction(SessionStatus status, ActionResponse response) { String elementType = null; Random random = new Random(System.currentTimeMillis()); int elementIndex = random.nextInt(3) + 1; switch(elementIndex) { case 1 : elementType = "sunshine"; break; ... } QName qname = new QName("http://uibook.com/events","ipc.share"); response.setEvent(qname, elementType); status.setComplete(); } This method gets a value for elementType (the type of water in our case) and sends out this elementType value to another portlet based on the ipc.share qualified name. The lettuce portlet has been defined in the portlet.xml file as follows to receive such a portlet event: <portlet> <portlet-name>lettuce</portlet-name> ... <supported-processing-event> <qname >x:ipc.share</qname> </supported-processing-event> </portlet> When the ipc.share portlet event is sent, the portal page refreshes. Because the lettuce portlet is on the same page as the leek portlet, the portlet event is received by the following method in the com.uibook.lettuce.portlet.LettuceController.java class: @EventMapping(value ="{http://uibook.com/events}ipc.share") public void receiveEvent(EventRequest request, EventResponse response, ModelMap map) { Event event = request.getEvent(); String element = (String)event.getValue(); map.put("element", element); response.setRenderParameter("element", element); } This receiveEvent(…) method receives the ipc.share portlet event, gets the value in the event (which can be sunshine, rain drops, wind, or space), and puts it in the ModelMap object with element as the key. Now, the following code in the view/lettuce/lettuce.jsp file runs: <c:choose> <c:when test="${empty element}"> <p> Please share the garden with me! </p> </c:when> <c:otherwise> <p>Thank you for the ${element}!</p> </c:otherwise> </c:choose> As the element parameter now has a value, a message similar to Thank you for the wind will show in the lettuce portlet. The wind is a gift from the leek to the lettuce portlet. In the default view of the leek portlet, there is a Some shade, please! button. This button is implemented with the following code in the view/leek/leek.jsp file: <button type="button" onclick="<portlet:namespace />loadContentThruAjax();">Some shade, please!</button> When this button is clicked, a _leek_WAR_leekportlet_loadContentThruAjax() JavaScript function will run: function <portlet:namespace />loadContentThruAjax() { ... document.getElementById("<portlet:namespace />content").innerHTML=xmlhttp.responseText; ... xmlhttp.open('GET','<portlet:resourceURL escapeXml="false" id="provideShade"/>',true); xmlhttp.send(); } This loadContentThruAjax() function is an Ajax call. It fires a resource URL whose ID is provideShade. It maps the following method in the com.uibook.leek.portlet.LeekController.java class: @ResourceMapping(value = "provideShade") public void provideShade(ResourceRequest resourceRequest, ResourceResponse resourceResponse) throws PortletException, IOException { resourceResponse.setContentType("text/html"); PrintWriter out = resourceResponse.getWriter(); StringBuilder strB = new StringBuilder(); strB.append("The banana tree will sway its leaf to cover you from the sun."); out.println(strB.toString()); out.close(); } This method simply sends the The banana tree will sway its leaf to cover you from the sun message back to the browser. The previous loadContentThruAjax() method receives this message, inserts it in the <div id="_leek_WAR_leekportlet_content"></div> element, and shows it. About the Vaadin portlet Vaadin is an open source web application development framework. It consists of a server-side API and a client-side API. Each API has a set of UI components and widgets. Vaadin has themes for controlling the appearance of a web page. Using Vaadin, you can write a web application purely in Java. A Vaadin application is like a servlet. However, unlike the servlet code, Vaadin has a large set of UI components, controls, and widgets. For example, in correspondence to the <table> HTML element, the Vaadin API has a com.vaadin.ui.Table.java class. The following is a comparison between servlet table implementation and Vaadin table implementation: Servlet Code Vaadin Code PrintWriter out = response.getWriter(); out.println("<table>n" + "<tr>n" + "<td>row 2, cell 1</td>n" + "<td>row 2, cell 2</td>" + "</tr>n" + "</table>"); sample = new Table(); sample.setSizeFull(); sample.setSelectable(true); ... sample.setColumnHeaders(new String[] { "Country", "Code" }); Basically, if there is a label element in HTML, there is a corresponding Label.java class in Vaadin. In the sample Vaadin code, you will find the use of the com.vaadin.ui.Button.java and com.vaadin.ui.TextField.java classes. Vaadin supports portlet development based on JSR-286. Vaadin support in Liferay Portal Starting with Version 6.0, the Liferay Portal was bundled with the Vaadin Java API, themes, and a widget set described as follows: ${APP_SERVER_PORTAL_DIR}/html/VAADIN/themes/ ${APP_SERVER_PORTAL_DIR}/html/VAADIN/widgetsets/ ${APP_SERVER_PORTAL_DIR}/WEB-INF/lib/vaadin.jar A Vaadin control panel for the Liferay Portal is also available for download. It can be used to rebuild the widget set when you install new add-ons in the Liferay Portal. In the ${LPORTAL_SRC_DIR}/portal-impl/src/portal.properties file, we have the following Vaadin-related setting: vaadin.resources.path=/html vaadin.theme=liferay vaadin.widgetset=com.vaadin.portal.gwt.PortalDefaultWidgetSet In this section, we will discuss two Vaadin portlets. These two Vaadin portlets are run and tested in Liferay Portal 6.1.20 because, at the time of writing, the support for Vaadin is not available in the new Liferay Portal 6.2. It is expected that when the Generally Available (GA) version of Liferay Portal 6.2 is available, the support for Vaadin portlets in the new Liferay Portal 6.2 will be ready. Vaadin portlet for CRUD operations CRUD stands for create, read, update, and delete. We will use a peanut portlet to illustrate the organization of a Vaadin portlet. In this portlet, a user can create, read, update, and delete data. This portlet is adapted from a SimpleAddressBook portlet from a Vaadin demo. Its structure is as shown in the following screenshot: You can see that it does not have JSP files. The view, model, and controller are all incorporated in the PeanutApplication.java class. Its portlet.xml file has the following content: <portlet-class>com.vaadin.terminal.gwt.server.ApplicationPortlet2</portlet-class> <init-param> <name>application</name> <value>peanut.PeanutApplication</value> </init-param> This means that when the Liferay Portal calls the peanut portlet, the com.vaadin.terminal.gwt.server.ApplicationPortlet2.java class will run. This ApplicationPortlet2.java class will in turn call the peanut.PeanutApplication.java class, which will retrieve data from the database and generate the HTML markup. The default UI of the peanut portlet is as follows: This default UI is implemented with the following code: HorizontalSplitPanel splitPanel = new HorizontalSplitPanel(); setMainWindow(new Window("Address Book", splitPanel)); VerticalLayout left = new VerticalLayout(); left.setSizeFull(); left.addComponent(contactList); contactList.setSizeFull(); left.setExpandRatio(contactList, 1); splitPanel.addComponent(left); splitPanel.addComponent(contactEditor); splitPanel.setHeight("450"); contactEditor.setCaption("Contact details editor"); contactEditor.setSizeFull(); contactEditor.getLayout().setMargin(true); contactEditor.setImmediate(true); bottomLeftCorner.setWidth("100%"); left.addComponent(bottomLeftCorner); The previous code comes from the initLayout() method of the PeanutApplication.java class. This method is run when the portal page is first loaded. The new Window("Address Book", splitPanel) statement instantiates a window area, which is the whole portlet UI. This window is set as the main window of the portlet; every portlet has a main window. The splitPanel attribute splits the main window into two equal parts vertically; it is like the 2 Columns (50/50) page layout of Liferay. The splitPanel.addComponent(left) statement adds the contact information table to the left pane of the main window, while the splitPanel.addComponent(contactEditor) statement adds the contact details of the editor to the right pane of the main window. The left variable is a com.vaadin.ui.VerticalLayout.java object. In the left.addComponent(bottomLeftCorner) statement, the left object adds a bottomLeftCorner object to itself. The bottomLeftCorner object is a com.vaadin.ui.HorizontalLayout.java object. It takes the space across the left vertical layout under the contact information table. This bottomLeftCorner horizontal layout will house the contact-add button and the contact-remove button. The following screenshot gives you an idea of how the screen will look: When the + icon is clicked, a button click event will be fired which runs the following code: Object id = ((IndexedContainer) contactList.getContainerDataSource()).addItemAt(0); contactList.getItem(id).getItemProperty("First Name").setValue("John"); contactList.getItem(id).getItemProperty("Last Name").setValue("Doe"); This code adds an entry in the contactList object (contact information table) initializing the contact's first name to John and the last name to Doe. At the same time, the ValueChangeListener property of the contactList object is triggered and runs the following code: contactList.addListener(new Property.ValueChangeListener() { public void valueChange(ValueChangeEvent event) { Object id = contactList.getValue(); contactEditor.setItemDataSource(id == null ? null : contactList .getItem(id)); contactRemovalButton.setVisible(id != null); } }); This code populates the contactEditor variable, a com.vaadin.ui.Form.Form.java object, with John Doe's contact information and displays the Contact details editor section in the right pane of the main window. After that, an end user can enter John Doe's other contact details. The end user can also update John Doe's first and last names. If you have noticed, the last statement of the previous code snippet mentions contactRemovalButton. At this time, the John Doe entry in the contact information table is highlighted. If the end user clicks on the contact removal button, this information will be removed from both the contact information table and the contact details editor. Actually, the end user can highlight any entry in the contact information table and edit or delete it. You may have seen that during the whole process of creating, reading, updating, and deleting the contact, the portal page URL did not change and the portal page did not refresh. All the operations were performed through Ajax calls to the application server. This means that only a few database accesses happened during the whole process. This improves the site performance and reduces load on the application server. It also implies that if you develop Vaadin portlets in the Liferay Portal, you do not have to know the friendly URL configuration skill on a Liferay Portal project. In the peanut portlet, a developer cannot retrieve the logged-in user in the code, which is a weak point. In the following section, a potato portlet is implemented in such a way that a developer can retrieve the Liferay Portal information, including the logged-in user information. Summary In this article, we learned about portlets and their development. We learned ways todevelop simple JSR 286 portlets, SpringMVC portlets, and Vaadin portlets. We also learned to implement the view, edit, and help modes of a portlet. Resources for Article: Further resources on this subject: Setting up and Configuring a Liferay Portal [Article] Liferay, its Installation and setup [Article] Building your First Liferay Site [Article]

0
0
2000

How-To Tutorials - Data

Using Redis in a hostile environment (Advanced)

Implementing the Naïve Bayes classifier in Mahout

Using Faceted Search, from Searching to Finding

SAP HANA Architecture

Learning Option Pricing

Key components and inner working of Impala

SQL Server Analysis Services – Administering and Monitoring Analysis Services

Sharing Your BI Reports and Dashboards

Downloading and Setting Up ElasticSearch

Applied Modeling

Trending Topics

Settings goals

Getting Started with Apache Nutch

Managing IBM Cognos BI Server Components

Apache Solr PHP Integration

Portlet

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access