Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
Packt
12 Jun 2013
8 min read
Save for later

A quick start – OpenCV fundamentals

Packt
12 Jun 2013
8 min read
(For more resources related to this topic, see here.) The OpenCV library has a modular structure, and the following diagram depicts the different modules available in it: A brief description of all the modules is as follows: Module Feature Core A compact module defining basic data structures, including the dense multidimensional array Mat and basic functions used by all other modules. Imgproc An image processing module that includes linear and non-linear image filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on. Video A video analysis module that includes motion estimation, background subtraction, and object tracking algorithms. Calib3d Basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction. Features2d Salient feature detectors, descriptors, and descriptor matchers. Objdetect Detection of objects and instances of the predefined classes; for example, faces, eyes, mugs, people, cars, and so on. Highgui An easy-to-use interface to video capturing, image and video codecs, as well as simple UI capabilities. Gpu GPU-accelerated algorithms from different OpenCV modules. Task 1 – image basics When trying to recreate the physical world around us in digital format via a camera, for example, the computer just sees the image in the form of a code that just contains the numbers 1 and 0. A digital image is nothing but a collection of pixels (picture elements) which are then stored in matrices in OpenCV for further manipulation. In the matrices, each element contains information about a particular pixel in the image. The pixel value decides how bright or what color that pixel should be. Based on this, we can classify images as: Greyscale Color/RGB Greyscale Here the pixel value can range from 0 to 255 and hence we can see the various shades of gray as shown in the following diagram. Here, 0 represents black and 255 represents white: A special case of grayscale is the binary image or black and white image. Here every pixel is either black or white, as shown in the following diagram: Color/RGB Red, Blue, and Green are the primary colors and upon mixing them in various different proportions, we can get new colors. A pixel in a color image has three separate channels— one each for Red, Blue, and Green. The value ranges from 0 to 255 for each channel, as shown in the following diagram: Task 2 – reading and displaying an image We are now going to write a very simple and basic program using the OpenCV library to read and display an image. This will help you understand the basics. Code A simple program to read and display an image is as follows: // opencv header files #include "opencv2/highgui/highgui.hpp" #include "opencv2/core/core.hpp" // namespaces declaration using namespace cv; using namespace std; // create a variable to store the image Mat image; int main( int argc, char** argv ) { // open the image and store it in the 'image' variable // Replace the path with where you have downloaded the image image=imread("<path to image">/lena.jpg"); // create a window to display the image namedWindow( "Display window", CV_WINDOW_AUTOSIZE ); // display the image in the window created imshow( "Display window", image ); // wait for a keystroke waitKey(0); return 0; } Code explanation Now let us understand how the code works. Short comments have also been included in the code itself to increase the readability. #include "opencv2/highgui/highgui.hpp" #include "opencv2/core/core.hpp" The preceding two header files will be a part of almost every program we write using the OpenCV library. As explained earlier, the highgui header is used for window creation, management, and so on, while the core header is used to access the Mat data structure in OpenCV. using namespace cv; using namespace std; The preceding two lines declare the required namespaces for this code so that we don't have to use the :: (scope resolution) operator every time for accessing the functions. Mat image; With the above command, we have just created a variable image of the datatype Mat that is frequently used in OpenCV to store images. image=imread("<path to image">/lena.jpg"); In the previous command, we opened the image lena.jpg and stored it in the image variable. Replace <path to image> in the preceding command with the location of that picture on your PC. namedWindow( "Display window", CV_WINDOW_AUTOSIZE ); We now need a window to display our image. So, we use the above function to do the same. This function takes two parameters, out of which the first one is the name of the window. In our case, we would like to name our window Display Window. The second parameter is optional, but it resizes the window based on the size of the image so that the image is not cropped. imshow( "Display window", image ); Finally, we are ready to display our image in the window we just created by using the preceding function. This function takes two parameters out of which the first one is the window name in which the image has to be displayed. In our case, obviously, that will be Display Window . The second parameter is the image variable containing the image that we want to display. In our case, it's the image variable. waitKey(0); Last but not least, it is advised that you use the preceding function in most of the codes that you write using the OpenCV library. If we don't write this code, the image will be displayed for a fraction of a second and the program will be immediately terminated. It happens so fast that you will not be able to see the image. What this function does essentially is that it waits for a keystroke from the user and hence it delays the termination of the program. The delay here is in milliseconds. Output The image can be displayed as follows: Task 3 – resizing and saving an image We are now going to write a very simple and basic program using the OpenCV library to resize and save an image. Code The following code helps you to resize a given image: // opencv header files #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include "opencv2/core/core.hpp" // namespaces declaration using namespace std; using namespace cv; int main(int argc, char** argv) { // create variables to store the images Mat org, resized,saved; // open the image and store it in the 'org' variable // Replace the path with where you have downloaded the image org=imread("<path to image>/lena.png"); //Create a window to display the image namedWindow("Original Image",CV_WINDOW_AUTOSIZE); //display the image imshow("Original Image",org); //resize the image resize(org,resized,Size(),0.5,0.5,INTER_LINEAR); namedWindow("Resized Image",CV_WINDOW_AUTOSIZE); imshow("Resized Image",resized); //save the image //Replace <path> with your desired location imwrite("<path>/saved.png",resized; namedWindow("Image saved",CV_WINDOW_AUTOSIZE); saved=imread("<path to image>/saved.png"); imshow("Image saved",saved); //wait for a keystroke waitKey(0); return 0; } Code explanation Only the new functions/concepts will be explained in this case. #include "opencv2/imgproc/imgproc.hpp" Imgproc is another useful header that gives us access to the various transformations, color conversions, filters, histograms, and so on. Mat org, resized; We have now created two variables, org and resized, to store the original and resized images respectively. resize(org,resized,Size(),0.5,0.5,INTER_LINEAR); We have used the preceding function to resize the image. The preceding function takes six parameters, out of which the first one is the variable containing the source image to be modified. The second one is the variable to store the resized image. The third parameter is the output image size. In this case we have not specified this, but we have instead used the Size() function, which will automatically calculate it based on the values of the fourth and fifth parameters. The fourth and fifth parameters are the scale factors along the horizontal and vertical axes respectively. The sixth parameter is for choosing the type of interpolation method. We have used the bilinear interpolation, which is the default method. imwrite("<path>/saved.png",final); Finally, using the preceding function, you can save an image to a particular location on our PC. The function takes two parameters, out of which the first one is the location where you want to store the image and the second is the variable in which the image is stored. This function is very useful when you want to perform multiple operations on an image and save the image on your PC for future reference. Replace <path> in the preceding function with your desired location. Output Resizing can be demonstrated through the following output: Summary This section showed you how to perform a few of the basic tasks in OpenCV as well as how to write your first OpenCV program. Resources for Article : Further resources on this subject: OpenCV: Segmenting Images [Article] Tracking Faces with Haar Cascades [Article] OpenCV: Image Processing using Morphological Filters [Article]
Read more
  • 0
  • 0
  • 8544

article-image-extending-elasticsearch-scripting
Packt
06 Feb 2015
21 min read
Save for later

Extending ElasticSearch with Scripting

Packt
06 Feb 2015
21 min read
In article by Alberto Paro, the author of ElasticSearch Cookbook Second Edition, we will cover about the following recipes: (For more resources related to this topic, see here.) Installing additional script plugins Managing scripts Sorting data using scripts Computing return fields with scripting Filtering a search via scripting Introduction ElasticSearch has a powerful way of extending its capabilities with custom scripts, which can be written in several programming languages. The most common ones are Groovy, MVEL, JavaScript, and Python. In this article, we will see how it's possible to create custom scoring algorithms, special processed return fields, custom sorting, and complex update operations on records. The scripting concept of ElasticSearch can be seen as an advanced stored procedures system in the NoSQL world; so, for an advanced usage of ElasticSearch, it is very important to master it. Installing additional script plugins ElasticSearch provides native scripting (a Java code compiled in JAR) and Groovy, but a lot of interesting languages are also available, such as JavaScript and Python. In older ElasticSearch releases, prior to version 1.4, the official scripting language was MVEL, but due to the fact that it was not well-maintained by MVEL developers, in addition to the impossibility to sandbox it and prevent security issues, MVEL was replaced with Groovy. Groovy scripting is now provided by default in ElasticSearch. The other scripting languages can be installed as plugins. Getting ready You will need a working ElasticSearch cluster. How to do it... In order to install JavaScript language support for ElasticSearch (1.3.x), perform the following steps: From the command line, simply enter the following command: bin/plugin --install elasticsearch/elasticsearch-lang-javascript/2.3.0 This will print the following result: -> Installing elasticsearch/elasticsearch-lang-javascript/2.3.0... Trying http://download.elasticsearch.org/elasticsearch/elasticsearch-lang-javascript/ elasticsearch-lang-javascript-2.3.0.zip... Downloading ....DONE Installed lang-javascript If the installation is successful, the output will end with Installed; otherwise, an error is returned. To install Python language support for ElasticSearch, just enter the following command: bin/plugin -install elasticsearch/elasticsearch-lang-python/2.3.0 The version number depends on the ElasticSearch version. Take a look at the plugin's web page to choose the correct version. How it works... Language plugins allow you to extend the number of supported languages to be used in scripting. During the ElasticSearch startup, an internal ElasticSearch service called PluginService loads all the installed language plugins. In order to install or upgrade a plugin, you need to restart the node. The ElasticSearch community provides common scripting languages (a list of the supported scripting languages is available on the ElasticSearch site plugin page at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-plugins.html), and others are available in GitHub repositories (a simple search on GitHub allows you to find them). The following are the most commonly used languages for scripting: Groovy (http://groovy.codehaus.org/): This language is embedded in ElasticSearch by default. It is a simple language that provides scripting functionalities. This is one of the fastest available language extensions. Groovy is a dynamic, object-oriented programming language with features similar to those of Python, Ruby, Perl, and Smalltalk. It also provides support to write a functional code. JavaScript (https://github.com/elasticsearch/elasticsearch-lang-javascript): This is available as an external plugin. The JavaScript implementation is based on Java Rhino (https://developer.mozilla.org/en-US/docs/Rhino) and is really fast. Python (https://github.com/elasticsearch/elasticsearch-lang-python): This is available as an external plugin, based on Jython (http://jython.org). It allows Python to be used as a script engine. Considering several benchmark results, it's slower than other languages. There's more... Groovy is preferred if the script is not too complex; otherwise, a native plugin provides a better environment to implement complex logic and data management. The performance of every language is different; the fastest one is the native Java. In the case of dynamic scripting languages, Groovy is faster, as compared to JavaScript and Python. In order to access document properties in Groovy scripts, the same approach will work as in other scripting languages: doc.score: This stores the document's score. doc['field_name'].value: This extracts the value of the field_name field from the document. If the value is an array or if you want to extract the value as an array, you can use doc['field_name'].values. doc['field_name'].empty: This returns true if the field_name field has no value in the document. doc['field_name'].multivalue: This returns true if the field_name field contains multiple values. If the field contains a geopoint value, additional methods are available, as follows: doc['field_name'].lat: This returns the latitude of a geopoint. If you need the value as an array, you can use the doc['field_name'].lats method. doc['field_name'].lon: This returns the longitude of a geopoint. If you need the value as an array, you can use the doc['field_name'].lons method. doc['field_name'].distance(lat,lon): This returns the plane distance, in miles, from a latitude/longitude point. If you need to calculate the distance in kilometers, you should use the doc['field_name'].distanceInKm(lat,lon) method. doc['field_name'].arcDistance(lat,lon): This returns the arc distance, in miles, from a latitude/longitude point. If you need to calculate the distance in kilometers, you should use the doc['field_name'].arcDistanceInKm(lat,lon) method. doc['field_name'].geohashDistance(geohash): This returns the distance, in miles, from a geohash value. If you need to calculate the same distance in kilometers, you should use doc['field_name'] and the geohashDistanceInKm(lat,lon) method. By using these helper methods, it is possible to create advanced scripts in order to boost a document by a distance that can be very handy in developing geolocalized centered applications. Managing scripts Depending on your scripting usage, there are several ways to customize ElasticSearch to use your script extensions. In this recipe, we will see how to provide scripts to ElasticSearch via files, indexes, or inline. Getting ready You will need a working ElasticSearch cluster populated with the populate script (chapter_06/populate_aggregations.sh), available at https://github.com/aparo/ elasticsearch-cookbook-second-edition. How to do it... To manage scripting, perform the following steps: Dynamic scripting is disabled by default for security reasons; we need to activate it in order to use dynamic scripting languages such as JavaScript or Python. To do this, we need to turn off the disable flag (script.disable_dynamic: false) in the ElasticSearch configuration file (config/elasticseach.yml) and restart the cluster. To increase security, ElasticSearch does not allow you to specify scripts for non-sandbox languages. Scripts can be placed in the scripts directory inside the configuration directory. To provide a script in a file, we'll put a my_script.groovy script in the config/scripts location with the following code content: doc["price"].value * factor If the dynamic script is enabled (as done in the first step), ElasticSearch allows you to store the scripts in a special index, .scripts. To put my_script in the index, execute the following command in the command terminal: curl -XPOST localhost:9200/_scripts/groovy/my_script -d '{ "script":"doc["price"].value * factor" }' The script can be used by simply referencing it in the script_id field; use the following command: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "match_all": {} }, "sort": {    "_script" : {      "script_id" : "my_script",      "lang" : "groovy",      "type" : "number",      "ignore_unmapped" : true,      "params" : {        "factor" : 1.1      },      "order" : "asc"    } } }' How it works... ElasticSearch allows you to load your script in different ways; each one of these methods has their pros and cons. The most secure way to load or import scripts is to provide them as files in the config/scripts directory. This directory is continuously scanned for new files (by default, every 60 seconds). The scripting language is automatically detected by the file extension, and the script name depends on the filename. If the file is put in subdirectories, the directory path becomes part of the filename; for example, if it is config/scripts/mysub1/mysub2/my_script.groovy, the script name will be mysub1_mysub2_my_script. If the script is provided via a filesystem, it can be referenced in the code via the "script": "script_name" parameter. Scripts can also be available in the special .script index. These are the REST end points: To retrieve a script, use the following code: GET http://<server>/_scripts/<language>/<id"> To store a script use the following code: PUT http://<server>/_scripts/<language>/<id> To delete a script use the following code: DELETE http://<server>/_scripts/<language>/<id> The indexed script can be referenced in the code via the "script_id": "id_of_the_script" parameter. The recipes that follow will use inline scripting because it's easier to use it during the development and testing phases. Generally, a good practice is to develop using the inline dynamic scripting in a request, because it's faster to prototype. Once the script is ready and no changes are needed, it can be stored in the index since it is simpler to call and manage. In production, a best practice is to disable dynamic scripting and store the script on the disk (generally, dumping the indexed script to disk). See also The scripting page on the ElasticSearch website at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html Sorting data using script ElasticSearch provides scripting support for the sorting functionality. In real world applications, there is often a need to modify the default sort by the match score using an algorithm that depends on the context and some external variables. Some common scenarios are given as follows: Sorting places near a point Sorting by most-read articles Sorting items by custom user logic Sorting items by revenue Getting ready You will need a working ElasticSearch cluster and an index populated with the script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition. How to do it... In order to sort using scripting, perform the following steps: If you want to order your documents by the price field multiplied by a factor parameter (that is, sales tax), the search will be as shown in the following code: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "match_all": {} }, "sort": {    "_script" : {      "script" : "doc["price"].value * factor",      "lang" : "groovy",      "type" : "number",      "ignore_unmapped" : true,    "params" : {        "factor" : 1.1      },            "order" : "asc"        }    } }' In this case, we have used a match_all query and a sort script. If everything is correct, the result returned by ElasticSearch should be as shown in the following code: { "took" : 7, "timed_out" : false, "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0 }, "hits" : {    "total" : 1000,    "max_score" : null,    "hits" : [ {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "161",      "_score" : null, "_source" : … truncated …,      "sort" : [ 0.0278578661440021 ]    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "634",      "_score" : null, "_source" : … truncated …,     "sort" : [ 0.08131364254827411 ]    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "465",      "_score" : null, "_source" : … truncated …,      "sort" : [ 0.1094966959069832 ]    } ] } } How it works... The sort scripting allows you to define several parameters, as follows: order (default "asc") ("asc" or "desc"): This determines whether the order must be ascending or descending. script: This contains the code to be executed. type: This defines the type to convert the value. params (optional, a JSON object): This defines the parameters that need to be passed. lang (by default, groovy): This defines the scripting language to be used. ignore_unmapped (optional): This ignores unmapped fields in a sort. This flag allows you to avoid errors due to missing fields in shards. Extending the sort with scripting allows the use of a broader approach to score your hits. ElasticSearch scripting permits the use of every code that you want. You can create custom complex algorithms to score your documents. There's more... Groovy provides a lot of built-in functions (mainly taken from Java's Math class) that can be used in scripts, as shown in the following table: Function Description time() The current time in milliseconds sin(a) Returns the trigonometric sine of an angle cos(a) Returns the trigonometric cosine of an angle tan(a) Returns the trigonometric tangent of an angle asin(a) Returns the arc sine of a value acos(a) Returns the arc cosine of a value atan(a) Returns the arc tangent of a value toRadians(angdeg) Converts an angle measured in degrees to an approximately equivalent angle measured in radians toDegrees(angrad) Converts an angle measured in radians to an approximately equivalent angle measured in degrees exp(a) Returns Euler's number raised to the power of a value log(a) Returns the natural logarithm (base e) of a value log10(a) Returns the base 10 logarithm of a value sqrt(a) Returns the correctly rounded positive square root of a value cbrt(a) Returns the cube root of a double value IEEEremainder(f1, f2) Computes the remainder operation on two arguments, as prescribed by the IEEE 754 standard ceil(a) Returns the smallest (closest to negative infinity) value that is greater than or equal to the argument and is equal to a mathematical integer floor(a) Returns the largest (closest to positive infinity) value that is less than or equal to the argument and is equal to a mathematical integer rint(a) Returns the value that is closest in value to the argument and is equal to a mathematical integer atan2(y, x) Returns the angle theta from the conversion of rectangular coordinates (x,y_) to polar coordinates (r,_theta) pow(a, b) Returns the value of the first argument raised to the power of the second argument round(a) Returns the closest integer to the argument random() Returns a random double value abs(a) Returns the absolute value of a value max(a, b) Returns the greater of the two values min(a, b) Returns the smaller of the two values ulp(d) Returns the size of the unit in the last place of the argument signum(d) Returns the signum function of the argument sinh(x) Returns the hyperbolic sine of a value cosh(x) Returns the hyperbolic cosine of a value tanh(x) Returns the hyperbolic tangent of a value hypot(x,y) Returns sqrt(x^2+y^2) without an intermediate overflow or underflow acos(a) Returns the arc cosine of a value atan(a) Returns the arc tangent of a value If you want to retrieve records in a random order, you can use a script with a random method, as shown in the following code: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "match_all": {} }, "sort": {    "_script" : {      "script" : "Math.random()",      "lang" : "groovy",      "type" : "number",      "params" : {}    } } }' In this example, for every hit, the new sort value is computed by executing the Math.random() scripting function. See also The official ElasticSearch documentation at http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html Computing return fields with scripting ElasticSearch allows you to define complex expressions that can be used to return a new calculated field value. These special fields are called script_fields, and they can be expressed with a script in every available ElasticSearch scripting language. Getting ready You will need a working ElasticSearch cluster and an index populated with the script (chapter_06/populate_aggregations.sh), which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition. How to do it... In order to compute return fields with scripting, perform the following steps: Return the following script fields: "my_calc_field": This concatenates the text of the "name" and "description" fields "my_calc_field2": This multiplies the "price" value by the "discount" parameter From the command line, execute the following code: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/ _search?&pretty=true&size=3' -d '{ "query": {    "match_all": {} }, "script_fields" : {    "my_calc_field" : {      "script" : "doc["name"].value + " -- " + doc["description"].value"    },    "my_calc_field2" : {      "script" : "doc["price"].value * discount",      "params" : {       "discount" : 0.8      }    } } }' If everything works all right, this is how the result returned by ElasticSearch should be: { "took" : 4, "timed_out" : false, "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0 }, "hits" : {    "total" : 1000,    "max_score" : 1.0,    "hits" : [ {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "4",      "_score" : 1.0,      "fields" : {        "my_calc_field" : "entropic -- accusantium",        "my_calc_field2" : 5.480038242170081      }    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "9",      "_score" : 1.0,      "fields" : {        "my_calc_field" : "frankie -- accusantium",        "my_calc_field2" : 34.79852410178313      }    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "11",      "_score" : 1.0,      "fields" : {        "my_calc_field" : "johansson -- accusamus",        "my_calc_field2" : 11.824173084636591      }    } ] } } How it works... The scripting fields are similar to executing an SQL function on a field during a select operation. In ElasticSearch, after a search phase is executed and the hits to be returned are calculated, if some fields (standard or script) are defined, they are calculated and returned. The script field, which can be defined with all the supported languages, is processed by passing a value to the source of the document and, if some other parameters are defined in the script (in the discount factor example), they are passed to the script function. The script function is a code snippet; it can contain everything that the language allows you to write, but it must be evaluated to a value (or a list of values). See also The Installing additional script plugins recipe in this article to install additional languages for scripting The Sorting using script recipe to have a reference of the extra built-in functions in Groovy scripts Filtering a search via scripting ElasticSearch scripting allows you to extend the traditional filter with custom scripts. Using scripting to create a custom filter is a convenient way to write scripting rules that are not provided by Lucene or ElasticSearch, and to implement business logic that is not available in the query DSL. Getting ready You will need a working ElasticSearch cluster and an index populated with the (chapter_06/populate_aggregations.sh) script, which is available at https://github.com/aparo/ elasticsearch-cookbook-second-edition. How to do it... In order to filter a search using a script, perform the following steps: Write a search with a filter that filters out a document with the value of age less than the parameter value: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "filtered": {      "filter": {        "script": {          "script": "doc["age"].value > param1",          "params" : {            "param1" : 80          }        }      },      "query": {        "match_all": {}      }    } } }' In this example, all the documents in which the value of age is greater than param1 are qualified to be returned. If everything works correctly, the result returned by ElasticSearch should be as shown here: { "took" : 30, "timed_out" : false, "_shards" : {    "total" : 5,    "successful" : 5,    "failed" : 0 }, "hits" : {    "total" : 237,    "max_score" : 1.0,    "hits" : [ {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "9",      "_score" : 1.0, "_source" :{ … "age": 83, … }    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "23",      "_score" : 1.0, "_source" : { … "age": 87, … }    }, {      "_index" : "test-index",      "_type" : "test-type",      "_id" : "47",      "_score" : 1.0, "_source" : {…. "age": 98, …}    } ] } } How it works... The script filter is a language script that returns a Boolean value (true/false). For every hit, the script is evaluated, and if it returns true, the hit passes the filter. This type of scripting can only be used as Lucene filters, not as queries, because it doesn't affect the search (the exceptions are constant_score and custom_filters_score). These are the scripting fields: script: This contains the code to be executed params: These are optional parameters to be passed to the script lang (defaults to groovy): This defines the language of the script The script code can be any code in your preferred and supported scripting language that returns a Boolean value. There's more... Other languages are used in the same way as Groovy. For the current example, I have chosen a standard comparison that works in several languages. To execute the same script using the JavaScript language, use the following code: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "filtered": {      "filter": {        "script": {          "script": "doc["age"].value > param1",          "lang":"javascript",          "params" : {            "param1" : 80          }        }      },      "query": {        "match_all": {}      }    } } }' For Python, use the following code: curl -XGET 'http://127.0.0.1:9200/test-index/test-type/_search?&pretty=true&size=3' -d '{ "query": {    "filtered": {      "filter": {        "script": {          "script": "doc["age"].value > param1",          "lang":"python",          "params" : {            "param1" : 80          }        }      },      "query": {        "match_all": {}      }    } } }' See also The Installing additional script plugins recipe in this article to install additional languages for scripting The Sorting data using script recipe in this article to get a reference of the extra built-in functions in Groovy scripts Summary In this article you have learnt the ways you can use scripting to extend the ElasticSearch functional capabilities using different programming languages. Resources for Article: Further resources on this subject: Indexing the Data [Article] Low-Level Index Control [Article] Designing Puppet Architectures [Article]
Read more
  • 0
  • 0
  • 8475

article-image-how-bad-is-the-gender-diversity-crisis-in-ai-research-study-analysing-1-5million-arxiv-papers-says-its-serious
Fatema Patrawala
18 Jul 2019
9 min read
Save for later

How bad is the gender diversity crisis in AI research? Study analysing 1.5million arxiv papers says it’s “serious”

Fatema Patrawala
18 Jul 2019
9 min read
Yesterday the team at Nesta organization, an innovation firm based out of UK published a research on gender diversity in the AI research workforce. The authors of this research are Juan Mateos Garcis, the Director, Konstantinos Stathoulopoulos, the Principal Researcher and Hannah Owen, the Programme Coordinator at Nesta. https://twitter.com/JMateosGarcia/status/1151517641103872006 They have prepared an analysis purely based on 1.5 million arxiv papers. The team claims that it is the first ever study of gender diversity in AI which is not on any convenience sampling or proprietary database. The team posted on its official blog post, “We conducted a large-scale analysis of gender diversity in AI research using publications from arXiv, a repository with more than 1.5 million preprints widely used by the AI community. We aim to expand the evidence base on gender diversity in AI research and create a baseline with which to interrogate the impact of current and future policies and interventions.  To achieve this, we enriched the ArXiv data with geographical, discipline and gender information in order to study the evolution of gender diversity in various disciplines, countries and institutions as well as examine the semantic differences between AI papers with and without female co-authors.” With this research the team also aims to bring prominent female figures they have identified under the spotlight. Key findings from the research Serious gender diversity crisis in AI research The team found a severe gender diversity gap in AI research with only 13.83% of authors being women. Moreover, in relative terms, the proportion of AI papers co-authored by at least one woman has not improved since the 1990s. Juan Mateos thinks this kind of crisis is a waste of talent and it increases the risk of discriminatory AI systems. https://twitter.com/JMateosGarcia/status/1151517642236276736 Location and research domain are significant drivers of gender diversity Women in the Netherlands, Norway and Denmark are more likely to publish AI papers while those in Japan and Singapore are less likely. In the UK, 26.62% of the AI papers have at least one female co-author, placing the country at the 22nd spot worldwide. The US follows the UK in terms of having at least one female co-authors at 25% and for the unique female author US leads one position above UK. Source: Nesta research report Regarding the research domains, women working in Physics and Education, Computer Ethics and other societal issues and Biology are more likely to publish their work on AI in comparison to those working in Computer Science or Mathematics. Source: Nesta research report Significant gender diversity gap in universities, big tech companies and other research institutions Apart from the University of Washington, every other academic institution and organisation in the dataset has less than 25% female AI researchers. Regarding some of the big tech, only 11.3% of Google’s employees who have published their AI research on arXiv are women, while the proportion is similar for Microsoft (11.95%) and is slightly better for IBM (15.66%). Important semantic differences between AI paper with and without a female co-author When examining the publications in the Machine Learning and Societal topics in the UK in 2012 and 2015, papers involving at least one female co-author tend to be more semantically similar to each other than with those without any female authors. Moreover, papers with at least one female co-author tend to be more applied and socially aware, with terms such as fairness, human mobility, mental, health, gender and personality being among the most salient ones. Juan Mateos noted that this is an area which deserves further research. https://twitter.com/JMateosGarcia/status/1151517647361781760   The top 15 women with the most AI publications on arXiv identified Aarti Singh, Associate Professor at the Machine learning department of Carnegie Mellon University Cordelia Schmid, is a part of Google AI team and holds a permanent research position at Inria Grenoble Rhone-Alpes Cynthia Rudin, an associate professor of computer science, electrical and computer engineering, statistical science and mathematics at Duke University Devi Parikh, an Assistant Professor in the School of Interactive Computing at Georgia Tech Karen Livescu, an Associate Professor at Toyota Technical Institute at Chicago Kate Saenko,  an Associate Professor at the Department of Computer at Boston University Kristina Lerman, a Project Leader at the Information Sciences Institute at the University of Southern California Marilyn A. Walker, a Professor at the Department of Computer Science at the University of California Mihaela van der Schaar, is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Fellow at The Alan Turing Institute in London Petia Radeva, a professor at the Department of Mathematics and Computer Science, Faculty of Mathematics and Computer Science at the Universitat de Barcelona Regina Barzilay is a professor at the Massachusetts Institute of Technology and a member of the MIT Computer Science and Artificial Intelligence Laboratory Svetha Venkatesh, an ARC Australian Laureate Fellow, Alfred Deakin Professor and Director of the Centre for Pattern Recognition and Data Analytics (PRaDA) at Deakin University Xiaodan Liang, an Associate Professor at the School of Intelligent Systems Engineering, Sun Yat-sen University Yonina C. Elda, a Professor of Electrical Engineering, Weizmann Faculty of Mathematics and Computer Science at the University of Israel Zeynep Akata, an Assistant Professor with the University of Amsterdam in the Netherlands There are 5 other women researchers who were not identified in the study. Interviews bites from few women contributors and institutions The research team also interviewed few researchers and institutions identified in their work and they think a system wide reform is needed. When the team discussed the findings with the most cited female researcher Mihaela Van Der Schaar, she did feel that her presence in the field has only started to be recognised, having begun her career in 2003, ‘I think that part of the reason for this is because I am a woman, and the experience of (the few) other women in AI in the same period has been similar.’ she says. Professor Van Der Schaar also described herself and many of her female colleagues as ‘faceless’, she suggested that the work of celebrating leading women in the field could have a positive impact on the representation of women, as well as the disparity in the recognition that these women receive. This suggests that work is needed across the pipeline, not just with early-stage invention in education, but support for those women in the field. She also highlighted the importance of open discussion about the challenges women face in the AI sector and that workplace changes such as flexible hours are needed to enable researchers to participate in a fast-paced sector without sacrificing their family life. The team further discussed the findings with the University of Washington’s Eve Riskin, Associate Dean of Diversity and Access in the College of Engineering. Riskin described that much of her female faculty experienced a ‘toxic environment’ and pervasive imposter syndrome. She also emphasized the fact that more research is needed in terms of the career trajectories of the male and female researchers including the recruitment and retention. Some recent examples of exceptional women in AI research and their contribution While these women talk about the diversity gaps in this field recently we have seen works from female researchers like Katie Bouman which gained significant attention. Katie is a post-doctoral fellow at MIT whose algorithm led to an image of a supermassive black hole. But then all the attention became a catalyst for a sexist backlash on social media and YouTube. It set off “what can only be described as a sexist scavenger hunt,” as The Verge described it, in which an apparently small group of vociferous men questioned Bouman’s role in the project. “People began going over her work to see how much she’d really contributed to the project that skyrocketed her to unasked-for fame.” Another incredible example in the field of AI research and ethics is of Meredith Whittaker, an ex-Googler, now a program manager, activist, and co-founder of the AI Now Institute at New York University. Meredith is committed to the AI Now Institute, her AI ethics work, and to organize an accountable tech industry. On Tuesday,  Meredith left Google after facing retaliation from company for organizing last year’s protest of Google Walkout for Real Change demanding the company for structural changes to ensure a safe and conducive work environment for everyone.. Other observations from the research and next steps The research also highlights the fact that women are as capable as men in contributing to technical topics while they tend to contribute more than men to publications with a societal or ethical output. Some of the leading AI researchers in the field shared their opinion on this: Petia Radeva, Professor at the Department of Mathematics and Computer Science at the University of Barcelona, was positive that the increasingly broad domains of application for AI and the potential impact of this technology will attract more women into the sector. Similarly, Van Der Schaar suggests that “publicising the interdisciplinary scope of possibilities and career paths that studying AI can lead to will help to inspire a more diverse group of people to pursue it. In parallel, the industry will benefit from a pipeline of people who are motivated by combining a variety of ideas and applying them across domains.” The research team in future will explore the temporal co-authorship network of AI papers to examine how different the career trajectory of male and female researchers might be. They will survey AI researchers on arXiv and investigate the drivers of the diversity gap in more detail through their innovation mapping methods. They also plan to extend this analysis to identify the representation of other underrepresented groups. Meredith Whittaker, Google Walkout organizer, and AI ethics researcher is leaving the company, adding to its brain-drain woes over ethical concerns “I’m concerned about Libra’s model for decentralization”, says co-founder of Chainspace, Facebook’s blockchain acquisition DeepMind’s Alphastar AI agent will soon anonymously play with European StarCraft II players
Read more
  • 0
  • 0
  • 8442

article-image-visualizing-my-social-graph-d3js
Packt
24 Oct 2013
7 min read
Save for later

Visualizing my Social Graph with d3.js

Packt
24 Oct 2013
7 min read
(For more resources related to this topic, see here.) The Social Networks Analysis Social Networks Analysis (SNA) is not new, sociologists have been using it for a long time to study human relationships (sociometry), to find communities and to simulate how information or a disease is spread in a population. With the rise of social networking sites such as Facebook, Twitter, LinkedIn, and so on. The acquisition of large amounts of social network data is easier. We can use SNA to get insight about customer behavior or unknown communities. It is important to say that this is not a trivial task and we will come across sparse data and a lot of noise (meaningless data). We need to understand how to distinguish between false correlation and causation. A good start is by knowing our graph through visualization and statistical analysis. Social networking sites bring us the opportunities to ask questions that otherwise are too hard to approach, because polling enough people is time-consuming and expensive. In this article, we will obtain our social network's graph from Facebook (FB) website in order to visualize the relationships between our friends. Finally we will create an interactive visualization of our graph using D3.js. Getting ready The easiest method to get our friends list is by using a third-party application. Netvizz is a Facebook app developed by Bernhard Rieder, which allows exporting social graph data to gdf and tab formats. Netvizz may export information about our friends such as gender, age, locale, posts, and likes. In order to get our social graph from Netvizz we need to access the link below and giving access to your Facebook profile. https://apps.facebook.com/netvizz/ As is shown in the following screenshot, we will create a gdf file from our personal friend network by clicking on the link named here in the Step 2. Then we will download the GDF (Graph Modeling Language) file. Netvizz will give us the number of nodes and edges (links); finally we will click on the gdf file link, as we can see in the following screenshot: The output file myFacebookNet.gdf will look like this: nodedef>name VARCHAR,label VARCHAR,gender VARCHAR,locale VARCHAR,agerankINT23917067,Jorge,male,en_US,10623931909,Haruna,female,en_US,10535702006,Joseph,male,en_US,104503839109,Damian,male,en_US,103532735006,Isaac,male,es_LA,102. . .edgedef>node1 VARCHAR,node2 VARCHAR23917067,3570200623917067,62939583723917067,74734348223917067,75560507523917067,1186286815. . . In the following screenshot we may see the visualization of the graph (106 nodes and 279 links). The nodes represent my friends and the links represent how my friends are connected between them. Transforming GDF to JSON In order to work with the graph in the web with d3.js, we need to transform our gdf file to json format. Firstly, we need to import the libraries numpy and json. import numpy as npimport json The numpy function, genfromtxt, will obtain only the ID and name from the nodes.csv file using the usecols attribute in the 'object' format. nodes = np.genfromtxt("nodes.csv",dtype='object',delimiter=',',skip_header=1,usecols=(0,1)) Then, the numpy function, genfromtxt, will obtain links with the source node and target node from the links.csv file using the usecols attribute in the 'object' format. links = np.genfromtxt("links.csv",dtype='object',delimiter=',',skip_header=1,usecols=(0,1)) The JSON format used in the D3.js Force Layout graph implemented in this article requires transforming the ID (for example, 100001448673085) into a numerical position in the list of nodes. Then, we need to look for each appearance of the ID in the links and replace them by their position in the list of nodes. for n in range(len(nodes)):for ls in range(len(links)):if nodes[n][0] == links[ls][0]:links[ls][0] = nif nodes[n][0] == links[ls][1]:links[ls][1] = n Now, we need to create a dictionary "data" to store the JSON file. data ={} Next, we need to create a list of nodes with the names of the friends in the format as follows: "nodes": [{"name": "X"},{"name": "Y"},. . .] and add it to thedata dictionary.lst = []for x in nodes:d = {}d["name"] = str(x[1]).replace("b'","").replace("'","")lst.append(d)data["nodes"] = lst Now, we need to create a list of links with the source and target in the format as follows: "links": [{"source": 0, "target": 2},{"source": 1, "target":2},. . .] and add it to the data dictionary.lnks = []for ls in links:d = {}d["source"] = ls[0]d["target"] = ls[1]lnks.append(d)data["links"] = lnks Finally, we need to create the file, newJson.json, and write the data dictionary in the file with the function dumps of the json library. with open("newJson.json","w") as f:f.write(json.dumps(data)) The file newJson.json will look as follows: {"nodes": [{"name": "Jorge"},{"name": "Haruna"},{"name": "Joseph"},{"name": "Damian"},{"name": "Isaac"},. . .],"links": [{"source": 0, "target": 2},{"source": 0, "target": 12},{"source": 0, "target": 20},{"source": 0, "target": 23},{"source": 0, "target": 31},. . .]} Graph visualization with D3.js D3.js provides us with the d3.layout.force() function that use the Force Atlas layout algorithm and help us to visualize our graph. First, we need to define the CSS style for the nodes, links, and node labels. <style>.link {fill: none;stroke: #666;stroke-width: 1.5px;}.node circle{fill: steelblue;stroke: #fff;stroke-width: 1.5px;}.node text{pointer-events: none;font: 10px sans-serif;}</style> Then, we need to refer the d3js library. <script src = "http://d3js.org/d3.v3.min.js"></script> Then, we need to define the width and height parameters for the svg container and include into the body tag. var width = 1100,height = 800var svg = d3.select("body").append("svg").attr("width", width).attr("height", height); Now, we define the properties of the force layout such as gravity, distance, and size. var force = d3.layout.force().gravity(.05).distance(150).charge(-100).size([width, height]); Then, we need to acquire the data of the graph using the JSON format. We will configure the parameters for nodes and links. d3.json("newJson.json", function(error, json) {force.nodes(json.nodes).links(json.links).start(); For a complete reference about the d3js Force Layout implementation, visit the link https://github.com/mbostock/d3/wiki/Force-Layout. Then, we define the links as lines from the json data. var link = svg.selectAll(".link").data(json.links).enter().append("line").attr("class", "link");var node = svg.selectAll(".node").data(json.nodes).enter().append("g").attr("class", "node").call(force.drag); Now, we define the node as circles of size 6 and include the labels of each node. node.append("circle").attr("r", 6);node.append("text").attr("dx", 12).attr("dy", ".35em").text(function(d) { return d.name }); Finally, with the function, tick, run step-by-step the force layout simulation. force.on("tick", function(){link.attr("x1", function(d) { return d.source.x; }).attr("y1", function(d) { return d.source.y; }).attr("x2", function(d) { return d.target.x; }).attr("y2", function(d) { return d.target.y; });node.attr("transform", function(d){return "translate(" + d.x + "," + d.y + ")";})});});</script> In the image below we can see the result of the visualization. In order to run the visualization we just need to open a Command Terminal and run the following Python command or any other web server. >>python –m http.server 8000 Then you just need to open a web browser and type the direction http://localhost:8000/ForceGraph.html. In the HTML page we can see our Facebook graph with a gravity effect and we can interactively drag-and-drop the nodes. All the codes and datasets of this article may be found in the author github repository in the link below.https://github.com/hmcuesta/PDA_Book/tree/master/Chapter10 Summary In this article we developed our own social graph visualization tool with D3js, transforming the data obtained from Netvizz with GDF format into JSON. Resources for Article: Further resources on this subject: GNU Octave: Data Analysis Examples [Article] Securing data at the cell level (Intermediate) [Article] Analyzing network forensic data (Become an expert) [Article]
Read more
  • 0
  • 0
  • 8406

article-image-preprocessing-data
Packt
16 Aug 2016
5 min read
Save for later

Preprocessing the Data

Packt
16 Aug 2016
5 min read
In this article, by Sampath Kumar Kanthala, the author of the book Practical Data Analysis discusses how to obtain, clean, normalize, and transform raw data into a standard format like CVS or JSON using OpenRefine. In this article we will cover: Data Scrubbing Statistical methods Text Parsing Data Transformation (For more resources related to this topic, see here.) Data scrubbing Scrubbing data also called data cleansing, is the process of correcting or removing data in a dataset that is incorrect, inaccurate, incomplete, improperly formatted, or duplicated. The result of the data analysis process not only depends on the algorithms, it depends on the quality of the data. That's why the next step after obtaining the data, is the data scrubbing. In order to avoid dirty data our dataset should possess the following characteristics: Correct Completeness Accuracy Consistency Uniformity Dirty data can be detected by applying some simple statistical data validation also by parsing the texts or deleting duplicate values. Missing or sparse data can lead you to highly misleading results. Statistical methods In this method we need some context about the problem (knowledge domain) to find values that are unexpected and thus erroneous, even if the data type match but the values are out of the range, it can be resolved by setting the values to an average or mean value. Statistical validations can be used to handle missing values which can be replaced by one or more probable values using Interpolation or by reducing the data set using decimation. Mean: Value calculated by summing up all values and then dividing by the number of values. Median: The median is defined as the value where 50% of values in a range will be below, 50% of values above the value. Range constraints: Numbers or dates should fall within a certain range. That is, they have minimum and/or maximum possible values. Clustering: Usually, when we obtain data directly from the user some values include ambiguity or refer to the same value with a typo. For example, "Buchanan Deluxe 750ml 12x01 "and "Buchanan Deluxe 750ml   12x01." which are different only by a "." or in the case of "Microsoft" or "MS" instead of "Microsoft Corporation" which refer to the same company and all values are valid. In those cases, grouping can help us to get accurate data and eliminate duplicated enabling a faster identification of unique values. Text parsing We perform parsing to help us to validate if a string of data is well formatted and avoid syntax errors. Regular expression patterns usually, text fields would have to be validated this way. For example, dates, e-mail, phone numbers, and IP address. Regex is a common abbreviation for "regular expression"): In Python we will use re module to implement regular expressions. We can perform text search and pattern validations. First, we need to import the re module. import re In the follow examples, we will implement three of the most common validations (e-mail, IP address, and date format). E-mail validation: myString = 'From: readers@packt.com (readers email)' result = re.search('([w.-]+)@([w.-]+)', myString) if result: print (result.group(0)) print (result.group(1)) print (result.group(2)) Output: >>> readers@packt.com >>> readers >>> packt.com The function search() scans through a string, searching for any location where the Regex matches. The function group() helps us to return the string matched by the Regex. The pattern w matches any alphanumeric character and is equivalent to the class [a-zA-Z0-9_]. IP address validation: isIP = re.compile('d{1,3}.d{1,3}.d{1,3}.d{1,3}') myString = " Your IP is: 192.168.1.254 " result = re.findall(isIP,myString) print(result) Output: >>> 192.168.1.254 The function findall() finds all the substrings where the Regex matches, and returns them as a list. The pattern d matches any decimal digit, is equivalent to the class [0-9]. Date format: myString = "01/04/2001" isDate = re.match('[0-1][0-9]/[0-3][0-9]/[1-2][0-9]{3}', myString) if isDate: print("valid") else: print("invalid") Output: >>> 'valid' The function match() finds if the Regex matches with the string. The pattern implements the class [0-9] in order to parse the date format. For more information about regular expressions: http://docs.python.org/3.4/howto/regex.html#regex-howto Data transformation Data transformation is usually related with databases and data warehouse where values from a source format are extract, transform, and load in a destination format. Extract, Transform, and Load (ETL) obtains data from data sources, performs some transformation function depending on our data model and loads the result data into destination. Data extraction allows us to obtain data from multiple data sources, such as relational databases, data streaming, text files (JSON, CSV, XML), and NoSQL databases. Data transformation allows us to cleanse, convert, aggregate, merge, replace, validate, format, and split data. Data loading allows us to load data into destination format, like relational databases, text files (JSON, CSV, XML), and NoSQL databases. In statistics data transformation refers to the application of a mathematical function to the dataset or time series points. Summary In this article, we explored the common data sources and implemented a web scraping example. Next, we introduced the basic concepts of data scrubbing like statistical methods and text parsing. Resources for Article:   Further resources on this subject: MicroStrategy 10 [article] Expanding Your Data Mining Toolbox [article] Machine Learning Using Spark MLlib [article]
Read more
  • 0
  • 0
  • 8387

article-image-creating-reports-using-sql-server-2016-reporting-services
Kunal Chaudhari
04 Jan 2018
6 min read
Save for later

Creating reports using SQL Server 2016 Reporting Services

Kunal Chaudhari
04 Jan 2018
6 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book authored by Dinesh Priyankara and Robert C. Cain, titled SQL Server 2016 Reporting Services Cookbook.This book will help you create cross-browser and cross-platform reports using SQL Server 2016 Reporting Services.[/box] In today’s tutorial, we explore steps to create reports on multiple axis charts with SQL Server 2016. Often you will want to have multiple items plotted on a chart. In this article, we will plot two values over time, in this case, the Total Sales Amount (Excluding Tax) and the Total Tax Amount. As you might expect though, the tax amounts are going to be a small percentage of the sales amounts. By default, this would create a chart with a huge gap in the middle and a Y Axis that is quite large and difficult to pinpoint values on. To prevent this, Reporting Services allows us to place a second Y Axis on our charts. With this article, we'll explore both adding a second line to our chart as well as having it plotted on a second Y-Axis. Getting ready First, we'll create a new Reporting Services project to contain it. Name this new project Chapter03. Within the new project, create a Shared Data Source that will connect to the WideWorldImportersDW database. Name the new data source after the database, WideWorldImportersDW. Next, we'll need data. Our data will come from the sales table, and we will want to sum our totals by year, so we can plot our years across the X-Axis. For the Y-Axis, we'll use the totals of two fields: TotalExcludingTax and TaxAmount. Here is the query by which we will accomplish this: SELECT YEAR([Invoice Date Key]) AS InvoiceYear ,SUM([Total Excluding Tax]) AS TotalExcludingTax ,SUM([Tax Amount]) AS TotalTaxAmount FROM [Fact].[Sale] GROUP BY YEAR([Invoice Date Key]) How to do it… Right-click on the Reports branch in the Solution Explorer. Go to Add | New Item… from the pop-up menu. On the Add New Item dialog, select Report from the choice of templates in the middle (do not select Report wizard). At the bottom, name the report Report 03-01 Multi Axis Charts.rdl and click on Add. Go to the Report Data tool pane. Right-click on Data Sources and then click Add Data Source… from the menu. In the Name: area, enter WideWorldImportersDW. Change the data source option to the Use shared dataset source reference. In the dropdown, select WideWorldImportersDW. Click on OK to close the DataSet Properties window. Right-click on the Datasets branch and select Add Dataset…. Name the dataset SalesTotalsOverTime. Select the Use a dataset embedded in my report option. Select WideWorldImportersDW in the Data source dropdown. Paste in the query from the Getting ready area of this article: When your window resembles that of the preceding figure, click on OK. Next, go to the Toolbox pane. Drag and drop a Chart tool onto the report. Select the leftmost Line chart from the Select Chart Type window, and click on OK. Resize the chart to a larger size. (For this demo, the exact size is not important. For your production reports, you can resize as needed using the Properties window, as seen previously.) Click inside the main chart area to make the Chart Data dialog appear to the right of the chart. Click on the + (plus button) to the right of the Values. Select TotalExcludingTax. Click on the plus button again, and now pick TotalTaxAmount. Click on the + (plus button) beside Category Groups, and pick InvoiceYear. Click on Preview. You will note the large gap between the two graphed lines. In addition, the values for the Total Tax Amount are almost impossible to guess, as shown in the following figure: Return to the designer, and again click in the chart area to make the Chart Data dialog appear. In the Chart Data dialog, click on the dropdown beside TotalTaxAmount: Select Series Properties…. Click on the Axes and Chart Area page, and for Vertical axis, select Secondary: Click on OK to close the Series Properties window. Right-click on the numbers now appearing on the right in the vertical axis area, and select Secondary Vertical Axis Properties in the menu. In the Axis Options, uncheck Always include zero. Click on the Number page. Under Category, select Currency. Change the Decimal places to 0, and place a check in Use 1000 separator. Click on OK to close this window. Now move to the vertical axis on the left-hand side of the chart, right-click, and pick Vertical Axis Properties. Uncheck Always include zero. On the Number page, pick Currency, set Decimal places to 0, and check Use 1000 separator. Click on OK to close. Click on the Preview tab to see the results: You can now see a chart with a second axis. The monetary amounts are much easier to read. Further, the plotted lines have a similar rise and fall, indicating the taxes collected matched the sales totals in terms of trending. SSRS is capable of plotting multiple lines on a chart. Here we've just placed two fields, but you can add as many as you need. But do realize that the more lines included, the harder the chart can become to read. All that is needed is to put the additional fields into the Values area of the Chart Data window. When these values are of similar scale, for example, sales broken up by state, this works fine. There are times though when the scale between plotted values is so great that it distorts the entire chart, leaving one value in a slender line at the top and another at the bottom, with a huge gap in the middle. To fix this, SSRS allows a second Y-Axis to be included. This will create a scale for the field (or fields) assigned to that axis in the Series Properties window. To summarize, we learned how creating reports with multiple axis is much more simpler with SQL Server 2016 Reporting Services. If you liked our post, check out the book SQL Server 2016 Reporting Services Cookbook to know more about different types of reportings and Power BI integrations.  
Read more
  • 0
  • 0
  • 8384
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-animating-graphic-objects-using-python
Packt
01 Dec 2010
9 min read
Save for later

Animating Graphic Objects using Python

Packt
01 Dec 2010
9 min read
Python 2.6 Graphics Cookbook Over 100 great recipes for creating and animating graphics using Python Create captivating graphics with ease and bring them to life using Python Apply effects to your graphics using powerful Python methods Develop vector as well as raster graphics and combine them to create wonders in the animation world Create interactive GUIs to make your creation of graphics simpler Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to accomplish the task of creation and animation of graphics as efficiently as possible        Precise collisions using floating point numbers Here the simulation flaws caused by the coarseness of integer arithmetic are eliminated by using floating point numbers for all ball position calculations. How to do it... All position, velocity, and gravity variables are made floating point by writing them with explicit decimal points. The result is shown in the following screenshot, showing the bouncing balls with trajectory tracing. from Tkinter import * root = Tk() root.title("Collisions with Floating point") cw = 350 # canvas width ch = 200 # canvas height GRAVITY = 1.5 chart_1 = Canvas(root, width=cw, height=ch, background="black") chart_1.grid(row=0, column=0) cycle_period = 80 # Time between new positions of the ball # (milliseconds). time_scaling = 0.2 # This governs the size of the differential steps # when calculating changes in position. # The parameters determining the dimensions of the ball and it's # position. ball_1 = {'posn_x':25.0, # x position of box containing the # ball (bottom). 'posn_y':180.0, # x position of box containing the # ball (left edge). 'velocity_x':30.0, # amount of x-movement each cycle of # the 'for' loop. 'velocity_y':100.0, # amount of y-movement each cycle of # the 'for' loop. 'ball_width':20.0, # size of ball - width (x-dimension). 'ball_height':20.0, # size of ball - height (y-dimension). 'color':"dark orange", # color of the ball 'coef_restitution':0.90} # proportion of elastic energy # recovered each bounce ball_2 = {'posn_x':cw - 25.0, 'posn_y':300.0, 'velocity_x':-50.0, 'velocity_y':150.0, 'ball_width':30.0, 'ball_height':30.0, 'color':"yellow3", 'coef_restitution':0.90} def detectWallCollision(ball): # Collision detection with the walls of the container if ball['posn_x'] > cw - ball['ball_width']: # Collision # with right-hand wall. ball['velocity_x'] = -ball['velocity_x'] * ball['coef_ restitution'] # reverse direction. ball['posn_x'] = cw - ball['ball_width'] if ball['posn_x'] < 1: # Collision with left-hand wall. ball['velocity_x'] = -ball['velocity_x'] * ball['coef_ restitution'] ball['posn_x'] = 2 # anti-stick to the wall if ball['posn_y'] < ball['ball_height'] : # Collision # with ceiling. ball['velocity_y'] = -ball['velocity_y'] * ball['coef_ restitution'] ball['posn_y'] = ball['ball_height'] if ball['posn_y'] > ch - ball['ball_height']: # Floor # collision. ball['velocity_y'] = - ball['velocity_y'] * ball['coef_ restitution'] ball['posn_y'] = ch - ball['ball_height'] def diffEquation(ball): # An approximate set of differential equations of motion # for the balls ball['posn_x'] += ball['velocity_x'] * time_scaling ball['velocity_y'] = ball['velocity_y'] + GRAVITY # a crude # equation incorporating gravity. ball['posn_y'] += ball['velocity_y'] * time_scaling chart_1.create_oval( ball['posn_x'], ball['posn_y'], ball['posn_x'] + ball['ball_width'], ball ['posn_y'] + ball['ball_height'], fill= ball['color']) detectWallCollision(ball) # Has the ball collided with # any container wall? for i in range(1,2000): # end the program after 1000 position shifts. diffEquation(ball_1) diffEquation(ball_2) chart_1.update() # This refreshes the drawing on the canvas. chart_1.after(cycle_period) # This makes execution pause for 200 # milliseconds. chart_1.delete(ALL) # This erases everything on the root.mainloop() How it works... Use of precision arithmetic has allowed us to notice simulation behavior that was previously hidden by the sins of integer-only calculations. This is the UNIQUE VALUE OF GRAPHIC SIMULATION AS A DEBUGGING TOOL. If you can represent your ideas in a visual way rather than as lists of numbers you will easily pick up subtle quirks in your code. The human brain is designed to function best in graphical images. It is a direct consequence of being a hunter. A graphic debugging tool... There is another very handy trick in the software debugger's arsenal and that is the visual trace. A trace is some kind of visual trail that shows the history of dynamic behavior. All of this is revealed in the next example. Trajectory tracing and ball-to-ball collisions Now we introduce one of the more difficult behaviors in our simulation of ever increasing complexity – the mid-air collision. The hardest thing when you are debugging a program is to try to hold in your short term memory some recently observed behavior and compare it meaningfully with present behavior. This kind of memory is an imperfect recorder. The way to overcome this is to create a graphic form of memory – some sort of picture that shows accurately what has been happening in the past. In the same way that military cannon aimers use glowing tracer projectiles to adjust their aim, a graphic programmer can use trajectory traces to examine the history of execution. How to do it... In our new code there is a new function called detect_ball_collision (ball_1, ball_2) whose job is to anticipate imminent collisions between the two balls no matter where they are. The collisions will come from any direction and therefore we need to be able to test all possible collision scenarios and examine the behavior of each one and see if it does not work as planned. This can be too difficult unless we create tools to test the outcome. In this recipe, the tool for testing outcomes is a graphic trajectory trace. It is a line that trails behind the path of the ball and shows exactly where it went right since the beginning of the simulation. The result is shown in the following screenshot, showing the bouncing with ball-to-ball collision rebounds. # kinetic_gravity_balls_1.py # >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> from Tkinter import * import math root = Tk() root.title("Balls bounce off each other") cw = 300 # canvas width ch = 200 # canvas height GRAVITY = 1.5 chart_1 = Canvas(root, width=cw, height=ch, background="white") chart_1.grid(row=0, column=0) cycle_period = 80 # Time between new positions of the ball # (milliseconds). time_scaling = 0.2 # The size of the differential steps # The parameters determining the dimensions of the ball and its # position. ball_1 = {'posn_x':25.0, 'posn_y':25.0, 'velocity_x':65.0, 'velocity_y':50.0, 'ball_width':20.0, 'ball_height':20.0, 'color':"SlateBlue1", 'coef_restitution':0.90} ball_2 = {'posn_x':180.0, 'posn_y':ch- 25.0, 'velocity_x':-50.0, 'velocity_y':-70.0, 'ball_width':30.0, 'ball_height':30.0, 'color':"maroon1", 'coef_restitution':0.90} def detect_wall_collision(ball): # detect ball-to-wall collision if ball['posn_x'] > cw - ball['ball_width']: # Right-hand wall. ball['velocity_x'] = -ball['velocity_x'] * ball['coef_ restitution'] ball['posn_x'] = cw - ball['ball_width'] if ball['posn_x'] < 1: # Left-hand wall. ball['velocity_x'] = -ball['velocity_x'] * ball['coef_ restitution'] ball['posn_x'] = 2 if ball['posn_y'] < ball['ball_height'] : # Ceiling. ball['velocity_y'] = -ball['velocity_y'] * ball['coef_ restitution'] ball['posn_y'] = ball['ball_height'] if ball['posn_y'] > ch - ball['ball_height'] : # Floor ball['velocity_y'] = - ball['velocity_y'] * ball['coef_ restitution'] ball['posn_y'] = ch - ball['ball_height'] def detect_ball_collision(ball_1, ball_2): #detect ball-to-ball collision # firstly: is there a close approach in the horizontal direction if math.fabs(ball_1['posn_x'] - ball_2['posn_x']) < 25: # secondly: is there also a close approach in the vertical # direction. if math.fabs(ball_1['posn_y'] - ball_2['posn_y']) < 25: ball_1['velocity_x'] = -ball_1['velocity_x'] # reverse # direction. ball_1['velocity_y'] = -ball_1['velocity_y'] ball_2['velocity_x'] = -ball_2['velocity_x'] ball_2['velocity_y'] = -ball_2['velocity_y'] # to avoid internal rebounding inside balls ball_1['posn_x'] += ball_1['velocity_x'] * time_scaling ball_1['posn_y'] += ball_1['velocity_y'] * time_scaling ball_2['posn_x'] += ball_2['velocity_x'] * time_scaling ball_2['posn_y'] += ball_2['velocity_y'] * time_scaling def diff_equation(ball): x_old = ball['posn_x'] y_old = ball['posn_y'] ball['posn_x'] += ball['velocity_x'] * time_scaling ball['velocity_y'] = ball['velocity_y'] + GRAVITY ball['posn_y'] += ball['velocity_y'] * time_scaling chart_1.create_oval( ball['posn_x'], ball['posn_y'], ball['posn_x'] + ball['ball_width'], ball['posn_y'] + ball['ball_height'], fill= ball['color'], tags="ball_tag") chart_1.create_line( x_old, y_old, ball['posn_x'], ball ['posn_y'], fill= ball['color']) detect_wall_collision(ball) # Has the ball # collided with any container wall? for i in range(1,5000): diff_equation(ball_1) diff_equation(ball_2) detect_ball_collision(ball_1, ball_2) chart_1.update() chart_1.after(cycle_period) chart_1.delete("ball_tag") # Erase the balls but # leave the trajectories root.mainloop() How it works... Mid-air ball against ball collisions are done in two steps. In the first step, we test whether the two balls are close to each other inside a vertical strip defined by if math.fabs(ball_1['posn_x'] - ball_2['posn_x']) < 25. In plain English, this asks "Is the horizontal distance between the balls less than 25 pixels?" If the answer is yes, then the region of examination is narrowed down to a small vertical distance less than 25 pixels by the statement if math.fabs(ball_1['posn_y'] - ball_2['posn_y']) < 25. So every time the loop is executed, we sweep the entire canvas to see if the two balls are both inside an area where their bottom-left corners are closer than 25 pixels to each other. If they are that close then we simply cause a rebound off each other by reversing their direction of travel in both the horizontal and vertical directions. There's more... Simply reversing the direction is not the mathematically correct way to reverse the direction of colliding balls. Certainly billiard balls do not behave that way. The law of physics that governs colliding spheres demands that momentum be conserved. Why do we sometimes get tkinter.TckErrors? If we click the close window button (the X in the top right) while Python is paused, when Python revives and then calls on Tcl (Tkinter) to draw something on the canvas we will get an error message. What probably happens is that the application has already shut down, but Tcl has unfinished business. If we allow the program to run to completion before trying to shut the window then termination is orderly.
Read more
  • 0
  • 0
  • 8314

article-image-oracle-bi-publisher-11g-working-multiple-data-sources
Packt
25 Oct 2011
5 min read
Save for later

Oracle BI Publisher 11g: Working with Multiple Data Sources

Packt
25 Oct 2011
5 min read
(For more resources on Oracle 11g, see here.) The Data Model Editor's interface deals with all the components and functionalities needed for the data model to achieve the structure you need. However, the main component is the Data Set. In order to create a data model structure in BIP, you can choose from a variety of data set types, such as: SQL Query MDX Query Oracle BI Analysis View Object Web Service LDAP Query XML file Microsoft Excel file Oracle BI Discoverer HTTP Taking advantage of this variety requires multiple Data Sources of different types to be defined in the BIP. In this article, we will see: How data sources are configured How the data is retrieved from different data sets How data set type characteristics and the links between elements influence the data model structure Administration Let's first see, how you can verify or configure your data sources. You must choose the Administration link found in the upper-right corner of any of the BIP interface pages, as shown in the following screenshot:     The connection to your database can be choosen from the following connection types: Java Database Connectivity (JDBC) Java Naming and Directory Interface (JNDI) Lightweight Directory Access Protocol (LDAP) Online Analytical Processing (OLAP) Available Data Sources To get to your data source, BIP offers two possibilities: YOu can use a connection. In order to use a connection, these are the available connection types: JDBC JNDI LDAP OLAP You can also use a file. In the following sections, the Data Source types&mdashJDBC, JNDI, OLAP Connections, and File&mdashwill be explained in detail. JDBC Connection Let's take the first example. To configure a Data Source to use JDBC, from the Administration page, choose JDBC Connection from the Data Sources types list, as shown in the following screenshot:     You can see the requested parameters for configuring a JDBC connection in the following screenshot: Data Source Name: Enter a name of your choice. Driver Type: Choose a type from the list. The relating parameters are: Database Driver Class: A driver, matching your database type. Connection String: Information containing the computer name on which your database server is running, for example, port, database name, and so on. Username: Enter a database username. Password: Provide the database user's password. The Use System User option allows you to use the operating system's credentials as your credentials. For example, in this case, your MS SQL Database Server uses Windows authentication as the only authentication method. When you have a system administrator in-charge of these configurations, all you have to do is to find which are the available Data Sources and eventually you can check if the connection works. Click on the Test Connection button at the bottom of the page to test the connection:     JNDI Connection JNDI Connection pool is in fact another way to access your JDBC Data Sources. Using a connection pool increases efficiency by maintaining a cache of physical connections that can be reused, allowing multiple clients to share a small number of physical connections. In order to configure a Data Source to use JNDI, from the Administration page, choose JNDI Connection from the Data Sources types list. The following screen will appear:     As you can see in the preceding screenshot, on the Add Data Source page you must enter the following parameters: Data Source Name: Enter a name of your choice JNDI Name: This is the JNDI location for the pool set up in your application server, for example, jdbc/BIP10gSource The users having roles included in Allowed Roles list only will be able to create reports using this Data Source. OLAP Connection Use the OLAP Connection to connect to OLAP databases. BI Publisher supports the following OLAP types: Oracle Hyperion Essbase Microsoft SQL Server 2000 Analysis Services Microsoft SQL Server 2005 Analysis Services SAP BW In order to configure a connection to an OLAP database, from the Administration page, choose OLAP Connection from the Data Sources types list. The following screen will appear:     On the Add Data Source page, the following parameters must be entered: Data Source Name: Enter a name of your choice OLAP Type: Choose a type from the list Connection String: Depending on the supported OLAP databases, the connection string format is as follows: Oracle Hyperion Essbase Format: [server name] Microsoft SQL Server 2000 Analysis Services Format: Data Source=[server];Provider=msolap;Initial Catalog=[catalog] Microsoft SQL Server 2005 Analysis Services Format: Data Source=[server];Provider=msolap.3;Initial Catalog=[catalog] SAP BW Format: ASHOST=[server] SYSNR=[system number] CLIENT=[client] LANG=[language] Username and Password: Used for OLAP database authentication File Another example of a data source type is File. In order to gain access to XML or Excel files, you need a File Data Source. In order to set up this kind of Data Source, only one step is required&mdashenter the path to the Directory in which your files reside. You can see in the following screenshot that demo files Data Source points to the default BIP files directory. The file needs to be accessible from the BI Server (not on your local machine):    
Read more
  • 0
  • 0
  • 8313

article-image-pentaho-data-integration-4-working-complex-data-flows
Packt
27 Jun 2011
7 min read
Save for later

Pentaho Data Integration 4: working with complex data flows

Packt
27 Jun 2011
7 min read
Joining two or more streams based on given conditions There are occasions where you will need to join two datasets. If you are working with databases, you could use SQL statements to perform this task, but for other kinds of input (XML, text, Excel), you will need another solution. Kettle provides the Merge Join step to join data coming from any kind of source. Let's assume that you are building a house and want to track and manage the costs of building it. Before starting, you prepared an Excel file with the estimated costs for the different parts of your house. Now, you are given a weekly file with the progress and the real costs. So, you want to compare both to see the progress. Getting ready To run this recipe, you will need two Excel files, one for the budget and another with the real costs. The budget.xls has the estimated starting date, estimated end date, and cost for the planned tasks. The costs.xls has the real starting date, end date, and cost for tasks that have already started. You can download the sample files from here. How to do it... Carry out the following steps: Create a new transformation. Drop two Excel input steps into the canvas. Use one step for reading the budget information (budget.xls file) and the other for reading the costs information (costs.xls file). Under the Fields tab of these steps, click on the Get fields from header row... button in order to populate the grid automatically. Apply the format dd/MM/yyyy to the fields of type Date and $0.00 to the fields with costs. Add a Merge Join step from the Join category, and create a hop from each Excel input step toward this step. The following diagram depicts what you have so far: Configure the Merge Join step, as shown in the following screenshot: If you do a preview on this step, you will obtain the result of the two Excel files merged. In order to have the columns more organized, add a Select values step from the Transform category. In this new step, select the fields in this order: task, starting date (est.), starting date, end date (est.), end date, cost (est.), cost. Doing a preview on the last step, you will obtain the merged data with the columns of both Excel files interspersed, as shown in the following screenshot: How it works... In the example, you saw how to use the Merge Join step to join data coming from two Excel files. You can use this step to join any other kind of input. In the Merge Join step, you set the name of the incoming steps, and the fields to use as the keys for joining them. In the recipe, you joined the streams by just a single field: the task field. The rows are expected to be sorted in an ascending manner on the specified key fields. There's more... In the example, you set the Join Type to LEFT OUTER JOIN. Let's see explanations of the possible join options: Interspersing new rows between existent rows In most Kettle datasets, all rows share a common meaning; they represent the same kind of entity, for example: In a dataset with sold items, each row has data about one item In a dataset with the mean temperature for a range of days in five different regions, each row has the mean temperature for a different day in one of those regions In a dataset with a list of people ordered by age range (0-10, 11-20, 20-40, and so on), each row has data about one person Sometimes, there is a need of interspersing new rows between your current rows. Taking the previous examples, imagine the following situations: In the sold items dataset, every 10 items, you have to insert a row with the running quantity of items and running sold price from the first line until that line. In the temperature's dataset, you have to order the data by region and the last row for each region has to have the average temperature for that region. In the people's dataset, for each age range, you have to insert a header row just before the rows of people in that range. In general, the rows you need to intersperse can have fixed data, subtotals of the numbers in previous rows, header to the rows coming next, and so on. What they have in common is that they have a different structure or meaning compared to the rows in your dataset. Interspersing these rows is not a complicated task, but is a tricky one. In this recipe, you will learn how to do it. Suppose that you have to create a list of products by category. For each category, you have to insert a header row with the category description and the number of products inside that category. The final result should be as follows: Getting ready This recipe uses an outdoor database with the structure shown in Appendix, Data Structures (Download here). As source, you can use a database like this or any other source, for example a text file with the same structure. How to do it... Carry out the following steps: Create a transformation, drag into the canvas a Table Input step, select the connection to the outdoor database, or create it if it doesn't exist. Then enter the following statement: SELECT category , desc_product FROM products p ,categories c WHERE p.id_category = c.id_category ORDER by category Do a preview of this step. You already have the product list! Now, you have to create and intersperse the header rows. In order to create the headers, do the following: From the Statistics category, add a Group by step and fill in the grids, as shown in the following screenshot: From the Scripting category, add a User Defined Java Expression step, and use it to add two fields: The first will be a String named desc_product, with value ("Category: " + category).toUpperCase(). The second will be an Integer field named order with value 1. Use a Select values step to reorder the fields as category, desc_product, qty_product, and order. Do a preview on this step; you should see the following result: Those are the headers. The next step is mixing all the rows in the proper order. Drag an Add constants step into the canvas and a Sort rows step. Link them to the other steps as shown: Use the Add constants to add two Integer fields: qty_prod and order. As Value, leave the first field empty, and type 2 for the second field. Use the Sort rows step for sorting by category, order, and desc_product. Select the last step and do a preview. You should see the rows exactly as shown in the introduction. How it works... When you have to intersperse rows between existing rows, there are just four main tasks to do, as follows: Create a secondary stream that will be used for creating new rows. In this case, the rows with the headers of the categories. In each stream, add a field that will help you intersperse rows in the proper order. In this case, the key field was named order. Before joining the two streams, add, remove, and reorder the fields in each stream to make sure that the output fields in each stream have the same metadata. Join the streams and sort by the fields that you consider appropriate, including the field created earlier. In this case, you sorted by category, inside each category by the field named order and finally by the products description. Note that in this case, you created a single secondary stream. You could create more if needed, for example, if you need a header and footer for each category.
Read more
  • 0
  • 0
  • 8270

article-image-how-to-stream-and-store-tweets-in-apache-kafka
Fatema Patrawala
22 Dec 2017
8 min read
Save for later

How to stream and store tweets in Apache Kafka

Fatema Patrawala
22 Dec 2017
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book authored by Ankit Jain titled Mastering Apache Storm. This book explores various real-time processing functionalities offered by Apache Storm such as parallelism, data partitioning, and more.[/box] Today, we are going to cover how to stream tweets from Twitter using the twitter streaming API. We are also going to explore how we can store fetched tweets in Kafka for later processing through Storm. Setting up a single node Kafka cluster Following are the steps to set up a single node Kafka cluster:   Download the Kafka 0.9.x binary distribution named kafka_2.10-0.9.0.1.tar.gz from http://apache.claz.org/kafka/0.9.0. or 1/kafka_2.10-0.9.0.1.tgz. Extract the archive to wherever you want to install Kafka with the following command: tar -xvzf kafka_2.10-0.9.0.1.tgz cd kafka_2.10-0.9.0.1   Change the following properties in the $KAFKA_HOME/config/server.properties file: log.dirs=/var/kafka- logszookeeper.connect=zoo1:2181,zoo2:2181,zoo3:2181 Here, zoo1, zoo2, and zoo3 represent the hostnames of the ZooKeeper nodes. The following are the definitions of the important properties in the server.properties file: broker.id: This is a unique integer ID for each of the brokers in a Kafka cluster. port: This is the port number for a Kafka broker. Its default value is 9092. If you want to run multiple brokers on a single machine, give a unique port to each broker. host.name: The hostname to which the broker should bind and advertise itself. log.dirs: The name of this property is a bit unfortunate as it represents not the log directory for Kafka, but the directory where Kafka stores the actual data sent to it. This can take a single directory or a comma-separated list of directories to store data. Kafka throughput can be increased by attaching multiple physical disks to the broker node and specifying multiple data directories, each lying on a different disk. It is not much use specifying multiple directories on the same physical disk, as all the I/O will still be happening on the same disk. num.partitions: This represents the default number of partitions for newly created topics. This property can be overridden when creating new topics. A greater number of partitions results in greater parallelism at the cost of a larger number of files. log.retention.hours: Kafka does not delete messages immediately after consumers consume them. It retains them for the number of hours defined by this property so that in the event of any issues the consumers can replay the messages from Kafka. The default value is 168 hours, which is 1 week. zookeeper.connect: This is the comma-separated list of ZooKeeper nodes in hostname:port form.    Start the Kafka server by running the following command: > ./bin/kafka-server-start.sh config/server.properties [2017-04-23 17:44:36,667] INFO New leader is 0 (kafka.server.ZookeeperLeaderElector$LeaderChangeListener) [2017-04-23 17:44:36,668] INFO Kafka version : 0.9.0.1 (org.apache.kafka.common.utils.AppInfoParser) [2017-04-23 17:44:36,668] INFO Kafka commitId : a7a17cdec9eaa6c5 (org.apache.kafka.common.utils.AppInfoParser) [2017-04-23 17:44:36,670] INFO [Kafka Server 0], started (kafka.server.KafkaServer) If you get something similar to the preceding three lines on your console, then your Kafka broker is up-and-running and we can proceed to test it. Now we will verify that the Kafka broker is set up correctly by sending and receiving some test messages. First, let's create a verification topic for testing by executing the following command: > bin/kafka-topics.sh --zookeeper zoo1:2181 --replication-factor 1 --partition 1 --topic verification-topic --create Created topic "verification-topic".    Now let's verify if the topic creation was successful by listing all the topics: > bin/kafka-topics.sh --zookeeper zoo1:2181 --list verification-topic    The topic is created; let's produce some sample messages for the Kafka cluster. Kafka comes with a command-line producer that we can use to produce messages: > bin/kafka-console-producer.sh --broker-list localhost:9092 -- topic verification-topic    Write the following messages on your console: Message 1 Test Message 2 Message 3 Let's consume these messages by starting a new console consumer on a new console window: > bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic verification-topic --from-beginning Message 1 Test Message 2 Message 3 Now, if we enter any message on the producer console, it will automatically be consumed by this consumer and displayed on the command line. Collecting Tweets We are assuming you already have a twitter account, and that the consumer key and access token are generated for your application. You can refer to: https://bdthemes.com/support/knowledge-base/generate-api-key-consumer-token-acc ess-key-twitter-oauth/ to generate a consumer key and access token. Take the following steps: Create a new maven project with groupId, com.storm advance and artifactId, kafka_producer_twitter. Add the following dependencies to the pom.xml file. We are adding the Kafka and Twitter streaming Maven dependencies to pom.xml to support the Kafka Producer and the streaming tweets from Twitter: <dependencies> <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka_2.10</artifactId> <version>0.9.0.1</version> <exclusions> <exclusion> <groupId>com.sun.jdmk</groupId> <artifactId>jmxtools</artifactId> </exclusion> <exclusion> <groupId>com.sun.jmx</groupId> <artifactId>jmxri</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-slf4j-impl</artifactId> <version>2.0-beta9</version> </dependency> <dependency> <groupId>org.apache.logging.log4j</groupId> <artifactId>log4j-1.2-api</artifactId> <version>2.0-beta9</version> </dependency> <!-- https://mvnrepository.com/artifact/org.twitter4j/twitter4j-stream --> <dependency> <groupId>org.twitter4j</groupId> <artifactId>twitter4j-stream</artifactId> <version>4.0.6</version> </dependency> </dependencies> 3. Now, we need to create a class, TwitterData, that contains the code to consume/stream data from Twitter and publish it to the Kafka cluster. We are assuming you already have a running Kafka cluster and topic, twitterData, created in the Kafka cluster. for information on the installation of the Kafka cluster and the creation of a Kafka please refer to . The class contains an instance of the twitter4j.conf.ConfigurationBuilder class; we need to set the access token and consumer keys in configuration, as mentioned in the source code.4. The twitter4j.StatusListener class returns the continuous stream of tweets inside the onStatus() method. We are using the Kafka Producer code inside the onStatus() method to publish the tweets in Kafka. The following is the source code for the TwitterData class: public class TwitterData { /** The actual Twitter stream. It's set up to collect raw JSON data */ private TwitterStream twitterStream; static String consumerKeyStr = "r1wFskT3q"; static String consumerSecretStr = "fBbmp71HKbqalpizIwwwkBpKC"; static String accessTokenStr = "298FPfE16frABXMcRIn7aUSSnNneMEPrUuZ"; static String accessTokenSecretStr = "1LMNZZIfrAimpD004QilV1pH3PYTvM"; public void start() { ConfigurationBuilder cb = new ConfigurationBuilder(); cb.setOAuthConsumerKey(consumerKeyStr); cb.setOAuthConsumerSecret(consumerSecretStr); cb.setOAuthAccessToken(accessTokenStr); cb.setOAuthAccessTokenSecret(accessTokenSecretStr); cb.setJSONStoreEnabled(true); cb.setIncludeEntitiesEnabled(true); // instance of TwitterStreamFactory twitterStream = new TwitterStreamFactory(cb.build()).getInstance(); final Producer<String, String> producer = new KafkaProducer<String, String>(getProducerConfig());// topicDetails CreateTopic("127.0.0.1:2181").createTopic("twitterData", 2, 1); /** Twitter listener **/ StatusListener listener = new StatusListener() { public void onStatus(Status status) { ProducerRecord<String, String> data = new ProducerRecord<String, String>("twitterData", DataObjectFactory.getRawJSON(status)); // send the data to kafka producer.send(data); } public void onException(Exception arg0) { System.out.println(arg0); } arg0) {  }; public void onDeletionNotice(StatusDeletionNotice } public void onScrubGeo(long arg0, long arg1) { } public void onStallWarning(StallWarning arg0) { } public void onTrackLimitationNotice(int arg0) { } /** Bind the listener **/ twitterStream.addListener(listener); /** GOGOGO **/ twitterStream.sample(); } private Properties getProducerConfig() { Properties props = new Properties(); // List of kafka borkers. Complete list of brokers is not required as // the producer will auto discover the rest of the brokers. props.put("bootstrap.servers", "localhost:9092"); props.put("batch.size", 1); // new sending // Serializer used for sending data to kafka. Since we are // string, // we are using StringSerializer. props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer"); props.put("producer.type", "sync"); return props; } public static void main(String[] args) throws InterruptedException { new TwitterData().start(); } Use valid Kafka properties before executing the TwitterData. After executing the preceding class, the user will have a real-time stream of Twitter tweets in Kafka. In the next section, we are going to cover how we can use Storm to calculate the sentiments of the collected tweets. To summarize we covered how to install a single node Apache Kafka cluster and how to collect tweet from Twitter to store in a Kafka cluster If you enjoyed this post, check out the book Mastering Apache Storm to know more about different types of real time processing techniques used to create distributed applications.
Read more
  • 0
  • 0
  • 8247
article-image-theming-highcharts
Packt
30 Oct 2014
10 min read
Save for later

Theming with Highcharts

Packt
30 Oct 2014
10 min read
Besides the charting capabilities offered by Highcharts, theming is yet another strong feature of Highcharts. With its extensive theming API, charts can be customized completely to match the branding of a website or an app. Almost all of the chart elements are customizable through this API. In this article by Bilal Shahid, author of Highcharts Essentials, we will do the following things: (For more resources related to this topic, see here.) Use different fill types and fonts Create a global theme for our charts Use jQuery easing for animations Using Google Fonts with Highcharts Google provides an easy way to include hundreds of high quality web fonts to web pages. These fonts work in all major browsers and are served by Google CDN for lightning fast delivery. These fonts can also be used with Highcharts to further polish the appearance of our charts. This section assumes that you know the basics of using Google Web Fonts. If you are not familiar with them, visit https://developers.google.com/fonts/docs/getting_started. We will style the following example with Google Fonts. We will use the Merriweather family from Google Fonts and link to its style sheet from our web page inside the <head> tag: <link href='http://fonts.googleapis.com/css?family=Merriweather:400italic,700italic' rel='stylesheet' type='text/css'> Having included the style sheet, we can actually use the font family in our code for the labels in yAxis: yAxis: [{ ... labels: {    style: {      fontFamily: 'Merriweather, sans-serif',      fontWeight: 400,      fontStyle: 'italic',      fontSize: '14px',      color: '#ffffff'    } } }, { ... labels: {    style: {      fontFamily: 'Merriweather, sans-serif',      fontWeight: 700,      fontStyle: 'italic',      fontSize: '21px',      color: '#ffffff'    },    ... } }] For the outer axis, we used a font size of 21px with font weight of 700. For the inner axis, we lowered the font size to 14px and used font weight of 400 to compensate for the smaller font size. The following is the modified speedometer: In the next section, we will continue with the same example to include jQuery UI easing in chart animations. Using jQuery UI easing for series animation Animations occurring at the point of initialization of charts can be disabled or customized. The customization requires modifying two properties: animation.duration and animation.easing. The duration property accepts the number of milliseconds for the duration of the animation. The easing property can have various values depending on the framework currently being used. For a standalone jQuery framework, the values can be either linear or swing. Using the jQuery UI framework adds a couple of more options for the easing property to choose from. In order to follow this example, you must include the jQuery UI framework to the page. You can also grab the standalone easing plugin from http://gsgd.co.uk/sandbox/jquery/easing/ and include it inside your <head> tag. We can now modify the series to have a modified animation: plotOptions: { ... series: {    animation: {      duration: 1000,      easing: 'easeOutBounce'    } } } The preceding code will modify the animation property for all the series in the chart to have duration set to 1000 milliseconds and easing to easeOutBounce. Each series can have its own different animation by defining the animation property separately for each series as follows: series: [{ ... animation: {    duration: 500,    easing: 'easeOutBounce' } }, { ... animation: {    duration: 1500,    easing: 'easeOutBounce' } }, { ... animation: {      duration: 2500,    easing: 'easeOutBounce' } }] Different animation properties for different series can pair nicely with column and bar charts to produce visually appealing effects. Creating a global theme for our charts A Highcharts theme is a collection of predefined styles that are applied before a chart is instantiated. A theme will be applied to all the charts on the page after the point of its inclusion, given that the styling options have not been modified within the chart instantiation. This provides us with an easy way to apply custom branding to charts without the need to define styles over and over again. In the following example, we will create a basic global theme for our charts. This way, we will get familiar with the fundamentals of Highcharts theming and some API methods. We will define our theme inside a separate JavaScript file to make the code reusable and keep things clean. Our theme will be contained in an options object that will, in turn, contain styling for different Highcharts components. Consider the following code placed in a file named custom-theme.js. This is a basic implementation of a Highcharts custom theme that includes colors and basic font styles along with some other modifications for axes: Highcharts.customTheme = {      colors: ['#1BA6A6', '#12734F', '#F2E85C', '#F27329', '#D95D30', '#2C3949', '#3E7C9B', '#9578BE'],      chart: {        backgroundColor: {            radialGradient: {cx: 0, cy: 1, r: 1},            stops: [                [0, '#ffffff'],                [1, '#f2f2ff']            ]        },        style: {            fontFamily: 'arial, sans-serif',            color: '#333'        }    },    title: {        style: {            color: '#222',            fontSize: '21px',            fontWeight: 'bold'        }    },    subtitle: {        style: {            fontSize: '16px',            fontWeight: 'bold'        }    },    xAxis: {        lineWidth: 1,        lineColor: '#cccccc',        tickWidth: 1,        tickColor: '#cccccc',        labels: {            style: {                fontSize: '12px'            }        }    },    yAxis: {        gridLineWidth: 1,        gridLineColor: '#d9d9d9',        labels: {           style: {                fontSize: '12px'            }        }    },    legend: {        itemStyle: {            color: '#666',            fontSize: '9px'        },        itemHoverStyle:{            color: '#222'        }      } }; Highcharts.setOptions( Highcharts.customTheme ); We start off by modifying the Highcharts object to include an object literal named customTheme that contains styles for our charts. Inside customTheme, the first option we defined is for series colors. We passed an array containing eight colors to be applied to series. In the next part, we defined a radial gradient as a background for our charts and also defined the default font family and text color. The next two object literals contain basic font styles for the title and subtitle components. Then comes the styles for the x and y axes. For the xAxis, we define lineColor and tickColor to be #cccccc with the lineWidth value of 1. The xAxis component also contains the font style for its labels. The y axis gridlines appear parallel to the x axis that we have modified to have the width and color at 1 and #d9d9d9 respectively. Inside the legend component, we defined styles for the normal and mouse hover states. These two states are stated by itemStyle and itemHoverStyle respectively. In normal state, the legend will have a color of #666 and font size of 9px. When hovered over, the color will change to #222. In the final part, we set our theme as the default Highcharts theme by using an API method Highcharts.setOptions(), which takes a settings object to be applied to Highcharts; in our case, it is customTheme. The styles that have not been defined in our custom theme will remain the same as the default theme. This allows us to partially customize a predefined theme by introducing another theme containing different styles. In order to make this theme work, include the file custom-theme.js after the highcharts.js file: <script src="js/highcharts.js"></script> <script src="js/custom-theme.js"></script> The output of our custom theme is as follows: We can also tell our theme to include a web font from Google without having the need to include the style sheet manually in the header, as we did in a previous section. For that purpose, Highcharts provides a utility method named Highcharts.createElement(). We can use it as follows by placing the code inside the custom-theme.js file: Highcharts.createElement( 'link', {    href: 'http://fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,700italic,400,300,700',    rel: 'stylesheet',    type: 'text/css' }, null, document.getElementsByTagName( 'head' )[0], null ); The first argument is the name of the tag to be created. The second argument takes an object as tag attributes. The third argument is for CSS styles to be applied to this element. Since, there is no need for CSS styles on a link element, we passed null as its value. The final two arguments are for the parent node and padding, respectively. We can now change the default font family for our charts to 'Open Sans': chart: {    ...    style: {        fontFamily: "'Open Sans', sans-serif",        ...    } } The specified Google web font will now be loaded every time a chart with our custom theme is initialized, hence eliminating the need to manually insert the required font style sheet inside the <head> tag. This screenshot shows a chart with 'Open Sans' Google web font. Summary In this article, you learned about incorporating Google fonts and jQuery UI easing into our chart for enhanced styling. Resources for Article: Further resources on this subject: Integrating with other Frameworks [Article] Highcharts [Article] More Line Charts, Area Charts, and Scatter Plots [Article]
Read more
  • 0
  • 0
  • 8224

article-image-microsoft-sql-server-2008-high-availability-understanding-domains-users-and-security
Packt
21 Jan 2011
14 min read
Save for later

Microsoft SQL Server 2008 High Availability: Understanding Domains, Users, and Security

Packt
21 Jan 2011
14 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study  Windows domains and domain users In the early era of Windows, operating system user were created standalone until Windows NT operating system hit the market. Windows NT, that is, Windows New Technology introduced some great feature to the world—including domains. A domain is a group of computers that run on Windows operating systems. Amongst them is a computer that holds all the information related to user authentication and user database and is called the domain controller (server), whereas every user who is part of this user database on the domain controller is called a domain user. Domain users have access to any resource across the domain and its subdomains with the privilege they have, unlike the standalone user who has access to the resources available to a specific system. With the release of Windows Server 2000, Microsoft released Active Directory (AD), which is now widely used with Windows operating system networks to store, authenticate, and control users who are part of the domain. A Windows domain uses various modes to authenticate users—encrypted passwords, various handshake methods such as PKI, Kerberos, EAP, SSL certificates, NAP, LDAP, and IP Sec policy—and makes it robust authentication. One can choose the authentication method that suits business needs and based on the environment. Let's now see various authentication methods in detail. Public Key Infrastructure (PKI): This is the most common method used to transmit data over insecure channels such as the Internet using digital certificates. It has generally two parts—the public and private keys. These keys are generated by a Certificate Authority, such as, Thawte. Public keys are stored in a directory where they are accessible by all parties. The public key is used by the message sender to send encrypted messages, which then can be decrypted using the private key. Kerberos: This is an authentication method used in client server architecture to authorize the client to use service(s) on a server in a network. In this method, when a client sends a request to use a service to a server, a request goes to the authentication server, which will generate a session key and a random value based on the username. This session key and a random value are then passed to the server, which grants or rejects the request. These sessions are for certain time period, which means for that particular amount of time the client can use the service without having to re-authenticate itself. Extensible Authentication Protocol (EAP): This is an authentication protocol generally used in wireless and point-to-point connections. SSL Certificates: A Secure Socket Layer certificate (SSL) is a digital certificate that is used to identify a website or server that provides a service to clients and sends the data in an encrypted form. SSL certificates are typically used by websites such as GMAIL. When we type a URL and press Enter, the web browser sends a request to the web server to identify itself. The web server then sends a copy of its SSL certificate, which is checked by the browser. If the browser trusts the certificate (this is generally done on the basis of the CA and Registration Authority and directory verification), it will send a message back to the server and in reply the web server sends an acknowledgement to the browser to start an encrypted session. Network Access Protection (NAP): This is a new platform introduced by Microsoft with the release of Windows Server 2008. It will provide access to the client, based on the identity of the client, the group it belongs to, and the level compliance it has with the policy defined by the Network Administrators. If the client doesn't have a required compliance level, NAP has mechanisms to bring the client to the compliance level dynamically and allow it to access the network. Lightweight Directory Access Protocol (LDAP): This is a protocol that runs over TCP/IP directly. It is a set of objects, that is, organizational units, printers, groups, and so on. When the client sends a request for a service, it queries the LDAP server to search for availability of information, and based on that information and level of access, it will provide access to the client. IP Security (IPSEC): IP Security is a set of protocols that provides security at the network layer. IP Sec provides two choices: Authentication Header: Here it encapsulates the authentication of the sender in a header of the network packet. Encapsulating Security Payload: Here it supports encryption of both the header and data. Now that we know basic information on Windows domains, domain users, and various authentication methods used with Windows servers, I will walk you through some of the basic and preliminary stuff about SQL Server security! Understanding SQL Server Security Security!! Now-a-days we store various kinds of information into databases and we just want to be sure that they are secured. Security is the most important word to the IT administrator and vital for everybody who has stored their information in a database as he/she needs to make sure that not a single piece of data should be made available to someone who shouldn't have access. Because all the information stored in the databases is vital, everyone wants to prevent unauthorized access to highly confidential data and here is how security implementation in SQL Server comes into the picture. With the release of SQL Server 2000, Microsoft (MS) has introduced some great security features such as authentication roles (fixed server roles and fixed database roles), application roles, various permissions levels, forcing protocol encryption, and so on, which are widely used by administrators to tighten SQL Server security. Basically, SQL Server security has two paradigms: one is SQL Server's own set of security measures and other is to integrate them with the domain. SQL Server has two methods of authentication. Windows authentication In Windows authentication mode, we can integrate domain accounts to authenticate users, and based on the group they are members of and level of access they have, DBAs can provide them access on the particular SQL server box. Whenever a user tries to access the SQL Server, his/her account is validated by the domain controller first, and then based on the permission it has, the domain controller allows or rejects the request; here it won't require separate login ID and password to authenticate. Once the user is authenticated, SQL server will allow access to the user based on the permission it has. These permissions are in form of Roles including Server, fixed DB Roles, and Application roles. Fixed Server Roles: These are security principals that have server-wide scope. Basically, fixed server roles are expected to manage the permissions at server level. We can add SQL logins, domain accounts, and domain groups to these roles. There are different roles that we can assign to a login, domain account, or group—the following table lists them. Fixed DB Roles: These are the roles that are assigned to a particular login for the particular database; its scope is limited to the database it has permission to. There are various fixed database roles, including the ones shown in the following table: Application Role: The Application role is a database principal that is widely used to assign user permissions for an application. For example, in a home-grown ERP, some users require only to view the data; we can create a role and add a db_datareader permission to it and then can add all those users who require read-only permission. Mixed Authentication: In the Mixed authentication mode, logins can be authenticated by the Windows domain controller or by SQL Server itself. DBAs can create logins with passwords in SQL Server. With the release of SQL Server 2005, MS has introduced password policies for SQL Server logins. Mixed mode authentication is used when one has to run a legacy application and it is not on the domain network. In my opinion, Windows authentication is good because we can use various handshake methods such as PKI, Kerberos, EAP, SSL NAP, LDAP, or IPSEC to tighten the security. SQL Server 2005 has enhancements in its security mechanisms. The most important features amongst them are password policy, native encryption, separation of users and schema, and no need to provide system administrator (SA) rights to run profiler. These are good things because SA is the super user, and with the power this account has, a user can do anything on the SQL box, including: The user can grant ALTER TRACE permission to users who require to run profiler The user can create login and users The user can grant or revoke permission to other users A schema is an object container that is owned by a user and is transferable to any other users. In earlier versions, objects are owned by users, so if the user leaves the company we cannot delete his/her account from the SQL box just because there is some object he/she has created. We first have to change the owner of that object and then we can delete that account. On the other hand, in the case of a schema, we could have dropped the user account because the object is owned by the schema. Now, SQL Server 2008 will give you more control over the configuration of security mechanisms. It allows you to configure metadata access, execution context, and auditing events using DDL triggers—the most powerful feature to audit any DDL event. If one wishes to know more about what we have seen till now, he/she can go through the following links: http://www.microsoft.com/sqlserver/2008/en/us/Security.aspx http://www.microsoft.com/sqlserver/2005/en/us/security-features.aspx http://technet.microsoft.com/en-us/library/cc966507.aspx In this section, we understood the basics of domains, domain users, and SQL Server security. We also learned why security is given so much emphasize these days. In the next section, we will understand the basics of clustering and its components. What is clustering? Before starting with SQL Server clustering, let's have a look at clustering in general and Windows clusters. The word Cluster itself is self-descriptive—a bunch or group. When two or more than two computers are connected to each other by means of a network and share some of the common resources to provide redundancy or performance improvement, they are known as a cluster of computers. Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are business critical available 24 X 7. MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two identical nodes connected to each other by means of private network or commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). SQL Server Clustering is built on top of Windows Clustering, which means before we go about installing SQL Server clustering, we should have Windows clustering installed. Before we start, let's understand the commonly used shared resources for the cluster server. Clusters with 2, 4, 8, 12 or 32 nodes can be built Windows Server 2008 R2. Clusters are categorized in the following manner: High-Availability Clusters: This type of cluster is known as a Failover cluster. High Availability clusters are implemented when the purpose is to provide highly available services. For implementing a failover or high availability cluster one may have up to 16 nodes in a Microsoft Cluster. Clustering in Windows operating systems was first introduced with the release of Windows NT 4.0 Enterprise Edition, and was enhanced gradually. Even though we can have non-identical hardware, we should use identical hardware. This is because if the node to which cluster fails over has lower configuration, then we might face degradation in performance. Load Balancing: This is the second form of cluster that can be configured. This type of cluster can be configured by linking multiple computers with each other and making use of each resource they need for operation. From the user's point of view, all of these servers/nodes linked to each other are different. However, it is collectively and virtually a single system, with the main goal being to balance work by sharing CPU, disk, and every possible resource among the linked nodes and that is why it is known as a Load Balancing cluster. SQL Server doesn't support this form of clustering. Compute Clusters: When computers are linked together with the purpose of using them for simulation for aircraft, they are known as a compute cluster. A well-known example is Beowulf computers. Grid Computing: This is one kind of clustering solution, but it is more often used when there is a dispersed location. This kind of cluster is called a Supercomputer or HPC. The main application is scientific research, academic, mathematical, or weather forecasting where lots of CPUs and disks are required—SETI@home is a well-known example. If we talk about SQL Server clusters, there are some cool new features that are added in the latest release of SQL Server 2008, although with the limitation that these features are available only if SQL Server 2008 is used with Windows Server 2008. So, let's have a glance at these features: Service SID: Service SIDs were introduced with Windows Vista and Windows Server 2008. They enable us to bind permissions directly to Windows services. In the earlier version of SQL Server 2005, we need to have a SQL Server Services account that is a member of a domain group so that it can have all the required permissions. This is not the case with SQL Server 2008 and we may choose Service SIDs to bypass the need to provision domain groups. Support for 16 nodes: We may add up to 16 nodes in our SQL Server 2008 cluster with SQL Server 2008 Enterprise 64-bit edition. New cluster validation: As a part of the installation steps, a new cluster validation step is implemented. Prior to adding a node into an existing cluster, this validation step checks whether or not the cluster environment is compatible. Mount Drives: Drive letters are no longer essential. If we have a cluster configured that has limited drive letters available, we may mount a drive and use it in our cluster environment, provided it is associated with the base drive letter. Geographically dispersed cluster node: This is the super-cool feature introduced with SQL Server 2008, which uses VLAN to establish connectivity with other cluster nodes. It breaks the limitation of having all clustered nodes at a single location. IPv6 Support: SQL Server 2008 natively supports IPv6, which increases network IP address size to 128 bit from 32 bit. DHCP Support: Windows server 2008 clustering introduced the use of DHCP-assigned IP addresses by the cluster services and it is supported by SQL Server 2008 clusters. It is recommended to use static IP addresses. This is because if some of our application depends on IP addresses, or in case of failure of renewing IP address from DHCP server, there would be a failure of IP address resources. iSCSI Support: Windows server 2008 supports iSCSI to be used as storage connection; in earlier versions, only SAN and fibre channels were supported.
Read more
  • 0
  • 0
  • 8215

article-image-plsql-using-collections
Packt
17 May 2012
18 min read
Save for later

PL/SQL: Using Collections

Packt
17 May 2012
18 min read
Collections—an overview A collection is a homogeneous single dimensional structure, which constitutes an ordered set of elements of a similar type. Being a homogeneous structure, all elements are of the same data type. The structure of the element contains cells with a subscript. The elements reside in these cells to make the index as their location information. The subscript or cell index becomes identification of an element and is used for its access. Structure of a collection type, SPORT, is shown in the following diagram. Note the subscript and elements into it. A new element, GOLF, enters at the last empty location and is represented as SPORT [6]: A collection element can be of any valid SQL data type or a user-defined type. An element of the SQL primitive data type is a scalar value while an element of the user-defined type is an object type instance. A collection can be used within a PL/SQL program by declaring a PL/SQL variable of collection type. The local PL/SQL variable can hold the instances of its collection type. Besides, a database column in a table can also be of the schema collection type. The collections in Oracle are strictly one dimensional. They cannot be realized on two-dimensional coordinates. However, multidimensional arrays can be realized when the collection has an object type or collection type attribute. A collection can be bounded or unbounded. Bounded collections can accommodate a limited number of elements while unbounded collections have no upper limit for subscripts. Collections provide an efficient way to organize the data in an array or set format while making the use of object-oriented features. An instance of a nested table or varray collection type is accessed as an object while the data is still stored in database columns. Collections can be used to avail data caching in programs and boost up the performance of SQL operations. On dedicated server connections, a session always uses User Global Area (UGA), a component of PGA, for collection operations. On the other hand, for shared server mode, the collection operations are still carried out in UGA; but UGA is now a part of System Global Area (SGA), thus indirectly in SGA. This is because in shared server connections, multiple server processes can affect a session, thus UGA must be allocated out of the SGA. Categorization Collections are of two types—persistent and non-persistent. A collection is persistent if it stores the collection structure and elements physically in the database. Contrarily, a non-persistent collection is active for a program only that is, maximum up to a session. Apart from the preceding categories, a collection can be realized in three formats namely, associative array, nested table or varray. This categorization is purely based on their objective and behavioral properties in a PL/SQL program. The following diagram combines the abstract and physical classification of collections: We will take a quick tour of these collection types now and discuss them in detail in the coming sections: Associative array (index-by table): This is the simplest form of non- persistent unbounded collections. As a non-persistent collection, it cannot be stored in the database, but they are available within a PL/SQL block only. The collection structure and data of associative array cannot be retained once the program is completed. Initially, during the days of Oracle 7, it was known as PL/SQL tables. Later, Oracle 8 version released it as index-by tables as they used an index to identify an element. Nested table: This is a persistent form of unbounded collections which can be created in the database as well as in PL/SQL block. Varray (variable-size array): This is a persistent but bounded form of collection which can be created in the database as well as in PL/SQL. Similar to a nested table, a varray is also a unidimensional homogeneous collection. The collection size and storage scheme are the factors which differentiate varrays from nested tables. Unlike a nested table, a varray can accommodate only a defined (fixed) number of elements. Selecting an appropriate collection type Here are a few guidelines to decide upon the appropriate usage of collection types in programs: Use of associative arrays is required when: You have to temporarily cache the program data in an array format for lookup purpose. You need string subscripts for the collection elements. Note that it supports negative subscripts, too. Map hash tables from the client to the database. Use of nested tables is preferred when: You have to stores data as sets in the database. Database columns of nested table type can be declared to hold the data persistently. Perform major array operations such as insertion and deletion, on a large volume of data. Use of varrays is preferred when: You have to store calculated or predefined volume of data in the database. Varray offers limited and defined storage of rows in a collection. Order of the elements has to be preserved. Associative arrays Associative arrays are analogous to conventional arrays or lists which can be defined within a PL/SQL program only. Neither the array structure nor the data can be stored in the database. It can hold the elements of a similar type in a key-value structure without any upper bound to the array. Each cell of the array is distinguished by its subscript, index, or cell number. The index can be a number or a string. Associative arrays were first introduced in Oracle 7 release as PL/SQL tables to signify its usage within the scope of a PL/SQL block. Oracle 8 release identified the PL/SQL table as Index by table due to its structure as an index-value pair. Oracle 10g release recognized the behavior of index by tables as arrays so as to rename it as associative arrays due to association of an index with an array. The following diagram explains the physical lookup structure of an associative array: Associative arrays follow the following syntax for declaration in a PL/SQL declare block: TYPE [COLL NAME] IS TABLE OF [ELEMENT DATA TYPE] NOT NULL INDEX BY [INDEX DATA TYPE] In the preceding syntax, the index type signifies the data type of the array subscript. RAW, NUMBER, LONG-RAW, ROWID, and CHAR are the unsupported index data types. The suited index types are BINARY_INTEGER, PLS_INTEGER, POSITIVE, NATURAL, SIGNTYPE, or VARCHAR2. The element's data type can be one of the following: PL/SQL scalar data type: NUMBER (along with its subtypes), VARCHAR2 (and its subtypes), DATE, BLOB, CLOB, or BOOLEAN Inferred data: The data type inherited from a table column, cursor expression or predefined package variable User-defined type: A user defined object type or collection type For illustration, the following are the valid conditions of the associative array in a PL/SQL block: /*Array of CLOB data*/ TYPE clob_t IS TABLE OF CLOB INDEX BY PLS_INTEGER; /*Array of employee ids indexed by the employee names*/ TYPE empno_t IS TABLE OF employees.empno%TYPE NOT NULL INDEX BY employees.ename%type; The following PL/SQL program declares an associative array type in a PL/ SQL block. Note that the subscript of the array is of a string type and it stores the number of days in a quarter. This code demonstrates the declaration of an array and assignment of the element in each cell and printing them. Note that the program uses the FIRST and NEXT collection methods to display the array elements. The collection methods would be covered in detail in the PL/SQL collection methods section: /*Enable the SERVEROUTPUT on to display the output*/ SET SERVEROUTPUT ON /*Start the PL/SQL block*/ DECLARE /*Declare a collection type associative array and its variable*/ TYPE string_asc_arr_t IS TABLE OF NUMBER INDEX BY VARCHAR2(10); l_str string_asc_arr_t; l_idx VARCHAR2(50); BEGIN /*Assign the total count of days in each quarter against each cell*/ l_str ('JAN-MAR') := 90; l_str ('APR-JUN') := 91; l_str ('JUL-SEP') := 92; l_str ('OCT-DEC') := 93; l_idx := l_str.FIRST; WHILE (l_idx IS NOT NULL) LOOP DBMS_OUTPUT.PUT_LINE('Value at index '||l_idx||' is '||l_str(l_ idx)); l_idx := l_str.NEXT(l_idx); END LOOP; END; / Value at index APR-JUN is 91 Value at index JAN-MAR is 90 Value at index JUL-SEP is 92 Value at index OCT-DEC is 93 PL/SQL procedure successfully completed. In the preceding block, note the string indexed array. A string indexed array considerably improves the performance by using indexed organization of array values. In the last block, we noticed the explicit assignment of data. In the following program, we will try to populate the array automatically in the program. The following PL/SQL block declares an associative array to hold the ASCII values of number 1 to 100: /*Enable the SERVEROUTPUT on to display the output*/ SET SERVEROUTPUT ON /*Start the PL/SQL Block*/ DECLARE /*Declare an array of string indexed by numeric subscripts*/ TYPE ASCII_VALUE_T IS TABLE OF VARCHAR2(12) INDEX BY PLS_INTEGER; L_GET_ASCII ASCII_VALUE_T; BEGIN /*Insert the values through a FOR loop*/ FOR I IN 1..100 LOOP L_GET_ASCII(I) := ASCII(I); END LOOP; /*Display the values randomly*/ DBMS_OUTPUT.PUT_LINE(L_GET_ASCII(5)); DBMS_OUTPUT.PUT_LINE(L_GET_ASCII(15)); DBMS_OUTPUT.PUT_LINE(L_GET_ASCII(75)); END; / 53 49 55 PL/SQL procedure successfully completed. The salient features of associative arrays are as follows: An associative array can exist as a sparse or empty collection Being a non-persistent collection, it cannot participate in DML transactions It can be passed as arguments to other local subprograms within the same block Sorting of an associative array depends on the NLS_SORT parameter An associative array declared in package specification behaves as a session-persistent array Nested tables Nested tables are a persistent form of collections which can be created in the database as well as PL/SQL. It is an unbounded collection where the index or subscript is implicitly maintained by the Oracle server during data retrieval. Oracle automatically marks the minimum subscript as 1 and relatively handles others. As there is no upper limit defined for a nested table, its size can grow dynamically. Though not an index-value pair structure, a nested table can be accessed like an array in a PL/SQL block. A nested table is initially a dense collection but it might become sparse due to delete operations on the collection cells. Dense collection is the one which is tightly populated. That means, there exists no empty cells between the lower and upper indexes of the collection. Sparse collections can have empty cells between the first and the last cell of the collection. A dense collection may get sparse by performing the "delete" operations. When a nested table is declared in a PL/SQL program, they behave as a one-dimensional array without any index type or upper limit specification. A nested table defined in a database exists as a valid schema object type. It can be either used in a PL/SQL block to declare a PL/SQL variable for temporarily holding program data or a database column of particular nested table type can be included in a table, which can persistently store the data in the database. A nested table type column in a table resembles a table within a table, but Oracle draws an out- of-line storage table to hold the nested table data. This scenario is illustrated in the following diagram: Whenever a database column of nested table type is created in a table (referred to as parent table), Oracle creates a storage table with the same storage options as that of the parent table. The storage table created by Oracle in the same segment carries the name as specified in the NESTED TABLE STORE AS clause during creation of the parent table. Whenever a row is created in the parent table, the following actions are performed by the Oracle server: A unique identifier is generated to distinguish the nested table instances of different parent rows, for the parent row The instance of the nested table is created in the storage table alongside the unique identifier of the parent row The Oracle server takes care of these nested table operations. For the programmer or user, the whole process is hidden and appears as a normal "insert" operation. A nested table definition in PL/SQL follows the following syntax: DECLARE TYPE type_name IS TABLE OF element_type [NOT NULL]; In the preceding syntax, element_type is a primitive data type or a user-defined type, but not as a REF CURSOR type. In a database, a nested table can be defined using the following syntax: CREATE [OR REPLACE] TYPE type_name IS TABLE OF [element_type] [NOT NULL]; / In the preceding syntax, [element_type] can be a SQL supported scalar data type, a database object type, or a REF object type. Unsupported element types are BOOLEAN, LONG, LONG-RAW, NATURAL, NATURALN, POSITIVE, POSITIVEN, REF CURSOR, SIGNTYPE, STRING, PLS_INTEGER, SIMPLE_INTEGER, BINARY_INTEGER and all other non-SQL supported data types. If the size of the element type of a database collection type has to be increased, follow this syntax: ALTER TYPE [type name] MODIFY ELEMENT TYPE [modified element type] [CASCADE | INVALIDATE]; The keywords, CASCADE or INVALIDATE, decide whether the collection modification has to invalidate the dependents or the changes that have to be cascaded across the dependents. The nested table from the database can be dropped using the DROP command, as shown in the following syntax (note that the FORCE keyword drops the type irrespective of its dependents): DROP TYPE [collection name] [FORCE] Nested table collection type as the database object We will go through the following illustration to understand the behavior of a nested table, when created as a database collection type: /*Create the nested table in the database*/ SQL> CREATE TYPE NUM_NEST_T AS TABLE OF NUMBER; / Type created. The nested table type, NUM_NEST_T, is now created in the database. Its metadata information can be queried from the USER_TYPES and USER_COLL_TYPES dictionary views: SELECT type_name, typecode, type_oid FROM USER_TYPES WHERE type_name = 'NUM_NEST_T'; TYPE_NAME TYPECODE TYPE_OID --------------- --------------- -------------------------------- NUM_NEST_T COLLECTION 96DE421E47114638A9F5617CE735731A Note that the TYPECODE value shows the type of the object in the database and differentiates collection types from user-defined object types: SELECT type_name, coll_type, elem_type_name FROM user_coll_types WHERE type_name = 'NUM_NEST_T'; TYPE_NAME COLL_TYPE ELEM_TYPE_NAME --------------- ---------- -------------------- NUM_NEST_T TABLE NUMBER Once the collection type has been successfully created in the database, it can be used to specify the type for a database column in a table. The CREATE TABLE statement in the following code snippet declares a column of the NUM_NEST_T nested table type in the parent table, TAB_USE_NT_COL. The NESTED TABLE [Column] STORE AS [Storage table] clause specifies the storage table for the nested table type column. A separate table for the nested table column, NUM, ensures its out-of-line storage. SQL> CREATE TABLE TAB_USE_NT_COL (ID NUMBER, NUM NUM_NEST_T) NESTED TABLE NUM STORE AS NESTED_NUM_ID; Table created. A nested table collection type in PL/SQL n PL/SQL, a nested table can be declared and defined in the declaration section of the block as a local collection type. As a nested table follows object orientation, the PL/SQL variable of the nested table type has to be necessarily initialized. The Oracle server raises the exception ORA-06531: Reference to uninitialized collection if an uninitialized nested table type variable is encountered during block execution. As the nested table collection type has been declared within the PL/SQL block, its scope, visibility, and life is the execution of the PL/SQL block only. The following PL/SQL block declares a nested table. Observe the scope and visibility of the collection variable. Note that the COUNT method has been used to display the array elements. /*Enable the SERVEROUTPUT to display the results*/ SET SERVEROUTPUT ON /*Start the PL/SQL block*/ DECLARE /*Declare a local nested table collection type*/ TYPE LOC_NUM_NEST_T IS TABLE OF NUMBER; L_LOCAL_NT LOC_NUM_NEST_T := LOC_NUM_NEST_T (10,20,30); BEGIN /*Use FOR loop to parse the array and print the elements*/ FOR I IN 1..L_LOCAL_NT.COUNT LOOP DBMS_OUTPUT.PUT_LINE('Printing '||i||' element: '||L_LOCAL_ NT(I)); END LOOP; END; / Printing 1 element: 10 Printing 2 element: 20 Printing 3 element: 30 PL/SQL procedure successfully completed. Additional features of a nested table In the earlier sections, we saw the operational methodology of a nested table. We will now focus on the nested table's metadata. Furthermore, we will demonstrate a peculiar behavior of the nested table for the "delete" operations. Oracle's USER_NESTED_TABLES and USER_NESTED_TABLE_COLS data dictionary views maintain the relationship information of the parent and the nested tables. These dictionary views are populated only when a database of a nested table collection type is included in a table. The USER_NESTED_TABLES static view maintains the information about the mapping of a nested table collection type with its parent table. The structure of the dictionary view is as follows: SQL> desc USER_NESTED_TABLES Name Null? Type ----------------------- -------- --------------- TABLE_NAME VARCHAR2(30) TABLE_TYPE_OWNER VARCHAR2(30) TABLE_TYPE_NAME VARCHAR2(30) PARENT_TABLE_NAME VARCHAR2(30) PARENT_TABLE_COLUMN VARCHAR2(4000) STORAGE_SPEC VARCHAR2(30) RETURN_TYPE VARCHAR2(20) ELEMENT_SUBSTITUTABLE VARCHAR2(25) Let us query the nested table relationship properties for the TAB_USE_NT_COL table from the preceding view: SELECT parent_table_column, table_name, return_type, storage_spec FROM user_nested_tables WHERE parent_table_name='TAB_USE_NT_COL' / PARENT_TAB TABLE_NAME RETURN_TYPE STORAGE_SPEC ---------------------------------------------------------------------- NUM NESTED_NUM_ID VALUE DEFAULT In the preceding view query, RETURN_TYPE specifies the return type of the collection. It can be VALUE (in this case) or LOCATOR. Another column, STORAGE_SPEC, signifies the storage scheme used for the storage of a nested table which can be either USER_SPECIFIED or DEFAULT (in this case). The USER_NESTED_TABLE_COLS view maintains the information about the collection attributes contained in the nested tables: SQL> desc USER_NESTED_TABLE_COLS Name Null? Type ----------------------- -------- --------------- TABLE_NAME NOT NULL VARCHAR2(30) COLUMN_NAME NOT NULL VARCHAR2(30) DATA_TYPE VARCHAR2(106) DATA_TYPE_MOD VARCHAR2(3) DATA_TYPE_OWNER VARCHAR2(30) DATA_LENGTH NOT NULL NUMBER DATA_PRECISION NUMBER DATA_SCALE NUMBER NULLABLE VARCHAR2(1) COLUMN_ID NUMBER DEFAULT_LENGTH NUMBER DATA_DEFAULT LONG NUM_DISTINCT NUMBER LOW_VALUE RAW(32) HIGH_VALUE RAW(32) DENSITY NUMBER NUM_NULLS NUMBER NUM_BUCKETS NUMBER LAST_ANALYZED DATE SAMPLE_SIZE NUMBER CHARACTER_SET_NAME VARCHAR2(44) CHAR_COL_DECL_LENGTH NUMBER GLOBAL_STATS VARCHAR2(3) USER_STATS VARCHAR2(3) AVG_COL_LEN NUMBER CHAR_LENGTH NUMBER CHAR_USED VARCHAR2(1) V80_FMT_IMAGE VARCHAR2(3) DATA_UPGRADED VARCHAR2(3) HIDDEN_COLUMN VARCHAR2(3) VIRTUAL_COLUMN VARCHAR2(3) SEGMENT_COLUMN_ID NUMBER INTERNAL_COLUMN_ID NOT NULL NUMBER HISTOGRAM VARCHAR2(15) QUALIFIED_COL_NAME VARCHAR2(4000) We will now query the nested storage table in the preceding dictionary view to list all its attributes: SELECT COLUMN_NAME, DATA_TYPE, DATA_LENGTH, HIDDEN_COLUMN FROM user_nested_table_cols where table_name='NESTED_NUM_ID' / COLUMN_NAME DATA_TYP DATA_LENGTH HID ------------------------------ ---------- ----------- ----- NESTED_TABLE_ID RAW 16 YES COLUMN_VALUE NUMBER 22 NO We observe that though the nested table had only number elements, there is two- columned information in the view. The COLUMN_VALUE attribute is the default pseudo column of the nested table as there are no "named" attributes in the collection structure. The other attribute, NESTED_TABLE_ID, is a hidden unique 16-byte system generated raw hash code which latently stores the parent row identifier alongside the nested table instance to distinguish the parent row association. If an element is deleted from the nested table, it is rendered as parse. This implies that once an index is deleted from the collection structure, the collection doesn't restructure itself by shifting the cells in a forward direction. Let us check out the sparse behavior in the following example. The following PL/SQL block declares a local nested table and initializes it with a constructor. We will delete the first element and print it again. The system raises the NO_DATA_FOUND exception when we query the element at the index 1 in the collection:   /*Enable the SERVEROUTPUT to display the block messages*/ SQL> SET SERVEROUTPUT ON /*Start the PL/SQL block*/ SQL> DECLARE /*Declare the local nested table collection*/ TYPE coll_method_demo_t IS TABLE OF NUMBER; /*Declare a collection variable and initialize it*/ L_ARRAY coll_method_demo_t := coll_method_demo_t (10,20,30,40,50); BEGIN /*Display element at index 1*/ DBMS_OUTPUT.PUT_LINE('Element at index 1 before deletion:'||l_ array(1)); /*Delete the 1st element from the collection*/ L_ARRAY.DELETE(1); /*Display element at index 1*/ DBMS_OUTPUT.PUT_LINE('Element at index 1 after deletion:'||l_ array(1)); END; / Element at index 1 before deletion:10 DECLARE * ERROR at line 1: ORA-01403: no data found ORA-06512: at line 15
Read more
  • 0
  • 0
  • 8214
article-image-pricing-double-no-touch-option
Packt
10 Mar 2015
19 min read
Save for later

Pricing the Double-no-touch option

Packt
10 Mar 2015
19 min read
In this article by Balázs Márkus, coauthor of the book Mastering R for Quantitative Finance, you will learn about pricing and life of Double-no-touch (DNT) option. (For more resources related to this topic, see here.) A Double-no-touch (DNT) option is a binary option that pays a fixed amount of cash at expiry. Unfortunately, the fExoticOptions package does not contain a formula for this option at present. We will show two different ways to price DNTs that incorporate two different pricing approaches. In this section, we will call the function dnt1, and for the second approach, we will use dnt2 as the name for the function. Hui (1996) showed how a one-touch double barrier binary option can be priced. In his terminology, "one-touch" means that a single trade is enough to trigger the knock-out event, and "double barrier" binary means that there are two barriers and this is a binary option. We call this DNT as it is commonly used on the FX markets. This is a good example for the fact that many popular exotic options are running under more than one name. In Haug (2007a), the Hui-formula is already translated into the generalized framework. S, r, b, s, and T have the same meaning. K means the payout (dollar amount) while L and U are the lower and upper barriers. Where Implementing the Hui (1996) function to R starts with a big question mark: what should we do with an infinite sum? How high a number should we substitute as infinity? Interestingly, for practical purposes, small number like 5 or 10 could often play the role of infinity rather well. Hui (1996) states that convergence is fast most of the time. We are a bit skeptical about this since a will be used as an exponent. If b is negative and sigma is small enough, the (S/L)a part in the formula could turn out to be a problem. First, we will try with normal parameters and see how quick the convergence is: dnt1 <- function(S, K, U, L, sigma, T, r, b, N = 20, ploterror = FALSE){    if ( L > S | S > U) return(0)    Z <- log(U/L)    alpha <- -1/2*(2*b/sigma^2 - 1)    beta <- -1/4*(2*b/sigma^2 - 1)^2 - 2*r/sigma^2    v <- rep(0, N)    for (i in 1:N)        v[i] <- 2*pi*i*K/(Z^2) * (((S/L)^alpha - (-1)^i*(S/U)^alpha ) /            (alpha^2+(i*pi/Z)^2)) * sin(i*pi/Z*log(S/L)) *              exp(-1/2 * ((i*pi/Z)^2-beta) * sigma^2*T)    if (ploterror) barplot(v, main = "Formula Error");    sum(v) } print(dnt1(100, 10, 120, 80, 0.1, 0.25, 0.05, 0.03, 20, TRUE)) The following screenshot shows the result of the preceding code: The Formula Error chart shows that after the seventh step, additional steps were not influencing the result. This means that for practical purposes, the infinite sum can be quickly estimated by calculating only the first seven steps. This looks like a very quick convergence indeed. However, this could be pure luck or coincidence. What about decreasing the volatility down to 3 percent? We have to set N as 50 to see the convergence: print(dnt1(100, 10, 120, 80, 0.03, 0.25, 0.05, 0.03, 50, TRUE)) The preceding command gives the following output: Not so impressive? 50 steps are still not that bad. What about decreasing the volatility even lower? At 1 percent, the formula with these parameters simply blows up. First, this looks catastrophic; however, the price of a DNT was already 98.75 percent of the payout when we used 3 percent volatility. Logic says that the DNT price should be a monotone-decreasing function of volatility, so we already know that the price of the DNT should be worth at least 98.75 percent if volatility is below 3 percent. Another issue is that if we choose an extreme high U or extreme low L, calculation errors emerge. However, similar to the problem with volatility, common sense helps here too; the price of a DNT should increase if we make U higher or L lower. There is still another trick. Since all the problem comes from the a parameter, we can try setting b as 0, which will make a equal to 0.5. If we also set r to 0, the price of a DNT converges into 100 percent as the volatility drops. Anyway, whenever we substitute an infinite sum by a finite sum, it is always good to know when it will work and when it will not. We made a new code that takes into consideration that convergence is not always quick. The trick is that the function calculates the next step as long as the last step made any significant change. This is still not good for all the parameters as there is no cure for very low volatility, except that we accept the fact that if implied volatilities are below 1 percent, than this is an extreme market situation in which case DNT options should not be priced by this formula: dnt1 <- function(S, K, U, L, sigma, Time, r, b) { if ( L > S | S > U) return(0) Z <- log(U/L) alpha <- -1/2*(2*b/sigma^2 - 1) beta <- -1/4*(2*b/sigma^2 - 1)^2 - 2*r/sigma^2 p <- 0 i <- a <- 1 while (abs(a) > 0.0001){    a <- 2*pi*i*K/(Z^2) * (((S/L)^alpha - (-1)^i*(S/U)^alpha ) /      (alpha^2 + (i *pi / Z)^2) ) * sin(i * pi / Z * log(S/L)) *        exp(-1/2*((i*pi/Z)^2-beta) * sigma^2 * Time)    p <- p + a    i <- i + 1 } p } Now that we have a nice formula, it is possible to draw some DNT-related charts to get more familiar with this option. Later, we will use a particular AUDUSD DNT option with the following parameters: L equal to 0.9200, U equal to 0.9600, K (payout) equal to USD 1 million, T equal to 0.25 years, volatility equal to 6 percent, r_AUD equal to 2.75 percent, r_USD equal to 0.25 percent, and b equal to -2.5 percent. We will calculate and plot all the possible values of this DNT from 0.9200 to 0.9600; each step will be one pip (0.0001), so we will use 2,000 steps. The following code plots a graph of price of underlying: x <- seq(0.92, 0.96, length = 2000) y <- z <- rep(0, 2000) for (i in 1:2000){    y[i] <- dnt1(x[i], 1e6, 0.96, 0.92, 0.06, 0.25, 0.0025, -0.0250)    z[i] <- dnt1(x[i], 1e6, 0.96, 0.92, 0.065, 0.25, 0.0025, -0.0250) } matplot(x, cbind(y,z), type = "l", lwd = 2, lty = 1,    main = "Price of a DNT with volatility 6% and 6.5% ", cex.main = 0.8, xlab = "Price of underlying" ) The following output is the result of the preceding code: It can be clearly seen that even a small change in volatility can have a huge impact on the price of a DNT. Looking at this chart is an intuitive way to find that vega must be negative. Interestingly enough even just taking a quick look at this chart can convince us that the absolute value of vega is decreasing if we are getting closer to the barriers. Most end users think that the biggest risk is when the spot is getting close to the trigger. This is because end users really think about binary options in a binary way. As long as the DNT is alive, they focus on the positive outcome. However, for a dynamic hedger, the risk of a DNT is not that interesting when the value of the DNT is already small. It is also very interesting that since the T-Bill price is independent of the volatility and since the DNT + DOT = T-Bill equation holds, an increasing volatility will decrease the price of the DNT by the exact same amount just like it will increase the price of the DOT. It is not surprising that the vega of the DOT should be the exact mirror of the DNT. We can use the GetGreeks function to estimate vega, gamma, delta, and theta. For gamma we can use the GetGreeks function in the following way: GetGreeks <- function(FUN, arg, epsilon,...) {    all_args1 <- all_args2 <- list(...)    all_args1[[arg]] <- as.numeric(all_args1[[arg]] + epsilon)    all_args2[[arg]] <- as.numeric(all_args2[[arg]] - epsilon)    (do.call(FUN, all_args1) -        do.call(FUN, all_args2)) / (2 * epsilon) } Gamma <- function(FUN, epsilon, S, ...) {    arg1 <- list(S, ...)    arg2 <- list(S + 2 * epsilon, ...)    arg3 <- list(S - 2 * epsilon, ...)    y1 <- (do.call(FUN, arg2) - do.call(FUN, arg1)) / (2 * epsilon)    y2 <- (do.call(FUN, arg1) - do.call(FUN, arg3)) / (2 * epsilon)  (y1 - y2) / (2 * epsilon) } x = seq(0.9202, 0.9598, length = 200) delta <- vega <- theta <- gamma <- rep(0, 200)   for(i in 1:200){ delta[i] <- GetGreeks(FUN = dnt1, arg = 1, epsilon = 0.0001,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.02, -0.02) vega[i] <-   GetGreeks(FUN = dnt1, arg = 5, epsilon = 0.0005,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.0025, -0.025) theta[i] <- - GetGreeks(FUN = dnt1, arg = 6, epsilon = 1/365,    x[i], 1000000, 0.96, 0.92, 0.06, 0.5, 0.0025, -0.025) gamma[i] <- Gamma(FUN = dnt1, epsilon = 0.0001, S = x[i], K =    1e6, U = 0.96, L = 0.92, sigma = 0.06, Time = 0.5, r = 0.02, b = -0.02) }   windows() plot(x, vega, type = "l", xlab = "S",ylab = "", main = "Vega") The following chart is the result of the preceding code: After having a look at the value chart, the delta of a DNT is also very close to intuitions; if we are coming close to the higher barrier, our delta gets negative, and if we are coming closer to the lower barrier, the delta gets positive as follows: windows() plot(x, delta, type = "l", xlab = "S",ylab = "", main = "Delta") This is really a non-convex situation; if we would like to do a dynamic delta hedge, we will lose money for sure. If the spot price goes up, the delta of the DNT decreases, so we should buy some AUDUSD as a hedge. However, if the spot price goes down, we should sell some AUDUSD. Imagine a scenario where AUDUSD goes up 20 pips in the morning and then goes down 20 pips in the afternoon. For a dynamic hedger, this means buying some AUDUSD after the price moved up and selling this very same amount after the price comes down. The changing of the delta can be described by the gamma as follows: windows() plot(x, gamma, type = "l", xlab = "S",ylab = "", main = "Gamma") Negative gamma means that if the spot goes up, our delta is decreasing, but if the spot goes down, our delta is increasing. This doesn't sound great. For this inconvenient non-convex situation, there is some compensation, that is, the value of theta is positive. If nothing happens, but one day passes, the DNT will automatically worth more. Here, we use theta as minus 1 times the partial derivative, since if (T-t) is the time left, we check how the value changes as t increases by one day: windows() plot(x, theta, type = "l", xlab = "S",ylab = "", main = "Theta") The more negative the gamma, the more positive our theta. This is how time compensates for the potential losses generated by the negative gamma. Risk-neutral pricing also implicates that negative gamma should be compensated by a positive theta. This is the main message of the Black-Scholes framework for vanilla options, but this is also true for exotics. See Taleb (1997) and Wilmott (2006). We already introduced the Black-Scholes surface before; now, we can go into more detail. This surface is also a nice interpretation of how theta and delta work. It shows the price of an option for different spot prices and times to maturity, so the slope of this surface is the theta for one direction and delta for the other. The code for this is as follows: BS_surf <- function(S, Time, FUN, ...) { n <- length(S) k <- length(Time) m <- matrix(0, n, k) for (i in 1:n) {    for (j in 1:k) {      l <- list(S = S[i], Time = Time[j], ...)      m[i,j] <- do.call(FUN, l)      } } persp3D(z = m, xlab = "underlying", ylab = "Time",    zlab = "option price", phi = 30, theta = 30, bty = "b2") } BS_surf(seq(0.92,0.96,length = 200), seq(1/365, 1/48, length = 200), dnt1, K = 1000000, U = 0.96, L = 0.92, r = 0.0025, b = -0.0250,    sigma = 0.2) The preceding code gives the following output: We can see what was already suspected; DNT likes when time is passing and the spot is moving to the middle of the (L,U) interval. Another way to price the Double-no-touch option Static replication is always the most elegant way of pricing. The no-arbitrage argument will let us say that if, at some time in the future, two portfolios have the same value for sure, then their price should be equal any time before this. We will show how double-knock-out (DKO) options could be used to build a DNT. We will need to use a trick; the strike price could be the same as one of the barriers. For a DKO call, the strike price should be lower than the upper barrier because if the strike price is not lower than the upper barrier, the DKO call would be knocked out before it could become in-the-money, so in this case, the option would be worthless as nobody can ever exercise it in-the-money. However, we can choose the strike price to be equal to the lower barrier. For a put, the strike price should be higher than the lower barrier, so why not make it equal to the upper barrier. This way, the DKO call and DKO put option will have a very convenient feature; if they are still alive, they will both expiry in-the-money. Now, we are almost done. We just have to add the DKO prices, and we will get a DNT that has a payout of (U-L) dollars. Since DNT prices are linear in the payout, we only have to multiply the result by K*(U-L): dnt2 <- function(S, K, U, L, sigma, T, r, b) {      a <- DoubleBarrierOption("co", S, L, L, U, T, r, b, sigma, 0,        0,title = NULL, description = NULL)    z <- a@price    b <- DoubleBarrierOption("po", S, U, L, U, T, r, b, sigma, 0,        0,title = NULL, description = NULL)    y <- b@price    (z + y) / (U - L) * K } Now, we have two functions for which we can compare the results: dnt1(0.9266, 1000000, 0.9600, 0.9200, 0.06, 0.25, 0.0025, -0.025) [1] 48564.59   dnt2(0.9266, 1000000, 0.9600, 0.9200, 0.06, 0.25, 0.0025, -0.025) [1] 48564.45 For a DNT with a USD 1 million contingent payout and an initial market value of over 48,000 dollars, it is very nice to see that the difference in the prices is only 14 cents. Technically, however, having a second pricing function is not a big help since low volatility is also an issue for dnt2. We will use dnt1 for the rest of the article. The life of a Double-no-touch option – a simulation How has the DNT price been evolving during the second quarter of 2014? We have the open-high-low-close type time series with five minute frequency for AUDUSD, so we know all the extreme prices: d <- read.table("audusd.csv", colClasses = c("character", rep("numeric",5)), sep = ";", header = TRUE) underlying <- as.vector(t(d[, 2:5])) t <- rep( d[,6], each = 4) n <- length(t) option_price <- rep(0, n)   for (i in 1:n) { option_price[i] <- dnt1(S = underlying[i], K = 1000000,    U = 0.9600, L = 0.9200, sigma = 0.06, T = t[i]/(60*24*365),      r = 0.0025, b = -0.0250) } a <- min(option_price) b <- max(option_price) option_price_transformed = (option_price - a) * 0.03 / (b - a) + 0.92   par(mar = c(6, 3, 3, 5)) matplot(cbind(underlying,option_price_transformed), type = "l",    lty = 1, col = c("grey", "red"),    main = "Price of underlying and DNT",    xaxt = "n", yaxt = "n", ylim = c(0.91,0.97),    ylab = "", xlab = "Remaining time") abline(h = c(0.92, 0.96), col = "green") axis(side = 2, at = pretty(option_price_transformed),    col.axis = "grey", col = "grey") axis(side = 4, at = pretty(option_price_transformed),    labels = round(seq(a/1000,1000,length = 7)), las = 2,    col = "red", col.axis = "red") axis(side = 1, at = seq(1,n, length=6),    labels = round(t[round(seq(1,n, length=6))]/60/24)) The following is the output for the preceding code: The price of a DNT is shown in red on the right axis (divided by 1000), and the actual AUDUSD price is shown in grey on the left axis. The green lines are the barriers of 0.9200 and 0.9600. The chart shows that in 2014 Q2, the AUDUSD currency pair was traded inside the (0.9200; 0.9600) interval; thus, the payout of the DNT would have been USD 1 million. This DNT looks like a very good investment; however, reality is just one trajectory out of an a priori almost infinite set. It could have happened differently. For example, on May 02, 2014, there were still 59 days left until expiry, and AUDUSD was traded at 0.9203, just three pips away from the lower barrier. At this point, the price of this DNT was only USD 5,302 dollars which is shown in the following code: dnt1(0.9203, 1000000, 0.9600, 0.9200, 0.06, 59/365, 0.0025, -0.025) [1] 5302.213 Compare this USD 5,302 to the initial USD 48,564 option price! In the following simulation, we will show some different trajectories. All of them start from the same 0.9266 AUDUSD spot price as it was on the dawn of April 01, and we will see how many of them stayed inside the (0.9200; 0.9600) interval. To make it simple, we will simulate geometric Brown motions by using the same 6 percent volatility as we used to price the DNT: library(matrixStats) DNT_sim <- function(S0 = 0.9266, mu = 0, sigma = 0.06, U = 0.96, L = 0.92, N = 5) {    dt <- 5 / (365 * 24 * 60)    t <- seq(0, 0.25, by = dt)    Time <- length(t)      W <- matrix(rnorm((Time - 1) * N), Time - 1, N)    W <- apply(W, 2, cumsum)    W <- sqrt(dt) * rbind(rep(0, N), W)    S <- S0 * exp((mu - sigma^2 / 2) * t + sigma * W )    option_price <- matrix(0, Time, N)      for (i in 1:N)        for (j in 1:Time)          option_price[j,i] <- dnt1(S[j,i], K = 1000000, U, L, sigma,              0.25-t[j], r = 0.0025,                b = -0.0250)*(min(S[1:j,i]) > L & max(S[1:j,i]) < U)      survivals <- sum(option_price[Time,] > 0)    dev.new(width = 19, height = 10)      par(mfrow = c(1,2))    matplot(t,S, type = "l", main = "Underlying price",        xlab = paste("Survived", survivals, "from", N), ylab = "")    abline( h = c(U,L), col = "blue")    matplot(t, option_price, type = "l", main = "DNT price",        xlab = "", ylab = "")} set.seed(214) system.time(DNT_sim()) The following is the output for the preceding code: Here, the only surviving trajectory is the red one; in all other cases, the DNT hits either the higher or the lower barrier. The line set.seed(214) grants that this simulation will look the same anytime we run this. One out of five is still not that bad; it would suggest that for an end user or gambler who does no dynamic hedging, this option has an approximate value of 20 percent of the payout (especially since the interest rates are low, the time value of money is not important). However, five trajectories are still too few to jump to such conclusions. We should check the DNT survivorship ratio for a much higher number of trajectories. The ratio of the surviving trajectories could be a good estimator of the a priori real-world survivorship probability of this DNT; thus, the end user value of it. Before increasing N rapidly, we should keep in mind how much time this simulation took. For my computer, it took 50.75 seconds for N = 5, and 153.11 seconds for N = 15. The following is the output for N = 15: Now, 3 out of 15 survived, so the estimated survivorship ratio is still 3/15, which is equal to 20 percent. Looks like this is a very nice product; the price is around 5 percent of the payout, while 20 percent is the estimated survivorship ratio. Just out of curiosity, run the simulation for N equal to 200. This should take about 30 minutes. The following is the output for N = 200: The results are shocking; now, only 12 out of 200 survive, and the ratio is only 6 percent! So to get a better picture, we should run the simulation for a larger N. The movie Whatever Works by Woody Allen (starring Larry David) is 92 minutes long; in simulation time, that is N = 541. For this N = 541, there are only 38 surviving trajectories, resulting in a survivorship ratio of 7 percent. What is the real expected survivorship ratio? Is it 20 percent, 6 percent, or 7 percent? We simply don't know at this point. Mathematicians warn us that the law of large numbers requires large numbers, where large is much more than 541, so it would be advisable to run this simulation for as large an N as time allows. Of course, getting a better computer also helps to do more N during the same time. Anyway, from this point of view, Hui's (1996) relatively fast converging DNT pricing formula gets some respect. Summary We started this article by introducing exotic options. In a brief theoretical summary, we explained how exotics are linked together. There are many types of exotics. We showed one possible way of classification that is consistent with the fExoticOptions package. We showed how the Black-Scholes surface (a 3D chart that contains the price of a derivative dependent on time and the underlying price) can be constructed for any pricing function. Resources for Article: Further resources on this subject: What is Quantitative Finance? [article] Learning Option Pricing [article] Derivatives Pricing [article]
Read more
  • 0
  • 0
  • 8088

article-image-overview-sql-server-reporting-services-2012-architecture-features-and-tools
Packt
08 Aug 2013
15 min read
Save for later

Overview of SQL Server Reporting Services 2012 Architecture, Features, and Tools

Packt
08 Aug 2013
15 min read
(For more resources related to this topic, see here.) Structural design of SQL servers and SharePoint environment Depending on the business and the resources available, the various servers may be located in distributed locations and the Web applications may also be run from Web servers in a farm and the same can be true for SharePoint servers. In this article, by the word architecture we mean the way by which the preceding elements are put together to work on a single computer. However, it is important to know that this is just one topology (an arrangement of constituent elements) and in general it can be lot more complicated spanning networks and reaching across boundaries. The Report Server is the centerpiece of the Reporting Services installation. This installation can be deployed in two modes, namely, Native mode or SharePoint Integrated mode. Each mode has a separate engine and an extensible architecture. It consists of a collection of special-purpose extensions that handle authentication, data processing, rendering, and delivery operations. Once deployed in one mode it cannot be changed to the other. It is possible to have two servers each installed in a different mode. We have installed all the necessary elements to explore the RS 2012 features, including Power View and Data Alerts. The next diagram briefly shows the structural design of the environment used in working with the article: Primarily, SQL Server 2012 Enterprise Edition is used, for both Native mode as well as SharePoint Integrated mode. As we see in the previous diagram, Report Server Native mode is on a named instance HI (in some places another named instance Kailua is also used). This server has the Reporting Services databases ReportServer$HI and ReportServer$HITempDB. The associated Report Server handles Jobs, Security, and Shared Schedules. The Native mode architecture described in the next section is taken from the Microsoft documentation. The tools (SSDT, Report Builder, Report Server Configuration, and so on) connect to the Report Server. The associated SQL Server Agent takes care of the jobs such as subscriptions related to Native mode. The SharePoint Server 2010 is a required element with which the Reporting Services add-in helps to create a Reporting Services Service. With the creation of the RS Service in SharePoint, three SQL Server 2012 databases (shown alongside in the diagram) are created in an instance with its Reporting Services installed in SharePoint Integrated mode. The SQL Server 2012 instance NJ is installed in this fashion. These databases are repositories for report content including those related to Power Views and Data Alerts. The data sources(extension .rsds) used in creating Power View reports (extension.rdlx) are stored in the ReportingService_b67933dba1f14282bdf434479cbc8f8f database and the alerting related information is stored in the ReportingService_b67933dba1f14282bdf434479cbc8f8f_Alerting database. Not shown is an Express database that is used by the SharePoint Server for its content, administration, and so on. RS_ADD-IN allows you to create the service. You will use the Power Shell tool to create and manage the service. In order to create Power View reports, the new feature in SSRS 2012, you start off creating a data source in SharePoint library. Because of the RS Service, you can enable Reporting Services features such as Report Builder; and associate BISM file extensions to support connecting to tabular models created in SSDT deployed to Analysis Services Server. When Reporting Services is installed in SharePoint Integrated mode, SharePoint Web parts will be available to users that allow them to connect to RS Native mode servers to work with reports on the servers from within SharePoint Site. Native mode The following schematic taken from Microsoft documentation (http://msdn.microsoft.com/en-us/library/ms157231.aspx) shows the major components of a Native mode installation: The image shows clearly the several processors that are called into play before a report is displayed. The following are the elements of this processing: Processing extensions(data, rendering, report processing, and authentication) Designing tools(Report Builder, Report Designer) Display devices(browsers) Windows components that do the scheduling and delivery through extensions(Report Server databases, a SQL Server 2012 database, which store everything connected with reports) For the Reporting Services 2012 enabled in Native mode for this article, the following image shows the ReportServer databases and the Reporting Services Server. A similar server HI was also installed after a malware attack. The Report Server is implemented as a Microsoft Windows service called Report Server Service. SharePoint Integrated mode In SharePoint mode, a Report Server must run within a SharePoint Server (even in a standalone implementation). The Report Server processing, rendering, and management are all from SharePoint application server running the Reporting Services SharePoint shared service. For this to happen, at SQL Server installation time, the SharePoint Integrated mode has to be chosen. The access to reports and related operations in this case are from a SharePoint frontend. The following elements are required for SharePoint mode: SharePoint Foundation 2010 or SharePoint Server 2010 An appropriate version of the Reporting Services add-in for SharePoint products A SharePoint application server with a Reporting Services shared service instance and at least one Reporting Services service application The following diagram taken from Microsoft documentation illustrates the various parts of a SharePoint Integrated environment of Reporting Services. Note that the alerting Web service and Power View need SharePoint Integration. The numbered items and their description shown next are also from the same Microsoft document. Follow the link at the beginning of this section. The architectural details presented previously were taken from Microsoft documentation. Item number in the diagram   Description   1   Web servers or Web Frontends (WFE). The Reporting Services add-in must be installed on each Web server from which you want to utilize the Web application feature such as viewing reports or a Reporting Services management page for tasks such as managing data sources and subscriptions.   2   The add-in installs URL and SOAP endpoints for clients to communicate with application servers through the Reporting Services Proxy.   3   Application servers running a shared service. Scale-out of report processing is managed as part of the SharePoint farm and by adding the service to additional application servers.   4 You can create more than one Reporting Services service application with different configurations, including permissions, e-mail, proxy, and subscriptions.   5   Reports, data sources, and other items are stored in SharePoint content databases.   6   Reporting Services service applications create three databases for the Report Server, temp, and data alerting features. Configuration settings that apply to all SSRS service applications are stored in RSReportserver.config file.   When you install Reporting Services in SharePoint Integrated mode, several features that you are used to in Native mode will not be available. Some of them are summarized here from the MSDN site: URL access will work but you will have to access SharePoint URL and not Native mode URL. The Native mode folder hierarchy will not work. Custom Security extensions can be used but you need to use the special purpose security extension meant to be used for SharePoint Integration. You cannot use the Reporting Services Configuration Manager (of the Native mode installation).You should use the SharePoint Central Administration shown in this section (for Reporting Services 2008 and 2008 R2). Report Manager is not the frontend; in this case, you should use SharePoint Application pages. You cannot use Linked Reports, My Reports, and My Subscriptions in SharePoint mode. In SharePoint Integrated mode, you can work with Data Alerts and this is not possible in a Native mode installation. Power View is another thing you can do with SharePoint that is not available for Native mode. To access Power View the browser needs Silverlight installed. While reports with RDL extension are supported in both modes, reports with RDLX are only supported in SharePoint mode. SharePoint user token credentials, AAM Zones for internet facing deployments, SharePoint back and recovery, and ULS log support are only available for SharePoint mode. For the purposes of discussion and exercises in this article, a standalone server deployment is used as shown in the next diagram. It must be remembered that there are various other topologies of deployment possible using more than one computer. For a detailed description please follow the link http://msdn.microsoft.com/en-us/library/bb510781(v=sql.105).aspx. The standalone deployment is the simplest, in that all the components are installed on a single computer representative of the installation used for this article. The following diagram taken from the preceding link illustrates the elements of the standalone deployment: Reporting Services configuration For both modes of installation, information for Reporting Services components is stored in configuration files and the registry. During setup the configuration files are copied to the following locations: Native modeC:Program FilesMicrosoft SQL ServerMSRS11.MSSQLSERVER SharePoint Integrated modeC:Program FilesCommon FilesMicrosoft SharedWeb Server Extensions15WebServicesReporting Follow the link http://msdn.microsoft.com/en-us/library/ms155866.aspx for details. Native mode The Report Server Windows Service is an orchestrated set of applications that run in a single process using a single account with access to a single Report Server database with a set of configuration files listed here: Stored in   Description   Location   RSReportServer.config   Stores configuration settings for feature areas of the Report Server Service: Report Manager, the Report Server Web Service, and background processing.   <Installation directory> Reporting Services ReportServer   RSSrvPolicy.config   Stores the code access security policies for the server extensions.   <Installation directory> Reporting Services ReportServer   RSMgrPolicy.config   Stores the code access security policies for Report Manager.   <Installation directory> Reporting Services ReportManager   Web.config for the Report Server Web Service   Includes only those settings that are required for ASP.NET.   <Installation directory> Reporting Services ReportServer   Web.config for Report Manager   Includes only those settings that are required for ASP.NET.   <Installation directory> Reporting Services ReportManager   ReportingServicesService. exe.config   Stores configuration settings that specify the trace levels and logging options for the Report Server Service.   <Installation directory> Reporting Services ReportServer Bin Registry settings   Stores configuration state and other settings used to uninstall Reporting Services. If you are troubleshooting an installation or configuration problem, you can view these settings to get information about how the Report Server is configured.   Do not modify these settings directly as this can invalidate your installation.   HKEY_LOCAL_MACHINE SOFTWARE Microsoft Microsoft SQL Server <InstanceID> Setup and HKEY_ LOCAL_MACHINE SOFTWARE Microsoft Microsoft SQL ServerServices ReportServer   RSReportDesigner.config   Stores configuration settings for Report Designer. For more information follow the link http://msdn.microsoft.com/en-us/library/ms160346.aspx   <drive>:Program Files Microsoft Visual Studio 10 Common7 IDE PrivateAssemblies   RSPreviewPolicy.config   Stores the code access security policies for the server extensions used during report preview.   C:Program Files Microsoft Visual Studio 10.0 Common7IDE PrivateAssembliesr   First is the RSReportServer configuration file which can be found in the installation directory under Reporting Services. The entries in this file control the feature areas of the three components in the previous image, namely, Report Server Web Service, Report Server Service, Report Manager, and background processing. The ReportServer Configuration file has several sections with which you can modify the following features: General configuration settings URL reservations Authentication Service UI Extensions MapTileServerConfiguration (Microsoft Bing Maps SOAP Services that provides a tile background for map report items in the report) Default configuration file for a Native mode Report Server Default configuration file for a SharePoint mode Report Server The three areas previously mentioned (Report Server Web Service, Report Server Service, and Report Manager) all run in separate application domains and you can turn on/off elements that you may or may not need so as to improve security by reducing the surface area for attacks. Some functionality works for all the three components such as memory management and process health. For example, in the reporting server Kailua in this article, the service name is ReportServer$KAILUA. This service has no other dependencies. In fact, you can access the help file for this service when you look at Windows Services in the Control Panels shown. In three of the tabbed pages of this window you can access contextual help. SharePoint Integrated mode The following table taken from Microsoft documentation describes the configuration files used in the SharePoint mode Report Server. Configuration settings are stored in SharePoint Service application databases. Stored in   Description   Location   RSReportServer. config   Stores configuration settings for feature areas of the Report Server Service: Report Manager, the Report Server Web Service, and background processing.   <Installation directory> Reporting Services ReportServer   RSSrvPolicy.config   Stores the code access security policies for the server extensions.   <Installation directory> Reporting Services ReportServer   Web.config for the Report Server Web Service Registry settings   Stores configuration state and other settings used to uninstall Reporting Services. Also stores information about each Reporting Services service application.   Do not modify these settings directly as this can invalidate your installation.   HKEY_LOCAL_MACHINE SOFTWARE Microsoft Microsoft SQL Server <InstanceID> Setup   For example instance ID: MSSQL11.MSSQLSERVER and HKEY_LOCAL_MACHINE SOFTWAREMicrosoft Microsoft SQL Server Reporting Services Service Applications   RSReportDesigner. config   Stores configuration settings for Report Designer.   <drive>:Program Files Microsoft Visual Studio 10 Common7 IDE PrivateAssemblies   Hands-on exercise 3.1 – modifying the configuration file in Native mode We can make changes to the rsreportserver.config file if changes are required or some tuning has to be done. For example, you may need to change, to accommodate a different e-mail, change authentication, and so on. This is an XML file that can be edited in Notepad.exe (you can also use an XML Editor or Visual Studio). You need to start Notepad with administrator privileges. Turn on/off the Report Server Web Service In this exercise, we will modify the configuration file to turn on/off the Report Server Web Service. Perform the following steps: Start Notepad using Run as Administrator. Open the file at this location (you may use Start Search| for rsreportserver.config) which is located at C:Program FilesMicrosoft SQL ServerMSRS11.KAILUAReporting ServicesReportServerrsreportserver.config. In Edit Find| type in IsWebServiceEnabled. There are two values True/False. If you want to turn off, change TRUE to FALSE. The default is TRUE.Here is a section of the file reproduced: <Service> <IsSchedulingService>True</IsSchedulingService> <IsNotificationService>True</IsNotificationService> <IsEventService>True</IsEventService> <PollingInterval>10</PollingInterval> <WindowsServiceUseFileShareStorage>False </WindowsServiceUseFileShareStorage> <MemorySafetyMargin>80</MemorySafetyMargin> <MemoryThreshold>90</MemoryThreshold> <RecycleTime>720</RecycleTime> <MaxAppDomainUnloadTime>30</MaxAppDomainUnloadTime> <MaxQueueThreads>0</MaxQueueThreads> <UrlRoot> </UrlRoot> <UnattendedExecutionAccount> <UserName></UserName> <Password></Password> <Domain></Domain> </UnattendedExecutionAccount> <PolicyLevel>rssrvpolicy.config</PolicyLevel> <IsWebServiceEnabled>True</IsWebServiceEnabled> <IsReportManagerEnabled>True</IsReportManagerEnabled> <FileShareStorageLocation> <Path> </Path> </FileShareStorageLocation> </Service> Save the file to apply changes. Turn on/off the scheduled events and delivery This changes the report processing and delivery. Make changes in the rsreportserver.config file in the following section of <Service/>: <IsSchedulingService>True</IsSchedulingService> <IsNotificationService>True</IsNotificationService> <IsEventService>True</IsEventService> The default value for all of the three is TRUE. You can make it FALSE and save the file to apply changes. This can be carried out modifying FACET in SQL Server Management Studio (SSMS), but presently this is not available. Turn on/off the Report Manager Report Manager can be turned off or on by making changes to the configuration file. Make a change to the following section in the <Service/>: <IsReportManagerEnabled>True</IsReportManagerEnabled> Again, this change can be made using the Reporting Services Server in its FACET. To change this make sure you launch SQL Server Management Studio as Administrator. In the following sections use of SSMS via Facets is described. Hands-on exercise 3.2 – turn the Reporting Service on/off in SSMS The following are the steps to turn the Reporting Service on/off in SSMS: Connect to Reporting Services_KAILUA in SQL Server Management Studio as the Administrator. Choose HODENTEKWIN7KAILUA under Reporting Services. Click on OK. Right-click on HODENTEKWIN7KAILUA (Report Server 11.0.22180 –HodentekWin7mysorian). Click on Facets to open the following properties page Click on the handle and set it to True or False and click on OK. The default is True. It should be possible to turn Windows Integrated security on or off by using SQL Server Management Studio. However, the Reporting Services Server properties are disabled.
Read more
  • 0
  • 0
  • 8004
Modal Close icon
Modal Close icon