Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-application-logging
Packt
11 Aug 2016
8 min read
Save for later

Application Logging

Packt
11 Aug 2016
8 min read
In this article by Travis Marlette, author of Splunk Best Practices, will cover the following topics: (For more resources related to this topic, see here.) Log messengers Logging formats Within the working world of technology, there are hundreds of thousands of different applications, all (usually) logging in different formats. As Splunk experts, our job is make all those logs speak human, which is often the impossible task. With third-party applications that provide support, sometimes log formatting is out of our control. Take for instance, Cisco or Juniper, or any other leading application manufacturer. We won't be discussing these kinds of logs in this article, but instead the logs that we do have some control over. The logs I am referencing belong to proprietary in-house (also known as "home grown") applications that are often part of the middle-ware, and usually they control some of the most mission-critical services an organization can provide. Proprietary applications can be written in any language, however logging is usually left up to the developers for troubleshooting and up until now the process of manually scraping log files to troubleshoot quality assurance issues and system outages has been very specific. I mean that usually, the developer(s) are the only people that truly understand what those log messages mean. That being said, oftentimes developers write their logs in a way that they can understand them, because ultimately it will be them doing the troubleshooting / code fixing when something breaks severely. As an IT community, we haven't really started taking a look at the way we log things, but instead we have tried to limit the confusion to developers, and then have them help other SME's that provide operational support to understand what is actually happening. This method is successful, however it is slow, and the true value of any SME is reducing any system’s MTTR, and increasing uptime. With any system, the more transactions processed means the larger the scale of a system, which means that, after about 20 machines, troubleshooting begins to get more complex and time consuming with a manual process. This is where something like Splunk can be extremely valuable, however Splunk is only as good as the information that is coming into it. I will say this phrase for the people who haven't heard it yet; "garbage in… garbage out". There are some ways to turn proprietary logging into a powerful tool, and I have personally seen the value of these kinds of logs, after formatting them for Splunk, they turn into a huge asset in an organization’s software life cycle. I'm not here to tell you this is easy, but I am here to give you some good practices about how to format proprietary logs. To do that I'll start by helping you appreciate a very silent and critical piece of the application stack. To developers, a logging mechanism is a very important part of the stack, and the log itself is mission critical. What we haven't spent much time thinking about before log analyzers, is how to make log events/messages/exceptions more machine friendly so that we can socialize the information in a system like Splunk, and start to bridge the knowledge gap between development and operations. The nicer we format the logs, the faster Splunk can reveal the information about our systems, saving everyone time and from headaches. Loggers Here I'm giving some very high level information on loggers. My intention is not to recommend logging tools, but simply to raise awareness of their existence for those that are not in development, and allow for independent research into what they do. With the right developer, and the right Splunker, the logger turns into something immensely valuable to an organization. There is an array of different loggers in the IT universe, and I'm only going to touch on a couple of them here. Keep in mind that I'm only referencing these due to the ease of development I've seen from personal experience, and experiences do vary. I'm only going to touch on three loggers and then move on to formatting, as there are tons of logging mechanisms and the preference truly depends on the developer. Anatomy of a log I'm going to be taking some very broad strokes with the following explanations in order to familiarize you, the Splunker, with the logger. If you would like to learn more information, please either seek out a developer to help you understand the logic better or acquire some education how to develop and log in independent study. There are some pretty basic components to logging that we need to understand to learn which type of data we are looking at. I'll start with the four most common ones: Log events: This is the entirety of the message we see within a log, often starting with a timestamp. The event itself contains all other aspects of application behavior such as fields, exceptions, messages, and so on… think of this as the "container" if you will, for information. Messages: These are often made by the developer of the application and provide some human insight into what's actually happening within an application. The most common messages we see are things like unauthorized login attempt <user> or Connection Timed out to <ip address>. Message Fields: These are the pieces of information that give us the who, where, and when types of information for the application’s actions. They are handed to the logger by the application itself as it either attempts or completes an activity. For instance, in the log event below, the highlighted pieces are what would be fields, and often those that people look for when troubleshooting: "2/19/2011 6:17:46 AM Using 'xplog70.dll' version '2009.100.1600' to execute extended store procedure 'xp_common_1' operation failed to connect to 'DB_XCUTE_STOR'" Exceptions: These are the uncommon, but very important pieces of the log. They are usually only written when something went wrong, and offer developer insight into the root cause at the application layer. They are usually only printed when an error occurs, and used for debugging. These exceptions can print a huge amount of information into the log depending on the developer and the framework. The format itself is not easy and in some cases not even possible for a developer to manage. Log4* This is an open source logger that is often used in middleware applications. Pantheios This is a logger popularly used for Linux, and popular for its performance and multi-threaded handling of logging. Commonly, Pantheios is used for C/C++ applications, but it works with a multitude of frameworks. Logging – logging facility for Python This is a logger specifically for Python, and since Python is becoming more and more popular, this is a very common package used to log Python scripts and applications. Each one of these loggers has their own way of logging, and the value is determined by the application developer. If there is no standardized logging, then one can imagine the confusion this can bring to troubleshooting. Example of a structured log This is an example of a Java exception in a structured log format: When Java prints this exception, it will come in this format and a developer doesn't control what that format is. They can control some aspects about what is included within an exception, though the arrangement of the characters and how it's written is done by the Java framework itself. I mention this last part in order to help operational people understand where the control of a developer sometimes ends. My own personal experience has taught me that attempting to change a format that is handled within the framework itself is an attempt at futility. Pick your battles right? As a Splunker, you can save yourself a headache on this kind of thing. Summary While I say that, I will add an addendum by saying that Splunk, mixed with a Splunk expert and the right development resources, can also make the data I just mentioned extremely valuable. It will likely not happen as fast as they make it out to be at a presentation, and it will take more resources than you may have thought, however at the end of your Splunk journey, you will be happy. This article was to help you understand the importance of logs formatting, and how logs are written. We often don't think about our logs proactively, and I encourage you to do so. Resources for Article: Further resources on this subject: Logging and Monitoring [Article] Logging and Reports [Article] Using Events, Interceptors, and Logging Services [Article]
Read more
  • 0
  • 0
  • 1841

article-image-model-design-accelerator
Packt
30 Jul 2013
6 min read
Save for later

Model Design Accelerator

Packt
30 Jul 2013
6 min read
(For more resources related to this topic, see here.) By the end of this article you will be able to use Model Design Accelerator to design a new Framework model. To introduce Model Design Accelerator, we will use a fairly simple schema based on a rental star schema, derived from the MySQL Sakila sample database.This database can be downloaded from http://dev.mysql.com/doc/sakila/en/. It is just one example of a number of possible dimensional models based on this sample database. The Model Design Accelerator user interface The user interface of Model Design Accelerator is very simple, consisting of only two panels: Explorer Tree: This contains details of the database tables and views from the data source. Model Accelerator: This contains a single fact table surrounded by four dimension tables, and is the main work area for the model being designed. By clicking on the labels (Explorer Tree and Model Accelerator) at the top of the window, it is possible to hide either of these panels, but having both these panels always visible is beneficial. Starting Model Design Accelerator Model Design Accelerator is started from the Framework Manager initial screen: Select Create a new project using Model Design Accelerator…. This will start the new project creation wizard, which is exactly the same as if you were starting any new project. Select the data source to import the database tables into the new model. After importing the database tables, the project creation wizard will display the Model Design Accelerator Introduction screen: After reading the instructions, click on the Close button to continue. This will then show the Model Design Accelerator workspace. Adding tables to your workspace The first step in creating your model with Model Design Accelerator is to add the dimension and fact tables to your model: From the Explorer panel,drag-and-drop dim_date ,dim_film ,dim_ customer, and dim_store to the four New Query Subject boxes in the Model Accelerator panel. After adding your queries, right-click on the boxes to rename the queries to Rental Date Dim,Film Dim ,Customer Dim, and Store Dim respectively. If not all query columns are required, it is also possible to expand the dimension tables and drag-and-drop individual columns to the query boxes. In the Explorer Tree panel,expand the fact_rental table by clicking on the (+) sign besides the name, and from the expanded tree drag-and-drop count_returns,count_rentals, and rental_duration columns to the Fact Query Subject box. Rename the Fact Query Subject to Rental Fact. Additional dimension queries can be added to the model by clicking on the top-left icon in the Model Accelerator panel, and then by dragging and dropping the required query onto the workplace window. Since we have a start_date and an end_date for the rental period, add a second copy of the date_dim table, by clicking on the icon and dragging the table from the Explorer view into the workspace. Also rename this query as Return Date Dim: Adding joins to your workspace After we have added our database table columns to the workspace, we now need to add the relationship joins between the dimension and fact tables. To do this: Double-click on the Rental Date Dim table, and this will expand the date_ dim and the fact_rental tables in the workspace window: Click on the Enter relationship creation mode link. Select the date_key column in the dim_date table, and the rental_date_ key column in the fact_rental table as follows: Click on the Create relationship icon: Click on OK to create this join. Close the Query Subject Diagram by clicking on the (X) symbol in the top-right corner. Repeat this procedure for each of the other four tables. The final model will look like the following screenshot: Generating Framework Manager model Once we have completed our model in Model Design Accelerator, we need to create a Framework Manager model: Click on the Generate Model button. Click on Yes to generate your model.The Framework Manager model will be generated and will open as follows:   When you generate your model, all of the Model Advisor tests are automatically applied to the resulting model. You should review any issues that have been identified in the Verify Results tab, and decide whether you need to fix them. When you generate the model only those query items required will be used to create the Framework Manager model. The Physical View tab will contain only those tables required by your star schema model. The Business View tab will contain model query subjects containing only the columns used in your star schema model. The Presentation View tab will only contain shortcuts to the query subjects that exist in the Business View tab. After generating your model, you can use Framework Manager to improve the model by adding calculations, filters, dimensions, measures, and so on. Each time you generate a Framework Manager model from your Model Design Accelerator model, a new namespace is created in the current Framework Manager model and any improvements you want to use will also need to be applied to these new namespaces. From Framework Manager you can return to Model Design Accelerator at any time to continue making changes to your star schema. To return to the Model Design Accelerator from within Framework Manager: From the Tools menu, select Run Model Design Accelerator. You may choose to continue with the same model or create a new model. To make your star schema model available to the report authors, you must first create a package and then publish the package to your Cognos Reporting Server. Summary In this article, we have looked at Model Design Accelerator. This is a tool that allows a novice modeler, or even an experienced modeler, to create a new Framework Manger model quickly and easily. Resources for Article: Further resources on this subject: Integrating IBM Cognos TM1 with IBM Cognos 8 BI [Article] How to Set Up IBM Lotus Domino Server [Article] IBM Cognos 10 BI dashboarding components [Article]
Read more
  • 0
  • 0
  • 1830

article-image-drag
Packt
21 Oct 2013
4 min read
Save for later

Drag

Packt
21 Oct 2013
4 min read
(For more resources related to this topic, see here.) I can't think of a better dragging demonstration than animating with the parallax illusion. The illusion works by having several keyframes rendered in vertical slices and dragging a screen over them to create an animated thingamabob. Drawing the lines by hand would be tedious, so we're using an image Marco Kuiper created in Photoshop. I asked on Twitter and he said we can use the image, if we check out his other work at marcofolio.net. You can also get the image in the examples repository at https://raw.github.com/Swizec/d3.js-book-examples/master/ch4/parallax_base.png. We need somewhere to put the parallax: var width = 1200, height = 450, svg = d3.select('#graph') .append('svg') .attr({width: width, height: height}); We'll use SVG's native support for embedding bitmaps to insert parallax_base.png into the page: svg.append('image') .attr({'xlink:href': 'parallax_base.png', width: width, height: height}); The image element's magic stems from its xlink:href attribute. It understands links and even lets us embed images to create self-contained SVGs. To use that, you would prepend an image MIME type to a base64 encoded representation of the image. For instance, the following line is the smallest embedded version of a spacer GIF. Don't worry if you don't know what a spacer GIF is; they were useful up to about 2005.  AMDAwAAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw== Anyway, now that we have the animation base, we need a screen that can be dragged. It's going to be a bunch of carefully calibrated vertical lines: var screen_width = 900, lines = d3.range(screen_width/6), x = d3.scale.ordinal().domain(lines).rangeBands([0, screen_width]); We'll base the screen off an array of numbers (lines). Since line thickness and density are very important, we divide screen_width by 6—five pixels for a line and one for spacing. Make sure the value of screen_width is a multiple of 6; otherwise anti-aliasing ruins the effect. The x scale will help us place the lines evenly: svg.append('g') .selectAll('line') .data(lines) .enter() .append('line') .style('shape-rendering', 'crispEdges') .attr({stroke: 'black', 'stroke-width': x.rangeBand()-1, x1: function (d) { return x(d); }, y1: 0, x2: function (d) { return x(d); }, y2: height}); There's nothing particularly interesting here, just stuff you already know. The code goes through the array and draws a new vertical line for each entry. We made absolutely certain there won't be any anti-aliasing by setting shape-rendering to crispEdges. Time to define and activate a dragging behavior for our group of lines: var drag = d3.behavior.drag().origin(Object).on('drag', function () {d3.select(this).attr('transform', 'translate('+d3.event.x+', 0)').datum({x: d3.event.x, y: 0});}); We created the behavior with d3.behavior.drag(), defined a .origin() accessor, and specified what happens on drag. The behavior automatically translates touch and mouse events to the higher-level drag event. How cool is that! We need to give the behavior an origin so it knows how to calculate positions relatively; otherwise, the current position is always set to the mouse cursor and objects jump around. It's terrible. Object is the identity function for elements and assumes a datum with x and y coordinates. The heavy lifting happens inside the drag listener. We get the screen's new position from d3.event.x, move the screen there, and update the attached .datum() method. All that's left to do is to call drag and make sure to set the attached datum to the current position: svg.select('g') .datum({x: 0, y: 0}) .call(drag); The item looks solid now! Try dragging the screen at different speeds. The parallax effect doesn't work very well on a retina display because the base image gets resized and our screen loses calibration. Summary In this article, we looked into the drag behavioud of d3. All this can be done by with just click events, but I heartily recommend d3's behaviors module. It makes complex behaviors is that they automatically create relevant event listeners and let you work at a higher level of abstraction. Resources for Article: Further resources on this subject: Visualizing Productions Ahead of Time with Celtx [Article] Custom Data Readers in Ext JS [Article] The Login Page using Ext JS [Article]
Read more
  • 0
  • 0
  • 1830

article-image-installing-coherence-35-and-accessing-data-grid-part-2
Packt
31 Mar 2010
10 min read
Save for later

Installing Coherence 3.5 and Accessing the Data Grid: Part 2

Packt
31 Mar 2010
10 min read
Using the Coherence API One of the great things about Coherence is that it has a very simple and intuitive API that hides most of the complexity that is happening behind the scenes to distribute your objects. If you know how to use a standard Map interface in Java, you already know how to perform basic tasks with Coherence. In this section, we will first cover the basics by looking at some of the foundational interfaces and classes in Coherence. We will then proceed to do something more interesting by implementing a simple tool that allows us to load data into Coherence from CSV files, which will become very useful during testing. The basics: NamedCache and CacheFactory As I have briefly mentioned earlier, Coherence revolves around the concept of named caches. Each named cache can be configured differently, and it will typically be used to store objects of a particular type. For example, if you need to store employees, trade orders, portfolio positions, or shopping carts in the grid, each of those types will likely map to a separate named cache. The first thing you need to do in your code when working with Coherence is to obtain a reference to a named cache you want to work with. In order to do this, you need to use the CacheFactory class, which exposes the getCache method as one of its public members. For example, if you wanted to get a reference to the countries cache that we created and used in the console example, you would do the following: NamedCache countries = CacheFactory.getCache("countries"); Once you have a reference to a named cache, you can use it to put data into that cache or to retrieve data from it. Doing so is as simple as doing gets and puts on a standard Java Map: countries.put("SRB", "Serbia");String countryName = (String) countries.get("SRB"); As a matter of fact, NamedCache is an interface that extends Java's Map interface, so you will be immediately familiar not only with get and put methods, but also with other methods from the Map interface, such as clear, remove, putAll, size, and so on. The nicest thing about the Coherence API is that it works in exactly the same way, regardless of the cache topology you use. For now let's just say that you can configure Coherence to replicate or partition your data across the grid. The difference between the two is that in the former case all of your data exists on each node in the grid, while in the latter only 1/n of the data exists on each individual node, where n is the number of nodes in the grid. Regardless of how your data is stored physically within the grid, the NamedCache interface provides a standard API that allows you to access it. This makes it very simple to change cache topology during development if you realize that a different topology would be a better fit, without having to modify a single line in your code. In addition to the Map interface, NamedCache extends a number of lower-level Coherence interfaces. The following table provides a quick overview of these interfaces and the functionality they provide: The "Hello World" example In this section we will implement a complete example that achieves programmatically what we have done earlier using Coherence console—we'll put a few countries in the cache, list cache contents, remove items, and so on. To make things more interesting, instead of using country names as cache values, we will use proper objects this time. That means that we need a class to represent a country, so let's start there: public class Country implements Serializable, Comparable {private String code;private String name;private String capital;private String currencySymbol;private String currencyName;public Country() {}public Country(String code, String name, String capital,String currencySymbol, String currencyName) {this.code = code;this.name = name;this.capital = capital;this.currencySymbol = currencySymbol;this.currencyName = currencyName;}public String getCode() {return code;}public void setCode(String code) {this.code = code;}public String getName() {return name;}public void setName(String name) {this.name = name;}public String getCapital() {return capital;}public void setCapital(String capital) {this.capital = capital;}public String getCurrencySymbol() {return currencySymbol;}public void setCurrencySymbol(String currencySymbol) {this.currencySymbol = currencySymbol;}public String getCurrencyName() {return currencyName;}public void setCurrencyName(String currencyName) {this.currencyName = currencyName;}public String toString() {return "Country(" +"Code = " + code + ", " +"Name = " + name + ", " +"Capital = " + capital + ", " +"CurrencySymbol = " + currencySymbol + ", " +"CurrencyName = " + currencyName + ")";}public int compareTo(Object o) {Country other = (Country) o;return name.compareTo(other.name);}} There are several things to note about the Country class, which also apply to other classes that you want to store in Coherence: Because the objects needs to be moved across the network, classes that are stored within the data grid need to be serializable. In this case we have opted for the simplest solution and made the class implement the java.io.Serializable interface. This is not optimal, both from performance and memory utilization perspective, and Coherence provides several more suitable approaches to serialization. We have implemented the toString method that prints out an object's state in a friendly format. While this is not a Coherence requirement, implementing toString properly for both keys and values that you put into the cache will help a lot when debugging, so you should get into a habit of implementing it for your own classes. Finally, we have also implemented the Comparable interface. This is also not a requirement, but it will come in handy in a moment to allow us to print out a list of countries sorted by name. Now that we have the class that represents the values we want to cache, it is time to write an example that uses it: import com.tangosol.net.NamedCache;import com.tangosol.net.CacheFactory;import ch02.Country;import java.util.Set;import java.util.Map;public class CoherenceHelloWorld {public static void main(String[] args) {NamedCache countries = CacheFactory.getCache("countries");// first, we need to put some countries into the cachecountries.put("USA", new Country("USA", "United States","Washington", "USD", "Dollar"));countries.put("GBR", new Country("GBR", "United Kingdom","London", "GBP", "Pound"));countries.put("RUS", new Country("RUS", "Russia", "Moscow","RUB", "Ruble"));countries.put("CHN", new Country("CHN", "China", "Beijing","CNY", "Yuan"));countries.put("JPN", new Country("JPN", "Japan", "Tokyo","JPY", "Yen"));countries.put("DEU", new Country("DEU", "Germany", "Berlin","EUR", "Euro"));countries.put("FRA", new Country("FRA", "France", "Paris","EUR", "Euro"));countries.put("ITA", new Country("ITA", "Italy", "Rome","EUR", "Euro"));countries.put("SRB", new Country("SRB", "Serbia", "Belgrade","RSD", "Dinar"));assert countries.containsKey("JPN"): "Japan is not in the cache";// get and print a single countrySystem.out.println("get(SRB) = " + countries.get("SRB"));// remove Italy from the cacheint size = countries.size();System.out.println("remove(ITA) = " + countries.remove("ITA"));assert countries.size() == size - 1: "Italy was not removed";// list all cache entriesSet<Map.Entry> entries = countries.entrySet(null, null);for (Map.Entry entry : entries) {System.out.println(entry.getKey() + " = " + entry.getValue());}}} Let's go through this code section by section. At the very top, you can see import statements for NamedCache and CacheFactory, which are the only Coherence classes we need for this simple example. We have also imported our Country class, as well as Java's standard Map and Set interfaces. The first thing we need to do within the main method is to obtain a reference to the countries cache using the CacheFactory.getCache method. Once we have the cache reference, we can add some countries to it using the same old Map.put method you are familiar with. We then proceed to get a single object from the cache using the Map.get method , and to remove one using Map.remove. Notice that the NamedCache implementation fully complies with the Map.remove contract and returns the removed object. Finally, we list all the countries by iterating over the set returned by the entrySet method. Notice that Coherence cache entries implement the standard Map.Entry interface. Overall, if it wasn't for a few minor differences, it would be impossible to tell whether the preceding code uses Coherence or any of the standard Map implementations. The first telltale sign is the call to the CacheFactory.getCache at the very beginning, and the second one is the call to entrySet method with two null arguments. We have already discussed the former, but where did the latter come from? The answer is that Coherence QueryMap interface extends Java Map by adding methods that allow you to filter and sort the entry set. The first argument in our example is an instance of Coherence Filter interface. In this case, we want all the entries, so we simply pass null as a filter. The second argument, however, is more interesting in this particular example. It represents the java.util.Comparator that should be used to sort the results. If the values stored in the cache implement the Comparable interface, you can pass null instead of the actual Comparator instance as this argument, in which case the results will be sorted using their natural ordering (as defined by Comparable.compareTo implementation). That means that when you run the previous example, you should see the following output: get(SRB) = Country(Code = SRB, Name = Serbia, Capital = Belgrade,CurrencySymbol = RSD, CurrencyName = Dinar)remove(ITA) = Country(Code = ITA, Name = Italy, Capital = Rome,CurrencySymbol = EUR, CurrencyName = Euro)CHN = Country(Code = CHN, Name = China, Capital = Beijing, CurrencySymbol= CNY, CurrencyName = Yuan)FRA = Country(Code = FRA, Name = France, Capital = Paris, CurrencySymbol= EUR, CurrencyName = Euro)DEU = Country(Code = DEU, Name = Germany, Capital = Berlin,CurrencySymbol = EUR, CurrencyName = Euro)JPN = Country(Code = JPN, Name = Japan, Capital = Tokyo, CurrencySymbol =JPY, CurrencyName = Yen)RUS = Country(Code = RUS, Name = Russia, Capital = Moscow, CurrencySymbol= RUB, CurrencyName = Ruble)SRB = Country(Code = SRB, Name = Serbia, Capital = Belgrade,CurrencySymbol = RSD, CurrencyName = Dinar)GBR = Country(Code = GBR, Name = United Kingdom, Capital = London,CurrencySymbol = GBP, CurrencyName = Pound)USA = Country(Code = USA, Name = United States, Capital = Washington,CurrencySymbol = USD, CurrencyName = Dollar) As you can see, the countries in the list are sorted by name, as defined by our Country.compareTo implementation. Feel free to experiment by passing a custom Comparator as the second argument to the entrySet method, or by removing both arguments, and see how that affects result ordering. If you are feeling really adventurous and can't wait to learn about Coherence queries, take a sneak peek by changing the line that returns the entry set to: Set<Map.Entry> entries = countries.entrySet(new LikeFilter("getName", "United%"), null); As a final note, you might have also noticed that I used Java assertions in the previous example to check that the reality matches my expectations (well, more to demonstrate a few other methods in the API, but that's beyond the point). Make sure that you specify the -ea JVM argument when running the example if you want the assertions to be enabled, or use the run-helloworld target in the included Ant build file, which configures everything properly for you. That concludes the implementation of our first Coherence application. One thing you might notice is that the CoherenceHelloWorld application will run just fine even if you don't have any Coherence nodes started, and you might be wondering how that is possible. The truth is that there is one Coherence node—the CoherenceHelloWorld application. As soon as the CacheFactory.getCache method gets invoked, Coherence services will start within the application's JVM and it will either join the existing cluster or create a new one, if there are no other nodes on the network. If you don't believe me, look at the log messages printed by the application and you will see that this is indeed the case. Now that you know the basics, let's move on and build something slightly more exciting, and much more useful.
Read more
  • 0
  • 0
  • 1809

article-image-iot-and-decision-science
Packt
13 Oct 2016
10 min read
Save for later

IoT and Decision Science

Packt
13 Oct 2016
10 min read
In this article by Jojo Moolayil, author of the book Smarter Decisions - The Intersection of Internet of Things and Decision Science, you will learn that the Internet of Things (IoT) and Decision Science have been among the hottest topics in the industry for a while now. You would have heard about IoT and wanted to learn more about it, but unfortunately you would have come across multiple names and definitions over the Internet with hazy differences between them. Also, Decision Science has grown from a nascent domain to become one of the fastest and most widespread horizontal in the industry in the recent years. With the ever-increasing volume, variety, and veracity of data, decision science has become more and more valuable for the industry. Using data to uncover latent patterns and insights to solve business problems has made it easier for businesses to take actions with better impact and accuracy. (For more resources related to this topic, see here.) Data is the new oil for the industry, and with the boom of IoT, we are in a world where more and more devices are getting connected to the Internet with sensors capturing more and more vital granular dimensions details that had never been touched earlier. The IoT is a game changer with a plethora of devices connected to each other; the industry is eagerly attempting to untap the huge potential that it can deliver. The true value and impact of IoT is delivered with the help of Decision Science. IoT has inherently generated an ocean of data where you can swim to gather insights and take smarter decisions with the intersection of Decision Science and IoT. In this book, you will learn about IoT and Decision Science in detail by solving real-life IoT business problems using a structured approach. In this article, we will begin by understanding the fundamental basics of IoT and Decision Science problem solving. You will learn the following concepts: Understanding IoT and demystifying Machine to Machine (M2M), IoT, Internet of Everything (IoE), and Industrial IoT (IIoT) Digging deeper into the logical stack of IoT Studying the problem life cycle Exploring the problem landscape The art of problem solving The problem solving framework It is highly recommended that you explore this article in depth. It focuses on the basics and concepts required to build problems and use cases. Understanding the IoT To get started with the IoT, lets first try to understand it using the easiest constructs. Internet and Things; we have two simple words here that help us understand the entire concept. So what is the Internet? It is basically a network of computing devices. Similarly, what is a Thing? It could be any real-life entity featuring Internet connectivity. So now, what do we decipher from IoT? It is a network of connected Things that can transmit and receive data from other things once connected to the network. This is how we describe the Internet of Things in a nutshell. Now, let's take a glance at the definition. IoT can be defined as the ever-growing network of Things (entities) that feature Internet connectivity and the communication that occurs between them and other Internet-enabled devices and systems. The Things in IoT are enabled with sensors that capture vital information from the device during its operations, and the device features Internet connectivity that helps it transfer and communicate to other devices and the network. Today, when we discuss about IoT, there are so many other similar terms that come into the picture, such as Industrial Internet, M2M, IoE, and a few more, and we find it difficult to understand the differences between them. Before we begin delineating the differences between these hazy terms and understand how IoT evolved in the industry, lets first take a simple real-life scenario to understand how exactly IoT looks like. IoT in a real-life scenario Let's take a simple example to understand how IoT works. Consider a scenario where you are a father in a family with a working mother and 10-year old son studying in school. You and your wife work in different offices. Your house is equipped with quite a few smart devices, say, a smart microwave, smart refrigerator, and smart TV. You are currently in office and you get notified on your smartphone that your son, Josh, has reached home from school. (He used his personal smart key to open the door.) You then use your smartphone to turn on the microwave at home to heat the sandwiches kept in it. Your son gets notified on the smart home controller that you have hot sandwiches ready for him. He quickly finishes them and starts preparing for a math test at school and you resume your work. After a while, you get notified again that your wife has also reached home (She also uses a similar smart key.) and you suddenly realize that you need to reach home to help your son with his math test. You again use your smartphone and change the air conditioner settings for three people and set the refrigerator to defrost using the app. In another 15 minutes, you are home and the air conditioning temperature is well set for three people. You then grab a can of juice from the refrigerator and discuss some math problems with your son on the couch. Intuitive, isnt it? How did it his happen and how did you access and control everything right from your phone? Well, this is how IoT works! Devices can talk to each other and also take actions based on the signals received: The IoT scenario Lets take a closer look at the same scenario. You are sitting in office and you could access the air conditioner, microwave, refrigerator, and home controller through your smartphone. Yes, the devices feature Internet connectivity and once connected to the network, they can send and receive data from other devices and take actions based on signals. A simple protocol helps these devices understand and send data and signals to a plethora of heterogeneous devices connected to the network. We will get into the details of the protocol and how these devices talk to each other soon. However, before that, we will get into some details of how this technology started and why we have so many different names today for IoT. Demystifying M2M, IoT, IIoT, and IoE So now that we have a general understanding about what is IoT, lets try to understand how it all started. A few questions that we will try to understand are: Is IoT very new in the market?, When did this start?, How did this start?, Whats the difference between M2M, IoT, IoE, and all those different names?, and so on. If we try to understand the fundamentals of IoT, that is, machines or devices connected to each other in a network, which isn't something really new and radically challenging, then what is this buzz all about? The buzz about machines talking to each other started long before most of us thought of it, and back then it was called Machine to Machine Data. In early 1950, a lot of machinery deployed for aerospace and military operations required automated communication and remote access for service and maintenance. Telemetry was where it all started. It is a process in which a highly automated communication was established from which data is collected by making measurements at remote or inaccessible geographical areas and then sent to a receiver through a cellular or wired network where it was monitored for further actions. To understand this better, lets take an example of a manned space shuttle sent for space exploration. A huge number of sensors are installed in such a space shuttle to monitor the physical condition of astronauts, the environment, and also the condition of the space shuttle. The data collected through these sensors is then sent back to the substation located on Earth, where a team would use this data to analyze and take further actions. During the same time, industrial revolution peaked and a huge number of machines were deployed in various industries. Some of these industries where failures could be catastrophic also saw the rise in machine-to-machine communication and remote monitoring: Telemetry Thus, machine-to-machine data a.k.a. M2M was born and mainly through telemetry. Unfortunately, it didnt scale to the extent that it was supposed to and this was largely because of the time it was developed in. Back then, cellular connectivity was not widespread and affordable, and installing sensors and developing the infrastructure to gather data from them was a very expensive deal. Therefore, only a small chunk of business and military use cases leveraged this. As time passed, a lot of changes happened. The Internet was born and flourished exponentially. The number of devices that got connected to the Internet was colossal. Computing power, storage capacities, and communication and technology infrastructure scaled massively. Additionally, the need to connect devices to other devices evolved, and the cost of setting up infrastructure for this became very affordable and agile. Thus came the IoT. The major difference between M2M and IoT initially was that the latter used the Internet (IPV4/6) as the medium whereas the former used cellular or wired connection for communication. However, this was mainly because of the time they evolved in. Today, heavy engineering industries have machinery deployed that communicate over the IPV4/6 network and is called Industrial IoT or sometimes M2M. The difference between the two is bare minimum and there are enough cases where both are used interchangeably. Therefore, even though M2M was actually the ancestor of IoT, today both are pretty much the same. M2M or IIoT are nowadays aggressively used to market IoT disruptions in the industrial sector. IoE or Internet of Everything was a term that surfaced on the media and Internet very recently. The term was coined by Cisco with a very intuitive definition. It emphasizes Humans as one dimension in the ecosystem. It is a more organized way of defining IoT. The IoE has logically broken down the IoT ecosystem into smaller components and simplified the ecosystem in an innovative way that was very much essential. IoE divides its ecosystem into four logical units as follows: People Processes Data Devices Built on the foundation of IoT, IoE is defined as The networked connection of People, Data, Processes, and Things. Overall, all these different terms in the IoT fraternity have more similarities than differences and, at the core, they are the same, that is, devices connecting to each other over a network. The names are then stylized to give a more intrinsic connotation of the business they refer to, such as Industrial IoT and Machine to Machine for (B2B) heavy engineering, manufacturing and energy verticals, Consumer IoT for the B2C industries, and so on. Summary In this article we learnt how to start with the IoT. It is basically a network of computing devices. Similarly, what is a Thing? It could be any real-life entity featuring Internet connectivity. So now, what do we decipher from IoT? It is a network of connected Things that can transmit and receive data from other things once connected to the network. This is how we describe the Internet of Things in a nutshell. Resources for Article: Further resources on this subject: Machine Learning Tasks [article] Welcome to Machine Learning Using the .NET Framework [article] Why Big Data in the Financial Sector? [article]
Read more
  • 0
  • 0
  • 1789

article-image-review-sql-server-features-developers
Packt
13 Feb 2017
43 min read
Save for later

Review of SQL Server Features for Developers

Packt
13 Feb 2017
43 min read
In this article by Dejan Sarka, Miloš Radivojević, and William Durkin, the authors of the book, SQL Server 2016 Developer's Guide explains that before dwelling into the new features in SQL Server 2016, let's make a quick recapitulation of the SQL Server features for developers available already in the previous versions of SQL Server. Recapitulating the most important features with help you remember what you already have in your development toolbox and also understanding the need and the benefits of the new or improved features in SQL Server 2016. The recapitulation starts with the mighty T-SQL SELECT statement. Besides the basic clauses, advanced techniques like window functions, common table expressions, and APPLY operator are explained. Then you will pass quickly through creating and altering database objects, including tables and programmable objects, like triggers, views, user-defined functions, and stored procedures. You will also review the data modification language statements. Of course, errors might appear, so you have to know how to handle them. In addition, data integrity rules might require that two or more statements are executed as an atomic, indivisible block. You can achieve this with help of transactions. The last section of this article deals with parts of SQL Server Database Engine marketed with a common name "Beyond Relational". This is nothing beyond the Relational Model, the "beyond relational" is really just a marketing term. Nevertheless, you will review the following: How SQL Server supports spatial data How you can enhance the T-SQL language with Common Language Runtime (CLR) elements written is some .NET language like Visual C# How SQL Server supports XML data The code in this article uses the WideWorldImportersDW demo database. In order to test the code, this database must be present in your SQL Server instance you are using for testing, and you must also have SQL Server Management Studio (SSMS) as the client tool. This article will cover the following points: Core Transact-SQL SELECT statement elements Advanced SELECT techniques Error handling Using transactions Spatial data XML support in SQL Server (For more resources related to this topic, see here.) The Mighty Transact-SQL SELECT You probably already know that the most important SQL statement is the mighty SELECT statement you use to retrieve data from your databases. Every database developer knows the basic clauses and their usage: SELECT to define the columns returned, or a projection of all table columns FROM to list the tables used in the query and how they are associated, or joined WHERE to filter the data to return only the rows that satisfy the condition in the predicate GROUP BY to define the groups over which the data is aggregated HAVING to filter the data after the grouping with conditions that refer to aggregations ORDER BY to sort the rows returned to the client application Besides these basic clauses, SELECT offers a variety of advanced possibilities as well. These advanced techniques are unfortunately less exploited by developers, although they are really powerful and efficient. Therefore, I urge you to review them and potentially use them in your applications. The advanced query techniques presented here include: Queries inside queries, or shortly subqueries Window functions TOP and OFFSET...FETCH expressions APPLY operator Common tables expressions, or CTEs Core Transact-SQL SELECT Statement Elements Let us start with the most simple concept of SQL which every Tom, Dick, and Harry is aware of! The simplest query to retrieve the data you can write includes the SELECT and the FROM clauses. In the select clause, you can use the star character, literally SELECT *, to denote that you need all columns from a table in the result set. The following code switches to the WideWorldImportersDW database context and selects all data from the Dimension.Customer table. USE WideWorldImportersDW; SELECT * FROM Dimension.Customer; The code returns 403 rows, all customers with all columns. Using SELECT * is not recommended in production. Such queries can return an unexpected result when the table structure changes, and is also not suitable for good optimization. Better than using SELECT * is to explicitly list only the columns you need. This means you are returning only a projection on the table. The following example selects only four columns from the table. SELECT [Customer Key], [WWI Customer ID], [Customer], [Buying Group] FROM Dimension.Customer; Below is the shortened result, limited to the first three rows only. Customer Key WWI Customer ID Customer Buying Group ------------ --------------- ----------------------------- ------------- 0 0 Unknown N/A 1 1 Tailspin Toys (Head Office) Tailspin Toys 2 2 Tailspin Toys (Sylvanite, MT) Tailspin Toys You can see that the column names in the WideWorldImportersDW database include spaces. Names that include spaces are called delimited identifiers. In order to make SQL Server properly understand them as column names, you must enclose delimited identifiers in square parentheses. However, if you prefer to have names without spaces, or is you use computed expressions in the column list, you can add column aliases. The following query returns completely the same data as the previous one, just with columns renamed by aliases to avoid delimited names. SELECT [Customer Key] AS CustomerKey, [WWI Customer ID] AS CustomerId, [Customer], [Buying Group] AS BuyingGroup FROM Dimension.Customer; You might have noticed in the result set returned from the last two queries that there is also a row in the table for an unknown customer. You can filter this row with the WHERE clause. SELECT [Customer Key] AS CustomerKey, [WWI Customer ID] AS CustomerId, [Customer], [Buying Group] AS BuyingGroup FROM Dimension.Customer WHERE [Customer Key] <> 0; In a relational database, you typically have data spread in multiple tables. Each table represents a set of entities of the same kind, like customers in the examples you have seen so far. In order to get result sets meaningful for the business your database supports, you most of the time need to retrieve data from multiple tables in the same query. You need to join two or more tables based on some conditions. The most frequent kind of a join is the inner join. Rows returned are those for which the condition in the join predicate for the two tables joined evaluates to true. Note that in a relational database, you have three-valued logic, because there is always a possibility that a piece of data is unknown. You mark the unknown with the NULL keyword. A predicate can thus evaluate to true, false or NULL. For an inner join, the order of the tables involved in the join is not important. In the following example, you can see the Fact.Sale table joined with an inner join to the Dimension.Customer table. SELECT c.[Customer Key] AS CustomerKey, c.[WWI Customer ID] AS CustomerId, c.[Customer], c.[Buying Group] AS BuyingGroup, f.Quantity, f.[Total Excluding Tax] AS Amount, f.Profit FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key]; In the query, you can see that table aliases are used. If a column's name is unique across all tables in the query, then you can use it without table name. If not, you need to use table name in front of the column, to avoid ambiguous column names, in the format table.column. In the previous query, the [Customer Key] column appears in both tables. Therefore, you need to precede this column name with the table name of its origin to avoid ambiguity. You can shorten the two-part column names by using table aliases. You specify table aliases in the FROM clause. Once you specify table aliases, you must always use the aliases; you can't refer to the original table names in that query anymore. Please note that a column name might be unique in the query at the moment when you write the query. However, later somebody could add a column with the same name in another table involved in the query. If the column name is not preceded by an alias or by the table name, you would get an error when executing the query because of the ambiguous column name. In order to make the code more stable and more readable, you should always use table aliases for each column in the query. The previous query returns 228,265 rows. It is always recommendable to know at least approximately the number of rows your query should return. This number is the first control of the correctness of the result set, or said differently, whether the query is written logically correct. The query returns the unknown customer and the orders associated for this customer, of more precisely said associated to this placeholder for an unknown customer. Of course, you can use the WHERE clause to filter the rows in a query that joins multiple tables, like you use it for a single table query. The following query filters the unknown customer rows. SELECT c.[Customer Key] AS CustomerKey, c.[WWI Customer ID] AS CustomerId, c.[Customer], c.[Buying Group] AS BuyingGroup, f.Quantity, f.[Total Excluding Tax] AS Amount, f.Profit FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0; The query returns 143,968 rows. You can see that a lot of sales is associated with the unknown customer. Of course, the Fact.Sale table cannot be joined to the Dimension.Customer table. The following query joins it to the Dimension.Date table. Again, the join performed is an inner join. SELECT d.Date, f.[Total Excluding Tax], f.[Delivery Date Key] FROM Fact.Sale AS f INNER JOIN Dimension.Date AS d ON f.[Delivery Date Key] = d.Date; The query returns 227,981 rows. The query that joined the Fact.Sale table to the Dimension.Customer table returned 228,265 rows. It looks like not all Fact.Sale table rows have a known delivery date, not all rows can match the Dimension.Date table rows. You can use an outer join to check this. With an outer join, you preserve the rows from one or both tables, even if they don't have a match in the other table. The result set returned includes all of the matched rows like you get from an inner join plus the preserved rows. Within an outer join, the order of the tables involved in the join might be important. If you use LEFT OUTER JOIN, then the rows from the left table are preserved. If you use RIGHT OUTER JOIN, then the rows from the right table are preserved. Of course, in both cases, the order of the tables involved in the join is important. With a FULL OUTER JOIN, you preserve the rows from both tables, and the order of the tables is not important. The following query preserves the rows from the Fact.Sale table, which is on the left side of the join to the Dimension.Date table. In addition, the query sorts the result set by the invoice date descending using the ORDER BY clause. SELECT d.Date, f.[Total Excluding Tax], f.[Delivery Date Key], f.[Invoice Date Key] FROM Fact.Sale AS f LEFT OUTER JOIN Dimension.Date AS d ON f.[Delivery Date Key] = d.Date ORDER BY f.[Invoice Date Key] DESC; The query returns 228,265 rows. Here is the partial result of the query. Date Total Excluding Tax Delivery Date Key Invoice Date Key ---------- -------------------- ----------------- ---------------- NULL 180.00 NULL 2016-05-31 NULL 120.00 NULL 2016-05-31 NULL 160.00 NULL 2016-05-31 … … … … 2016-05-31 2565.00 2016-05-31 2016-05-30 2016-05-31 88.80 2016-05-31 2016-05-30 2016-05-31 50.00 2016-05-31 2016-05-30 For the last invoice date (2016-05-31), the delivery date is NULL. The NULL in the Date column form the Dimension.Date table is there because the data from this table is unknown for the rows with an unknown delivery date in the Fact.Sale table. Joining more than two tables is not tricky if all of the joins are inner joins. The order of joins is not important. However, you might want to execute an outer join after all of the inner joins. If you don't control the join order with the outer joins, it might happen that a subsequent inner join filters out the preserved rows if an outer join. You can control the join order with parenthesis. The following query joins the Fact.Sale table with an inner join to the Dimension.Customer, Dimension.City, Dimension.[Stock Item], and Dimension.Employee tables, and with an left outer join to the Dimension.Date table. SELECT cu.[Customer Key] AS CustomerKey, cu.Customer, ci.[City Key] AS CityKey, ci.City, ci.[State Province] AS StateProvince, ci.[Sales Territory] AS SalesTeritory, d.Date, d.[Calendar Month Label] AS CalendarMonth, d.[Calendar Year] AS CalendarYear, s.[Stock Item Key] AS StockItemKey, s.[Stock Item] AS Product, s.Color, e.[Employee Key] AS EmployeeKey, e.Employee, f.Quantity, f.[Total Excluding Tax] AS TotalAmount, f.Profit FROM (Fact.Sale AS f INNER JOIN Dimension.Customer AS cu ON f.[Customer Key] = cu.[Customer Key] INNER JOIN Dimension.City AS ci ON f.[City Key] = ci.[City Key] INNER JOIN Dimension.[Stock Item] AS s ON f.[Stock Item Key] = s.[Stock Item Key] INNER JOIN Dimension.Employee AS e ON f.[Salesperson Key] = e.[Employee Key]) LEFT OUTER JOIN Dimension.Date AS d ON f.[Delivery Date Key] = d.Date; The query returns 228,265 rows. Note that with the usage of the parenthesis the order of joins is defined in the following way: Perform all inner joins, with an arbitrary order among them Execute the left outer join after all of the inner joins So far, I have tacitly assumed that the Fact.Sale table has 228,265 rows, and that the previous query needed only one outer join of the Fact.Sale table with the Dimension.Date to return all of the rows. It would be good to check this number in advance. You can check the number of rows by aggregating them using the COUNT(*) aggregate function. The following query introduces that function. SELECT COUNT(*) AS SalesCount FROM Fact.Sale; Now you can be sure that the Fact.Sale table has exactly 228,265 rows. Many times you need to aggregate data in groups. This is the point where the GROUP BY clause becomes handy. The following query aggregates the sales data for each customer. SELECT c.Customer, SUM(f.Quantity) AS TotalQuantity, SUM(f.[Total Excluding Tax]) AS TotalAmount, COUNT(*) AS InvoiceLinesCount FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 GROUP BY c.Customer; The query returns 402 rows, one for each known customer. In the SELECT clause, you can have only the columns used for grouping, or aggregated columns. You need to get a scalar, a single aggregated value for each row for each column not included in the GROUP BY list. Sometimes you need to filter aggregated data. For example, you might need to find only frequent customers, defined as customers with more than 400 rows in the Fact.Sale table. You can filter the result set on the aggregated data by using the HAVING clause, like the following query shows. SELECT c.Customer, SUM(f.Quantity) AS TotalQuantity, SUM(f.[Total Excluding Tax]) AS TotalAmount, COUNT(*) AS InvoiceLinesCount FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 GROUP BY c.Customer HAVING COUNT(*) > 400; The query returns 45 rows for 45 most frequent known customers. Note that you can't use column aliases from the SELECT clause in any other clause introduced in the previous query. The SELECT clause logically executes after all other clause from the query, and the aliases are not known yet. However, the ORDER BY clause executes after the SELECT clause, and therefore the columns aliases are already known and you can refer to them. The following query shows all of the basic SELECT statement clauses used together to aggregate the sales data over the known customers, filters the data to include the frequent customers only, and sorts the result set descending by the number of rows of each customer in the Fact.Sale table. SELECT c.Customer, SUM(f.Quantity) AS TotalQuantity, SUM(f.[Total Excluding Tax]) AS TotalAmount, COUNT(*) AS InvoiceLinesCount FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 GROUP BY c.Customer HAVING COUNT(*) > 400 ORDER BY InvoiceLinesCountDESC; The query returns 45 rows. Below is the shortened result set. Customer TotalQuantity TotalAmount SalesCount ------------------------------------- ------------- ------------ ----------- Tailspin Toys (Vidrine, LA) 18899 340163.80 455 Tailspin Toys (North Crows Nest, IN) 17684 313999.50 443 Tailspin Toys (Tolna, ND) 16240 294759.10 443 Advanced SELECT Techniques Aggregating data over the complete input rowset or aggregating in groups produces aggregated rows only – either one row for the whole input rowset or one row per group. Sometimes you need to return aggregates together with the detail data. One way to achieve this is by using subqueries, queries inside queries. The following query shows an example of using two subqueries in a single query. In the SELECT clause, a subquery that calculates the sum of quantity for each customer. It returns a scalar value. The subquery refers to the customer key from the outer query. The subquery can't execute without the outer query. This is a correlated subquery. There is another subquery in the FROM clause that calculates overall quantity for all customers. This query returns a table, although it is a table with a single row and single column. This query is a self-contained subquery, independent of the outer query. A subquery in the FROM clause is also called a derived table. Another type of join is used to add the overall total to each detail row. A cross join is a Cartesian product of two input rowsets—each row from one side is associated with every single row from the other side. No join condition is needed. A cross join can produce an unwanted huge result set. For example, if you cross join just a 1,000 rows from the left side of the join with 1,000 rows from the right side, you get 1,000,000 rows in the output. Therefore, typically you want to avoid a cross join in production. However, in the example in the following query, 143,968 from the left side rows is cross joined to a single row from the subquery, therefore producing 143,968 only. Effectively, this means that the overall total column is added to each detail row. SELECT c.Customer, f.Quantity, (SELECT SUM(f1.Quantity) FROM Fact.Sale AS f1 WHERE f1.[Customer Key] = c.[Customer Key]) AS TotalCustomerQuantity, f2.TotalQuantity FROM (Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key]) CROSS JOIN (SELECT SUM(f2.Quantity) FROM Fact.Sale AS f2 WHERE f2.[Customer Key] <> 0) AS f2(TotalQuantity) WHERE c.[Customer Key] <> 0 ORDER BY c.Customer, f.Quantity DESC; Here is an abbreviated output of the query. Customer Quantity TotalCustomerQuantity TotalQuantity ---------------------------- ----------- --------------------- ------------- Tailspin Toys (Absecon, NJ) 360 12415 5667611 Tailspin Toys (Absecon, NJ) 324 12415 5667611 Tailspin Toys (Absecon, NJ) 288 12415 5667611 In the previous example, the correlated subquery in the SELECT clause has to logically execute once per row of the outer query. The query was partially optimized by moving the self-contained subquery for the overall total in the FROM clause, where logically executes only once. Although SQL Server can many times optimize correlated subqueries and convert them to joins, there exist also a much better and more efficient way to achieve the same result as the previous query returned. You can do this by using the window functions. The following query is using the window aggregate function SUM to calculate the total over each customer and the overall total. The OVER clause defines the partitions, or the windows of the calculation. The first calculation is partitioned over each customer, meaning that the total quantity per customer is reset to zero for each new customer. The second calculation uses an OVER clause without specifying partitions, thus meaning the calculation is done over all input rowset. This query produces exactly the same result as the previous one/ SELECT c.Customer, f.Quantity, SUM(f.Quantity) OVER(PARTITION BY c.Customer) AS TotalCustomerQuantity, SUM(f.Quantity) OVER() AS TotalQuantity FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 ORDER BY c.Customer, f.Quantity DESC; You can use many other functions for window calculations. For example, you can use the ranking functions, like ROW_NUMBER(), to calculate some rank in the window or in the overall rowset. However, rank can be defined only over some order of the calculation. You can specify the order of the calculation in the ORDER BY sub-clause inside the OVER clause. Please note that this ORDER BY clause defines only the logical order of the calculation, and not the order of the rows returned. A stand-alone, outer ORDER BY at the end of the query defines the order of the result. The following query calculates a sequential number, the row number of each row in the output, for each detail row of the input rowset. The row number is calculated once in partitions for each customer and once ever the whole input rowset. Logical order of calculation is over quantity descending, meaning that row number 1 gets the largest quantity, either the largest for each customer or the largest in the whole input rowset. SELECT c.Customer, f.Quantity, ROW_NUMBER() OVER(PARTITION BY c.Customer ORDER BY f.Quantity DESC) AS CustomerOrderPosition, ROW_NUMBER() OVER(ORDER BY f.Quantity DESC) AS TotalOrderPosition FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 ORDER BY c.Customer, f.Quantity DESC; The query produces the following result, abbreviated to couple of rows only again. Customer Quantity CustomerOrderPosition TotalOrderPosition ----------------------------- ----------- --------------------- -------------------- Tailspin Toys (Absecon, NJ) 360 1 129 Tailspin Toys (Absecon, NJ) 324 2 162 Tailspin Toys (Absecon, NJ) 288 3 374 … … … … Tailspin Toys (Aceitunas, PR) 288 1 392 Tailspin Toys (Aceitunas, PR) 250 4 1331 Tailspin Toys (Aceitunas, PR) 250 3 1315 Tailspin Toys (Aceitunas, PR) 250 2 1313 Tailspin Toys (Aceitunas, PR) 240 5 1478 Note the position, or the row number, for the second customer. The order does not look to be completely correct – it is 1, 4, 3, 2, 5, and not 1, 2, 3, 4, 5, like you might expect. This is due to repeating value for the second largest quantity, for the quantity 250. The quantity is not unique, and thus the order is not deterministic. The order of the result is defined over the quantity, and not over the row number. You can't know in advance which row will get which row number when the order of the calculation is not defined on unique values. Please also note that you might get a different order when you execute the same query on your SQL Server instance. Window functions are useful for some advanced calculations, like running totals and moving averages as well. However, the calculation of these values can't be performed over the complete partition. You can additionally frame the calculation to a subset of rows of each partition only. The following query calculates the running total of the quantity per customer (the column alias Q_RT in the query) ordered by the sale key and framed differently for each row. The frame is defined from the first row in the partition to the current row. Therefore, the running total is calculated over one row for the first row, over two rows for the second row, and so on. Additionally, the query calculates the moving average of the quantity (the column alias Q_MA in the query) for the last three rows. SELECT c.Customer, f.[Sale Key] AS SaleKey, f.Quantity, SUM(f.Quantity) OVER(PARTITION BY c.Customer ORDER BY [Sale Key] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Q_RT, AVG(f.Quantity) OVER(PARTITION BY c.Customer ORDER BY [Sale Key] ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS Q_MA FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 ORDER BY c.Customer, f.[Sale Key]; The query returns the following (abbreviated) result. Customer SaleKey Quantity Q_RT Q_MA ---------------------------- -------- ----------- ----------- ----------- Tailspin Toys (Absecon, NJ) 2869 216 216 216 Tailspin Toys (Absecon, NJ) 2870 2 218 109 Tailspin Toys (Absecon, NJ) 2871 2 220 73 Let's find the top three orders by quantity for the Tailspin Toys (Aceitunas, PR) customer! You can do this by using the OFFSET…FETCH clause after the ORDER BY clause, like the following query shows. SELECT c.Customer, f.[Sale Key] AS SaleKey, f.Quantity FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.Customer = N'Tailspin Toys (Aceitunas, PR)' ORDER BY f.Quantity DESC OFFSET 0 ROWS FETCH NEXT 3 ROWS ONLY; This is the complete result of the query. Customer SaleKey Quantity ------------------------------ -------- ----------- Tailspin Toys (Aceitunas, PR) 36964 288 Tailspin Toys (Aceitunas, PR) 126253 250 Tailspin Toys (Aceitunas, PR) 79272 250 But wait… Didn't the second largest quantity, the value 250, repeat three times? Which two rows were selected in the output? Again, because the calculation is done over a non-unique column, the result is somehow nondeterministic. SQL Server offers another possibility, the TOP clause. You can specify TOP n WITH TIES, meaning you can get all of the rows with ties on the last value in the output. However, this way you don't know the number of the rows in the output in advance. The following query shows this approach. SELECT TOP 3 WITH TIES c.Customer, f.[Sale Key] AS SaleKey, f.Quantity FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.Customer = N'Tailspin Toys (Aceitunas, PR)' ORDER BY f.Quantity DESC; This is the complete result of the previous query – this time it is four rows. Customer SaleKey Quantity ------------------------------ -------- ----------- Tailspin Toys (Aceitunas, PR) 36964 288 Tailspin Toys (Aceitunas, PR) 223106 250 Tailspin Toys (Aceitunas, PR) 126253 250 Tailspin Toys (Aceitunas, PR) 79272 250 The next task is to get the top three orders by quantity for each customer. You need to perform the calculation for each customer. The APPLY Transact-SQL operator comes handy here. You use it in the FROM clause. You apply, or execute, a table expression defined on the right side of the operator once for each row of the input rowset from the left side of the operator. There are two flavors of this operator. The CROSS APPLY version filters out the rows from the left rowset if the tabular expression on the right side does not return any row. The OUTER APPLY version preserves the row from the left side, even is the tabular expression on the right side does not return any row, similarly as the LEFT OUTER JOIN does. Of course, columns for the preserved rows do not have known values from the right-side tabular expression. The following query uses the CROSS APPLY operator to calculate top three orders by quantity for each customer that actually does have some orders. SELECT c.Customer, t3.SaleKey, t3.Quantity FROM Dimension.Customer AS c CROSS APPLY (SELECT TOP(3) f.[Sale Key] AS SaleKey, f.Quantity FROM Fact.Sale AS f WHERE f.[Customer Key] = c.[Customer Key] ORDER BY f.Quantity DESC) AS t3 WHERE c.[Customer Key] <> 0 ORDER BY c.Customer, t3.Quantity DESC; Below is the result of this query, shortened to first nine rows. Customer SaleKey Quantity ---------------------------------- -------- ----------- Tailspin Toys (Absecon, NJ) 5620 360 Tailspin Toys (Absecon, NJ) 114397 324 Tailspin Toys (Absecon, NJ) 82868 288 Tailspin Toys (Aceitunas, PR) 36964 288 Tailspin Toys (Aceitunas, PR) 126253 250 Tailspin Toys (Aceitunas, PR) 79272 250 Tailspin Toys (Airport Drive, MO) 43184 250 Tailspin Toys (Airport Drive, MO) 70842 240 Tailspin Toys (Airport Drive, MO) 630 225 For the final task in this section, assume that you need to calculate some statistics over totals of customers' orders. You need to calculate the average total amount for all customers, the standard deviation of this total amount, and the average count of total count of orders per customer. This means you need to calculate the totals over customers in advance, and then use aggregate functions AVG() and STDEV() on these aggregates. You could do aggregations over customers in advance in a derived table. However, there is another way to achieve this. You can define the derived table in advance, in the WITH clause of the SELECT statement. Such subquery is called a common table expression, or a CTE. CTEs are more readable than derived tables, and might be also more efficient. You could use the result of the same CTE multiple times in the outer query. If you use derived tables, then you need to define them multiple times if you want to use the multiple times in the outer query. The following query shows the usage of a CTE to calculate the average total amount for all customers, the standard deviation of this total amount, and the average count of total count of orders per customer. WITH CustomerSalesCTE AS ( SELECT c.Customer, SUM(f.[Total Excluding Tax]) AS TotalAmount, COUNT(*) AS InvoiceLinesCount FROM Fact.Sale AS f INNER JOIN Dimension.Customer AS c ON f.[Customer Key] = c.[Customer Key] WHERE c.[Customer Key] <> 0 GROUP BY c.Customer ) SELECT ROUND(AVG(TotalAmount), 6) AS AvgAmountPerCustomer, ROUND(STDEV(TotalAmount), 6) AS StDevAmountPerCustomer, AVG(InvoiceLinesCount) AS AvgCountPerCustomer FROM CustomerSalesCTE; It returns the following result. AvgAmountPerCustomer StDevAmountPerCustomer AvgCountPerCustomer --------------------- ---------------------- ------------------- 270479.217661 38586.082621 358 Transactions and Error Handling In a real world application, errors always appear. Syntax or even logical errors can be in the code, the database design might be incorrect, there might even be a bug in the database management system you are using. Even is everything works correctly, you might get an error because the users insert wrong data. With Transact-SQL error handling you can catch such user errors and decide what to do upon them. Typically, you want to log the errors, inform the users about the errors, and sometimes even correct them in the error handling code. Error handling for user errors works on the statement level. If you send SQL Server a batch of two or more statements and the error is in the last statement, the previous statements execute successfully. This might not be what you desire. Many times you need to execute a batch of statements as a unit, and fail all of the statements if one of the statements fails. You can achieve this by using transactions. You will learn in this section about: Error handling Transaction management Error Handling You can see there is a need for error handling by producing an error. The following code tries to insert an order and a detail row for this order. EXEC dbo.InsertSimpleOrder @OrderId = 6, @OrderDate = '20160706', @Customer = N'CustE'; EXEC dbo.InsertSimpleOrderDetail @OrderId = 6, @ProductId = 2, @Quantity = 0; In SQL Server Management Studio, you can see that an error happened. You should get a message that the error 547 occurred, that The INSERT statement conflicted with the CHECK constraint. If you remember, in order details only rows where the value for the quantity is not equal to zero are allowed. The error occurred in the second statement, in the call of the procedure that inserts an order detail. The procedure that inserted an order executed without an error. Therefore, an order with id equal to six must be in the dbo. SimpleOrders table. The following code tries to insert order six again. EXEC dbo.InsertSimpleOrder @OrderId = 6, @OrderDate = '20160706', @Customer = N'CustE'; Of course, another error occurred. This time it should be error 2627, a violation of the PRIMARY KEY constraint. The values of the OrderId column must be unique. Let's check the state of the data after these successful and unsuccessful inserts. SELECT o.OrderId, o.OrderDate, o.Customer, od.ProductId, od.Quantity FROM dbo.SimpleOrderDetails AS od RIGHT OUTER JOIN dbo.SimpleOrders AS o ON od.OrderId = o.OrderId WHERE o.OrderId > 5 ORDER BY o.OrderId, od.ProductId; The previous query checks only orders and their associated details where the order id value is greater than five. The query returns the following result set. OrderId OrderDate Customer ProductId Quantity ----------- ---------- -------- ----------- ----------- 6 2016-07-06 CustE NULL NULL You can see that only the first insert of the order with the id 6 succeeded. The second insert of an order with the same id and the insert of the detail row for the order six did not succeed. You start handling errors by enclosing the statements in the batch you are executing in the BEGIN TRY … END TRY block. You can catch the errors in the BEGIN CATCH … END CATCH block. The BEGIN CATCH statement must be immediately after the END TRY statement. The control of the execution is passed from the try part to the catch part immediately after the first error occurs. In the catch part, you can decide how to handle the errors. If you want to log the data about the error or inform an end user about the details of the error, the following functions might be very handy: ERROR_NUMBER() – this function returns the number of the error. ERROR_SEVERITY() - it returns the severity level. The severity of the error indicates the type of problem encountered. Severity levels 11 to 16 can be corrected by the user. ERROR_STATE() – this function returns the error state number. Error state gives more details about a specific error. You might want to use this number together with the error number to search Microsoft knowledge base for the specific details of the error you encountered. ERROR_PROCEDURE() – it returns the name of the stored procedure or trigger where the error occurred, or NULL if the error did not occur within a stored procedure or trigger. ERROR_LINE() – it returns the line number at which the error occurred. This might be the line number in a routine if the error occurred within a stored procedure or trigger, or the line number in the batch. ERROR_MESSAGE() – this function returns the text of the error message. The following code uses the try…catch block to handle possible errors in the batch of the statements, and returns the information of the error using the above mentioned functions. Note that the error happens in the first statement of the batch. BEGIN TRY EXEC dbo.InsertSimpleOrder @OrderId = 6, @OrderDate = '20160706', @Customer = N'CustF'; EXEC dbo.InsertSimpleOrderDetail @OrderId = 6, @ProductId = 2, @Quantity = 5; END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_MESSAGE() AS ErrorMessage, ERROR_LINE() as ErrorLine; END CATCH There was a violation of the PRIMARY KEY constraint again, because the code tried to insert an order with id six again. The second statement would succeed if you would execute in its own batch, without error handling. However, because of the error handling, the control was passed to the catch block immediately after the error in the first statement, and the second statement never executed. You can check the data with the following query. SELECT o.OrderId, o.OrderDate, o.Customer, od.ProductId, od.Quantity FROM dbo.SimpleOrderDetails AS od RIGHT OUTER JOIN dbo.SimpleOrders AS o ON od.OrderId = o.OrderId WHERE o.OrderId > 5 ORDER BY o.OrderId, od.ProductId; The result set should be the same as the results set of the last check of the orders with id greater than five – a single order without details. The following code produces an error in the second statement. BEGIN TRY EXEC dbo.InsertSimpleOrder @OrderId = 7, @OrderDate = '20160706', @Customer = N'CustF'; EXEC dbo.InsertSimpleOrderDetail @OrderId = 7, @ProductId = 2, @Quantity = 0; END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_MESSAGE() AS ErrorMessage, ERROR_LINE() as ErrorLine; END CATCH You can see that the insert of the order detail violates the CHECK constraint for the quantity. If you check the data with the same query as last two times again, you would see that there are orders with id six and seven in the data, both without order details. Using Transactions Your business logic might request that the insert of the first statement fails when the second statement fails. You might need to repeal the changes of the first statement on the failure of the second statement. You can define that a batch of statements executes as a unit by using transactions. The following code shows how to use transactions. Again, the second statement in the batch in the try block is the one that produces an error. BEGIN TRY BEGIN TRANSACTION EXEC dbo.InsertSimpleOrder @OrderId = 8, @OrderDate = '20160706', @Customer = N'CustG'; EXEC dbo.InsertSimpleOrderDetail @OrderId = 8, @ProductId = 2, @Quantity = 0; COMMIT TRANSACTION END TRY BEGIN CATCH SELECT ERROR_NUMBER() AS ErrorNumber, ERROR_MESSAGE() AS ErrorMessage, ERROR_LINE() as ErrorLine; IF XACT_STATE() <> 0 ROLLBACK TRANSACTION; END CATCH You can check the data again. SELECT o.OrderId, o.OrderDate, o.Customer, od.ProductId, od.Quantity FROM dbo.SimpleOrderDetails AS od RIGHT OUTER JOIN dbo.SimpleOrders AS o ON od.OrderId = o.OrderId WHERE o.OrderId > 5 ORDER BY o.OrderId, od.ProductId; Here is the result of the check: OrderId OrderDate Customer ProductId Quantity ----------- ---------- -------- ----------- ----------- 6 2016-07-06 CustE NULL NULL 7 2016-07-06 CustF NULL NULL You can see that the order with id 8 does not exist in your data. Because of the insert of the detail row for this order failed, the insert of the order was rolled back as well. Note that in the catch block, the XACT_STATE() function was used to check whether the transaction still exists. If the transaction was rolled back automatically by SQL Server, then the ROLLBACK TRANSACTION would produce a new error. The following code drops the objects (in correct order, due to object contraints) created for the explanation of the DDL and DML statements, programmatic objects, error handling, and transactions. DROP FUNCTION dbo.Top2OrderDetails; DROP VIEW dbo.OrdersWithoutDetails; DROP PROCEDURE dbo.InsertSimpleOrderDetail; DROP PROCEDURE dbo.InsertSimpleOrder; DROP TABLE dbo.SimpleOrderDetails; DROP TABLE dbo.SimpleOrders; Beyond Relational The "beyond relational" is actually only a marketing term. The relational model, used in the relational database management system, is nowhere limited to specific data types, or specific languages only. However, with the term beyond relational, we typically mean specialized and complex data types that might include spatial and temporal data, XML or JSON data, and extending the capabilities of the Transact-SQL language with CLR languages like Visual C#, or statistical languages like R. SQL Server in versions before 2016 already supports some of the features mentioned. Here is a quick review of this support that includes: Spatial data CLR support XML data Defining Locations and Shapes with Spatial Data In modern applications, many times you want to show your data on a map, using the physical location. You might also want to show the shape of the objects that your data describes. You can use spatial data for tasks like these. You can represent the objects with points, lines, or polygons. From the simple shapes you can create complex geometrical objects or geographical objects, for example cities and roads. Spatial data appear in many contemporary database. Acquiring spatial data has become quite simple with the Global Positioning System (GPS) and other technologies. In addition, many software packages and database management systems help you working with spatial data. SQL Server supports two spatial data types, both implemented as .NET common language runtime (CLR) data types, from version 2008: The geometry type represents data in a Euclidean (flat) coordinate system. The geography type represents data in a round-earth coordinate system. We need two different spatial data types because of some important differences between them. These differences include units of measurement and orientation. In the planar, or flat-earth, system, you define the units of measurements. The length of a distance and the surface of an area are given in the same unit of measurement as you use for the coordinates of your coordinate system. You as the database developer know what the coordinates mean and what the unit of measure is. In geometry, the distance between the points described with the coordinates (1, 3) and (4, 7) is 5 units, regardless of the units used. You, as the database developer who created the database where you are storing this data, know the context. You know what these 5 units mean, is this 5 kilometers, or 5 inches. When talking about locations on earth, coordinates are given in degrees of latitude and longitude. This is the round-earth, or ellipsoidal system Lengths and areas are usually measured in the metric system, in meters and square meters. However, not everywhere in the world the metric system is used for the spatial data. The spatial reference identifier (SRID) of the geography instance defines the unit of measure. Therefore, whenever measuring some distance or area in the ellipsoidal system, you should always quote also the SRID used, which defines the units. In the planar system, the ring orientation of a polygon is not an important factor. For example, a polygon described by the points ((0, 0), (10, 0), (0, 5), (0, 0)) is the same as a polygon described by ((0, 0), (5, 0), (0, 10), (0, 0)). You can always rotate the coordinates appropriately to get the same feeling of the orientation. However, in geography, the orientation is needed to completely describe a polygon. Just think of the equator, which divides the earth in the two hemispheres. Is your spatial data describing the northern or southern hemisphere? The Wide World Importers data warehouse includes the city location in the Dimension.City table. The following query retrieves it for cities in the main part of the USA> SELECT City, [Sales Territory] AS SalesTerritory, Location AS LocationBinary, Location.ToString() AS LocationLongLat FROM Dimension.City WHERE [City Key] <> 0 AND [Sales Territory] NOT IN (N'External', N'Far West'); Here is the partial result of the query. City SalesTerritory LocationBinary LocationLongLat ------------ --------------- -------------------- ------------------------------- Carrollton Mideast 0xE6100000010C70... POINT (-78.651695 42.1083969) Carrollton Southeast 0xE6100000010C88... POINT (-76.5605078 36.9468152) Carrollton Great Lakes 0xE6100000010CDB... POINT (-90.4070632 39.3022693) You can see that the location is actually stored as a binary string. When you use the ToString() method of the location, you get the default string representation of the geographical point, which is the degrees of longitude and latitude. If SSMS, you send the results of the previous query to a grid, you get in the results pane also an additional representation for the spatial data. Click the Spatial results tab, and you can see the points represented in the longitude – latitude coordinate system, like you can see in the following figure. Figure 2-1: Spatial results showing customers' locations If you executed the query, you might have noticed that the spatial data representation control in SSMS has some limitations. It can show only 5,000 objects. The result displays only first 5,000 locations. Nevertheless, as you can see from the previous figure, this is enough to realize that these points form a contour of the main part of the USA. Therefore, the points represent the customers' locations for customers from USA. The following query gives you the details, like location and population, for Denver, Colorado. SELECT [City Key] AS CityKey, City, [State Province] AS State, [Latest Recorded Population] AS Population, Location.ToString() AS LocationLongLat FROM Dimension.City WHERE [City Key] = 114129 AND [Valid To] = '9999-12-31 23:59:59.9999999'; Spatial data types have many useful methods. For example, the STDistance() method returns the shortest line between two geography types. This is a close approximate to the geodesic distance, defined as the shortest route between two points on the Earth's surface. The following code calculates this distance between Denver, Colorado, and Seattle, Washington. DECLARE @g AS GEOGRAPHY; DECLARE @h AS GEOGRAPHY; DECLARE @unit AS NVARCHAR(50); SET @g = (SELECT Location FROM Dimension.City WHERE [City Key] = 114129); SET @h = (SELECT Location FROM Dimension.City WHERE [City Key] = 108657); SET @unit = (SELECT unit_of_measure FROM sys.spatial_reference_systems WHERE spatial_reference_id = @g.STSrid); SELECT FORMAT(@g.STDistance(@h), 'N', 'en-us') AS Distance, @unit AS Unit; The result of the previous batch is below. Distance Unit ------------- ------ 1,643,936.69 metre Note that the code uses the sys.spatial_reference_system catalog view to get the unit of measure for the distance of the SRID used to store the geographical instances of data. The unit is meter. You can see that the distance between Denver, Colorado, and Seattle, Washington, is more than 1,600 kilometers. The following query finds the major cities within a circle of 1,000 km around Denver, Colorado. Major cities are defined as the cities with population larger than 200,000. DECLARE @g AS GEOGRAPHY; SET @g = (SELECT Location FROM Dimension.City WHERE [City Key] = 114129); SELECT DISTINCT City, [State Province] AS State, FORMAT([Latest Recorded Population], '000,000') AS Population, FORMAT(@g.STDistance(Location), '000,000.00') AS Distance FROM Dimension.City WHERE Location.STIntersects(@g.STBuffer(1000000)) = 1 AND [Latest Recorded Population] > 200000 AND [City Key] <> 114129 AND [Valid To] = '9999-12-31 23:59:59.9999999' ORDER BY Distance; Here is the result abbreviated to the twelve closest cities to Denver, Colorado. City State Population Distance ----------------- ----------- ----------- ----------- Aurora Colorado 325,078 013,141.64 Colorado Springs Colorado 416,427 101,487.28 Albuquerque New Mexico 545,852 537,221.38 Wichita Kansas 382,368 702,553.01 Lincoln Nebraska 258,379 716,934.90 Lubbock Texas 229,573 738,625.38 Omaha Nebraska 408,958 784,842.10 Oklahoma City Oklahoma 579,999 809,747.65 Tulsa Oklahoma 391,906 882,203.51 El Paso Texas 649,121 895,789.96 Kansas City Missouri 459,787 898,397.45 Scottsdale Arizona 217,385 926,980.71 There are many more useful methods and properties implemented in the two spatial data types. In addition, you can improve the performance of spatial queries with help of specialized spatial indexes. Please refer to the MSDN article "Spatial Data (SQL Server)" at https://msdn.microsoft.com/en-us/library/bb933790.aspx for more details on the spatial data types, their methods, and spatial indexes. XML Support in SQL Server SQL Server in version 2005 also started to feature extended support for XML data inside the database engine, although some basic support was already included in version 2000. The support starts by generating XML data from tabular results. You can use the FOR XML clause of the SELECT statement for this task. The following query generates an XML document from the regular tabular result set by using the FOR XML clause with AUTO option, to generate element-centric XML instance, with namespace and inline schema included. SELECT c.[Customer Key] AS CustomerKey, c.[WWI Customer ID] AS CustomerId, c.[Customer], c.[Buying Group] AS BuyingGroup, f.Quantity, f.[Total Excluding Tax] AS Amount, f.Profit FROM Dimension.Customer AS c INNER JOIN Fact.Sale AS f ON c.[Customer Key] = f.[Customer Key] WHERE c.[Customer Key] IN (127, 128) FOR XML AUTO, ELEMENTS, ROOT('CustomersOrders'), XMLSCHEMA('CustomersOrdersSchema'); GO Here is the partial result of this query. First part of the result is the inline schema/ <CustomersOrders> <xsd:schema targetNamespace="CustomersOrdersSchema" … <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" … <xsd:element name="c"> <xsd:complexType> <xsd:sequence> <xsd:element name="CustomerKey" type="sqltypes:int" /> <xsd:element name="CustomerId" type="sqltypes:int" /> <xsd:element name="Customer"> <xsd:simpleType> <xsd:restriction base="sqltypes:nvarchar" … <xsd:maxLength value="100" /> </xsd:restriction> </xsd:simpleType> </xsd:element> … </xsd:sequence> </xsd:complexType> </xsd:element> </xsd:schema> <c > <CustomerKey>127</CustomerKey> <CustomerId>127</CustomerId> <Customer>Tailspin Toys (Point Roberts, WA)</Customer> <BuyingGroup>Tailspin Toys</BuyingGroup> <f> <Quantity>3</Quantity> <Amount>48.00</Amount> <Profit>31.50</Profit> </f> <f> <Quantity>9</Quantity> <Amount>2160.00</Amount> <Profit>1363.50</Profit> </f> </c> <c > <CustomerKey>128</CustomerKey> <CustomerId>128</CustomerId> <Customer>Tailspin Toys (East Portal, CO)</Customer> <BuyingGroup>Tailspin Toys</BuyingGroup> <f> <Quantity>84</Quantity> <Amount>420.00</Amount> <Profit>294.00</Profit> </f> </c> … </CustomersOrders> You can also do the opposite process: convert XML to tables. Converting XML to relational tables is known as shredding XML. You can do this by using the nodes() method of the XML data type or with the OPENXML() rowset function. Inside SQL Server, you can also query the XML data from Transact-SQL to find specific elements, attributes, or XML fragments. XQuery is a standard language for browsing XML instances and returning XML, and is supported inside XML data type methods. You can store XML instances inside SQL Server database in a column of the XML data type. An XML data type includes five methods that accept XQuery as a parameter. The methods support querying (the query() method), retrieving atomic values (the value() method), existence checks (the exist() method), modifying sections within the XML data (the modify() method) as opposed to overriding the whole thing, and shredding XML data into multiple rows in a result set (the nodes() method). The following code creates a variable of the XML data type to store an XML instance in it. Then it uses the query() method to return XML fragments from the XML instance. This method accepts XQuery query as a parameter. The XQuery query uses the FLWOR expressions to define and shape the XML returned. DECLARE @x AS XML; SET @x = N' <CustomersOrders> <Customer custid="1"> <!-- Comment 111 --> <companyname>CustA</companyname> <Order orderid="1"> <orderdate>2016-07-01T00:00:00</orderdate> </Order> <Order orderid="9"> <orderdate>2016-07-03T00:00:00</orderdate> </Order> <Order orderid="12"> <orderdate>2016-07-12T00:00:00</orderdate> </Order> </Customer> <Customer custid="2"> <!-- Comment 222 --> <companyname>CustB</companyname> <Order orderid="3"> <orderdate>2016-07-01T00:00:00</orderdate> </Order> <Order orderid="10"> <orderdate>2016-07-05T00:00:00</orderdate> </Order> </Customer> </CustomersOrders>'; SELECT @x.query('for $i in CustomersOrders/Customer/Order let $j := $i/orderdate where $i/@orderid < 10900 order by ($j)[1] return <Order-orderid-element> <orderid>{data($i/@orderid)}</orderid> {$j} </Order-orderid-element>') AS [Filtered, sorted and reformatted orders with let clause]; Here is the result of the previous query. <Order-orderid-element> <orderid>1</orderid> <orderdate>2016-07-01T00:00:00</orderdate> </Order-orderid-element> <Order-orderid-element> <orderid>3</orderid> <orderdate>2016-07-01T00:00:00</orderdate> </Order-orderid-element> <Order-orderid-element> <orderid>9</orderid> <orderdate>2016-07-03T00:00:00</orderdate> </Order-orderid-element> <Order-orderid-element> <orderid>10</orderid> <orderdate>2016-07-05T00:00:00</orderdate> </Order-orderid-element> <Order-orderid-element> <orderid>12</orderid> <orderdate>2016-07-12T00:00:00</orderdate> </Order-orderid-element> Summary In this article, you got a review of the SQL Server features for developers that exists already in the previous versions. You can see that this support goes well beyond basic SQL statements, and also beyond pure Transact-SQL. Resources for Article: Further resources on this subject: Configuring a MySQL linked server on SQL Server 2008 [article] Basic Website using Node.js and MySQL database [article] Exception Handling in MySQL for Python [article]
Read more
  • 0
  • 0
  • 1780
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-logistic-regression
Packt
27 Nov 2014
9 min read
Save for later

Logistic regression

Packt
27 Nov 2014
9 min read
This article is written by Breck Baldwin and Krishna Dayanidhi, the authors of Natural Language Processing with Java and LingPipe Cookbook. In this article, we will cover logistic regression. (For more resources related to this topic, see here.) Logistic regression is probably responsible for the majority of industrial classifiers, with the possible exception of naïve Bayes classifiers. It almost certainly is one of the best performing classifiers available, albeit at the cost of slow training and considerable complexity in configuration and tuning. Logistic regression is also known as maximum entropy, neural network classification with a single neuron, and others. The classifiers have been based on the underlying characters or tokens, but logistic regression uses unrestricted feature extraction, which allows for arbitrary observations of the situation to be encoded in the classifier. This article closely follows a more complete tutorial at http://alias-i.com/lingpipe/demos/tutorial/logistic-regression/read-me.html. How logistic regression works All that logistic regression does is take a vector of feature weights over the data, apply a vector of coefficients, and do some simple math, which results in a probability for each class encountered in training. The complicated bit is in determining what the coefficients should be. The following are some of the features produced by our training example for 21 tweets annotated for English e and non-English n. There are relatively few features because feature weights are being pushed to 0.0 by our prior, and once a weight is 0.0, then the feature is removed. Note that one category, n, is set to 0.0 for all the features of the n-1 category—this is a property of the logistic regression process that fixes once categories features to 0.0 and adjust all other categories features with respect to that: FEATURE e nI : 0.37 0.0! : 0.30 0.0Disney : 0.15 0.0" : 0.08 0.0to : 0.07 0.0anymore : 0.06 0.0isn : 0.06 0.0' : 0.06 0.0t : 0.04 0.0for : 0.03 0.0que : -0.01 0.0moi : -0.01 0.0_ : -0.02 0.0, : -0.08 0.0pra : -0.09 0.0? : -0.09 0.0 Take the string, I luv Disney, which will only have two non-zero features: I=0.37 and Disney=0.15 for e and zeros for n. Since there is no feature that matches luv, it is ignored. The probability that the tweet is English breaks down to: vectorMultiply(e,[I,Disney]) = exp(.37*1 + .15*1) = 1.68 vectorMultiply(n,[I,Disney]) = exp(0*1 + 0*1) = 1 We will rescale to a probability by summing the outcomes and dividing it: p(e|,[I,Disney]) = 1.68/(1.68 +1) = 0.62p(e|,[I,Disney]) = 1/(1.68 +1) = 0.38 This is how the math works on running a logistic regression model. Training is another issue entirely. Getting ready This example assumes the same framework that we have been using all along to get training data from .csv files, train the classifier, and run it from the command line. Setting up to train the classifier is a bit complex because of the number of parameters and objects used in training. The main() method starts with what should be familiar classes and methods: public static void main(String[] args) throws IOException {String trainingFile = args.length > 0 ? args[0]: "data/disney_e_n.csv";List<String[]> training= Util.readAnnotatedCsvRemoveHeader(new File(trainingFile));int numFolds = 0;XValidatingObjectCorpus<Classified<CharSequence>> corpus= Util.loadXValCorpus(training,numFolds);TokenizerFactory tokenizerFactory= IndoEuropeanTokenizerFactory.INSTANCE; Note that we are using XValidatingObjectCorpus when a simpler implementation such as ListCorpus will do. We will not take advantage of any of its cross-validation features, because the numFolds param as 0 will have training visit the entire corpus. We are trying to keep the number of novel classes to a minimum, and we tend to always use this implementation in real-world gigs anyway. Now, we will start to build the configuration for our classifier. The FeatureExtractor<E> interface provides a mapping from data to features; this will be used to train and run the classifier. In this case, we are using a TokenFeatureExtractor() method, which creates features based on the tokens found by the tokenizer supplied during construction. This is similar to what naïve Bayes reasons over: FeatureExtractor<CharSequence> featureExtractor= new TokenFeatureExtractor(tokenizerFactory); The minFeatureCount item is usually set to a number higher than 1, but with small training sets, this is needed to get any performance. The thought behind filtering feature counts is that logistic regression tends to overfit low-count features that, just by chance, exist in one category of training data. As training data grows, the minFeatureCount value is adjusted usually by paying attention to cross-validation performance: int minFeatureCount = 1; The addInterceptFeature Boolean controls whether a category feature exists that models the prevalence of the category in training. The default name of the intercept feature is *&^INTERCEPT%$^&**, and you will see it in the weight vector output if it is being used. By convention, the intercept feature is set to 1.0 for all inputs. The idea is that if a category is just very common or very rare, there should be a feature that captures just this fact, independent of other features that might not be as cleanly distributed. This models the category probability in naïve Bayes in some way, but the logistic regression algorithm will decide how useful it is as it does with all other features: boolean addInterceptFeature = true;boolean noninformativeIntercept = true; These Booleans control what happens to the intercept feature if it is used. Priors, in the following code, are typically not applied to the intercept feature; this is the result if this parameter is true. Set the Boolean to false, and the prior will be applied to the intercept. Next is the RegressionPrior instance, which controls how the model is fit. What you need to know is that priors help prevent logistic regression from overfitting the data by pushing coefficients towards 0. There is a non-informative prior that does not do this with the consequence that if there is a feature that applies to just one category it will be scaled to infinity, because the model keeps fitting better as the coefficient is increased in the numeric estimation. Priors, in this context, function as a way to not be over confident in observations about the world. Another dimension in the RegressionPrior instance is the expected variance of the features. Low variance will push coefficients to zero more aggressively. The prior returned by the static laplace() method tends to work well for NLP problems. There is a lot going on, but it can be managed without a deep theoretical understanding. double priorVariance = 2;RegressionPrior prior= RegressionPrior.laplace(priorVariance,noninformativeIntercept); Next, we will control how the algorithm searches for an answer. AnnealingSchedule annealingSchedule= AnnealingSchedule.exponential(0.00025,0.999);double minImprovement = 0.000000001;int minEpochs = 100;int maxEpochs = 2000; AnnealingSchedule is best understood by consulting the Javadoc, but what it does is change how much the coefficients are allowed to vary when fitting the model. The minImprovement parameter sets the amount the model fit has to improve to not terminate the search, because the algorithm has converged. The minEpochs parameter sets a minimal number of iterations, and maxEpochs sets an upper limit if the search does not converge as determined by minImprovement. Next is some code that allows for basic reporting/logging. LogLevel.INFO will report a great deal of information about the progress of the classifier as it tries to converge: PrintWriter progressWriter = new PrintWriter(System.out,true);progressWriter.println("Reading data.");Reporter reporter = Reporters.writer(progressWriter);reporter.setLevel(LogLevel.INFO); Here ends the Getting ready section of one of our most complex classes—next, we will train and run the classifier. How to do it... It has been a bit of work setting up to train and run this class. We will just go through the steps to get it up and running: Note that there is a more complex 14-argument train method as well the one that extends configurability. This is the 10-argument version: LogisticRegressionClassifier<CharSequence> classifier= LogisticRegressionClassifier.<CharSequence>train(corpus,featureExtractor,minFeatureCount,addInterceptFeature,prior,annealingSchedule,minImprovement,minEpochs,maxEpochs,reporter); The train() method, depending on the LogLevel constant, will produce from nothing with LogLevel.NONE to the prodigious output with LogLevel.ALL. While we are not going to use it, we show how to serialize the trained model to disk: AbstractExternalizable.compileTo(classifier, new File("models/myModel.LogisticRegression")); Once trained, we will apply the standard classification loop with: Util.consoleInputPrintClassification(classifier); Run the preceding code in the IDE of your choice or use the command-line command: java -cp lingpipe-cookbook.1.0.jar:lib/lingpipe-4.1.0.jar:lib/opencsv-2.4.jar com.lingpipe.cookbook.chapter3.TrainAndRunLogReg The result is a big dump of information about the training: Reading data.:00 Feature Extractor class=class com.aliasi.tokenizer.TokenFeatureExtractor:00 min feature count=1:00 Extracting Training Data:00 Cold start:00 Regression callback handler=null:00 Logistic Regression Estimation:00 Monitoring convergence=true:00 Number of dimensions=233:00 Number of Outcomes=2:00 Number of Parameters=233:00 Number of Training Instances=21:00 Prior=LaplaceRegressionPrior(Variance=2.0,noninformativeIntercept=true):00 Annealing Schedule=Exponential(initialLearningRate=2.5E-4,base=0.999):00 Minimum Epochs=100:00 Maximum Epochs=2000:00 Minimum Improvement Per Period=1.0E-9:00 Has Informative Prior=true:00 epoch= 0 lr=0.000250000 ll= -20.9648 lp=-232.0139 llp= -252.9787 llp*= -252.9787:00 epoch= 1 lr=0.000249750 ll= -20.9406 lp=-232.0195 llp= -252.9602 llp*= -252.9602 The epoch reporting goes on until either the number of epochs is met or the search converges. In the following case, the number of epochs was met: :00 epoch= 1998 lr=0.000033868 ll= -15.4568 lp= -233.8125 llp= -249.2693 llp*= -249.2693 :00 epoch= 1999 lr=0.000033834 ll= -15.4565 lp= -233.8127 llp= -249.2692 llp*= -249.2692 Now, we can play with the classifier a bit: Type a string to be classified. Empty string to quit. I luv Disney Rank Category Score P(Category|Input) 0=e 0.626898085027528 0.626898085027528 1=n 0.373101914972472 0.373101914972472 This should look familiar; it is exactly the same result as the worked example at the start. That's it! You have trained up and used the world's most relevant industrial classifier. However, there's a lot more to harnessing the power of this beast. Summary In this article, we learned how to do logistic regression. Resources for Article: Further resources on this subject: Installing NumPy, SciPy, matplotlib, and IPython [Article] Introspecting Maya, Python, and PyMEL [Article] Understanding the Python regex engine [Article]
Read more
  • 0
  • 0
  • 1775

Packt
06 Jul 2015
8 min read
Save for later

CoreOS – Overview and Installation

Packt
06 Jul 2015
8 min read
In this article by Rimantas Mocevicius, author of the book CoreOS Essentials, has described CoreOS is often as Linux for massive server deployments, but it can also run easily as a single host on bare-metal, cloud servers, and as a virtual machine on your computer as well. It is designed to run application containers as docker and rkt, and you will learn about its main features later in this article. This article is a practical, example-driven guide to help you learn about the essentials of the CoreOS Linux operating system. We assume that you have experience with VirtualBox, Vagrant, Git, Bash shell scripting and the command line (terminal on UNIX-like computers), and you have already installed VirtualBox, Vagrant, and git on your Mac OS X or Linux computer. As for a cloud installation, we will use Google Cloud's Compute Engine instances. By the end of this article, you will hopefully be familiar with setting up CoreOS on your laptop or desktop, and on the cloud. You will learn how to set up a local computer development machine and a cluster on a local computer and in the cloud. Also, we will cover etcd, systemd, fleet, cluster management, deployment setup, and production clusters. In this article, you will learn how CoreOS works and how to carry out a basic CoreOS installation on your laptop or desktop with the help of VirtualBox and Vagrant. We will basically cover two topics in this article: An overview of CoreOS Installing the CoreOS virtual machine (For more resources related to this topic, see here.) An overview of CoreOS CoreOS is a minimal Linux operation system built to run docker and rkt containers (application containers). By default, it is designed to build powerful and easily manageable server clusters. It provides automatic, very reliable, and stable updates to all machines, which takes away a big maintenance headache from sysadmins. And, by running everything in application containers, such setup allows you to very easily scale servers and applications, replace faulty servers in a fraction of a second, and so on. How CoreOS works CoreOS has no package manager, so everything needs to be installed and used via docker containers. Moreover, it is 40 percent more efficient in RAM usage than an average Linux installation, as shown in this diagram: CoreOS utilizes an active/passive dual-partition scheme to update itself as a single unit, instead of using a package-by-package method. Its root partition is read-only and changes only when an update is applied. If the update is unsuccessful during reboot time, then it rolls back to the previous boot partition. The following image shows OS updated gets applied to partition B (passive) and after reboot it becomes the active to boot from. The docker and rkt containers run as applications on CoreOS. Containers can provide very good flexibility for application packaging and can start very quickly—in a matter of milliseconds. The following image shows the simplicity of CoreOS. Bottom part is Linux OS, the second level is etcd/fleet with docker daemon and the top level are running containers on the server. By default, CoreOS is designed to work in a clustered form, but it also works very well as a single host. It is very easy to control and run application containers across cluster machines with fleet and use the etcd service discovery to connect them as it shown in the following image. CoreOS can be deployed easily on all major cloud providers, for example, Google Cloud, Amazon Web Services, Digital Ocean, and so on. It runs very well on bare-metal servers as well. Moreover, it can be easily installed on a laptop or desktop with Linux, Mac OS X, or Windows via Vagrant, with VirtualBox or VMware virtual machine support. This short overview should throw some light on what CoreOS is about and what it can do. Let's now move on to the real stuff and install CoreOS on to our laptop or desktop machine. Installing the CoreOS virtual machine To use the CoreOS virtual machine, you need to have VirtualBox, Vagrant, and git installed on your computer. In the following examples, we will install CoreOS on our local computer, which will serve as a virtual machine on VirtualBox. Okay, let's get started! Cloning the coreos-vagrant GitHub project Let‘s clone this project and get it running. In your terminal (from now on, we will use just the terminal phrase and use $ to label the terminal prompt), type the following command: $ git clone https://github.com/coreos/coreos-vagrant/ This will clone from the GitHub repository to the coreos-vagrant folder on your computer. Working with cloud-config To start even a single host, we need to provide some config parameters in the cloud-config format via the user data file. In your terminal, type this: $ cd coreos-vagrant$ mv user-data.sample user-data The user data should have content like this (the coreos-vagrant Github repository is constantly changing, so you might see a bit of different content when you clone the repository): #cloud-config coreos: etcd2:    #generate a new token for each unique cluster from “     https://discovery.etcd.io/new    #discovery: https://discovery.etcd.io/<token>    # multi-region and multi-cloud deployments need to use “     $public_ipv4    advertise-client-urls: http://$public_ipv4:2379    initial-advertise-peer-urls: http://$private_ipv4:2380    # listen on both the official ports and the legacy ports    # legacy ports can be omitted if your application doesn‘t “     depend on them    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001    listen-peer-urls: “     http://$private_ipv4:2380,http://$private_ipv4:7001 fleet:    public-ip: $public_ipv4 flannel:    interface: $public_ipv4 units:    - name: etcd2.service      command: start    - name: fleet.service      command: start    - name: docker-tcp.socket      command: start      enable: true      content: |        [Unit]        Description=Docker Socket for the API          [Socket]        ListenStream=2375        Service=docker.service        BindIPv6Only=both        [Install]        WantedBy=sockets.target Replace the text between the etcd2: and fleet: lines to look this: etcd2:    name: core-01    initial-advertise-peer-urls: http://$private_ipv4:2380    listen-peer-urls: “     http://$private_ipv4:2380,http://$private_ipv4:7001    initial-cluster-token: core-01_etcd    initial-cluster: core-01=http://$private_ipv4:2380    initial-cluster-state: new    advertise-client-urls: “     http://$public_ipv4:2379,http://$public_ipv4:4001    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 fleet: You can also download the latest user-data file from https://github.com/rimusz/coreos-essentials-book/blob/master/Chapter1/user-data. This should be enough to bootstrap a single-host CoreOS VM with etcd, fleet, and docker running there. Startup and SSH It's now time to boot our CoreOS VM and log in to its console using ssh. Let's boot our first CoreOS VM host. To do so, using the terminal, type the following command: $ vagrant up This will trigger vagrant to download the latest CoreOS alpha (this is the default channel set in the config.rb file, and it can easily be changed to beta, or stable) channel image and the lunch VM instance. You should see something like this as the output in your terminal: CoreOS VM has booted up, so let's open the ssh connection to our new VM using the following command: $ vagrant ssh It should show something like this: CoreOS alpha (some version) core@core-01 ~ $ Perfect! Let's verify that etcd, fleet, and docker are running there. Here are the commands required and the corresponding screenshots of the output: $ systemctl status etcd2 To check the status of fleet, type this: $ systemctl status fleet To check the status of docker, type the following command: $ docker version Lovely! Everything looks fine. Thus, we've got our first CoreOS VM up and running in VirtualBox. Summary In this article, we saw what CoreOS is and how it is installed. We covered a simple CoreOS installation on a local computer with the help of Vagrant and VirtualBox, and checked whether etcd, fleet, and docker are running there. Resources for Article: Further resources on this subject: Core Data iOS: Designing a Data Model and Building Data Objects [article] Clustering [article] Deploying a Play application on CoreOS and Docker [article]
Read more
  • 0
  • 0
  • 1771

article-image-machine-learning-technique-supervised-learning
Packt
09 Nov 2016
7 min read
Save for later

Machine Learning Technique: Supervised Learning

Packt
09 Nov 2016
7 min read
In this article by Andrea Isoni author of the book Machine Learning for the Web, we will the most relevant regression and classification techniques are discussed. All of these algorithms share the same background procedure, and usually the name of the algorithm refers to both a classification and a regression method. The linear regression algorithms, Naive Bayes, decision tree, and support vector machine are going to be discussed in the following sections. To understand how to employ the techniques, a classification and a regression problem will be solved using the mentioned methods. Essentially, a labeled train dataset will be used to train the models, which means to find the values of the parameters, as we discussed in the introduction. As usual, the code is available in the my GitHub folder at https://github.com/ai2010/machine_learning_for_the_web/tree/master/chapter_3/. (For more resources related to this topic, see here.) We will conclude the article with an extra algorithm that may be used for classification, although it is not specifically designed for this purpose (hidden Markov model). We will now begin to explain the general causes of error in the methods when predicting the true labels associated with a dataset. Model error estimation We said that the trained model is used to predict the labels of new data, and the quality of the prediction depends on the ability of the model to generalize, that is, the correct prediction of cases not present in the trained data. This is a well-known problem in literature and related to two concepts: bias and variance of the outputs. The bias is the error due to a wrong assumption in the algorithm. Given a point x(t) with label yt, the model is biased if it is trained with different training sets, and the predicted label ytpred will always be different from yt. The variance error instead refers to the different wrongly predicted labels of the given point x(t). A classic example to explain the concepts is to consider a circle with the true value at the center (true label), as shown in the following figure. The closer the predicted labels are to the center, the more unbiased the model and the lower the variance (top left in the following figure). The other three cases are also shown here: Variance and bias example. A model with low variance and low bias errors will have the predicted labels that is blue dots (as show in the preceding figure) concentrated on the red center (true label). The high bias error occurs when the predictions are far away from the true label, while high variance appears when the predictions are in a wide range of values. We have already seen that labels can be continuous or discrete, corresponding to regression classification problems respectively. Most of the models are suitable for solving both problems, and we are going to use word regression and classification referring to the same model. More formally, given a set of N data points and corresponding labels, a model with a set of parameters with the true parameter values will have the mean square error (MSE), equal to: We will use the MSE as a measure to evaluate the methods discussed in this article. Now we will start describing the generalized linear methods. Generalized linear models The generalized linear model is a group of models that try to find the M parameters that form a linear relationship between the labels yi and the feature vector x(i) that is as follows: Here, are the errors of the model. The algorithm for finding the parameters tries to minimize the total error of the model defined by the cost function J: The minimization of J is achieved using an iterative algorithm called batch gradient descent: Here, a is called learning rate, and it is a trade-off between convergence speed and convergence precision. An alternative algorithm that is called stochastic gradient descent, that is loop for : The qj is updated for each training example i instead of waiting to sum over the entire training set. The last algorithm converges near the minimum of J, typically faster than batch gradient descent, but the final solution may oscillate around the real values of the parameters. The following paragraphs describe the most common model and the corresponding cost function, J. Linear regression Linear regression is the simplest algorithm and is based on the model: The cost function and update rule are: Ridge regression Ridge regression, also known as Tikhonov regularization, adds a term to the cost function J such that: , where l is the regularization parameter. The additional term has the function needed to prefer a certain set of parameters over all the possible solutions penalizing all the parameters qj different from 0. The final set of qj shrank around 0, lowering the variance of the parameters but introducing a bias error. Indicating with the superscript l the parameters from the linear regression, the ridge regression parameters are related by the following formula: This clearly shows that the larger the l value, the more the ridge parameters are shrunk around 0. Lasso regression Lasso regression is an algorithm similar to ridge regression, the only difference being that the regularization term is the sum of the absolute values of the parameters: Logistic regression Despite the name, this algorithm is used for (binary) classification problems, so we define the labels. The model is given the so-called logistic function expressed by: In this case, the cost function is defined as follows: From this, the update rule is formally the same as linear regression (but the model definition,  , is different): Note that the prediction for a point p,  , is a continuous value between 0 and 1. So usually, to estimate the class label, we have a threshold at =0.5 such that: The logistic regression algorithm is applicable to multiple label problems using the techniques one versus all or one versus one. Using the first method, a problem with K classes is solved by training K logistic regression models, each one assuming the labels of the considered class j as +1 and all the rest as 0. The second approach consists of training a model for each pair of labels (  trained models). Probabilistic interpretation of generalized linear models Now that we have seen the generalized linear model, let’s find the parameters qj that satisfy the relationship: In the case of linear regression, we can assume  as normally distributed with mean 0 and variance s2 such that the probability  is  equivalent to: Therefore, the total likelihood of the system can be expressed as follows: In the case of the logistic regression algorithm, we are assuming that the logistic function itself is the probability: Then the likelihood can be expressed by: In both cases, it can be shown that maximizing the likelihood is equivalent to minimizing the cost function, so the gradient descent will be the same. k-nearest neighbours (KNN) This is a very simple classification (or regression) method in which given a set of feature vectors  with corresponding labels yi, a test point x(t) is assigned to the label value with the majority of the label occurrences in the K nearest neighbors  found, using a distance measure such as the following: Euclidean: Manhattan: Minkowski:  (if q=2, this reduces to the Euclidean distance) In the case of regression, the value yt is calculated by replacing the majority of occurrences by the average of the labels .  The simplest average (or the majority of occurrences) has uniform weights, so each point has the same importance regardless of their actual distance from x(t). However, a weighted average with weights equal to the inverse distance from x(t) may be used. Summary In this article, the major classification and regression algorithms, together with the techniques to implement them, were discussed. You should now be able to understand in which situation each method can be used and how to implement it using Python and its libraries (sklearn and pandas). Resources for Article: Further resources on this subject: Supervised Machine Learning [article] Unsupervised Learning [article] Specialized Machine Learning Topics [article]
Read more
  • 0
  • 0
  • 1739

article-image-designing-sample-applications-and-creating-reports
Packt
25 Oct 2013
6 min read
Save for later

Designing Sample Applications and Creating Reports

Packt
25 Oct 2013
6 min read
(For more resources related to this topic, see here.) Creating a new application We will now start using Microsoft Visual Studio, which we installed in the previous article: From the Start menu, select All Programs, then select the Microsoft Visual Studio 2012 folder, and then select Visual Studio 2012. Since we are running Visual Studio for the first time, we will see the following screenshot. We can select the default environment setting. Also, we can change this setting in our application. We will use C# as the programming language, so we will select Visual C# Development Settings and click on the Start Visual Studio button. Now we will see the Start Page of Microsoft Visual Studio 2012. Select New Project or open it by navigating to File | New | Project. This is shown in the following screenshot: As shown in the following screenshot, we will select Windows as the application template. We will then select Windows Forms Application as the application type. Let us name our application Monitor. Choose where you want to save the application and click on OK. Now we will see the following screenshot. Let's start to design our application interface. Start by right-clicking on the form and navigating to Properties, as shown in the following screenshot: The Properties explorer will open as shown in the following screenshot. Change Text property from Form1 to monitor, change the Name property to MainForm, and then change Size to 322, 563 because we will add many controls to our form. After I finished the design process, I found that this was the suitable form size; you can change form size and all controls sizes as you wish in order to better suit your application interface. Adding controls to the form In this step, we will start designing the interface of our application by adding controls to the Main Form. We will divide the Main Form into three main parts as follows: Employees Products and orders Employees and products Employees In this section, we will see the different ways in which we can search for our employees and display the result in a report. Beginning your design process Let's begin to design the Employees section of our application using the following steps: From the Toolbox explorer, drag-and-drop GroupBox into the Main Form. Change the GroupBox Size property to 286, 181, the Location property to 12,12, and Text property to Employees. Add Label to the Main Form. Change the Text property to ID and the Location property to 12,25. Add Textbox to the Main Form and change the Name property to txtEmpId. Change the Location property to 70, 20. Add GroupBox to the Main Form. Change the Groupbox Size property to 250,103, the Location property to 6, 40, and the Text property to filters. Add Label to the Main Form. Change the Text property to Title and the Location property to 6,21. Add ComboBox to Main Form. Change the Name property to cbEmpTitle and the Location property to 56,17. Add Label to Main Form. Change the Text property to City and the Location property to 6,50. Add ComboBox to Main Form. Change the Name property to cbEmpCity and the Location property to 56,46. Add Label to Main Form. Change the Text property to Country and the Location property to 6,76. Add ComboBox to Main Form. Change the Name property to cbEmpCountry and the Location property to 56,72. Add Button to Main Form. Change the Name property to btnEmpAll, the Location property to 6,152, and the Text property to All. Add Button to Main Form. Change the Name property to btnEmpById, the Location property to 99,152, and the Text property to By Id. Add Button to Main Form. Change the Name property to btnEmpByFilters, the Location property to 196,152 and the Text property to By filters. The final design will look like the following screenshot: Products and orders In this part, we will see the relationship between products and orders in order to demonstrate, which products have higher orders. Because we have a lot of product categories and orders from different countries, we filter our report by products category and countries. Beginning your design process After we finished the design of employees section, let's start designing the products and orders section. From Toolbox explorer, drag-and-drop GroupBox to the Main Form. Change the GroupBox Size property to 284,136, the Location property to 12,204 and the Text property to Products – Orders. Add GroupBox to the Main Form. Change the GroupBox Size property to 264,77, the Location property to 6,20, and the Text property to filters. Add Label to the Main Form, change the Text property to Category, and the Location property to 6,25. Add ComboBox to the Main Form. Change the Name Property to cbProductsCategory, and the Location property to 61,20. Add Label to the Main Form. Change the Text property to Country and the Location property to 6,51. Add ComboBox to the Main Form. Change the Name property to cbOrdersCountry and the Location property to 61,47. Add Button to the Main Form. Change the Name property to btnProductsOrdersByFilters, the Location property to 108,103, and the Text property to By filters. The final design looks like the following screenshot: Employees and orders In this part we will see the relationship between employees and orders in order to evaluate the performance of each employee. In this section we will filter data by a period of time. Beginning your design process In this section we will start to design the last section in our application employees and orders. From Toolbox explorer, drag -and-drop GroupBox to the Main Form. Change the Size property of GroupBox to 284,175, the Location property to 14,346, and the Text property to Employees - Orders. Add GroupBox to the Main Form. Change the GroupBox Size property to 264,77, the Location property to 6,25, and the Text property to filters. Add Label to the Main Form. Change the Text property to From and the Location property to 7,29. Add DateTimePicker to the Main Form and change the Name property to dtpFrom. Change the Location property to 41,25, the Format property to Short, and the Value property to 1/1/1996. Add Label to the Main Form. Change the Text property to To and the Location property to 7,55. Add DateTimePicker to the Main Form. Change the Name property to dtpTo, the Location property to 41,51, the Format property to Short, and the Value property to 12/30/1996. Add Button to the Main Form. Change the Name property to btnBarChart, the Location property to 15,117, and the Text property to Bar chart. Add Button to the Main Form and change the Name property to btnPieChart. Change the Location property to 99,117 and the Text property to Pie chart. Add Button to the Main Form. Change the Name Property to btnGaugeChart, the Location property to 186, 117, and the Text property to Gauge chart. Add Button to the Main Form. Change the Name property to btnAllInOne, the Location property to 63,144, and the Text property to All In One. The final design looks like the following screenshot: You can change Location, Size, and Text properties as you wish; but don't change the Name property because we will use it in code in the next articles.
Read more
  • 0
  • 0
  • 1715
article-image-reading-fine-manual
Packt
09 Mar 2017
34 min read
Save for later

Reading the Fine Manual

Packt
09 Mar 2017
34 min read
In this article by Simon Riggs, Gabriele Bartolini, Hannu Krosing, Gianni Ciolli, the authors of the book PostgreSQL Administration Cookbook - Third Edition, you will learn the following recipes: Reading The Fine Manual (RTFM) Planning a new database Changing parameters in your programs Finding the current configuration settings Which parameters are at nondefault settings? Updating the parameter file Setting parameters for particular groups of users The basic server configuration checklist Adding an external module to PostgreSQL Using an installed module Managing installed extensions (For more resources related to this topic, see here.) I get asked many questions about parameter settings in PostgreSQL. Everybody's busy and most people want a 5-minute tour of how things work. That's exactly what a Cookbook does, so we'll do our best. Some people believe that there are some magical parameter settings that will improve their performance, spending hours combing the pages of books to glean insights. Others feel comfortable because they have found some website somewhere that "explains everything", and they "know" they have their database configured OK. For the most part, the settings are easy to understand. Finding the best setting can be difficult, and the optimal setting may change over time in some cases. This article is mostly about knowing how, when, and where to change parameter settings. Reading The Fine Manual (RTFM) RTFM is often used rudely to mean "don't bother me, I'm busy", or it is used as a stronger form of abuse. The strange thing is that asking you to read a manual is most often very good advice. Don't flame the advisors back; take the advice! The most important point to remember is that you should refer to a manual whose release version matches that of the server on which you are operating. The PostgreSQL manual is very well-written and comprehensive in its coverage of specific topics. However, one of its main failings is that the "documents" aren't organized in a way that helps somebody who is trying to learn PostgreSQL. They are organized from the perspective of people checking specific technical points so that they can decide whether their difficulty is a user error or not. It sometimes answers "What?" but seldom "Why?" or "How?" I've helped write sections of the PostgreSQL documents, so I'm not embarrassed to steer you towards reading them. There are, nonetheless, many things to read here that are useful. How to do it… The main documents for each PostgreSQL release are available at http://www.postgresql.org/docs/manuals/. The most frequently accessed parts of the documents are as follows: SQL command reference, as well as client and server tools reference: http://www.postgresql.org/docs/current/interactive/reference.html Configuration: http://www.postgresql.org/docs/current/interactive/runtime-config.html Functions: http://www.postgresql.org/docs/current/interactive/functions.html You can also grab yourself a PDF version of the manual, which can allow easier searching in some cases. Don't print it! The documents are more than 2000 pages of A4-sized sheets. How it works… The PostgreSQL documents are written in SGML, which is similar to, but not the same as, XML. These files are then processed to generate HTML files, PDF, and so on. This ensures that all the formats have exactly the same content. Then, you can choose the format you prefer, and you can even compile it in other formats such as EPUB, INFO, and so on. Moreover, the PostgreSQL manual is actually a subset of the PostgreSQL source code, so it evolves together with the software. It is written by the same people who make PostgreSQL. Even more reasons to read it! There's more… More information is also available at http://wiki.postgresql.org. Many distributions offer packages that install static versions of the HTML documentation. For example, on Debian and Ubuntu, the docs for the most recent stable PostgreSQL version are named postgresql-9.6-docs (unsurprisingly). Planning a new database Planning a new database can be a daunting task. It's easy to get overwhelmed by it, so here, we present some planning ideas. It's also easy to charge headlong at the task as well, thinking that whatever you know is all you'll ever need to consider. Getting ready You are ready. Don't wait to be told what to do. If you haven't been told what the requirements are, then write down what you think they are, clearly labeling them as "assumptions" rather than "requirements"—we mustn't confuse the two things. Iterate until you get some agreement, and then build a prototype. How to do it… Write a document that covers the following items: Database design—plan your database design Calculate the initial database sizing Transaction analysis—how will we access the database? Look at the most frequent access paths What are the requirements for response times? Hardware configuration Initial performance thoughts—will all of the data fit into RAM? Choose the operating system and filesystem type How do we partition the disk? Localization plan Decide server encoding, locale, and time zone Access and security plan Identify client systems and specify required drivers Create roles according to a plan for access control Specify pg_hba.conf Maintenance plan—who will keep it working? How? Availability plan—consider the availability requirements checkpoint_timeout Plan your backup mechanism and test it High-availability plan Decide which form of replication you'll need, if any How it works… One of the most important reasons for planning your database ahead of time is that retrofitting some things is difficult. This is especially true of server encoding and locale, which can cause much downtime and exertion if we need to change them later. Security is also much more difficult to set up after the system is live. There's more… Planning always helps. You may know what you're doing, but others may not. Tell everybody what you're going to do before you do it to avoid wasting time. If you're not sure yet, then build a prototype to help you decide. Approach the administration framework as if it were a development task. Make a list of things you don't know yet, and work through them one by one. This is deliberately a very short recipe. Everybody has their own way of doing things, and it's very important not to be too prescriptive about how to do things. If you already have a plan, great! If you don't, think about what you need to do, make a checklist, and then do it. Changing parameters in your programs PostgreSQL allows you to set some parameter settings for each session or transaction. How to do it… You can change the value of a setting during your session, like this: SET work_mem = '16MB'; This value will then be used for every future transaction. You can also change it only for the duration of the "current transaction": SET LOCAL work_mem = '16MB'; The setting will last until you issue this command: RESET work_mem; Alternatively, you can issue the following command: RESET ALL; SETand RESET commands are SQL commands that can be issued from any interface. They apply only to PostgreSQL server parameters, but this does not mean that they affect the entire server. In fact, the parameters you can change with SET and RESET apply only to the current session. Also, note that there may be other parameters, such as JDBC driver parameters, that cannot be set in this way. How it works… Suppose you change the value of a setting during your session, for example, by issuing this command: SET work_mem = '16MB'; Then, the following will show up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 16384 | 1024 | session Until you issue this command: RESET work_mem; After issuing it, the setting returns to reset_val and the source returns to default: name | setting | reset_val | source ---------+---------+-----------+--------- work_mem | 1024 | 1024 | default There's more… You can change the value of a setting during your transaction as well, like this: SET LOCAL work_mem = '16MB'; Then, this will show up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 1024 | 1024 | session Huh? What happened to your parameter setting? The SET LOCAL command takes effect only for the transaction in which it was executed, which was just the SET LOCAL command in our case. We need to execute it inside a transaction block to be able to see the setting take hold, as follows: BEGIN; SET LOCAL work_mem = '16MB'; Here is what shows up in the pg_settings catalog view: postgres=# SELECT name, setting, reset_val, source FROM pg_settings WHERE source = 'session'; name | setting | reset_val | source ----------+---------+-----------+--------- work_mem | 16384 | 1024 | session You should also note that the value of source is session rather than transaction, as you might have been expecting. Finding the current configuration settings At some point, it will occur to you to ask, "What are the current configuration settings?" Most settings can be changed in more than one way, and some ways do not affect all users or all sessions, so it is quite possible to get confused. How to do it… Your first thought is probably to look in postgresql.conf, which is the configuration file, described in detail in the Updating the parameter file recipe. That works, but only as long as there is only one parameter file. If there are two, then maybe you're reading the wrong file! (How do you know?) So, the cautious and accurate way is not to trust a text file, but to trust the server itself. Moreover, you learned in the previous recipe, Changing parameters in your programs, that each parameter has a scope that determines when it can be set. Some parameters can be set through postgresql.conf, but others can be changed afterwards. So, the current value of configuration settings may have been subsequently changed. We can use the SHOW command like this: postgres=# SHOW work_mem; Its output is as follows: work_mem ---------- 1MB (1 row) However, remember that it reports the current setting at the time it is run, and that can be changed in many places. Another way of finding the current settings is to access a PostgreSQL catalog view named pg_settings: postgres=# x Expanded display is on. postgres=# SELECT * FROM pg_settings WHERE name = 'work_mem'; [ RECORD 1 ] -------------------------------------------------------- name | work_mem setting | 1024 unit | kB category | Resource Usage / Memory short_desc | Sets the maximum memory to be used for query workspaces. extra_desc | This much memory can be used by each internal sort operation and hash table before switching to temporary disk files. context | user vartype | integer source | default min_val | 64 max_val | 2147483647 enumvals | boot_val | 1024 reset_val | 1024 sourcefile | sourceline | Thus, you can use the SHOW command to retrieve the value for a setting, or you can access the full details via the catalog table. There's more… The actual location of each configuration file can be asked directly to the PostgreSQL server, as shown in this example: postgres=# SHOW config_file; This returns the following output: config_file ------------------------------------------ /etc/postgresql/9.4/main/postgresql.conf (1 row) The other configuration files can be located by querying similar variables, hba_file and ident_file. How it works… Each parameter setting is cached within each session so that we can get fast access to the parameter settings. This allows us to access the parameter settings with ease. Remember that the values displayed are not necessarily settings for the server as a whole. Many of those parameters will be specific to the current session. That's different from what you experience with many other database software, and is also very useful. Which parameters are at nondefault settings? Often, we need to check which parameters have been changed or whether our changes have correctly taken effect. In the previous two recipes, we have seen that parameters can be changed in several ways, and with different scope. You learned how to inspect the value of one parameter or get the full list of parameters. In this recipe, we will show you how to use SQL capabilities to list only those parameters whose value in the current session differs from the system-wide default value. This list is valuable for several reasons. First, it includes only a few of the 200-plus available parameters, so it is more immediate. Also, it is difficult to remember all our past actions, especially in the middle of a long or complicated session. Version 9.4 introduces the ALTER SYSTEM syntax, which we will describe in the next recipe, Updating the parameter file. From the viewpoint of this recipe, its behavior is quite different from all other setting-related commands; you run it from within your session and it changes the default value, but not the value in your session. How to do it… We write a SQL query that lists all parameter values, excluding those whose current value is either the default or set from a configuration file: postgres=# SELECT name, source, setting FROM pg_settings WHERE source != 'default' AND source != 'override' ORDER by 2, 1; The output is as follows: name | source | setting ----------------------------+----------------------+----------------- application_name | client | psql client_encoding | client | UTF8 DateStyle | configuration file | ISO, DMY default_text_search_config | configuration file | pg_catalog.english dynamic_shared_memory_type | configuration file | posix lc_messages | configuration file | en_GB.UTF-8 lc_monetary | configuration file | en_GB.UTF-8 lc_numeric | configuration file | en_GB.UTF-8 lc_time | configuration file | en_GB.UTF-8 log_timezone | configuration file | Europe/Rome max_connections | configuration file | 100 port | configuration file | 5460 shared_buffers | configuration file | 16384 TimeZone | configuration file | Europe/Rome max_stack_depth | environment variable | 2048 How it works… You can see from pg_settings which parameters have nondefault values and what the source of the current value is. The SHOW command doesn't tell you whether a parameter is set at a nondefault value. It just tells you the value, which isn't of much help if you're trying to understand what is set and why. If the source is a configuration file, then the sourcefile and sourceline columns are also set. These can be useful in understanding where the configuration came from. There's more… The setting column of pg_settings shows the current value, but you can also look at boot_val and reset_val. The boot_val parameter shows the value assigned when the PostgreSQL database cluster was initialized (initdb), while reset_val shows the value that the parameter will return to if you issue the RESET command. The max_stack_depth parameter is an exception because pg_settings says it is set by the environment variable, though it is actually set by ulimit -s on Linux and Unix systems. The max_stack_depth parameter just needs to be set directly on Windows. The time zone settings are also picked up from the OS environment, so you shouldn't need to set those directly. In older releases, pg_settings showed them as command-line settings. From version 9.1 onwards, they are written to postgresql.conf when the data directory is initialized, so they show up as configuration files. Updating the parameter file The parameter file is the main location for defining parameter values for the PostgreSQL server. All the parameters can be set in the parameter file, which is known as postgresql.conf. There are also two other parameter files: pg_hba.conf and pg_ident.conf. Both of these relate to connections and security. Getting ready First, locate postgresql.conf, as described earlier. How to do it… Some of the parameters take effect only when the server is first started. A typical example might be shared_buffers, which defines the size of the shared memory cache. Many of the parameters can be changed while the server is still running. After changing the required parameters, we issue a reload operation to the server, forcing PostgreSQL to reread the postgresql.conf file (and all other configuration files). There is a number of ways to do that, depending on your distribution and OS. The most common is to issue the following command: pg_ctl reload with the same OS user that runs the PostgreSQL server process. This assumes the default data directory; otherwise you have to specify the correct data directory with the -D option. As noted earlier, Debian and Ubuntu have a different multiversion architecture, so you should issue the following command instead: pg_ctlcluster 9.6 main reload On modern distributions you can also use systemd, as follows: sudo systemctl reload postgresql@9.6-main Some other parameters require a restart of the server for changes to take effect, for instance, max_connections, listen_addresses, and so on. The syntax is very similar to a reload operation, as shown here: pg_ctl restart For Debian and Ubuntu, use this command: pg_ctlcluster 9.6 main restart and with systemd: sudo systemctl restart postgresql@9.6-main The postgresql.conf file is a normal text file that can be simply edited. Most of the parameters are listed in the file, so you can just search for them and then insert the desired value in the right place. How it works… If you set the same parameter twice in different parts of the file, the last setting is what applies. This can cause lots of confusion if you add settings to the bottom of the file, so you are advised against doing that. The best practice is to either leave the file as it is and edit the values, or to start with a blank file and include only the values that you wish to change. I personally prefer a file with only the nondefault values. That makes it easier to see what's happening. Whichever method you use, you are strongly advised to keep all the previous versions of your .conf files. You can do this by copying, or you can use a version control system such as Git or SVN. There's more… The postgresql.conf file also supports an include directive. This allows the postgresql.conf file to reference other files, which can then reference other files, and so on. That may help you organize your parameter settings better, if you don't make it too complicated. If you are working with PostgreSQL version 9.4 or later, you can change the values stored in the parameter files directly from your session, with syntax such as the following: ALTER SYSTEM SET shared_buffers = '1GB'; This command will not actually edit postgresql.conf. Instead, it writes the new setting to another file named postgresql.auto.conf. The effect is equivalent, albeit in a safer way. The original configuration is never written, so it cannot be damaged in the event of a crash. If you mess up with too many ALTER SYSTEM commands, you can always delete postgresql.auto.conf manually and reload the configuration, or restart PostgreSQL, depending on what parameters you had changed. Setting parameters for particular groups of users PostgreSQL supports a variety of ways of defining parameter settings for various user groups. This is very convenient, especially to manage user groups that have different requirements. How to do it… For all users in the saas database, use the following commands: ALTER DATABASE saas SET configuration_parameter = value1; For a user named simon connected to any database, use this: ALTER ROLE Simon SET configuration_parameter = value2; Alternatively, you can set a parameter for a user only when connected to a specific database, as follows: ALTER ROLE Simon IN DATABASE saas SET configuration_parameter = value3; The user won't know that these have been executed specifically for them. These are default settings, and in most cases, they can be overridden if the user requires nondefault values. How it works… You can set parameters for each of the following: Database User (which is named role by PostgreSQL) Database/user combination Each of the parameter defaults is overridden by the one below it. In the preceding three SQL statements: If user hannu connects to the saas database, then value1 will apply If user simon connects to a database other than saas, then value2 will apply If user simon connects to the saas database, then value3 will apply PostgreSQL implements this in exactly the same way as if the user had manually issued the equivalent SET statements immediately after connecting. The basic server configuration checklist PostgreSQL arrives configured for use on a shared system, though many people want to run dedicated database systems. The PostgreSQL project wishes to ensure that PostgreSQL will play nicely with other server software, and will not assume that it has access to the full server resources. If you, as the system administrator, know that there is no other important server software running on this system, then you can crank up the values much higher. Getting ready Before we start, we need to know two sets of information: We need to know the size of the physical RAM that will be dedicated to PostgreSQL We need to know something about the types of applications for which we will use PostgreSQL How to do it… If your database is larger than 32 MB, then you'll probably benefit from increasing shared_buffers. You can increase this to much larger values, but remember that running out of memory induces many problems. For instance, PostgreSQL is able to store information to the disk when the available memory is too small, and it employs sophisticated algorithms to treat each case differently and to place each piece of data either in the disk or in the memory, depending on each use case. On the other hand, overstating the amount of available memory confuses such abilities and results in suboptimal behavior. For instance, if the memory is swapped to disk, then PostgreSQL will inefficiently treat all data as if it were the RAM. Another unfortunate circumstance is when the Linux Out-Of-Memory (OOM) killer terminates one of the various processes spawned by the PostgreSQL server. So, it's better to be conservative. It is good practice to set a low value in your postgresql.conf and increment slowly to ensure that you get the benefits from each change. If you increase shared_buffers and you're running on a non-Windows server, you will almost certainly need to increase the value of the SHMMAX OS parameter (and on some platforms, other parameters as well). On Linux, Mac OS, and FreeBSD, you will need to either edit the /etc/sysctl.conf file or use sysctl -w with the following values: For Linux, use kernel.shmmax=value For Mac OS, use kern.sysv.shmmax=value For FreeBSD, use kern.ipc.shmmax=value There's more… For more information, you can refer to http://www.postgresql.org/docs/9.6/static/kernel-resources.html#SYSVIPC. For example, on Linux, add the following line to /etc/sysctl.conf: kernel.shmmax=value Don't worry about setting effective_cache_size. It is much less important a parameter than you might think. There is no need for too much fuss selecting the value. If you're doing heavy write activity, then you may want to set wal_buffers to a much higher value than the default. In fact wal_buffers is set automatically from the value of shared_buffers, following a rule that fits most cases; however, it is always possible to specify an explicit value that overrides the computation for the very few cases where the rule is not good enough. If you're doing heavy write activity and/or large data loads, you may want to set checkpoint_segments higher than the default to avoid wasting I/O in excessively frequent checkpoints. If your database has many large queries, you may wish to set work_mem to a value higher than the default. However, remember that such a limit applies separately to each node in the query plan, so there is a real risk of overallocating memory, with all the problems discussed earlier. Ensure that autovacuum is turned on, unless you have a very good reason to turn it off—most people don't. Leave the settings as they are for now. Don't fuss too much about getting the settings right. You can change most of them later, so you can take an iterative approach to improving things.  And remember, don't touch the fsync parameter. It's keeping you safe. Adding an external module to PostgreSQL Another strength of PostgreSQL is its extensibility. Extensibility was one of the original design goals, going back to the late 1980s. Now, in PostgreSQL 9.6, there are many additional modules that plug into the core PostgreSQL server. There are many kinds of additional module offerings, such as the following: Additional functions Additional data types Additional operators Additional indexes Getting ready First, you'll need to select an appropriate module to install. The walk towards a complete, automated package management system for PostgreSQL is not over yet, so you need to look in more than one place for the available modules, such as the following: Contrib: The PostgreSQL "core" includes many functions. There is also an official section for add-in modules, known as "contrib" modules. They are always available for your database server, but are not automatically enabled in every database because not all users might need them. On PostgreSQL version 9.6, we will have more than 40 such modules. These are documented at http://www.postgresql.org/docs/9.6/static/contrib.html. PGXN: This is the PostgreSQL Extension Network, a central distribution system dedicated to sharing PostgreSQL extensions. The website started in 2010, as a repository dedicated to the sharing of extension files. Separate projects: These are large external projects, such as PostGIS, offering extensive and complex PostgreSQL modules. For more information, take a look at http://www.postgis.org/. How to do it… There are several ways to make additional modules available for your database server, as follows: Using a software installer Installing from PGXN Installing from a manually downloaded package Installing from source code Often, a particular module will be available in more than one way, and users are free to choose their favorite, exactly like PostgreSQL itself, which can be downloaded and installed through many different procedures. Installing modules using a software installer Certain modules are available exactly like any other software packages that you may want to install in your server. All main Linux distributions provide packages for the most popular modules, such as PostGIS, SkyTools, procedural languages other than those distributed with core, and so on. In some cases, modules can be added during installation if you're using a standalone installer application, for example, the OneClick installer, or tools such as rpm, apt-get, and YaST on Linux distributions. The same procedure can also be followed after the PostgreSQL installation, when the need for a certain module arrives. We will actually describe this case, which is way more common. For example, let's say that you need to manage a collection of Debian package files, and that one of your tasks is to be able to pick the latest version of one of them. You start by building a database that records all the package files. Clearly, you need to store the version number of each package. However, Debian version numbers are much more complex than what we usually call "numbers". For instance, on my Debian laptop, I currently have version 9.2.18-1.pgdg80+1 of the PostgreSQL client package. Despite being complicated, that string follows a clearly defined specification, which includes many bits of information, including how to compare two versions to establish which of them is older. Since this recipe discussed extending PostgreSQL with custom data types and operators, you might have already guessed that I will now consider a custom data type for Debian version numbers that is capable of tasks such as understanding the Debian version number format, sorting version numbers, choosing the latest version number in a given group, and so on. It turns out that somebody else already did all the work of creating the required PostgreSQL data type, endowed with all the useful accessories: comparison operators, input/output functions, support for indexes, and maximum/minimum aggregates. All of this has been packaged as a PostgreSQL extension, as well as a Debian package (not a big surprise), so it is just a matter of installing the postgresql-9.2-debversion package with a Debian tool such as apt-get, aptitude, or synaptic. On my laptop, that boils down to the command line: apt-get install postgresql-9.2-debversion This will download the required package and unpack all the files in the right locations, making them available to my PostgreSQL server. Installing modules from PGXN The PostgreSQL Extension Network, PGXN for short, is a website (http://pgxn.org) launched in late 2010 with the purpose of providing "a central distribution system for open source PostgreSQL extension libraries". Anybody can register and upload their own module, packaged as an extension archive. The website allows browsing available extensions and their versions, either via a search interface or from a directory of package and user names. The simple way is to use a command-line utility, called pgxnclient. It can be easily installed in most systems; see the PGXN website on how to do so. Its purpose is to interact with PGXN and take care of administrative tasks, such as browsing available extensions, downloading the package, compiling the source code, installing files in the proper place, and removing installed package files. Alternatively, you can download the extension files from the website and place them in the right place by following the installation instructions. PGXN is different from official repositories because it serves another purpose. Official repositories usually contain only seasoned extensions because they accept new software only after a certain amount of evaluation and testing. On the other hand, anybody can ask for a PGXN account and upload their own extensions, so there is no filter except requiring that the extension has an open source license and a few files that any extension must have. Installing modules from a manually downloaded package You might have to install a module that is correctly packaged for your system but is not available from the official package archives. For instance, it could be the case that the module has not been accepted in the official repository yet, or you could have repackaged a bespoke version of that module with some custom tweaks, which are so specific that they will never become official. Whatever the case, you will have to follow the installation procedure for standalone packages specific to your system. Here is an example with the Oracle compatibility module, described at http://postgres.cz/wiki/Oracle_functionality_(en): First, we get the package, say for PostgreSQL 8.4 on a 64-bit architecture, from http://pgfoundry.org/frs/download.php/2414/orafce-3.0.1-1.pg84.rhel5.x86_64.rpm. Then, we install the package in the standard way: rpm -ivh orafce-3.0.1-1.pg84.rhel5.x86_64.rpm If all the dependencies are met, we are done. I mentioned dependencies because that's one more potential problem in installing packages that are not officially part of the installed distribution—you can no longer assume that all software version numbers have been tested, all requirements are available, and there are no conflicts. If you get error messages that indicate problems in these areas, you may have to solve them yourself, by manually installing missing packages and/or uninstalling conflicting packages. Installing modules from source code In many cases, useful modules may not have full packaging. In these cases, you may need to install the module manually. This isn't very hard and it's a useful exercise that helps you understand what happens. Each module will have different installation requirements. There are generally two aspects of installing a module. They are as follows: Building the libraries (only for modules that have libraries) Installing the module files in the appropriate locations You need to follow the instructions for the specific module in order to build the libraries, if any are required. Installation will then be straightforward, and usually there will be a suitably prepared configuration file for the make utility so that you just need to type the following command: make install Each file will be copied to the right directory. Remember that you normally need to be a system superuser in order to install files on system directories. Once a library file is in the directory expected by the PostgreSQL server, it will be loaded automatically as soon as requested by a function. Modules such as auto_explain do not provide any additional user-defined function, so they won't be auto-loaded; that needs to be done manually by a superuser with a LOAD statement. How it works… PostgreSQL can dynamically load libraries in the following ways: Using the explicit LOAD command in a session Using the shared_preload_libraries parameter in postgresql.conf at server start At session start, using the local_preload_libraries parameter for a specific user, as set using ALTER ROLE PostgreSQL functions and objects can reference code in these libraries, allowing extensions to be bound tightly to the running server process. The tight binding makes this method suitable for use even in very high-performance applications, and there's no significant difference between additionally supplied features and native features. Using an installed module In this recipe, we will explain how to enable an installed module so that it can be used in a particular database. The additional types, functions, and so on will exist only in those databases where we have carried out this step. Although most modules require this procedure, there are actually a couple of notable exceptions. For instance, the auto_explain module mentioned earlier, which is shipped together with PostgreSQL, does not create any function, type or operator. To use it, you must load its object file using the LOAD command. From that moment, all statements longer than a configurable threshold will be logged together with their execution plan. In the rest of this recipe, we will cover all the other modules. They do not require a LOAD statement because PostgreSQL can automatically load the relevant libraries when they are required. As mentioned in the previous recipe, Adding an external module to PostgreSQL, specially packaged modules are called extensions in PostgreSQL. They can be managed with dedicated SQL commands.  Getting ready Suppose you have chosen to install a certain module among those available for your system (see the previous recipe, Adding an external module to PostgreSQL); all you need to know is the extension name.  How to do it…  Each extension has a unique name, so it is just a matter of issuing the following command: CREATE EXTENSION myextname; This will automatically create all the required objects inside the current database. For security reasons, you need to do so as a database superuser. For instance, if you want to install the dblink extension, type this: CREATE EXTENSION dblink; How it works… When you issue a CREATE EXTENSION command, the database server looks for a file named EXTNAME.control in the SHAREDIR/extension directory. That file tells PostgreSQL some properties of the extension, including a description, some installation information, and the default version number of the extension (which is unrelated to the PostgreSQL version number). Then, a creation script is executed in a single transaction, so if it fails, the database is unchanged. The database server also notes in a catalog table the extension name and all the objects that belong to it. Managing installed extensions In the last two recipes, we showed you how to install external modules in PostgreSQL to augment its capabilities. In this recipe, we will show you some more capabilities offered by the extension infrastructure. How to do it… First, we list all available extensions: postgres=# x on Expanded display is on. postgres=# SELECT * postgres-# FROM pg_available_extensions postgres-# ORDER BY name; -[ RECORD 1 ]-----+-------------------------------------------------- name | adminpack default_version | 1.0 installed_version | comment | administrative functions for PostgreSQL -[ RECORD 2 ]-----+-------------------------------------------------- name | autoinc default_version | 1.0 installed_version | comment | functions for autoincrementing fields (...) In particular, if the dblink extension is installed, then we see a record like this: -[ RECORD 10 ]----+-------------------------------------------------- name | dblink default_version | 1.0 installed_version | 1.0 comment | connect to other PostgreSQL databases from within a database Now, we can list all the objects in the dblink extension, as follows: postgres=# x off Expanded display is off. postgres=# dx+ dblink Objects in extension "dblink" Object Description --------------------------------------------------------------------- function dblink_build_sql_delete(text,int2vector,integer,text[]) function dblink_build_sql_insert(text,int2vector,integer,text[],text[]) function dblink_build_sql_update(text,int2vector,integer,text[],text[]) function dblink_cancel_query(text) function dblink_close(text) function dblink_close(text,boolean) function dblink_close(text,text) (...) Objects created as parts of extensions are not special in any way, except that you can't drop them individually. This is done to protect you from mistakes: postgres=# DROP FUNCTION dblink_close(text); ERROR: cannot drop function dblink_close(text) because extension dblink requires it HINT: You can drop extension dblink instead. Extensions might have dependencies too. The cube and earthdistance contrib extensions provide a good example, since the latter depends on the former: postgres=# CREATE EXTENSION earthdistance; ERROR: required extension "cube" is not installed postgres=# CREATE EXTENSION cube; CREATE EXTENSION postgres=# CREATE EXTENSION earthdistance; CREATE EXTENSION As you can reasonably expect, dependencies are taken into account when dropping objects, just like for other objects: postgres=# DROP EXTENSION cube; ERROR: cannot drop extension cube because other objects depend on it DETAIL: extension earthdistance depends on extension cube HINT: Use DROP ... CASCADE to drop the dependent objects too. postgres=# DROP EXTENSION cube CASCADE; NOTICE: drop cascades to extension earthdistance DROP EXTENSION How it works… The pg_available_extensions system view shows one row for each extension control file in the SHAREDIR/extension directory (see the Using an installed module recipe). The pg_extension catalog table records only the extensions that have actually been created. The psql command-line utility provides the dx meta-command to examine e"x"tensions. It supports an optional plus sign (+) to control verbosity and an optional pattern for the extension name to restrict its range. Consider the following command: dx+ db* This will list all extensions whose name starts with db, together with all their objects. The CREATE EXTENSION command creates all objects belonging to a given extension, and then records the dependency of each object on the extension in pg_depend. That's how PostgreSQL can ensure that you cannot drop one such object without dropping its extension. The extension control file admits an optional line, requires, that names one or more extensions on which the current one depends. The implementation of dependencies is still quite simple. For instance, there is no way to specify a dependency on a specific version number of other extensions, and there is no command that installs one extension and all its prerequisites. As a general PostgreSQL rule, the CASCADE keyword tells the DROP command to delete all the objects that depend on cube, the earthdistance extension in this example. There's more… Another system view, pg_available_extension_versions, shows all the versions available for each extension. It can be valuable when there are multiple versions of the same extension available at the same time, for example, when making preparations for an extension upgrade. When a more recent version of an already installed extension becomes available to the database server, for instance because of a distribution upgrade that installs updated package files, the superuser can perform an upgrade by issuing the following command: ALTER EXTENSION myext UPDATE TO '1.1'; This assumes that the author of the extension taught it how to perform the upgrade. Starting from version 9.6, the CASCADE option is accepted also by the CREATE EXTENSION syntax, with the meaning of “issue CREATE EXTENSION recursively to cover all dependencies”. So, instead of creating extension cube before creating extension earthdistance, you could have just issued the following command: postgres=# CREATE EXTENSION earthdistance CASCADE; NOTICE: installing required extension "cube" CREATE EXTENSION Remember that CREATE EXTENSION … CASCADE will only work if all the extensions it tries to install have already been placed in the appropriate location. Summary In this article, we studies about RTFM, how to plan new database, changing parameters in the program, changing configurations, updating parameter files, how to use a module which is already installed. Resources for Article: Further resources on this subject: PostgreSQL in Action [article] PostgreSQL Cookbook - High Availability and Replication [article] Running a PostgreSQL Database Server [article]
Read more
  • 0
  • 0
  • 1708

article-image-terms-and-concepts-related-mdx
Packt
09 Aug 2011
8 min read
Save for later

Terms and Concepts Related to MDX

Packt
09 Aug 2011
8 min read
MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook More than 80 recipes for enriching your Business Intelligence solutions with high-performance MDX calculations and flexible MDX queries in this book and eBook Parts of an MDX query This section contains the brief explanation of the basic elements of MDX queries: members, sets, tuples, axes, properties, and so on. Regular members Regular dimension members are members sourced from the underlying dimension tables. They are the building blocks of dimensions, fully supported in any type of drill operations, drillthrough, scopes, subqueries, and probably all SSAS front-ends. They can have children and be organized in multilevel user hierarchies. Some of the regular member's properties can be dynamically changed using scopes in MDX script or cell calculations in queries - color, format, font, and so on. Measures are a type of regular members, found on the Measures dimension/hierarchy. The other type of members is calculated members. Calculated members Calculated members are artificial members created in a query, session, or MDX script. They do not exist in the underlying dimension table and as such are not supported in drillthrough and scopes. In subqueries, they are only supported if the connection string includes one of these settings: Subqueries = 1 or Subqueries = 2. See here for examples: http://tinyurl.com/ChrisSubquerySettings They also have a limited set of properties compared to regular members and worse support than regular members in some SSAS front-ends. An often practiced workaround is creating dummy regular members in a dimension table and then using MDX script assignments to provide the calculation for them. They are referred to as "Dummy" because they never occur in the fact table which also explains the need for assignments. Tuples A tuple is a coordinate in the multidimensional cube. That coordinate can be huge and is often such. For example, a cube with 10 dimensions each having 5 attributes is a 51 dimensional object (measures being that extra one). To fully define a coordinate we would have to reference every single attribute in that cube. Fortunately, in order to simplify their usage, tuples are allowed to be written using a part of the full coordinate only. The rest of the coordinate inside the tuple is implicitly evaluated by SSAS engine, either using the current members (for unrelated hierarchies) or through the mechanism known as strong relationships (for related hierarchies). It's worth mentioning that the initial current members are cube's default members. Any subsequent current members are derived from the current context of the query or the calculation. Evaluation of implicit members can sometimes lead to unexpected problems. We can prevent those problems by explicitly specifying all the hierarchies we want to have control over and thereby not letting the implicit evaluation to occur for those hierarchies. Contrary to members and sets, tuples are not an object that can be defined in the WITH part of the query or in MDX script. They are non-persistent. Tuples can be found in sets, during iteration or in calculations. They are often used to set or overwrite the current context, in other words, to jump out of the current context and get the value in another coordinate of the cube. Another important aspect of tuples is their dimensionality. When building a set from tuples, two or more tuples can be combined only if they are built from the same hierarchies, specified in the exact same order. That's their dimensionality. You should know that rearranging the order of hierarchies in a tuple doesn't change its value. Therefore, this can be the first step we can do to make the tuples compatible. The other thing is adding the current members of hierarchies present only in the other tuple, to match the other tuple's dimensionality. Named sets A named set is a user-defined collection of members, more precisely, tuples. Named sets are found in queries, sessions, and MDX scripts. Query-based named sets are equivalent to dynamic sets in MDX script. They both react to the context of subquery and slicer. Contrary to them, static sets are constant, independent of any context. Only the sets that have the same dimensionality can be combined together because what we really combine are the tuples they are built from. It is possible to extract one or more hierarchies from the set. It is also possible to expand the set by crossjoining it with hierarchies not present in its tuples. These processes are known as reducing and increasing the dimensionality of a set. Set alias Set aliases can be defined in calculations only, as a part of that calculation and not in the WITH part of the query as a named set. This is done by identifying a part of the calculation that represents a set and giving a name to that expression inside the calculation, using the AS keyword. This way that set can be used in other parts of the calculation or even other calculations of the query or MDX script. Set aliases enable true dynamic evaluation of sets in a query because they can be evaluated for each cell if used inside a calculated measure. The positive effect is that they are cached, calculated only once and used many times in the calculation or query. The downside is that they prevent block-computation mode because the above mentioned evaluation is performed for each cell individually. In short, set aliases can be used in long calculations, where the same set appears multiple times or when that set needs to be truly dynamic. At the same time, they are to be avoided in iterations of any kind. Axis An axis is a part of the query where a set is projected at. A query can have up to 128 axes although most queries have 1 or 2 axes. A query with no axis is also a valid query but almost never used. The important thing to remember is that axes are evaluated independently. SSAS engine knows in which order to calculate them if there is a dependency between them. One way to create such a dependency is to refer to the current member of a hierarchy on the other axis. The other option would be to use the Axis() function. Some SSAS front-ends generate MDX queries that break the axes dependencies established through calculations. The workaround calculations can be very hard if not impossible. Slicer The slicer, also known as the filter axis or the WHERE clause, is a part of the query which sets the context for the evaluation of members and sets on axes and in the WITH part of the query. The slicer, which can be anything from a single tuple up to the multidimensional set, interacts with sets on axes. A single member of a hierarchy in slicer forces the coordinate and reduces the related sets on axes by removing all non-existing combinations. Multiple members of the same hierarchy are not that strong. In their case, individual members in sets on axes overwrite the context of the slicer during their evaluation. Finally, the context established by the slicer can be overwritten in the calculations using tuples. Subquery The subquery, also known as the subselect, is a part of the query which executes first and determines the cube space to be used in the query. Unlike slicer, the subquery doesn't set the coordinate for the query. In other words, current members of all hierarchies (related, unrelated, and even the same hierarchy used in the subquery) remain the same. What the subquery does is it applies the VisualTotals operation on members of hierarchies used in the subquery. The VisualTotals operation changes each member's value with the aggregation value of its children, but only those present in the subquery. Because the slicer and the subquery have different behaviors, one should not be used as a replacement for the other. Whenever you need to set the context for the whole query, use the slicer. That will adjust the total for all hierarchies in the cube. If you only need to adjust the total for some hierarchies in the cube and not for the others, subquery is the way to go; specify those hierarchies in the subquery. This is also an option if you need to prevent any attribute interaction between your subquery and the query. The areas where the subquery is particularly good at are grouping of non-granular attributes, advanced set logic, and restricting members on hierarchies. Cell properties Cell properties are properties that can be used to get specific behaviors for cells. For example: colors, font sizes, types and styles, and so on. Unless explicitly asked for, only the Cell_Ordinal, Value, and Formatted_Value properties are returned by an MDX query. Dimension properties Dimension properties are a set of member properties that return extra information about members on axes. Intrinsic member properties are Key, Name, and Value; the others are those defined by the user for a particular hierarchy in the Dimension Designer. In client tools, dimension properties are often shown in the grid next to the attribute they are bound to or in the hint over that attribute.  
Read more
  • 0
  • 0
  • 1700

article-image-follow-money
Packt
28 Mar 2013
13 min read
Save for later

Follow the Money

Packt
28 Mar 2013
13 min read
(For more resources related to this topic, see here.) It starts with the Cost Worksheet In PCM, the Cost Worksheet is the common element for all the monetary modules. The Cost Worksheet is a spreadsheet-like module with rows and columns. The following screenshot shows a small part of a typical Cost Worksheet register: The columns are set by PCM but the rows are determined by the organization. The rows are the Cost Codes. This is your cost breakdown structure. This is the lowest level of detail with which money will be tracked in PCM. It can also be called Cost Accounts. All money entered into the documents in any of the monetary modules in PCM will be allocated to a Cost Code on the Cost Worksheet. Even if you do not specifically allocate to a Cost Code, the system will allocate to a system generated Cost Code called NOT COSTED. The NOT COSTED Cost Code is important so no money slips through the cracks. If you forget to assign money to your Cost Codes on the project it will assign the money to this code. When reviewing the Cost Worksheet, a user can review the NOT COSTED Cost Code and see if any money is associated with this code. If there is money associated with NOT COSTED, he can find that document where he has forgotten to allocate all the money to proper Cost Codes. Users cannot edit any numbers directly on the Cost Worksheet; it is a reflection of information entered on various documents on your project. This provides a high level of accountability in that no money can be entered or changed without a document entered someplace within PCM (like can be done with a spreadsheet) The Cost Code itself can be up to 30 characters in length and can be divided into segments to align with the cost breakdown structure, as shown in the following screenshot: The number of Cost Codes and the level of breakdown is typically determined by the accounting or ERP system used by your organization or it can be used as an extension of the ERP system's coding structure. When the Cost Code structure matches, integration between the two systems becomes easier. There are many other factors to consider when thinking about integrating systems but the Cost Code structure is at the core of relating the two systems. Defining the segments within the Cost Codes is done as part of the initial setup and implementation of PCM. This is done in the Cost Code Definitions screen, as shown in the following screenshot: To set up, you must tell PCM what character of the Cost Code the segment starts with and how long the segment is (the number of characters). Once this is done you can also populate a dictionary of titles for each segment. A trick used for having different segment titles for different projects is to create an identical segment dictionary but for different projects. For example, if you have a different list of Disciplines for every project, you can create and define a list of Disciplines for each project with the same starting character and length. Then you can use the proper Cost Code definitions in your layouts and reporting for that project. The following screenshot shows how this can be done: Once the Cost Codes have been defined, the Cost Worksheet will need to be populated for your project. There are various ways to accomplish this. Create a dummy project with the complete list of company Cost Codes you would ever use on a project. When you want to populate the Cost Code list on a new project, use the Copy Cost Codes function from the project tree. Import a list of Cost Codes that have been developed in a spreadsheet (Yes, I used the word "spreadsheet". There are times when a spreadsheet comes in handy – managing a multi-million dollar project is not one of them). PCM has an import function from the Cost Worksheet where you can import a comma-separated values (CSV) file of the Cost Codes and titles. Enter the Cost Codes one at a time from the Cost Worksheet. If there are a small number of Cost Codes, this might be the fastest and easiest method. Understanding the columns of the Cost Worksheet will help you understand how powerful and important the Cost Worksheet really is. The columns of the Cost Worksheet in PCM are set by the system. They are broken down into a few categories, as follows: Budget Commitment Custom Actuals Procurement Variances Miscellaneous Each of the categories has a corresponding color to help differentiate them when looking at the Cost Worksheet. Within each of these categories are a number of columns. The Budget, Commitment, and Custom categories have the same columns while the other categories have their own set of columns. These three categories work basically the same. They can be defined in basic terms as follows: Budget: This is the money that your company has available to spend and is going to be received by the project. Examples depend on the perspective of the setup of PCM. In the example of our cavemen Joe and David, David is the person working for Joe. If David was using PCM, the Budget category would be the amount of the agreed price between Joe and David to make the chair or the amount of money that David was going to be paid by Joe to make the chair. Committed: This is the money that has been agreed to be spent on the project, not the money that has been spent. So in our example it would be the amount of money that David has agreed to pay his subcontractors to supply him with goods and services to build the chair for him. Custom: This is a category that is available to be used by the user for another contracting type. It has its own set of columns identical to the Budget and Commitment categories. This can be used for a Funding module where you can track the amount of money funded for the project, which can be much different from the available budget for the project. Money distributed to the Trends module can be posted to many of the columns as determined by the user upon adding the Trend. The Trend document is not referenced in the following explanations. When money is entered in a document, it must be allocated or distributed to one or multiple Cost Codes. As stated before, if you forget or do not allocate the money to a Cost Code, PCM will allocate the money to the NOT COSTED Cost Code. The system knows what column to place the money in but the user must tell PCM the proper row (Cost Code). If the Status Type of a document is set to Closed or Rejected, the money is removed from the Cost Worksheet but the document is still available to be reviewed. This way only documents that are in progress or approved will be placed on the Cost Worksheet. Let's look at each of the columns individually and explain how money is posted. The only documents that affect the Cost Worksheet are as follows: Contracts (three types) Change Orders Proposals Payment Requisitions Invoices Trends Procurement Contracts Let's look at the first three categories first since they are the most complex. Following is a table of the columns associated with these categories. Understand that the terminology used here is the standard out of the box terminology of PCM and may not match what has been set up in your organization. The third contract type (Custom) can be turned on or off using the Project Settings. It can be used for a variety of types as it has its own set of columns in the Cost Worksheet. The Custom contract type can be used in the Change Management module; however, it utilizes the Commitment tab, which requires the user to understand exactly what category the change is related. The following tables show various columns on the Cost Worksheet starting with the Cost Code itself. The first table shows all the columns used by each of the three contract categories: Cost Worksheet Columns The columns listed above are affected by the Contracts, Purchase Orders, or any Change Document modules. Let's look at specific definitions of what document type can be posted to which column. The Original Column The Original column is used for money distributed from any of the Contract modules. If a Commitment contract is added under the Contracts – Committed module and the money is distributed to various Cost Codes (rows), the column used is the Original Commitment column in the worksheet. It's the same with the Contracts – Budgeted and Contracts – Custom modules. The Purchase Order module is posted to the Commitments category. Money can also be posted to this column for Budget and Commitment contracts from the Change Management module where a phase has been assigned this column. This is not a typical practice as the Original column should be unchangeable from the values on the original contract. The Approved Column The Approved Revisions column is used for money distributed from the Change Order module. If a Change Order is added under the Change Order module against a commitment contract and the money is distributed to various Cost Codes (rows), and the Change Order has been approved, the money on this document is posted to the Approved Commitment Revisions column in the worksheet. We will discuss what happens prior to approval later. The Revised Column The Revised column is a computed column adding the original money and the approved money. Money cannot be distributed to this column from any document in PCM. The Pending Changes Column The Pending Revisions column can be populated by several document types as follows: Change Orders: Prior to approving the Change Order document, all money associated with the Change Order document created from the Change Orders module from the point of creation will be posted to the Pending Changes column. Change Management: These are documents associated with a change management process where the change phase is associated with the Pending column. This can be from the Proposal module or the Change Order module. Proposals: These are documents created in the Proposals module either through the Change Management module or directly from the module itself. The Estimated Changes Column The Estimated Revisions column is populated from phases in Change Management that have been assigned to distribute money to this column The Adjustment Column The Adjustment column is populated from phases in Change Management that have been assigned to distribute money to this column The Projected Column The Projected column is a computed column of all columns associated with a category. This column is very powerful in understanding the potential cost at completion of this Cost Code. Actuals There are two columns that are associated with actual cost documents in PCM. The modules that affect these columns are as follows: Payment Requisitions Invoices These columns are the Actuals Received and Actuals Issued columns. These column names can be confusing and should be considered for change during implementation. This is the way you could look at what money these columns include. Actuals Received: This column holds money where you have received a Payment Requisition or Invoice to be paid by you. This also includes the Custom category. Actuals Issued: This column holds money where you have issued a Payment Requisition or Invoice to be paid to you. As Payment Requisitions or Invoices are being entered and the money distributed to Cost Codes, this money will be placed in one of these two columns depending on the contract relationship associated with these documents. Be aware that money is placed into these columns as soon as it is entered into Payment Requisitions or Invoices regardless of approval or certification. Procurement There are many columns relating to the Procurement module. This book does not go into details of the Procurement module. The column names related to Procurement are as follows: Procurement Estimate Original Estimate Estimate Accuracy Estimated Gross Profit Buyout Purchasing Buyout Variances There are many Variance columns that are computed columns. These columns show the variance (or difference) between other columns on the worksheet, as follows. Original Variance: The Original Budget minus the Original Commitment Approved Variance: The Revised Budget minus the Revised Commitment Pending Variance: The (Revised Budget plus Pending Budget Revisions) minus (Revised Commitment plus Pending Commitment) Projected Variance: The Projected Budget minus the Projected Commitment These columns are very powerful to help analyze relationships between the Budget category and the Commitment category. Miscellaneous There are a few miscellaneous columns as follows that are worth noting so you understand what the numbers mean: Budget Percent: This represents the percentage of the Actuals Issued column of the Revised Budget column for that Cost Code. Commitment Percentage: This represents the percentage of the Actuals Received column of the Revised Commitment column for that Cost Code. Planned to Commit: This is the planned expenditure for the Cost Code. This value can only be populated from the Details tab of the Cost Code. It is also used for an estimators value of the Cost Code. Drilling down to the detail The beauty of the Cost Worksheet is the ability to quickly review what documents have had an effect on which column on the worksheet. Look at the Cost Worksheet as a ten-thousand foot view of the money on your project. There is a lot of information that can be gleaned from this high-level review especially if you are using layouts properly. If you see some numbers that need further review, then drilling down to the detail directly from the Cost Worksheet is quite simple. To drill down to the detail, click on the Cost Code. This will bring up a new page with tabs for the different categories. Click on the tab you wish to review and the grid shows all the documents where some or all the money has been posted to this Cost Code. This page shows all columns affected by the selected category, with the rows representing each document and the corresponding value from that document that affects the selected Cost Code on the Cost Worksheet. From this page you can click on the link under the Item column (as shown in the previous screenshot) to open the actual document that the row represents. Summary Understanding the concepts in this article is key to understanding how the money flows within PCM. Take the time to review this information so that other articles of the book on changes, payments, and forecasting make more sense. The ability to have all aspects of the money on your project accessible from one module is extremely powerful and should be one of the modules that you refer to on a regular basis. Resources for Article : Further resources on this subject: Author Podcast - Ronald Rood discusses the birth of Oracle Scheduler [Article] Author Podcast - Bob Griesemer on Oracle Warehouse Builder 11g [Article] Oracle Integration and Consolidation Products [Article]
Read more
  • 0
  • 0
  • 1676
article-image-security-considerations-multitenant-environment
Packt
24 May 2016
8 min read
Save for later

Security Considerations in Multitenant Environment

Packt
24 May 2016
8 min read
In this article by Zoran Pavlović and Maja Veselica, authors of the book, Oracle Database 12c Security Cookbook, we will be introduced to common privileges and learn how to grant privileges and roles commonly. We'll also study the effects of plugging and unplugging operations on users, roles, and privileges. (For more resources related to this topic, see here.) Granting privileges and roles commonly Common privilege is a privilege that can be exercised across all containers in a container database. Depending only on the way it is granted, a privilege becomes common or local. When you grant privilege commonly (across all containers) it becomes common privilege. Only common users or roles can have common privileges. Only common role can be granted commonly. Getting ready For this recipe, you will need to connect to the root container as an existing common user who is able to grant a specific privilege or existing role (in our case – create session, select any table, c##role1, c##role2) to another existing common user (c##john). If you want to try out examples in the How it works section given ahead, you should open pdb1 and pdb2. You will use: Common users c##maja and c##zoran with dba role granted commonly Common user c##john Common roles c##role1 and c##role2 How to do it... You should connect to the root container as a common user who can grant these privileges and roles (for example, c##maja or system user). SQL> connect c##maja@cdb1 Grant a privilege (for example, create session) to a common user (for example, c##john) commonly c##maja@CDB1> grant create session to c##john container=all; Grant a privilege (for example, select any table) to a common role (for example, c##role1) commonly c##maja@CDB1> grant select any table to c##role1 container=all; Grant a common role (for example, c##role1) to a common role (for example, c##role2) commonly c##maja@CDB1> grant c##role1 to c##role2 container=all; Grant a common role (for example, c##role2) to a common user (for example, c##john) commonly c##maja@CDB1> grant c##role2 to c##john container=all; How it works... Figure 16 You can grant privileges or common roles commonly only to a common user. You need to connect to the root container as a common user who is able to grant a specific privilege or role. In step 2, system privilege, create session, is granted to common user c##john commonly, by adding a container=all clause to the grant statement. This means that user c##john can connect (create session) to root or any pluggable database in this container database (including all pluggable databases that will be plugged-in in the future). N.B. container = all clause is NOT optional, even though you are connected to the root. Unlike during creation of common users and roles (if you omit container=all, user or role will be created in all containers – commonly), If you omit this clause during privilege or role grant, privilege or role will be granted locally and it can be exercised only in root container. SQL> connect c##john/oracle@cdb1 c##john@CDB1> connect c##john/oracle@pdb1 c##john@PDB1> connect c##john/oracle@pdb2 c##john@PDB2> In the step 3, system privilege, select any table, is granted to common role c##role1 commonly. This means that role c##role1 contains select any table privilege in all containers (root and pluggable databases). c##zoran@CDB1> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM ------------- ----------------- --- --- C##ROLE1 SELECT ANY TABLE NO YES c##zoran@CDB1> connect c##zoran/oracle@pdb1 c##zoran@PDB1> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM -------------- ------------------ --- --- C##ROLE1 SELECT ANY TABLE NO YES c##zoran@PDB1> connect c##zoran/oracle@pdb2 c##zoran@PDB2> select * from role_sys_privs where role='C##ROLE1'; ROLE PRIVILEGE ADM COM -------------- ---------------- --- --- C##ROLE1 SELECT ANY TABLE NO YES In step 4, common role c##role1, is granted to another common role c##role2 commonly. This means that role c##role2 has granted role c##role1 in all containers. c##zoran@CDB1> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED ROLE ADM COM --------------- --------------- --- --- C##ROLE2 C##ROLE1 NO YES c##zoran@CDB1> connect c##zoran/oracle@pdb1 c##zoran@PDB1> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED_ROLE ADM COM ------------- ----------------- --- --- C##ROLE2 C##ROLE1 NO YES c##zoran@PDB1> connect c##zoran/oracle@pdb2 c##zoran@PDB2> select * from role_role_privs where role='C##ROLE2'; ROLE GRANTED_ROLE ADM COM ------------- ------------- --- --- C##ROLE2 C##ROLE1 NO YES In step 5, common role c##role2, is granted to common user c##john commonly. This means that user c##john has c##role2 in all containers. Consequently, user c##john can use select any table privilege in all containers in this container database. c##john@CDB1> select count(*) from c##zoran.t1; COUNT(*) ---------- 4 c##john@CDB1> connect c##john/oracle@pdb1 c##john@PDB1> select count(*) from hr.employees; COUNT(*) ---------- 107 c##john@PDB1> connect c##john/oracle@pdb2 c##john@PDB2> select count(*) from sh.sales; COUNT(*) ---------- 918843 Effects of plugging/unplugging operations on users, roles, and privileges Purpose of this recipe is to show what is going to happen to users, roles, and privileges when you unplug a pluggable database from one container database (cdb1) and plug it into some other container database (cdb2). Getting ready To complete this recipe, you will need: Two container databases (cdb1 and cdb2) One pluggable database (pdb1) in container database cdb1 Local user mike in pluggable database pdb1 with local create session privilege Common user c##john with create session common privilege and create synonym local privilege on pluggable database pdb1 How to do it... Connect to the root container of cdb1 as user sys: SQL> connect sys@cdb1 as sysdba Unplug pdb1 by creating XML metadata file: SQL> alter pluggable database pdb1 unplug into '/u02/oradata/pdb1.xml'; Drop pdb1 and keep datafiles: SQL> drop pluggable database pdb1 keep datafiles; Connect to the root container of cdb2 as user sys: SQL> connect sys@cdb2 as sysdba Create (plug) pdb1 to cdb2 by using previously created metadata file: SQL> create pluggable database pdb1 using '/u02/oradata/pdb1.xml' nocopy; How it works... By completing previous steps, you unplugged pdb1 from cdb1 and plugged it into cdb2. After this operation, all local users and roles (in pdb1) are migrated with pdb1 database. If you try to connect to pdb1 as a local user: SQL> connect mike@pdb1 It will succeed. All local privileges are migrated, even if they are granted to common users/roles. However, if you try to connect to pdb1 as a previously created common user c##john, you'll get an error SQL> connect c##john@pdb1 ERROR: ORA-28000: the account is locked Warning: You are no longer connected to ORACLE. This happened because after migration, common users are migrated in a pluggable database as locked accounts. You can continue to use objects in these users' schemas, or you can create these users in root container of a new CDB. To do this, we first need to close pdb1: sys@CDB2> alter pluggable database pdb1 close; Pluggable database altered. sys@CDB2> create user c##john identified by oracle container=all; User created. sys@CDB2> alter pluggable database pdb1 open; Pluggable database altered. If we try to connect to pdb1 as user c##john, we will get an error: SQL> conn c##john/oracle@pdb1 ERROR: ORA-01045: user C##JOHN lacks CREATE SESSION privilege; logon denied Warning: You are no longer connected to ORACLE. Even though c##john had create session common privilege in cdb1, he cannot connect to the migrated PDB. This is because common privileges are not migrated! So we need to give create session privilege (either common or local) to user c##john. sys@CDB2> grant create session to c##john container=all; Grant succeeded. Let's try granting a create synonym local privilege to the migrated pdb2: c##john@PDB1> create synonym emp for hr.employees; Synonym created. This proves that local privileges are always migrated. Summary In this article, we learned about common privileges and the methods to grant common privileges and roles to users. We also studied what happens to users, roles, and privileges when you unplug a pluggable database from one container database and plug it into some other container database. Resources for Article: Further resources on this subject: Oracle 12c SQL and PL/SQL New Features[article] Oracle GoldenGate 12c — An Overview[article] Backup and Recovery for Oracle SOA Suite 12C[article]
Read more
  • 0
  • 0
  • 1671

article-image-creating-reusable-actions-agent-behaviors-lua
Packt
27 Nov 2014
18 min read
Save for later

Creating reusable actions for agent behaviors with Lua

Packt
27 Nov 2014
18 min read
In this article by David Young, author of Learning Game AI Programming with Lua, we will create reusable actions for agent behaviors. (For more resources related to this topic, see here.) Creating userdata So far we've been using global data to store information about our agents. As we're going to create decision structures that require information about our agents, we'll create a local userdata table variable that contains our specific agent data as well as the agent controller in order to manage animation handling: local userData = {    alive, -- terminal flag    agent, -- Sandbox agent    ammo, -- current ammo    controller, -- Agent animation controller    enemy, -- current enemy, can be nil    health, -- current health    maxHealth -- max Health }; Moving forward, we will encapsulate more and more data as a means of isolating our systems from global variables. A userData table is perfect for storing any arbitrary piece of agent data that the agent doesn't already possess and provides a common storage area for data structures to manipulate agent data. So far, the listed data members are some common pieces of information we'll be storing; when we start creating individual behaviors, we'll access and modify this data. Agent actions Ultimately, any decision logic or structure we create for our agents comes down to deciding what action our agent should perform. Actions themselves are isolated structures that will be constructed from three distinct states: Uninitialized Running Terminated The typical lifespan of an action begins in uninitialized state and will then become initialized through a onetime initialization, and then, it is considered to be running. After an action completes the running phase, it moves to a terminated state where cleanup is performed. Once the cleanup of an action has been completed, actions are once again set to uninitialized until they wait to be reactivated. We'll start defining an action by declaring the three different states in which actions can be as well as a type specifier, so our data structures will know that a specific Lua table should be treated as an action. Remember, even though we use Lua in an object-oriented manner, Lua itself merely creates each instance of an object as a primitive table. It is up to the code we write to correctly interpret different tables as different objects. The use of a Type variable that is moving forward will be used to distinguish one class type from another. Action.lua: Action = {};   Action.Status = {    RUNNING = "RUNNING",    TERMINATED = "TERMINATED",    UNINITIALIZED = "UNINITIALIZED" };   Action.Type = "Action"; Adding data members To create an action, we'll pass three functions that the action will use for the initialization, updating, and cleanup. Additional information such as the name of the action and a userData variable, used for passing information to each callback function, is passed in during the construction time. Moving our systems away from global data and into instanced object-oriented patterns requires each instance of an object to store its own data. As our Action class is generic, we use a custom data member, which is userData, to store action-specific information. Whenever a callback function for the action is executed, the same userData table passed in during the construction time will be passed into each function. The update callback will receive an additional deltaTimeInMillis parameter in order to perform any time specific update logic. To flush out the Action class' constructor function, we'll store each of the callback functions as well as initialize some common data members: Action.lua: function Action.new(name, initializeFunction, updateFunction,        cleanUpFunction, userData)      local action = {};       -- The Action's data members.    action.cleanUpFunction_ = cleanUpFunction;    action.initializeFunction_ = initializeFunction;    action.updateFunction_ = updateFunction;    action.name_ = name or "";    action.status_ = Action.Status.UNINITIALIZED;    action.type_ = Action.Type;    action.userData_ = userData;           return action; end Initializing an action Initializing an action begins by calling the action's initialize callback and then immediately sets the action into a running state. This transitions the action into a standard update loop that is moving forward: Action.lua: function Action.Initialize(self)    -- Run the initialize function if one is specified.    if (self.status_ == Action.Status.UNINITIALIZED) then        if (self.initializeFunction_) then            self.initializeFunction_(self.userData_);        end    end       -- Set the action to running after initializing.    self.status_ = Action.Status.RUNNING; end Updating an action Once an action has transitioned to a running state, it will receive callbacks to the update function every time the agent itself is updated, until the action decides to terminate. To avoid an infinite loop case, the update function must return a terminated status when a condition is met; otherwise, our agents will never be able to finish the running action. An update function isn't a hard requirement for our actions, as actions terminate immediately by default if no callback function is present. Action.lua: function Action.Update(self, deltaTimeInMillis)    if (self.status_ == Action.Status.TERMINATED) then        -- Immediately return if the Action has already        -- terminated.        return Action.Status.TERMINATED;    elseif (self.status_ == Action.Status.RUNNING) then        if (self.updateFunction_) then            -- Run the update function if one is specified.                      self.status_ = self.updateFunction_(                deltaTimeInMillis, self.userData_);              -- Ensure that a status was returned by the update            -- function.            assert(self.status_);        else            -- If no update function is present move the action            -- into a terminated state.            self.status_ = Action.Status.TERMINATED;        end    end      return self.status_; end Action cleanup Terminating an action is very similar to initializing an action, and it sets the status of the action to uninitialized once the cleanup callback has an opportunity to finish any processing of the action. If a cleanup callback function isn't defined, the action will immediately move to an uninitialized state upon cleanup. During action cleanup, we'll check to make sure the action has fully terminated, and then run a cleanup function if one is specified. Action.lua: function Action.CleanUp(self)    if (self.status_ == Action.Status.TERMINATED) then        if (self.cleanUpFunction_) then            self.cleanUpFunction_(self.userData_);        end    end       self.status_ = Action.Status.UNINITIALIZED; end Action member functions Now that we've created the basic, initialize, update, and terminate functionalities, we can update our action constructor with CleanUp, Initialize, and Update member functions: Action.lua: function Action.new(name, initializeFunction, updateFunction,        cleanUpFunction, userData)       ...      -- The Action's accessor functions.    action.CleanUp = Action.CleanUp;    action.Initialize = Action.Initialize;    action.Update = Action.Update;       return action; end Creating actions With a basic action class out of the way, we can start implementing specific action logic that our agents can use. Each action will consist of three callback functions—initialization, update, and cleanup—that we'll use when we instantiate our action instances. The idle action The first action we'll create is the basic and default choice from our agents that are going forward. The idle action wraps the IDLE animation request to our soldier's animation controller. As the animation controller will continue looping our IDLE command until a new command is queued, we'll time our idle action to run for 2 seconds, and then terminate it to allow another action to run: SoldierActions.lua: function SoldierActions_IdleCleanUp(userData)    -- No cleanup is required for idling. end   function SoldierActions_IdleInitialize(userData)    userData.controller:QueueCommand(        userData.agent,        SoldierController.Commands.IDLE);           -- Since idle is a looping animation, cut off the idle    -- Action after 2 seconds.    local sandboxTimeInMillis = Sandbox.GetTimeInMillis(        userData.agent:GetSandbox());    userData.idleEndTime = sandboxTimeInMillis + 2000; end Updating our action requires that we check how much time has passed; if the 2 seconds have gone by, we terminate the action by returning the terminated state; otherwise, we return that the action is still running: SoldierActions.lua: function SoldierActions_IdleUpdate(deltaTimeInMillis, userData)    local sandboxTimeInMillis = Sandbox.GetTimeInMillis(        userData.agent:GetSandbox());    if (sandboxTimeInMillis >= userData.idleEndTime) then        userData.idleEndTime = nil;        return Action.Status.TERMINATED;    end    return Action.Status.RUNNING; end As we'll be using our idle action numerous times, we'll create a wrapper around initializing our action based on our three functions: SoldierLogic.lua: local function IdleAction(userData)    return Action.new(        "idle",        SoldierActions_IdleInitialize,        SoldierActions_IdleUpdate,        SoldierActions_IdleCleanUp,        userData); end The die action Creating a basic death action is very similar to our idle action. In this case, as death in our animation controller is a terminating state, all we need to do is request that the DIE command be immediately executed. From this point, our die action is complete, and it's the responsibility of a higher-level system to stop any additional processing of logic behavior. Typically, our agents will request this state when their health drops to zero. In the special case that our agent dies due to falling, the soldier's animation controller will manage the correct animation playback and set the soldier's health to zero: SoldierActions.lua: function SoldierActions_DieCleanUp(userData)    -- No cleanup is required for death. end   function SoldierActions_DieInitialize(userData)    -- Issue a die command and immediately terminate.    userData.controller:ImmediateCommand(        userData.agent,        SoldierController.Commands.DIE);      return Action.Status.TERMINATED; end   function SoldierActions_DieUpdate(deltaTimeInMillis, userData)    return Action.Status.TERMINATED; end Creating a wrapper function to instantiate a death action is identical to our idle action: SoldierLogic.lua: local function DieAction(userData)    return Action.new(        "die",        SoldierActions_DieInitialize,        SoldierActions_DieUpdate,        SoldierActions_DieCleanUp,        userData); end The reload action Reloading is the first action that requires an animation to complete before we can consider the action complete, as the behavior will refill our agent's current ammunition count. As our animation controller is queue-based, the action itself never knows how many commands must be processed before the reload command has finished executing. To account for this during the update loop of our action, we wait till the command queue is empty, as the reload action will be the last command that will be added to the queue. Once the queue is empty, we can terminate the action and allow the cleanup function to award the ammo: SoldierActions.lua: function SoldierActions_ReloadCleanUp(userData)    userData.ammo = userData.maxAmmo; end   function SoldierActions_ReloadInitialize(userData)    userData.controller:QueueCommand(        userData.agent,        SoldierController.Commands.RELOAD);    return Action.Status.RUNNING; end   function SoldierActions_ReloadUpdate(deltaTimeInMillis, userData)    if (userData.controller:QueueLength() > 0) then        return Action.Status.RUNNING;    end       return Action.Status.TERMINATED; end SoldierLogic.lua: local function ReloadAction(userData)    return Action.new(        "reload",        SoldierActions_ReloadInitialize,        SoldierActions_ReloadUpdate,        SoldierActions_ReloadCleanUp,        userData); end The shoot action Shooting is the first action that directly interacts with another agent. In order to apply damage to another agent, we need to modify how the soldier's shots deal with impacts. When the soldier shot bullets out of his rifle, we added a callback function to handle the cleanup of particles; now, we'll add an additional functionality in order to decrement an agent's health if the particle impacts an agent: Soldier.lua: local function ParticleImpact(sandbox, collision)    Sandbox.RemoveObject(sandbox, collision.objectA);       local particleImpact = Core.CreateParticle(        sandbox, "BulletImpact");    Core.SetPosition(particleImpact, collision.pointA);    Core.SetParticleDirection(        particleImpact, collision.normalOnB);      table.insert(        impactParticles,        { particle = particleImpact, ttl = 2.0 } );       if (Agent.IsAgent(collision.objectB)) then        -- Deal 5 damage per shot.        Agent.SetHealth(            collision.objectB,            Agent.GetHealth(collision.objectB) - 5);    end end Creating the shooting action requires more than just queuing up a shoot command to the animation controller. As the SHOOT command loops, we'll queue an IDLE command immediately afterward so that the shoot action will terminate after a single bullet is fired. To have a chance at actually shooting an enemy agent, though, we first need to orient our agent to face toward its enemy. During the normal update loop of the action, we will forcefully set the agent to point in the enemy's direction. Forcefully setting the agent's forward direction during an action will allow our soldier to shoot but creates a visual artifact where the agent will pop to the correct forward direction. See whether you can modify the shoot action's update to interpolate to the correct forward direction for better visual results. SoldierActions.lua: function SoldierActions_ShootCleanUp(userData)    -- No cleanup is required for shooting. end   function SoldierActions_ShootInitialize(userData)    userData.controller:QueueCommand(        userData.agent,        SoldierController.Commands.SHOOT);    userData.controller:QueueCommand(        userData.agent,        SoldierController.Commands.IDLE);       return Action.Status.RUNNING; end   function SoldierActions_ShootUpdate(deltaTimeInMillis, userData)    -- Point toward the enemy so the Agent's rifle will shoot    -- correctly.    local forwardToEnemy = userData.enemy:GetPosition() –        userData.agent:GetPosition();    Agent.SetForward(userData.agent, forwardToEnemy);      if (userData.controller:QueueLength() > 0) then        return Action.Status.RUNNING;    end      -- Subtract a single bullet per shot.    userData.ammo = userData.ammo - 1;    return Action.Status.TERMINATED; end SoldierLogic.lua: local function ShootAction(userData)    return Action.new(        "shoot",        SoldierActions_ShootInitialize,        SoldierActions_ShootUpdate,        SoldierActions_ShootCleanUp,        userData); end The random move action Randomly moving is an action that chooses a random point on the navmesh to be moved to. This action is very similar to other actions that move, except that this action doesn't perform the moving itself. Instead, the random move action only chooses a valid point to move to and requires the move action to perform the movement: SoldierActions.lua: function SoldierActions_RandomMoveCleanUp(userData)   end   function SoldierActions_RandomMoveInitialize(userData)    local sandbox = userData.agent:GetSandbox();      local endPoint = Sandbox.RandomPoint(sandbox, "default");    local path = Sandbox.FindPath(        sandbox,        "default",        userData.agent:GetPosition(),        endPoint);       while #path == 0 do        endPoint = Sandbox.RandomPoint(sandbox, "default");        path = Sandbox.FindPath(            sandbox,            "default",            userData.agent:GetPosition(),            endPoint);    end       userData.agent:SetPath(path);    userData.agent:SetTarget(endPoint);    userData.movePosition = endPoint;       return Action.Status.TERMINATED; end   function SoldierActions_RandomMoveUpdate(userData)    return Action.Status.TERMINATED; end SoldierLogic.lua: local function RandomMoveAction(userData)    return Action.new(        "randomMove",        SoldierActions_RandomMoveInitialize,        SoldierActions_RandomMoveUpdate,        SoldierActions_RandomMoveCleanUp,        userData); end The move action Our movement action is similar to an idle action, as the agent's walk animation will loop infinitely. In order for the agent to complete a move action, though, the agent must reach within a certain distance of its target position or timeout. In this case, we can use 1.5 meters, as that's close enough to the target position to terminate the move action and half a second to indicate how long the move action can run for: SoldierActions.lua: function SoldierActions_MoveToCleanUp(userData)    userData.moveEndTime = nil; end   function SoldierActions_MoveToInitialize(userData)    userData.controller:QueueCommand(        userData.agent,        SoldierController.Commands.MOVE);       -- Since movement is a looping animation, cut off the move    -- Action after 0.5 seconds.    local sandboxTimeInMillis =        Sandbox.GetTimeInMillis(userData.agent:GetSandbox());    userData.moveEndTime = sandboxTimeInMillis + 500;      return Action.Status.RUNNING; end When applying the move action onto our agents, the indirect soldier controller will manage all animation playback and steer our agent along their path. The agent moving to a random position Setting a time limit for the move action will still allow our agents to move to their final target position, but gives other actions a chance to execute in case the situation has changed. Movement paths can be long, and it is undesirable to not handle situations such as death until the move action has terminated: SoldierActions.lua: function SoldierActions_MoveToUpdate(deltaTimeInMillis, userData)    -- Terminate the action after the allotted 0.5 seconds. The    -- decision structure will simply repath if the Agent needs    -- to move again.    local sandboxTimeInMillis =        Sandbox.GetTimeInMillis(userData.agent:GetSandbox());  if (sandboxTimeInMillis >= userData.moveEndTime) then        userData.moveEndTime = nil;        return Action.Status.TERMINATED;    end      path = userData.agent:GetPath();    if (#path ~= 0) then        offset = Vector.new(0, 0.05, 0);        DebugUtilities_DrawPath(            path, false, offset, DebugUtilities.Orange);        Core.DrawCircle(            path[#path] + offset, 1.5, DebugUtilities.Orange);    end      -- Terminate movement is the Agent is close enough to the    -- target.  if (Vector.Distance(userData.agent:GetPosition(),        userData.agent:GetTarget()) < 1.5) then          Agent.RemovePath(userData.agent);        return Action.Status.TERMINATED;    end      return Action.Status.RUNNING; end SoldierLogic.lua: local function MoveAction(userData)    return Action.new(        "move",        SoldierActions_MoveToInitialize,        SoldierActions_MoveToUpdate,        SoldierActions_MoveToCleanUp,        userData); end Summary In this article, we have taken a look at creating userdata and reuasable actions. Resources for Article: Further resources on this subject: Using Sprites for Animation [Article] Installing Gideros [Article] CryENGINE 3: Breaking Ground with Sandbox [Article]
Read more
  • 0
  • 0
  • 1650
Modal Close icon
Modal Close icon