How-To Tutorials

article-image-your-first-fuelphp-application-7-easy-steps

04 Mar 2015

12 min read

Your first FuelPHP application in 7 easy steps

04 Mar 2015

In this article by Sébastien Drouyer, author of the book FuelPHP Application Development Blueprints we will see that FuelPHP is an open source PHP framework using the latest technologies. Its large community regularly creates and improves packages and extensions, and the framework’s core is constantly evolving. As a result, FuelPHP is a very complete solution for developing web applications. (For more resources related to this topic, see here.) In this article, we will also see how easy it is for developers to create their first website using the PHP oil utility. The target application Suppose you are a zoo manager and you want to keep track of the monkeys you are looking after. For each monkey, you want to save: Its name If it is still in the zoo Its height A description input where you can enter custom information You want a very simple interface with five major features. You want to be able to: Create new monkeys Edit existing ones List all monkeys View a detailed file for each monkey Delete monkeys These preceding five major features, very common in computer applications, are part of the Create, Read, Update and Delete (CRUD) basic operations. Installing the environment The FuelPHP framework needs the three following components: Webserver: The most common solution is Apache PHP interpreter: The 5.3 version or above Database: We will use the most popular one, MySQL The installation and configuration procedures of these components will depend on the operating system you use. We will provide here some directions to get you started in case you are not used to install your development environment. Please note though that these are very generic guidelines. Feel free to search the web for more information, as there are countless resources on the topic. Windows A complete and very popular solution is to install WAMP. This will install Apache, MySQL and PHP, in other words everything you need to get started. It can be accessed at the following URL: http://www.wampserver.com/en/ Mac PHP and Apache are generally installed on the latest version of the OS, so you just have to install MySQL. To do that, you are recommended to read the official documentation: http://dev.mysql.com/doc/refman/5.1/en/macosx-installation.html A very convenient solution for those of you who have the least system administration skills is to install MAMP, the equivalent of WAMP but for the Mac operating system. It can be downloaded through the following URL: http://www.mamp.info/en/downloads/ Ubuntu As this is the most popular Linux distribution, we will limit our instructions to Ubuntu. You can install a complete environment by executing the following command lines: # Apache, MySQL, PHP sudo apt-get install lamp-server^ # PHPMyAdmin allows you to handle the administration of MySQL DB sudo apt-get install phpmyadmin # Curl is useful for doing web requests sudo apt-get install curl libcurl3 libcurl3-dev php5-curl # Enabling the rewrite module as it is needed by FuelPHP sudo a2enmod rewrite # Restarting Apache to apply the new configuration sudo service apache2 restart Getting the FuelPHP framework There are four common ways to download FuelPHP: Downloading and unzipping the compressed package which can be found on the FuelPHP website. Executing the FuelPHP quick command-line installer. Downloading and installing FuelPHP using Composer. Cloning the FuelPHP GitHub repository. It is a little bit more complicated but allows you to select exactly the version (or even the commit) you want to install. The easiest way is to download and unzip the compressed package located at: http://fuelphp.com/files/download/28 You can get more information about this step in Chapter 1 of FuelPHP Application Development Blueprints, which can be accessed freely. It is also well-documented on the website installation instructions page: http://fuelphp.com/docs/installation/instructions.html Installation directory and apache configuration Now that you know how to install FuelPHP in a given directory, we will explain where to install it and how to configure Apache. The simplest way The simplest way is to install FuelPHP in the root folder of your web server (generally the /var/www directory on Linux systems). If you install fuel in the DIR directory inside the root folder (/var/www/DIR), you will be able to access your project on the following URL: http://localhost/DIR/public/ However, be warned that fuel has not been implemented to support this, and if you publish your project this way in the production server, it will introduce security issues you will have to handle. In such cases, you are recommended to use the second way we explained in the section below, although, for instance if you plan to use a shared host to publish your project, you might not have the choice. A complete and up to date documentation about this issue can be found in the Fuel installation instruction page: http://fuelphp.com/docs/installation/instructions.html By setting up a virtual host Another way is to create a virtual host to access your application. You will need a *nix environment and a little bit more apache and system administration skills, but the benefit is that it is more secured and you will be able to choose your working directory. You will need to change two files: Your apache virtual host file(s) in order to link a virtual host to your application Your system host file, in order redirect the wanted URL to your virtual host In both cases, the files location will be very dependent on your operating system and the server environment you are using, so you will have to figure their location yourself (if you are using a common configuration, you won’t have any problem to find instructions on the web). In the following example, we will set up your system to call your application when requesting the my.app URL on your local environment. Let’s first edit the virtual host file(s); add the following code at the end: <VirtualHost *:80> ServerName my.app DocumentRoot YOUR_APP_PATH/public SetEnv FUEL_ENV "development" <Directory YOUR_APP_PATH/public> DirectoryIndex index.php AllowOverride All Order allow,deny Allow from all </Directory> </VirtualHost> Then, open your system host files and add the following line at the end: 127.0.0.1 my.app Depending on your environment, you might need to restart Apache after that. You can now access your website on the following URL: http://my.app/ Checking that everything works Whether you used a virtual host or not, the following should now appear when accessing your website: Congratulation! You just have successfully installed the FuelPHP framework. The welcome page shows some recommended directions to continue your project. Database configuration As we will store our monkeys into a MySQL database, it is time to configure FuelPHP to use our local database. If you open fuel/app/config/db.php, all you will see is an empty array but this configuration file is merged to fuel/app/config/ENV/db.php, ENV being the current Fuel’s environment, which in that case is development. You should therefore open fuel/app/config/development/db.php: <?php //... return array( 'default' => array( 'connection' => array( 'dsn' => 'mysql:host=localhost;dbname=fuel_dev', 'username' => 'root', 'password' => 'root', ), ), ); You should adapt this array to your local configuration, particularly the database name (currently set to fuel_dev), the username, and password. You must create your project’s database manually. Scaffolding Now that the database configuration is set, we will be able to generate a scaffold. We will use for that the generate feature of the oil utility. Open the command-line utility and go to your website root directory. To generate a scaffold for a new model, you will need to enter the following line: php oil generate scaffold/crud MODEL ATTR_1:TYPE_1 ATTR_2:TYPE_2 ... Where: MODEL is the model name ATTR_1, ATTR_2… are the model’s attributes names TYPE_1, TYPE_2… are each attribute type In our case, it should be: php oil generate scaffold/crud monkey name:string still_here:bool height:float description:text Here we are telling oil to generate a scaffold for the monkey model with the following attributes: name: The name of the monkey. Its type is string and the associated MySQL column type will be VARCHAR(255). still_here: Whether or not the monkey is still in the facility. Its type is boolean and the associated MySQL column type will be TINYINT(1). height: Height of the monkey. Its type is float and its associated MySQL column type will be FLOAT. description: Description of the monkey. Its type is text and its associated MySQL column type will be TEXT. You can do much more using the oil generate feature, as generating models, controllers, migrations, tasks, package and so on. We will see some of these in the FuelPHP Application Development Blueprints book and you are also recommended to take a look at the official documentation: http://fuelphp.com/docs/packages/oil/generate.html When you press Enter, you will see the following lines appear: Creating migration: APPPATH/migrations/001_create_monkeys.php Creating model: APPPATH/classes/model/monkey.php Creating controller: APPPATH/classes/controller/monkey.php Creating view: APPPATH/views/monkey/index.php Creating view: APPPATH/views/monkey/view.php Creating view: APPPATH/views/monkey/create.php Creating view: APPPATH/views/monkey/edit.php Creating view: APPPATH/views/monkey/_form.php Creating view: APPPATH/views/template.php Where APPPATH is your website directory/fuel/app. Oil has generated for us nine files: A migration file, containing all the necessary information to create the model’s associated table The model A controller Five view files and a template file More explanation about these files and how they interact with each other can be accessed in Chapter 1 of the FuelPHP Application Development Blueprints book, freely available. For those of you who are not yet familiar with MVC and HMVC frameworks, don’t worry; the chapter contains an introduction to the most important concepts. Migrating One of the generated files was APPPATH/migrations/001_create_monkeys.php. It is a migration file and contains the required information to create our monkey table. Notice the name is structured as VER_NAME where VER is the version number and NAME is the name of the migration. If you execute the following command line: php oil refine migrate All migrations files that have not been yet executed will be executed from the oldest version to the latest version (001, 002, 003, and so on). Once all files are executed, oil will display the latest version number. Once executed, if you take a look at your database, you will observe that not one, but two tables have been created: monkeys: As expected, a table have been created to handle your monkeys. Notice that the table name is the plural version of the word we typed for generating the scaffold; such a transformation was internally done using the Inflector::pluralize method. The table will contain the specified columns (name, still_here), the id column, but also created_at and updated_at. These columns respectively store the time an object was created and updated, and are added by default each time you generate your models. It is though possible to not generate them with the --no-timestamp argument. migration: This other table was automatically created. It keeps track of the migrations that were executed. If you look into its content, you will see that it already contains one row; this is the migration you just executed. You can notice that the row does not only indicate the name of the migration, but also a type and a name. This is because migrations files can be placed at many places such as modules or packages. The oil utility allows you to do much more. Don’t hesitate to take a look at the official documentation: http://fuelphp.com/docs/packages/oil/intro.html Or, again, to read FuelPHP Application Development Blueprints’ Chapter 1 which is available for free. Using your application Now that we generated the code and migrated the database, our application is ready to be used. Request the following URL: If you created a virtual host: http://my.app/monkey Otherwise (don’t forget to replace DIR): http://localhost/DIR/public/monkey As you can notice, this webpage is intended to display the list of all monkeys, but since none have been added, the list is empty. Then let’s add a new monkey by clicking on the Add new Monkey button. The following webpage should appear: You can enter your monkey’s information here. The form is certainly not perfect - for instance the Still here field use a standard input although a checkbox would be more appropriated - but it is a great start. All we will have to do is refine the code a little bit. Once you have added several monkeys, you can again take a look at the listing page: Again, this is a great start, though we might want to refine it. Each item on the list has three associated actions: View, Edit, and Delete. Let’s first click on View: Again a great start, though we will refine this webpage. You can return back to the listing by clicking on Back or edit the monkey file by clicking on Edit. Either accessed from the listing page or the view page, it will display the same form as when creating a new monkey, except that the form will be prefilled of course. Finally, if you click on Delete, a confirmation box will appear to prevent any miss clicking. Want to learn more ? Don’t hesitate to check out FuelPHP Application Development Blueprints’ Chapter 1 which is freely available in Packt Publishing’s website. In this chapter, you will find a more thorough introduction to FuelPHP and we will show how to improve this first application. You are also recommended to explore FuelPHP website, which contains a lot of useful information and an excellent documentation: http://www.fuelphp.com There is much more to discover about this wonderful framework. Summary In this article we leaned about the installation of the FuelPHP environment and installation of directories in it. Resources for Article: Further resources on this subject: PHP Magic Features [Article] FuelPHP [Article] Building a To-do List with Ajax [Article]

0
0
7271

article-image-test-driving-uitableviews-cedar

Joe Masilotti

04 Mar 2015

8 min read

Test Driving UITableViews with Cedar

Joe Masilotti

04 Mar 2015

8 min read

One of the first things a developer does when learning iOS development is to display a list of items to the user. In iOS we use UITableViews to show one-dimensional tables of information. In practice they look like a long list of data and should be used in that way. UITableViews get their information from a UITableViewDataSource, which responds to a few delegate methods for a number of cells and what information the cells contain. This post will follow a step-by-step guide to test driving UITableViews in iOS. All code samples will use the behavior-driven testing framework Cedar. Cedar can be installed as a Cocoapod by adding the following to your Podfile: target Specs do pod Cedar end Follow this guide for installation and configuration instructions if you are having trouble or want a crash course on the framework. Unit-Style Approach One way to test table views is to follow a unit-style approach on the data source. The goal there is to call single public methods and assert that the correct state was altered or the return value was configured correctly. The target for unit testing a UITableView is its UITableViewDataSource property. The tests for this are fairly straightforward as they call -tableView:cellForRowAtIndexPath: and -tableView:numberOfCellsInSection: directly. For example, let's say we want our controller to display a table with the current list of iPhones. Our mental assertions are that this table should show a single section with nine items, one for each of the iPhone, iPhone 3G, iPhone 3GS, iPhone 4, iPhone 4s, iPhone 5, iPhone 5s, iPhone 6, and iPhone 6 Plus. The unit tests will follow a very similar pattern. Since a table defaults to one section we don't need to write a test asserting the number of sections. We can just go about testing that there are nine cells and assuming that the first and last cells text is correct, everything is working. describe(@"ViewController", ^{ __block ViewController *subject; beforeEach(^{ subject = [[ViewController alloc] init]; }); describe(@"-tableView:numberOfRowsInSection:", ^{ it(@"should have nine cells", ^{ [subject tableView:subject.tableView numberOfRowsInSection:0] should equal(9); }); }); describe(@"-tableView:cellForRowAtIndexPath:", ^{ __block UITableViewCell *cell; context(@"the first cell", ^{ beforeEach(^{ NSIndexPath *indexPath = [NSIndexPath indexPathForRow:0 inSection:0]; cell = [subject tableView:subject.tableView cellForRowAtIndexPath:indexPath]; }); it(@"should display 'iPhone'", ^{ cell.textLabel.text should equal(@"iPhone"); }); }); context(@"the last cell", ^{ beforeEach(^{ NSIndexPath *indexPath = [NSIndexPath indexPathForRow:8 inSection:0]; cell = [subject tableView:subject.tableView cellForRowAtIndexPath:indexPath]; }); it(@"should display 'iPhone 6 Plus'", ^{ cell.textLabel.text should equal(@"iPhone 6 Plus"); }); }); }); }); Now the good part about these tests is that they are easy to follow and straight to the point. When we ask how many items there are we expect the right amount. And when we want to ensure the first cell is set up correctly we test just that. Issues Unfortunately there are a few problems with this approach. The biggest issue is that we can get these tests to pass without actually displaying anything on the screen. A simple implementation of these two methods in our controller will make everything green but has no guarantee that a table view is on the screen (or that one even exists!). The first step in remedying this is to write a test asserting that the table view is a subview. Another, albeit minor, issue is we are breaking encapsulation; we are exposing that our controller conforms to the UITableViewDataSource protocol. Let's see what we can do about these two problems. Benefits Don't think that unit-style is bad, it just has different uses. If you have an app that uses multiple instances you will see benefits from this approach. This is because all you would need in your controller is to ensure the right type of data source was configured. You could take this one step farther by injecting the array of items to display and unit testing that. Then you have a repeatable unit of code that shows a list of data conforming to your app's specifications, which is quite powerful. Behavior-Driven Approach Let's take a more behavioral approach to our problem. Our goal is to display to the user the list of iPhones. If we care about what the user sees what is the closest way of replicating that? How about what cells are visible to the user? From Apple's documentation, -visibleCells on UITableView: Returns the table cells that are visible in the receiver. This sounds interesting. Let's restructure our tests to run assertions on the cells that the user sees, not some made up world of delegates and data sources. describe(@"when the view loads", ^{ beforeEach(^{ subject.view should_not be_nil; [subject.view layoutIfNeeded]; }); it(@"should display the first iPhone, first", ^{ UITableViewCell *firstCell = subject.tableView.visibleCells.firstObject; firstCell.textLabel.text should equal(@"iPhone"); }); it(@"display the iPhone 6 Plus, last", ^{ UITableViewCell *lastCell = subject.tableView.visibleCells.lastObject; lastCell.textLabel.text should equal(@"iPhone 6 Plus"); }); }); Note that in the beforeEach we assert that the view should exist. This is to kick off the controller's view lifecycle methods, namely -loadView and -viewDidLoad. We then tell its view to layout its subviews if need be. This ensures that anything we add as subviews have their layout constraints configured and applied. To get this to pass we have a few things to take care of. Create the backing array of iPhones Create the table view and add it as a subview Become the data source and respond to the calls The first one is easy so let's knock that out first. @interface ViewController () <UITableViewDataSource> @property (nonatomic) UITableView *tableView; @property (nonatomic, strong) NSArray *iPhones; @end @implementation ViewController - (instancetype)init { if (self = [super init]) { self.iPhones = @[ @"iPhone", @"iPhone 3G", @"iPhone 3GS", @"iPhone 4", @"iPhone 4s", @"iPhone 5", @"iPhone 5s", @"iPhone 6", @"iPhone 6 Plus" ]; } return self; } Note the opening up of the -tableView property in the interface extension. This allows us to keep it private in the header and the outside world while still being able to modify it internally. Next let's add the table view and its auto layout constraints. - (void)viewDidLoad { [super viewDidLoad]; self.tableView = [[UITableView alloc] init]; [self.view addSubview:self.tableView]; [self addTableViewConstraints]; } #pragma mark - Private - (void)addTableViewConstraints { self.tableView.translatesAutoresizingMaskIntoConstraints = NO; NSDictionary *views = @{ @"tableView": self.tableView }; [self.view addConstraints:[NSLayoutConstraint constraintsWithVisualFormat:@"V:|[tableView]|" options:kNilOptions metrics:nil views:views]]; [self.view addConstraints:[NSLayoutConstraint constraintsWithVisualFormat:@"H:|[tableView]|" options:kNilOptions metrics:nil views:views]]; } Since we aren't working with Storyboards or xibs/nibs we create the table view manually and add it as a subview. We also will need to add some simple auto layout constraints to have it fill the screen. Check out Apple's Auto Layout by Example guide if you would like a deeper explanation. Finally let's get to the meat of the issue and respond to the data source methods. #pragma mark - <UITableViewDataSource> - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection:(NSInteger)section { return self.iPhones.count; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath { UITableViewCell *cell = [tableView dequeueReusableCellWithIdentifier:kCellIdentifier forIndexPath:indexPath]; cell.textLabel.text = self.iPhones[indexPath.row]; return cell; } We also need to become the data source of the table so do that and register the cell in -viewDidLoad. [self.tableView registerClass:[UITableViewCell class] forCellReuseIdentifier:kCellIdentifier]; self.tableView.dataSource = self; Finally add the constant to the top of the file. NSString * const kCellIdentifier = @"CellIdentifier"; What's interesting with this approach is that not until you have every line correct with the tests pass. This helps ensure that what is happening under spec is closer to the real experience of the app. For example, having a table view on the screen, responding to the delegate calls, but not assigning the delegate won't get you anywhere. In the unit approach you could have done just that but still seen your tests go green. Benefits of Behavior Testing When testing behavior you put yourself in a world that more closely represents the state when a user is interacting with it. It also enables you to test collaboration between objects without having to single very simple piece of the architecture out. This means it can be easy to get carried away and start writing full integration tests from controllers. If you keep to only testing one or two layers of abstraction, in this case the table view through the delegate, your code and specs remain easy to read and understand. A side effect of this approach enabled us to hide some implementation details in the production code. This means we are more freely to do a green-to-green refactor without having to change our specs. For example, we could extract the UITableViewDataSource into its own object and know that it works correctly when all of the existing tests still pass. If we wanted to then reuse that collaborator we could then extract the specs and have it stand on its own. Or if our backing array turned into an NSDictionary and found everything by key nothing in our tests would have to change. There are many styles of testing and even more ways to test Objective-C code and the Cocoa Touch framework. Behavior testing is just one approach that has proved to be the most flexible and easy to understand for me. What other techniques and methods have you implemented to ensure code coverage on your own iOS apps? About the author Joe Masilotti is a test-driven iOS developer living in Brooklyn, NY. He contributes to open-source testing tools on GitHub and talks about development, cooking, and craft beer on Twitter.

0
0
2975

How-To Tutorials

Packt

04 Mar 2015

22 min read

Python functions – Avoid repeating code

Packt

04 Mar 2015

22 min read

In this article by Silas Toms, author of the book ArcPy and ArcGIS – Geospatial Analysis with Python we will see how programming languages share a concept that has aided programmers for decades: functions. The idea of a function, loosely speaking, is to create blocks of code that will perform an action on a piece of data, transforming it as required by the programmer and returning the transformed data back to the main body of code. Functions are used because they solve many different needs within programming. Functions reduce the need to write repetitive code, which in turn reduces the time needed to create a script. They can be used to create ranges of numbers (the range() function), or to determine the maximum value of a list (the max function), or to create a SQL statement to select a set of rows from a feature class. They can even be copied and used in another script or included as part of a module that can be imported into scripts. Function reuse has the added bonus of making programming more useful and less of a chore. When a scripter starts writing functions, it is a major step towards making programming part of a GIS workflow. (For more resources related to this topic, see here.) Technical definition of functions Functions, also called subroutines or procedures in other programming languages, are blocks of code that have been designed to either accept input data and transform it, or provide data to the main program when called without any input required. In theory, functions will only transform data that has been provided to the function as a parameter; it should not change any other part of the script that has not been included in the function. To make this possible, the concept of namespaces is invoked. Namespaces make it possible to use a variable name within a function, and allow it to represent a value, while also using the same variable name in another part of the script. This becomes especially important when importing modules from other programmers; within that module and its functions, the variables that it contains might have a variable name that is the same as a variable name within the main script. In a high-level programming language such as Python, there is built-in support for functions, including the ability to define function names and the data inputs (also known as parameters). Functions are created using the keyword def plus a function name, along with parentheses that may or may not contain parameters. Parameters can also be defined with default values, so parameters only need to be passed to the function when they differ from the default. The values that are returned from the function are also easily defined. A first function Let's create a function to get a feel for what is possible when writing functions. First, we need to invoke the function by providing the def keyword and providing a name along with the parentheses. The firstFunction() will return a string when called: def firstFunction(): 'a simple function returning a string' return "My First Function" >>>firstFunction() The output is as follows: 'My First Function' Notice that this function has a documentation string or doc string (a simple function returning a string) that describes what the function does; this string can be called later to find out what the function does, using the __doc__ internal function: >>>print firstFunction.__doc__ The output is as follows: 'a simple function returning a string' The function is defined and given a name, and then the parentheses are added followed by a colon. The following lines must then be indented (a good IDE will add the indention automatically). The function does not have any parameters, so the parentheses are empty. The function then uses the keyword return to return a value, in this case a string, from the function. Next, the function is called by adding parentheses to the function name. When it is called, it will do what it has been instructed to do: return the string. Functions with parameters Now let's create a function that accepts parameters and transforms them as needed. This function will accept a number and multiply it by 3: def secondFunction(number): 'this function multiples numbers by 3' return number *3 >>> secondFunction(4) The output is as follows: 12 The function has one flaw, however; there is no assurance that the value passed to the function is a number. We need to add a conditional to the function to make sure it does not throw an exception: def secondFunction(number): 'this function multiples numbers by 3' if type(number) == type(1) or type(number) == type(1.0): return number *3 >>> secondFunction(4.0) The output is as follows: 12.0 >>>secondFunction(4) The output is as follows: 12 >>>secondFunction("String") >>> The function now accepts a parameter, checks what type of data it is, and returns a multiple of the parameter whether it is an integer or a function. If it is a string or some other data type, as shown in the last example, no value is returned. There is one more adjustment to the simple function that we should discuss: parameter defaults. By including default values in the definition of the function, we avoid having to provide parameters that rarely change. If, for instance, we wanted a different multiplier than 3 in the simple function, we would define it like this: def thirdFunction(number, multiplier=3): 'this function multiples numbers by 3' if type(number) == type(1) or type(number) == type(1.0): return number *multiplier >>>thirdFunction(4) The output is as follows: 12 >>>thirdFunction(4,5) The output is as follows: 20 The function will work when only the number to be multiplied is supplied, as the multiplier has a default value of 3. However, if we need another multiplier, the value can be adjusted by adding another value when calling the function. Note that the second value doesn't have to be a number as there is no type checking on it. Also, the default value(s) in a function must follow the parameters with no defaults (or all parameters can have a default value and the parameters can be supplied to the function in order or by name). Using functions to replace repetitive code One of the main uses of functions is to ensure that the same code does not have to be written over and over. The first portion of the script that we could convert into a function is the three ArcPy functions. Doing so will allow the script to be applicable to any of the stops in the Bus Stop feature class and have an adjustable buffer distance: bufferDist = 400 buffDistUnit = "Feet" lineName = '71 IB' busSignage = 'Ferry Plaza' sqlStatement = "NAME = '{0}' AND BUS_SIGNAG = '{1}'" def selectBufferIntersect(selectIn,selectOut,bufferOut, intersectIn, intersectOut, sqlStatement, bufferDist, buffDistUnit, lineName, busSignage): 'a function to perform a bus stop analysis' arcpy.Select_analysis(selectIn, selectOut, sqlStatement.format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} {1}".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut, intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut This function demonstrates how the analysis can be adjusted to accept the input and output feature class variables as parameters, along with some new variables. The function adds a variable to replace the SQL statement and variables to adjust the bus stop, and also tweaks the buffer distance statement so that both the distance and the unit can be adjusted. The feature class name variables, defined earlier in the script, have all been replaced with local variable names; while the global variable names could have been retained, it reduces the portability of the function. The next function will accept the result of the selectBufferIntersect() function and search it using the Search Cursor, passing the results into a dictionary. The dictionary will then be returned from the function for later use: def createResultDic(resultFC): 'search results of analysis and create results dictionary' dataDictionary = {} with arcpy.da.SearchCursor(resultFC, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) return dataDictionary This function only requires one parameter: the feature class returned from the searchBufferIntersect() function. The results holding dictionary is first created, then populated by the search cursor, with the busStopid attribute used as a key, and the census block population attribute added to a list assigned to the key. The dictionary, having been populated with sorted data, is returned from the function for use in the final function, createCSV(). This function accepts the dictionary and the name of the output CSV file as a string: def createCSV(dictionary, csvname): 'a function takes a dictionary and creates a CSV file' with open(csvname, 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dictionary.keys(): popList = dictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) The final function creates the CSV using the csv module. The name of the file, a string, is now a customizable parameter (meaning the script name can be any valid file path and text file with the extension .csv). The csvfile parameter is passed to the CSV module's writer method and assigned to the variable csvwriter, and the dictionary is accessed and processed, and passed as a list to csvwriter to be written to the CSV file. The csv.writer() method processes each item in the list into the CSV format and saves the final result. Open the CSV file with Excel or a text editor such as Notepad. To run the functions, we will call them in the script following the function definitions: analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, Inbound71_400ft_buffer, CensusBlocks2010, Intersect71Census, bufferDist, lineName, busSignage ) dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv') Now, the script has been divided into three functions, which replace the code of the first modified script. The modified script looks like this: # -*- coding: utf-8 -*- # --------------------------------------------------------------------------- # 8662_Chapter4Modified1.py # Created on: 2014-04-22 21:59:31.00000 # (generated by ArcGIS/ModelBuilder) # Description: # Adjusted by Silas Toms # 2014 05 05 # --------------------------------------------------------------------------- # Import arcpy module import arcpy import csv # Local variables: Bus_Stops = r"C:\Projects\PacktDB.gdb\SanFrancisco\Bus_Stops" CensusBlocks2010 = r"C:\Projects\PacktDB.gdb\SanFrancisco\CensusBlocks2010" Inbound71 = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71" Inbound71_400ft_buffer = r"C:\Projects\PacktDB.gdb\Chapter3Results\Inbound71_400ft_buffer" Intersect71Census = r"C:\Projects\PacktDB.gdb\Chapter3Results\Intersect71Census" bufferDist = 400 lineName = '71 IB' busSignage = 'Ferry Plaza' def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn, intersectOut, bufferDist,lineName, busSignage ): arcpy.Select_analysis(selectIn, selectOut, "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} Feet".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut def createResultDic(resultFC): dataDictionary = {} with arcpy.da.SearchCursor(resultFC, ["STOPID","POP10"]) as cursor: for row in cursor: busStopID = row[0] pop10 = row[1] if busStopID not in dataDictionary.keys(): dataDictionary[busStopID] = [pop10] else: dataDictionary[busStopID].append(pop10) return dataDictionary def createCSV(dictionary, csvname): with open(csvname, 'wb') as csvfile: csvwriter = csv.writer(csvfile, delimiter=',') for busStopID in dictionary.keys(): popList = dictionary[busStopID] averagePop = sum(popList)/len(popList) data = [busStopID, averagePop] csvwriter.writerow(data) analysisResult = selectBufferIntersect(Bus_Stops,Inbound71, Inbound71_400ft_buffer,CensusBlocks2010,Intersect71Census, bufferDist,lineName, busSignage ) dictionary = createResultDic(analysisResult) createCSV(dictionary,r'C:\Projects\Output\Averages.csv') print "Data Analysis Complete" Further generalization of the functions, while we have created functions from the original script that can be used to extract more data about bus stops in San Francisco, our new functions are still very specific to the dataset and analysis for which they were created. This can be very useful for long and laborious analysis for which creating reusable functions is not necessary. The first use of functions is to get rid of the need to repeat code. The next goal is to then make that code reusable. Let's discuss some ways in which we can convert the functions from one-offs into reusable functions or even modules. First, let's examine the first function: def selectBufferIntersect(selectIn,selectOut,bufferOut,intersectIn, intersectOut, bufferDist,lineName, busSignage ): arcpy.Select_analysis(selectIn, selectOut, "NAME = '{0}' AND BUS_SIGNAG = '{1}'".format(lineName, busSignage)) arcpy.Buffer_analysis(selectOut, bufferOut, "{0} Feet".format(bufferDist), "FULL", "ROUND", "NONE", "") arcpy.Intersect_analysis("{0} #;{1} #".format(bufferOut,intersectIn), intersectOut, "ALL", "", "INPUT") return intersectOut This function appears to be pretty specific to the bus stop analysis. It's so specific, in fact, that while there are a few ways in which we can tweak it to make it more general (that is, useful in other scripts that might not have the same steps involved), we should not convert it into a separate function. When we create a separate function, we introduce too many variables into the script in an effort to simplify it, which is a counterproductive effort. Instead, let's focus on ways to generalize the ArcPy tools themselves. The first step will be to split the three ArcPy tools and examine what can be adjusted with each of them. The Select tool should be adjusted to accept a string as the SQL select statement. The SQL statement can then be generated by another function or by parameters accepted at runtime. For instance, if we wanted to make the script accept multiple bus stops for each run of the script (for example, the inbound and outbound stops for each line), we could create a function that would accept a list of the desired stops and a SQL template, and would return a SQL statement to plug into the Select tool. Here is an example of how it would look: def formatSQLIN(dataList, sqlTemplate): 'a function to generate a SQL statement' sql = sqlTemplate #"OBJECTID IN " step = "(" for data in dataList: step += str(data) sql += step + ")" return sql def formatSQL(dataList, sqlTemplate): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(data) + ' OR ' else: sql += sqlTemplate.format(data) return sql >>> dataVals = [1,2,3,4] >>> sqlOID = "OBJECTID = {0}" >>> sql = formatSQL(dataVals, sqlOID) >>> print sql The output is as follows: OBJECTID = 1 OR OBJECTID = 2 OR OBJECTID = 3 OR OBJECTID = 4 This new function, formatSQL(), is a very useful function. Let's review what it does by comparing the function to the results following it. The function is defined to accept two parameters: a list of values and a SQL template. The first local variable is the empty string sql, which will be added to using string addition. The function is designed to insert the values into the variable sql, creating a SQL statement by taking the SQL template and using string formatting to add them to the template, which in turn is added to the SQL statement string (note that sql += is equivelent to sql = sql +). Also, an operator (OR) is used to make the SQL statement inclusive of all data rows that match the pattern. This function uses the built-in enumerate function to count the iterations of the list; once it has reached the last value in the list, the operator is not added to the SQL statement. Note that we could also add one more parameter to the function to make it possible to use an AND operator instead of OR, while still keeping OR as the default: def formatSQL2(dataList, sqlTemplate, operator=" OR "): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(data) + operator else: sql += sqlTemplate.format(data) return sql >>> sql = formatSQL2(dataVals, sqlOID," AND ") >>> print sql The output is as follows: OBJECTID = 1 AND OBJECTID = 2 AND OBJECTID = 3 AND OBJECTID = 4 While it would make no sense to use an AND operator on ObjectIDs, there are other cases where it would make sense, hence leaving OR as the default while allowing for AND. Either way, this function can now be used to generate our bus stop SQL statement for multiple stops (ignoring, for now, the bus signage field): >>> sqlTemplate = "NAME = '{0}'" >>> lineNames = ['71 IB','71 OB'] >>> sql = formatSQL2(lineNames, sqlTemplate) >>> print sql The output is as follows: NAME = '71 IB' OR NAME = '71 OB' However, we can't ignore the Bus Signage field for the inbound line, as there are two starting points for the line, so we will need to adjust the function to accept multiple values: def formatSQLMultiple(dataList, sqlTemplate, operator=" OR "): 'a function to generate a SQL statement' sql = '' for count, data in enumerate(dataList): if count != len(dataList)-1: sql += sqlTemplate.format(*data) + operator else: sql += sqlTemplate.format(*data) return sql >>> sqlTemplate = "(NAME = '{0}' AND BUS_SIGNAG = '{1}')" >>> lineNames = [('71 IB', 'Ferry Plaza'),('71 OB','48th Avenue')] >>> sql = formatSQLMultiple(lineNames, sqlTemplate) >>> print sql The output is as follows: (NAME = '71 IB' AND BUS_SIGNAG = 'Ferry Plaza') OR (NAME = '71 OB' AND BUS_SIGNAG = '48th Avenue') The slight difference in this function, the asterisk before the data variable, allows the values inside the data variable to be correctly formatted into the SQL template by exploding the values within the tuple. Notice that the SQL template has been created to segregate each conditional by using parentheses. The function(s) are now ready for reuse, and the SQL statement is now ready for insertion into the Select tool: sql = formatSQLMultiple(lineNames, sqlTemplate) arcpy.Select_analysis(Bus_Stops, Inbound71, sql) Next up is the Buffer tool. We have already taken steps towards making it generalized by adding a variable for the distance. In this case, we will only add one more variable to it, a unit variable that will make it possible to adjust the buffer unit from feet to meter or any other allowed unit. We will leave the other defaults alone. Here is an adjusted version of the Buffer tool: bufferDist = 400 bufferUnit = "Feet" arcpy.Buffer_analysis(Inbound71, Inbound71_400ft_buffer, "{0} {1}".format(bufferDist, bufferUnit), "FULL", "ROUND", "NONE", "") Now, both the buffer distance and buffer unit are controlled by a variable defined in the previous script, and this will make it easily adjustable if it is decided that the distance was not sufficient and the variables might need to be adjusted. The next step towards adjusting the ArcPy tools is to write a function, which will allow for any number of feature classes to be intersected together using the Intersect tool. This new function will be similar to the formatSQL functions as previous, as they will use string formatting and addition to allow for a list of feature classes to be processed into the correct string format for the Intersect tool to accept them. However, as this function will be built to be as general as possible, it must be designed to accept any number of feature classes to be intersected: def formatIntersect(features): 'a function to generate an intersect string' formatString = '' for count, feature in enumerate(features): if count != len(features)-1: formatString += feature + " #;" else: formatString += feature + " #" return formatString >>> shpNames = ["example.shp","example2.shp"] >>> iString = formatIntersect(shpNames) >>> print iString The output is as follows: example.shp #;example2.shp # Now that we have written the formatIntersect() function, all that needs to be created is a list of the feature classes to be passed to the function. The string returned by the function can then be passed to the Intersect tool: intersected = [Inbound71_400ft_buffer, CensusBlocks2010] iString = formatIntersect(intersected) # Process: Intersect arcpy.Intersect_analysis(iString, Intersect71Census, "ALL", "", "INPUT") Because we avoided creating a function that only fits this script or analysis, we now have two (or more) useful functions that can be applied in later analyses, and we know how to manipulate the ArcPy tools to accept the data that we want to supply to them. Summary In this article, we discussed how to take autogenerated code and make it generalized, while adding functions that can be reused in other scripts and will make the generation of the necessary code components, such as SQL statements, much easier. Resources for Article: Further resources on this subject: Enterprise Geodatabase [article] Adding Graphics to the Map [article] Image classification and feature extraction from images [article]

0
0
27288

Packt

04 Mar 2015

20 min read

Writing Consumers

Packt

04 Mar 2015

20 min read

0
0
3687

article-image-prototyping-arduino-projects-using-python

Packt

04 Mar 2015

18 min read

Prototyping Arduino Projects using Python

Packt

04 Mar 2015

18 min read

In this article by Pratik Desai, the author of Python Programming for Arduino, we will cover the following topics: Working with pyFirmata methods Servomotor – moving the motor to a certain angle The Button() widget – interfacing GUI with Arduino and LEDs (For more resources related to this topic, see here.) Working with pyFirmata methods The pyFirmata package provides useful methods to bridge the gap between Python and Arduino's Firmata protocol. Although these methods are described with specific examples, you can use them in various different ways. This section also provides detailed description of a few additional methods. Setting up the Arduino board To set up your Arduino board in a Python program using pyFirmata, you need to specifically follow the steps that we have written down. We have distributed the entire code that is required for the setup process into small code snippets in each step. While writing your code, you will have to carefully use the code snippets that are appropriate for your application. You can always refer to the example Python files containing the complete code. Before we go ahead, let's first make sure that your Arduino board is equipped with the latest version of the StandardFirmata program and is connected to your computer: Depending upon the Arduino board that is being utilized, start by importing the appropriate pyFirmata classes to the Python code. Currently, the inbuilt pyFirmata classes only support the Arduino Uno and Arduino Mega boards: from pyfirmata import Arduino In case of Arduino Mega, use the following line of code: from pyfirmata import ArduinoMega Before we start executing any methods that is associated with handling pins, it is required to properly set the Arduino board. To perform this task, we have to first identify the USB port to which the Arduino board is connected and assign this location to a variable in the form of a string object. For Mac OS X, the port string should approximately look like this: port = '/dev/cu.usbmodemfa1331' For Windows, use the following string structure: port = 'COM3' In the case of the Linux operating system, use the following line of code: port = '/dev/ttyACM0' The port's location might be different according to your computer configuration. You can identify the correct location of your Arduino USB port by using the Arduino IDE. Once you have imported the Arduino class and assigned the port to a variable object, it's time to engage Arduino with pyFirmata and associate this relationship to another variable: board = Arduino(port) Similarly, for Arduino Mega, use this: board = ArduinoMega(port) The synchronization between the Arduino board and pyFirmata requires some time. Adding sleep time between the preceding assignment and the next set of instructions can help to avoid any issues that are related to serial port buffering. The easiest way to add sleep time is to use the inbuilt Python method, sleep(time): from time import sleep sleep(1) The sleep() method takes seconds as the parameter and a floating-point number can be used to provide the specific sleep time. For example, for 200 milliseconds, it will be sleep(0.2). At this point, you have successfully synchronized your Arduino Uno or Arduino Mega board to the computer using pyFirmata. What if you want to use a different variant (other than Arduino Uno or ArduinoMega) of the Arduino board? Any board layout in pyFirmata is defined as a dictionary object. The following is a sample of the dictionary object for the Arduino board: arduino = { 'digital' : tuple(x for x in range(14)), 'analog' : tuple(x for x in range(6)), 'pwm' : (3, 5, 6, 9, 10, 11), 'use_ports' : True, 'disabled' : (0, 1) # Rx, Tx, Crystal } For your variant of the Arduino board, you have to first create a custom dictionary object. To create this object, you need to know the hardware layout of your board. For example, an Arduino Nano board has a layout similar to a regular Arduino board, but it has eight instead of six analog ports. Therefore, the preceding dictionary object can be customized as follows: nano = { 'digital' : tuple(x for x in range(14)), 'analog' : tuple(x for x in range(8)), 'pwm' : (3, 5, 6, 9, 10, 11), 'use_ports' : True, 'disabled' : (0, 1) # Rx, Tx, Crystal } As you have already synchronized the Arduino board earlier, modify the layout of the board using the setup_layout(layout) method: board.setup_layout(nano) This command will modify the default layout of the synchronized Arduino board to the Arduino Nano layout or any other variant for which you have customized the dictionary object. Configuring Arduino pins Once your Arduino board is synchronized, it is time to configure the digital and analog pins that are going to be used as part of your program. Arduino board has digital I/O pins and analog input pins that can be utilized to perform various operations. As we already know, some of these digital pins are also capable of PWM. The direct method Now before we start writing or reading any data to these pins, we have to first assign modes to these pins. In the Arduino sketch-based, we use the pinMode function, that is, pinMode(11, INPUT) for this operation. Similarly, in pyFirmata, this assignment operation is performed using the mode method on the board object as shown in the following code snippet: from pyfirmata import Arduino from pyfirmata import INPUT, OUTPUT, PWM # Setting up Arduino board port = '/dev/cu.usbmodemfa1331' board = Arduino(port) # Assigning modes to digital pins board.digital[13].mode = OUTPUT board.analog[0].mode = INPUT The pyFirmata library includes classes for the INPUT and OUTPUT modes, which are required to be imported before you utilized them. The preceding example shows the delegation of digital pin 13 as an output and the analog pin 0 as an input. The mode method is performed on the variable assigned to the configured Arduino board using the digital[] and analog[] array index assignment. The pyFirmata library also supports additional modes such as PWM and SERVO. The PWM mode is used to get analog results from digital pins, while SERVO mode helps a digital pin to set the angle of the shaft between 0 to 180 degrees. If you are using any of these modes, import their appropriate classes from the pyFirmata library. Once these classes are imported from the pyFirmata package, the modes for the appropriate pins can be assigned using the following lines of code: board.digital[3].mode = PWM board.digital[10].mode = SERVO Assigning pin modes The direct method of configuring pin is mostly used for a single line of execution calls. In a project containing a large code and complex logic, it is convenient to assign a pin with its role to a variable object. With an assignment like this, you can later utilize the assigned variable throughout the program for various actions, instead of calling the direct method every time you need to use that pin. In pyFirmata, this assignment can be performed using the get_pin(pin_def) method: from pyfirmata import Arduino port = '/dev/cu.usbmodemfa1311' board = Arduino(port) # pin mode assignment ledPin = board.get_pin('d:13:o') The get_pin() method lets you assign pin modes using the pin_def string parameter, 'd:13:o'. The three components of pin_def are pin type, pin number, and pin mode separated by a colon (:) operator. The pin types ( analog and digital) are denoted with a and d respectively. The get_pin() method supports three modes, i for input, o for output, and p for PWM. In the previous code sample, 'd:13:o' specifies the digital pin 13 as an output. In another example, if you want to set up the analog pin 1 as an input, the parameter string will be 'a:1:i'. Working with pins As you have configured your Arduino pins, it's time to start performing actions using them. Two different types of methods are supported while working with pins: reporting methods and I/O operation methods. Reporting data When pins get configured in a program as analog input pins, they start sending input values to the serial port. If the program does not utilize this incoming data, the data starts getting buffered at the serial port and quickly overflows. The pyFirmata library provides the reporting and iterator methods to deal with this phenomenon. The enable_reporting() method is used to set the input pin to start reporting. This method needs to be utilized before performing a reading operation on the pin: board.analog[3].enable_reporting() Once the reading operation is complete, the pin can be set to disable reporting: board.analog[3].disable_reporting() In the preceding example, we assumed that you have already set up the Arduino board and configured the mode of the analog pin 3 as INPUT. The pyFirmata library also provides the Iterator() class to read and handle data over the serial port. While working with analog pins, we recommend that you start an iterator thread in the main loop to update the pin value to the latest one. If the iterator method is not used, the buffered data might overflow your serial port. This class is defined in the util module of the pyFirmata package and needs to be imported before it is utilized in the code: from pyfirmata import Arduino, util # Setting up the Arduino board port = 'COM3' board = Arduino(port) sleep(5) # Start Iterator to avoid serial overflow it = util.Iterator(board) it.start() Manual operations As we have configured the Arduino pins to suitable modes and their reporting characteristic, we can start monitoring them. The pyFirmata provides the write() and read() methods for the configured pins. The write() method The write() method is used to write a value to the pin. If the pin's mode is set to OUTPUT, the value parameter is a Boolean, that is, 0 or 1: board.digital[pin].mode = OUTPUT board.digital[pin].write(1) If you have used an alternative method of assigning the pin's mode, you can use the write() method as follows: ledPin = board.get_pin('d:13:o') ledPin.write(1) In case of the PWM signal, the Arduino accepts a value between 0 and 255 that represents the length of the duty cycle between 0 and 100 percent. The PyFiramta library provides a simplified method to deal with the PWM values as instead of values between 0 and 255, as you can just provide a float value between 0 and 1.0. For example, if you want a 50 percent duty cycle (2.5V analog value), you can specify 0.5 with the write() method. The pyFirmata library will take care of the translation and send the appropriate value, that is, 127, to the Arduino board via the Firmata protocol: board.digital[pin].mode = PWM board.digital[pin].write(0.5) Similarly, for the indirect method of assignment, you can use code similar to the following one: pwmPin = board.get_pin('d:13:p') pwmPin.write(0.5) If you are using the SERVO mode, you need to provide the value in degrees between 0 and 180. Unfortunately, the SERVO mode is only applicable for direct assignment of the pins and will be available in future for indirect assignments: board.digital[pin].mode = SERVO board.digital[pin].write(90) The read() method The read() method provides an output value at the specified Arduino pin. When the Iterator() class is being used, the value received using this method is the latest updated value at the serial port. When you read a digital pin, you can get only one of the two inputs, HIGH or LOW, which will translate to 1 or 0 in Python: board.digital[pin].read() The analog pins of Arduino linearly translate the input voltages between 0 and +5V to 0 and 1023. However, in pyFirmata, the values between 0 and +5V are linearly translated into the float values of 0 and 1.0. For example, if the voltage at the analog pin is 1V, an Arduino program will measure a value somewhere around 204, but you will receive the float value as 0.2 while using pyFirmata's read() method in Python. Servomotor – moving the motor to certain angle Servomotors are widely used electronic components in applications such as pan-tilt camera control, robotics arm, mobile robot movements, and so on where precise movement of the motor shaft is required. This precise control of the motor shaft is possible because of the position sensing decoder, which is an integral part of the servomotor assembly. A standard servomotor allows the angle of the shaft to be set between 0 and 180 degrees. The pyFirmata provides the SERVO mode that can be implemented on every digital pin. This prototyping exercise provides a template and guidelines to interface a servomotor with Python. Connections Typically, a servomotor has wires that are color-coded red, black and yellow, respectively to connect with the power, ground, and signal of the Arduino board. Connect the power and the ground of the servomotor to the 5V and the ground of the Arduino board. As displayed in the following diagram, connect the yellow signal wire to the digital pin 13: If you want to use any other digital pin, make sure that you change the pin number in the Python program in the next section. Once you have made the appropriate connections, let's move on to the Python program. The Python code The Python file consisting this code is named servoCustomAngle.py and is located in the code bundle of this book, which can be downloaded from https://www.packtpub.com/books/content/support/19610. Open this file in your Python editor. Like other examples, the starting section of the program contains the code to import the libraries and set up the Arduino board: from pyfirmata import Arduino, SERVO from time import sleep # Setting up the Arduino board port = 'COM5' board = Arduino(port) # Need to give some time to pyFirmata and Arduino to synchronize sleep(5) Now that you have Python ready to communicate with the Arduino board, let's configure the digital pin that is going to be used to connect the servomotor to the Arduino board. We will complete this task by setting the mode of pin 13 to SERVO: # Set mode of the pin 13 as SERVO pin = 13 board.digital[pin].mode = SERVO The setServoAngle(pin,angle) custom function takes the pins on which the servomotor is connected and the custom angle as input parameters. This function can be used as a part of various large projects that involve servos: # Custom angle to set Servo motor angle def setServoAngle(pin, angle): board.digital[pin].write(angle) sleep(0.015) In the main logic of this template, we want to incrementally move the motor shaft in one direction until it achieves the maximum achievable angle (180 degrees) and then move it back to the original position with the same incremental speed. In the while loop, we will ask the user to provide inputs to continue this routine, which will be captured using the raw_input() function. The user can enter character y to continue this routine or enter any other character to abort the loop: # Testing the function by rotating motor in both direction while True: for i in range(0, 180): setServoAngle(pin, i) for i in range(180, 1, -1): setServoAngle(pin, i) # Continue or break the testing process i = raw_input("Enter 'y' to continue or Enter to quit): ") if i == 'y': pass else: board.exit() break While working with all these prototyping examples, we used the direct communication method by using digital and analog pins to connect the sensor with Arduino. Now, let's get familiar with another widely used communication method between Arduino and the sensors. This is called I2C communication. The Button() widget – interfacing GUI with Arduino and LEDs Now that you have had your first hands-on experience in creating a Python graphical interface, let's integrate Arduino with it. Python makes it easy to interface various heterogeneous packages within each other and that is what you are going to do. In the next coding exercise, we will use Tkinter and pyFirmata to make the GUI work with Arduino. In this exercise, we are going to use the Button() widget to control the LEDs interfaced with the Arduino board. Before we jump to the exercises, let's build the circuit that we will need for all upcoming programs. The following is a Fritzing diagram of the circuit where we use two different colored LEDs with pull up resistors. Connect these LEDs to digital pins 10 and 11 on your Arduino Uno board, as displayed in the following diagram: While working with the code provided in this section, you will have to replace the Arduino port that is used to define the board variable according to your operating system. Also, make sure that you provide the correct pin number in the code if you are planning to use any pins other than 10 and 11. For some exercises, you will have to use the PWM pins, so make sure that you have correct pins. You can use the entire code snippet as a Python file and run it. But, this might not be possible in the upcoming exercises due to the length of the program and the complexity involved. For the Button() widget exercise, open the exampleButton.py file. The code contains three main components: pyFirmata and Arduino configurations Tkinter widget definitions for a button The LED blink function that gets executed when you press the button As you can see in the following code snippet, we have first imported libraries and initialized the Arduino board using the pyFirmata methods. For this exercise, we are only going to work with one LED and we have initialized only the ledPin variable for it: import Tkinter import pyfirmata from time import sleep port = '/dev/cu.usbmodemfa1331' board = pyfirmata.Arduino(port) sleep(5) ledPin = board.get_pin('d:11:o') As we are using the pyFirmata library for all the exercises in this article, make sure that you have uploaded the latest version of the standard Firmata sketch on your Arduino board. In the second part of the code, we have initialized the root Tkinter widget as top and provided a title string. We have also fixed the size of this window using the minsize() method. In order to get more familiar with the root widget, you can play around with the minimum and maximum size of the window: top = Tkinter.Tk() top.title("Blink LED using button") top.minsize(300,30) The Button() widget is a standard Tkinter widget that is mostly used to obtain the manual, external input stimulus from the user. Like the Label() widget, the Button() widget can be used to display text or images. Unlike the Label() widget, it can be associated with actions or methods when it is pressed. When the button is pressed, Tkinter executes the methods or commands specified by the command option: startButton = Tkinter.Button(top, text="Start", command=onStartButtonPress) startButton.pack() In this initialization, the function associated with the button is onStartButtonPress and the "Start" string is displayed as the title of the button. Similarly, the top object specifies the parent or the root widget. Once the button is instantiated, you will need to use the pack() method to make it available in the main window. In the preceding lines of code, the onStartButonPress() function includes the scripts that are required to blink the LEDs and change the state of the button. A button state can have the state as NORMAL, ACTIVE, or DISABLED. If it is not specified, the default state of any button is NORMAL. The ACTIVE and DISABLED states are useful in applications when repeated pressing of the button needs to be avoided. After turning the LED on using the write(1) method, we will add a time delay of 5 seconds using the sleep(5) function before turning it off with the write(0) method: def onStartButtonPress(): startButton.config(state=Tkinter.DISABLED) ledPin.write(1) # LED is on for fix amount of time specified below sleep(5) ledPin.write(0) startButton.config(state=Tkinter.ACTIVE) At the end of the program, we will execute the mainloop() method to initiate the Tkinter loop. Until this function is executed, the main window won't appear. To run the code, make appropriate changes to the Arduino board variable and execute the program. The following screenshot with a button and title bar will appear as the output of the program. Clicking on the Start button will turn on the LED on the Arduino board for the specified time delay. Meanwhile, when the LED is on, you will not be able to click on the Start button again. Now, in this particular program, we haven't provided sufficient code to safely disengage the Arduino board and it will be covered in upcoming exercises. Summary In this article, we learned about the Python library pyFirmata to interface Arduino to your computer using the Firmata protocol. We build a prototype using pyFirmata and Arduino to control servomotor and also developed another one with GUI, based on the Tkinter library, to control LEDs. Resources for Article: Further resources on this subject: Python Functions : Avoid Repeating Code? [article] Python 3 Designing Tasklist Application [article] The Five Kinds Of Python Functions Python 3.4 Edition [article]

0
0
24158

How-To Tutorials

Packt

04 Mar 2015

10 min read

Deployment Scenarios

Packt

04 Mar 2015

10 min read

In this article by Andrea Gazzarini, author of the book Apache Solr Essentials, contains information on the various ways in which you can deploy Solr, including key features and pros and cons for each scenario. Solr has a wide range of deployment alternatives, from monolithic to distributed indexes and standalone to clustered instances. We will organize this article by deployment scenarios, with a growing level of complexity. This article will cover the following topics: Sharding Replication: master, slave, and repeaters (For more resources related to this topic, see here.) Standalone instance All the examples use a standalone instance of Solr, that is, one or more cores managed by a Solr deployment hosted in a standalone servlet container (for example, Jetty, Tomcat, and so on). This kind of deployment is useful for development because, as you learned, it is very easy to start and debug. Besides, it can also be suitable for a production context if you don't have strict non-functional requirements and have a small or medium amount of data. I have used a standalone instance to provide autocomplete services for small and medium intranet systems. Anyway, the main features of this kind of deployment are simplicity and maintainability; one simple node acts as both an indexer and a searcher. The following diagram depicts a standalone instance with two cores: Shards When a monolithic index becomes too large for a single node or when additions, deletions, or queries take too long to execute, the index can be split into multiple pieces called shards. The previous sentence highlights a logical and theoretical evolution path of a Solr index. However, this (in general) is valid for all scenarios we will describe. It is strongly recommended that you perform a preliminary analysis of your data and the estimated growth factor in order to decide from the beginning the right configuration that suits your requirements. Although it is possible to split an existing index into shards (https://lucene.apache.org/core/4_10_3/misc/org/apache/lucene/index/PKIndexSplitter.html), things definitely become easier if you start directly with a distributed index (if you need it, of course). The index is split vertically so that each shard contains a disjoint set of the entire index. Solr will query and merge results across those shards. The following diagram illustrates a Solr deployment with 3 nodes; this deployment consists of two cores (C1 and C2) divided into three shards (S1, S2, and S3): When using shards, only query requests are distributed. This means that it's up to the indexer to add and distribute the data across nodes, and to subsequently forward a change request (that is, delete, replace, and commit) for a given document to the appropriate shard (the shard that owns the document). The Solr Wiki recommends a simple, hash-based algorithm to determine the shard where a given document should be indexed: documentId.hashCode() % numServers Using this approach is also useful in order to know in advance where to send delete or update requests for a given document. On the opposite side, a searcher client will send a query request to any node, but it has to specify an additional shards parameter that declares the target shards that will be queried. In the following example, assuming that two shards are hosted in two servers listening to ports 8080 and 8081, the same request when sent to both nodes will produce the same result: http://localhost:8080/solr/c1/query?q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2 http://localhost:8081/solr/c2/query?q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2 When sending a query request, a client can optionally include a pseudofield associated with the [shard] transformer. In this case, as a part of each returned document, there will be additional information indicating the owning shard. This is an example of such a request: http://localhost:8080/solr/c1/query?q=*:*&shards=localhost:8080/solr/c1,localhost:8081/solr/c2&src_shard:[shard] Here is the corresponding response (note the pseudofield aliased as src_shard): <result name="response" numFound="192" start="0"> <doc> <str name="id">9920</str> <str name="brand">Fender</str> <str name="model">Jazz Bass</str> <arr name="artist"> <str>Marcus Miller</str> </arr><str name="series">Marcus Miller signature</str> <str name="src_shard">localhost:8080/solr/shard1</str> </doc> … <doc> <str name="id">4392</str> <str name="brand">Music Man</str> <str name="model">Sting Ray</str> <arr name="artist"><str>Tony Levin</str></arr> <str name="series">5 strings DeLuxe</str> <str name="src_shard">localhost:8081/solr/shard2</str> </doc> </result> The following are a few things to keep in mind when using this deployment scenario: The schema must have a uniqueKey field. This field must be declared as stored and indexed; in addition, it is supposed to be unique across all shards. Inverse Document Frequency (IDF) calculations cannot be distributed. IDF is computed per shard. Joins between documents belonging to different shards are not supported. If a shard receives both index and query requests, the index may change during a query execution, thus compromising the outgoing results (for example, a matching document that has been deleted). Master/slaves scenario In a master/slaves scenario, there are two types of Solr servers: an indexer (the master) and one or more searchers (the slaves). The master is the server that manages the index. It receives update requests and applies those changes. A searcher, on the other hand, is a Solr server that exposes search services to external clients. The index, in terms of data files, is replicated from the indexer to the searcher through HTTP by means of a built-in RequestHandler that must be configured on both the indexer side and searcher side (within the solrconfig.xml configuration file). On the indexer (master), a replication configuration looks like this: <requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="master"> <str name="replicateAfter">startup</str> <str name="replicateAfter">optimize</str> <str name="confFiles">schema.xml,stopwords.txt</str> </lst> </requestHandler> The replication mechanism can be configured to be triggered after one of the following events: Commit: A commit has been applied Optimize: The index has been optimized Startup: The Solr instance has started In the preceding example, we want the index to be replicated after startup and optimize commands. Using the confFiles parameter, we can also indicate a set of configuration files (schema.xml and stopwords.txt, in the example) that must be replicated together with the index. Remember that changes on those files don't trigger any replication. Only a change in the index, in conjunction with one of the events we defined in the replicateAfter parameter, will mark the index (and the configuration files) as replicable. On the searcher side, the configuration looks like the following: <requestHandler name="/replication" class="solr.ReplicationHandler"> <lst name="slave"> <str name="masterUrl">http://<localhost>:<port>/solrmaster</str> <str name="pollInterval">00:00:10</str> </lst> </requestHandler> You can see that a searcher periodically keeps polling the master (the pollInterval parameter) to check whether a newer version of the index is available. If it is, the searcher will start the replication mechanism by issuing a request to the master, which is completely unaware of the searchers. The replicability status of the index is actually indicated by a version number. If the searcher has the same version as the master, it means the index is the same. If the versions are different, it means that a newer version of the index is available on the master, and replication can start. Other than separating responsibilities, this deployment configuration allows us to have a so-called diamond architecture, consisting of one indexer and several searchers. When the replication is triggered, each searcher in the ring will receive a whole copy of the index. This allows the following: Load balancing of the incoming (query) requests. An increment to the availability of the whole system. In the event of a server crash, the other searchers will continue to serve the incoming requests. The following diagram illustrates a master/slave deployment scenario with one indexer, three searchers, and two cores: If the searchers are in several geographically dislocated data centers, an additional role called repeater can be configured in each data center in order to rationalize the replication data traffic flow between nodes. A repeater is simply a node that acts as both a master and a slave. It is a slave of the main master, and at the same time, it acts as master of the searchers within the same data center, as shown in this diagram: Shards with replication This scenario combines shards and replication in order to have a scalable system with high throughput and availability. There is one indexer and one or more searchers for each shard, allowing load balancing between (query) shard requests. The following diagram illustrates a scenario with two cores, three shards, one indexer, and (due to problems with available space), only one searcher for each shard: The drawback of this approach is undoubtedly the overall growing complexity of the system that requires more effort in terms of maintainability, manageability, and system administration. In addition to this, each searcher is an independent node, and we don't have a central administration console where a system administrator can get a quick overview of system health. Summary In this article, we described various ways in which you can deploy Solr. Each deployment scenario has specific features, advantages, and drawbacks that make a choice ideal for one context and bad for another. A good thing is that the different scenarios are not strictly exclusive; they follow an incremental approach. In an ideal context, things should start immediately with the perfect scenario that fits your needs. However, unless your requirements are clear right from the start, you can begin with a simple configuration and then change it, depending on how your application evolves. Resources for Article: Further resources on this subject: Tuning Solr JVM and Container [article] Boost Your search [article] In the Cloud [article]

0
0
2009

Packt

04 Mar 2015

20 min read

AngularJS Performance

Packt

04 Mar 2015

20 min read

In this article by Chandermani, the author of AngularJS by Example, we focus our discussion on the performance aspect of AngularJS. For most scenarios, we can all agree that AngularJS is insanely fast. For standard size views, we rarely see any performance bottlenecks. But many views start small and then grow over time. And sometimes the requirement dictates we build large pages/views with a sizable amount of HTML and data. In such a case, there are things that we need to keep in mind to provide an optimal user experience. Take any framework and the performance discussion on the framework always requires one to understand the internal working of the framework. When it comes to Angular, we need to understand how Angular detects model changes. What are watches? What is a digest cycle? What roles do scope objects play? Without a conceptual understanding of these subjects, any performance guidance is merely a checklist that we follow without understanding the why part. Let's look at some pointers before we begin our discussion on performance of AngularJS: The live binding between the view elements and model data is set up using watches. When a model changes, one or many watches linked to the model are triggered. Angular's view binding infrastructure uses these watches to synchronize the view with the updated model value. Model change detection only happens when a digest cycle is triggered. Angular does not track model changes in real time; instead, on every digest cycle, it runs through every watch to compare the previous and new values of the model to detect changes. A digest cycle is triggered when $scope.$apply is invoked. A number of directives and services internally invoke $scope.$apply: Directives such as ng-click, ng-mouse* do it on user action Services such as $http and $resource do it when a response is received from server $timeout or $interval call $scope.$apply when they lapse A digest cycle tracks the old value of the watched expression and compares it with the new value to detect if the model has changed. Simply put, the digest cycle is a workflow used to detect model changes. A digest cycle runs multiple times till the model data is stable and no watch is triggered. Once you have a clear understanding of the digest cycle, watches, and scopes, we can look at some performance guidelines that can help us manage views as they start to grow. (For more resources related to this topic, see here.) Performance guidelines When building any Angular app, any performance optimization boils down to: Minimizing the number of binding expressions and hence watches Making sure that binding expression evaluation is quick Optimizing the number of digest cycles that take place The next few sections provide some useful pointers in this direction. Remember, a lot of these optimization may only be necessary if the view is large. Keeping the page/view small The sanest advice is to keep the amount of content available on a page small. The user cannot interact/process too much data on the page, so remember that screen real estate is at a premium and only keep necessary details on a page. The lesser the content, the lesser the number of binding expressions; hence, fewer watches and less processing are required during the digest cycle. Remember, each watch adds to the overall execution time of the digest cycle. The time required for a single watch can be insignificant but, after combining hundreds and maybe thousands of them, they start to matter. Angular's data binding infrastructure is insanely fast and relies on a rudimentary dirty check that compares the old and the new values. Check out the stack overflow (SO) post (http://stackoverflow.com/questions/9682092/databinding-in-angularjs), where Misko Hevery (creator of Angular) talks about how data binding works in Angular. Data binding also adds to the memory footprint of the application. Each watch has to track the current and previous value of a data-binding expression to compare and verify if data has changed. Keeping a page/view small may not always be possible, and the view may grow. In such a case, we need to make sure that the number of bindings does not grow exponentially (linear growth is OK) with the page size. The next two tips can help minimize the number of bindings in the page and should be seriously considered for large views. Optimizing watches for read-once data In any Angular view, there is always content that, once bound, does not change. Any read-only data on the view can fall into this category. This implies that once the data is bound to the view, we no longer need watches to track model changes, as we don't expect the model to update. Is it possible to remove the watch after one-time binding? Angular itself does not have something inbuilt, but a community project bindonce (https://github.com/Pasvaz/bindonce) is there to fill this gap. Angular 1.3 has added support for bind and forget in the native framework. Using the syntax {{::title}}, we can achieve one-time binding. If you are on Angular 1.3, use it! Hiding (ng-show) versus conditional rendering (ng-if/ng-switch) content You have learned two ways to conditionally render content in Angular. The ng-show/ng-hide directive shows/hides the DOM element based on the expression provided and ng-if/ng-switch creates and destroys the DOM based on an expression. For some scenarios, ng-if can be really beneficial as it can reduce the number of binding expressions/watches for the DOM content not rendered. Consider the following example: <div ng-if='user.isAdmin'> <div ng-include="'admin-panel.html'"></div></div> The snippet renders an admin panel if the user is an admin. With ng-if, if the user is not an admin, the ng-include directive template is neither requested nor rendered saving us of all the bindings and watches that are part of the admin-panel.html view. From the preceding discussion, it may seem that we should get rid of all ng-show/ng-hide directives and use ng-if. Well, not really! It again depends; for small size pages, ng-show/ng-hide works just fine. Also, remember that there is a cost to creating and destroying the DOM. If the expression to show/hide flips too often, this will mean too many DOMs create-and-destroy cycles, which are detrimental to the overall performance of the app. Expressions being watched should not be slow Since watches are evaluated too often, the expression being watched should return results fast. The first way we can make sure of this is by using properties instead of functions to bind expressions. These expressions are as follows: {{user.name}}ng-show='user.Authorized' The preceding code is always better than this: {{getUserName()}}ng-show = 'isUserAuthorized(user)' Try to minimize function expressions in bindings. If a function expression is required, make sure that the function returns a result quickly. Make sure a function being watched does not: Make any remote calls Use $timeout/$interval Perform sorting/filtering Perform DOM manipulation (this can happen inside directive implementation) Or perform any other time-consuming operation Be sure to avoid such operations inside a bound function. To reiterate, Angular will evaluate a watched expression multiple times during every digest cycle just to know if the return value (a model) has changed and the view needs to be synchronized. Minimizing the deep model watch When using $scope.$watch to watch for model changes in controllers, be careful while setting the third $watch function parameter to true. The general syntax of watch looks like this: $watch(watchExpression, listener, [objectEquality]); In the standard scenario, Angular does an object comparison based on the reference only. But if objectEquality is true, Angular does a deep comparison between the last value and new value of the watched expression. This can have an adverse memory and performance impact if the object is large. Handling large datasets with ng-repeat The ng-repeat directive undoubtedly is the most useful directive Angular has. But it can cause the most performance-related headaches. The reason is not because of the directive design, but because it is the only directive that allows us to generate HTML on the fly. There is always the possibility of generating enormous HTML just by binding ng-repeat to a big model list. Some tips that can help us when working with ng-repeat are: Page data and use limitTo: Implement a server-side paging mechanism when a number of items returned are large. Also use the limitTo filter to limit the number of items rendered. Its syntax is as follows: <tr ng-repeat="user in users |limitTo:pageSize">…</tr> Look at modules such as ngInfiniteScroll (http://binarymuse.github.io/ngInfiniteScroll/) that provide an alternate mechanism to render large lists. Use the track by expression: The ng-repeat directive for performance tries to make sure it does not unnecessarily create or delete HTML nodes when items are added, updated, deleted, or moved in the list. To achieve this, it adds a $$hashKey property to every model item allowing it to associate the DOM node with the model item. We can override this behavior and provide our own item key using the track by expression such as: <tr ng-repeat="user in users track by user.id">…</tr> This allows us to use our own mechanism to identify an item. Using your own track by expression has a distinct advantage over the default hash key approach. Consider an example where you make an initial AJAX call to get users: $scope.getUsers().then(function(users){ $scope.users = users;}) Later again, refresh the data from the server and call something similar again: $scope.users = users; With user.id as a key, Angular is able to determine what elements were added/deleted and moved; it can also determine created/deleted DOM nodes for such elements. Remaining elements are not touched by ng-repeat (internal bindings are still evaluated). This saves a lot of CPU cycles for the browser as fewer DOM elements are created and destroyed. Do not bind ng-repeat to a function expression: Using a function's return value for ng-repeat can also be problematic, depending upon how the function is implemented. Consider a repeat with this: <tr ng-repeat="user in getUsers()">…</tr> And consider the controller getUsers function with this: $scope.getUser = function() { var orderBy = $filter('orderBy'); return orderBy($scope.users, predicate);} Angular is going to evaluate this expression and hence call this function every time the digest cycle takes place. A lot of CPU cycles were wasted sorting user data again and again. It is better to use scope properties and presort the data before binding. Minimize filters in views, use filter elements in the controller: Filters defined on ng-repeat are also evaluated every time the digest cycle takes place. For large lists, if the same filtering can be implemented in the controller, we can avoid constant filter evaluation. This holds true for any filter function that is used with arrays including filter and orderBy. Avoiding mouse-movement tracking events The ng-mousemove, ng-mouseenter, ng-mouseleave, and ng-mouseover directives can just kill performance. If an expression is attached to any of these event directives, Angular triggers a digest cycle every time the corresponding event occurs and for events like mouse move, this can be a lot. We have already seen this behavior when working with 7 Minute Workout, when we tried to show a pause overlay on the exercise image when the mouse hovers over it. Avoid them at all cost. If we just want to trigger some style changes on mouse events, CSS is a better tool. Avoiding calling $scope.$apply Angular is smart enough to call $scope.$apply at appropriate times without us explicitly calling it. This can be confirmed from the fact that the only place we have seen and used $scope.$apply is within directives. The ng-click and updateOnBlur directives use $scope.$apply to transition from a DOM event handler execution to an Angular execution context. Even when wrapping the jQuery plugin, we may require to do a similar transition for an event raised by the JQuery plugin. Other than this, there is no reason to use $scope.$apply. Remember, every invocation of $apply results in the execution of a complete digest cycle. The $timeout and $interval services take a Boolean argument invokeApply. If set to false, the lapsed $timeout/$interval services does not call $scope.$apply or trigger a digest cycle. Therefore, if you are going to perform background operations that do not require $scope and the view to be updated, set the last argument to false. Always use Angular wrappers over standard JavaScript objects/functions such as $timeout and $interval to avoid manually calling $scope.$apply. These wrapper functions internally call $scope.$apply. Also, understand the difference between $scope.$apply and $scope.$digest. $scope.$apply triggers $rootScope.$digest that evaluates all application watches whereas, $scope.$digest only performs dirty checks on the current scope and its children. If we are sure that the model changes are not going to affect anything other than the child scopes, we can use $scope.$digest instead of $scope.$apply. Lazy-loading, minification, and creating multiple SPAs I hope you are not assuming that the apps that we have built will continue to use the numerous small script files that we have created to separate modules and module artefacts (controllers, directives, filters, and services). Any modern build system has the capability to concatenate and minify these files and replace the original file reference with a unified and minified version. Therefore, like any JavaScript library, use minified script files for production. The problem with the Angular bootstrapping process is that it expects all Angular application scripts to be loaded before the application can bootstrap. We cannot load modules, controllers, or in fact, any of the other Angular constructs on demand. This means we need to provide every artefact required by our app, upfront. For small applications, this is not a problem as the content is concatenated and minified; also, the Angular application code itself is far more compact as compared to the traditional JavaScript of jQuery-based apps. But, as the size of the application starts to grow, it may start to hurt when we need to load everything upfront. There are at least two possible solutions to this problem; the first one is about breaking our application into multiple SPAs. Breaking applications into multiple SPAs This advice may seem counterintuitive as the whole point of SPAs is to get rid of full page loads. By creating multiple SPAs, we break the app into multiple small SPAs, each supporting parts of the overall app functionality. When we say app, it implies a combination of the main (such as index.html) page with ng-app and all the scripts/libraries and partial views that the app loads over time. For example, we can break the Personal Trainer application into a Workout Builder app and a Workout Runner app. Both have their own start up page and scripts. Common scripts such as the Angular framework scripts and any third-party libraries can be referenced in both the applications. On similar lines, common controllers, directives, services, and filters too can be referenced in both the apps. The way we have designed Personal Trainer makes it easy to achieve our objective. The segregation into what belongs where has already been done. The advantage of breaking an app into multiple SPAs is that only relevant scripts related to the app are loaded. For a small app, this may be an overkill but for large apps, it can improve the app performance. The challenge with this approach is to identify what parts of an application can be created as independent SPAs; it totally depends upon the usage pattern of the application. For example, assume an application has an admin module and an end consumer/user module. Creating two SPAs, one for admin and the other for the end customer, is a great way to keep user-specific features and admin-specific features separate. A standard user may never transition to the admin section/area, whereas an admin user can still work on both areas; but transitioning from the admin area to a user-specific area will require a full page refresh. If breaking the application into multiple SPAs is not possible, the other option is to perform the lazy loading of a module. Lazy-loading modules Lazy-loading modules or loading module on demand is a viable option for large Angular apps. But unfortunately, Angular itself does not have any in-built support for lazy-loading modules. Furthermore, the additional complexity of lazy loading may be unwarranted as Angular produces far less code as compared to other JavaScript framework implementations. Also once we gzip and minify the code, the amount of code that is transferred over the wire is minimal. If we still want to try our hands on lazy loading, there are two libraries that can help: ocLazyLoad (https://github.com/ocombe/ocLazyLoad): This is a library that uses script.js to load modules on the fly angularAMD (http://marcoslin.github.io/angularAMD): This is a library that uses require.js to lazy load modules With lazy loading in place, we can delay the loading of a controller, directive, filter, or service script, until the page that requires them is loaded. The overall concept of lazy loading seems to be great but I'm still not sold on this idea. Before we adopt a lazy-load solution, there are things that we need to evaluate: Loading multiple script files lazily: When scripts are concatenated and minified, we load the complete app at once. Contrast it to lazy loading where we do not concatenate but load them on demand. What we gain in terms of lazy-load module flexibility we lose in terms of performance. We now have to make a number of network requests to load individual files. Given these facts, the ideal approach is to combine lazy loading with concatenation and minification. In this approach, we identify those feature modules that can be concatenated and minified together and served on demand using lazy loading. For example, Personal Trainer scripts can be divided into three categories: The common app modules: This consists of any script that has common code used across the app and can be combined together and loaded upfront The Workout Runner module(s): Scripts that support workout execution can be concatenated and minified together but are loaded only when the Workout Runner pages are loaded. The Workout Builder module(s): On similar lines to the preceding categories, scripts that support workout building can be combined together and served only when the Workout Builder pages are loaded. As we can see, there is a decent amount of effort required to refactor the app in a manner that makes module segregation, concatenation, and lazy loading possible. The effect on unit and integration testing: We also need to evaluate the effect of lazy-loading modules in unit and integration testing. The way we test is also affected with lazy loading in place. This implies that, if lazy loading is added as an afterthought, the test setup may require tweaking to make sure existing tests still run. Given these facts, we should evaluate our options and check whether we really need lazy loading or we can manage by breaking a monolithic SPA into multiple smaller SPAs. Caching remote data wherever appropriate Caching data is the one of the oldest tricks to improve any webpage/application performance. Analyze your GET requests and determine what data can be cached. Once such data is identified, it can be cached from a number of locations. Data cached outside the app can be cached in: Servers: The server can cache repeated GET requests to resources that do not change very often. This whole process is transparent to the client and the implementation depends on the server stack used. Browsers: In this case, the browser caches the response. Browser caching depends upon the server sending HTTP cache headers such as ETag and cache-control to guide the browser about how long a particular resource can be cached. Browsers can honor these cache headers and cache data appropriately for future use. If server and browser caching is not available or if we also want to incorporate any amount of caching in the client app, we do have some choices: Cache data in memory: A simple Angular service can cache the HTTP response in the memory. Since Angular is SPA, the data is not lost unless the page refreshes. This is how a service function looks when it caches data: var workouts;service.getWorkouts = function () { if (workouts) return $q.resolve(workouts); return $http.get("/workouts").then(function (response){ workouts = response.data; return workouts; });}; The implementation caches a list of workouts into the workouts variable for future use. The first request makes a HTTP call to retrieve data, but subsequent requests just return the cached data as promised. The usage of $q.resolve makes sure that the function always returns a promise. Angular $http cache: Angular's $http service comes with a configuration option cache. When set to true, $http caches the response of the particular GET request into a local cache (again an in-memory cache). Here is how we cache a GET request: $http.get(url, { cache: true}); Angular caches this cache for the lifetime of the app, and clearing it is not easy. We need to get hold of the cache dedicated to caching HTTP responses and clear the cache key manually. The caching strategy of an application is never complete without a cache invalidation strategy. With cache, there is always a possibility that caches are out of sync with respect to the actual data store. We cannot affect the server-side caching behavior from the client; consequently, let's focus on how to perform cache invalidation (clearing) for the two client-side caching mechanisms described earlier. If we use the first approach to cache data, we are responsible for clearing cache ourselves. In the case of the second approach, the default $http service does not support clearing cache. We either need to get hold of the underlying $http cache store and clear the cache key manually (as shown here) or implement our own cache that manages cache data and invalidates cache based on some criteria: var cache = $cacheFactory.get('$http');cache.remove("http://myserver/workouts"); //full url Using Batarang to measure performance Batarang (a Chrome extension), as we have already seen, is an extremely handy tool for Angular applications. Using Batarang to visualize app usage is like looking at an X-Ray of the app. It allows us to: View the scope data, scope hierarchy, and how the scopes are linked to HTML elements Evaluate the performance of the application Check the application dependency graph, helping us understand how components are linked to each other, and with other framework components. If we enable Batarang and then play around with our application, Batarang captures performance metrics for all watched expressions in the app. This data is nicely presented as a graph available on the Performance tab inside Batarang: That is pretty sweet! When building an app, use Batarang to gauge the most expensive watches and take corrective measures, if required. Play around with Batarang and see what other features it has. This is a very handy tool for Angular applications. This brings us to the end of the performance guidelines that we wanted to share in this article. Some of these guidelines are preventive measures that we should take to make sure we get optimal app performance whereas others are there to help when the performance is not up to the mark. Summary In this article, we looked at the ever-so-important topic of performance, where you learned ways to optimize an Angular app performance. Resources for Article: Further resources on this subject: Role of AngularJS [article] The First Step [article] Recursive directives [article]

0
0
5548

How-To Tutorials

article-image-native-ms-security-tools-and-configuration

Packt

04 Mar 2015

19 min read

Native MS Security Tools and Configuration

Packt

04 Mar 2015

19 min read

0
0
2075

Packt

04 Mar 2015

38 min read

KnockoutJS Templates

Packt

04 Mar 2015

38 min read

0
0
11034

How-To Tutorials

Packt

03 Mar 2015

14 min read

SciPy for Signal Processing

Packt

03 Mar 2015

14 min read

In this article by Sergio J. Rojas G. and Erik A Christensen, authors of the book Learning SciPy for Numerical and Scientific Computing - Second Edition, we will focus on the usage of some most commonly used routines that are included in SciPy modules—scipy.signal, scipy.ndimage, and scipy.fftpack, which are used for signal processing, multidimensional image processing, and computing Fourier transforms, respectively. We define a signal as data that measures either a time-varying or spatially varying phenomena. Sound or electrocardiograms are excellent examples of time-varying quantities, while images embody the quintessential spatially varying cases. Moving images are treated with the techniques of both types of signals, obviously. The field of signal processing treats four aspects of this kind of data: its acquisition, quality improvement, compression, and feature extraction. SciPy has many routines to treat effectively tasks in any of the four fields. All these are included in two low-level modules (scipy.signal being the main module, with an emphasis on time-varying data, and scipy.ndimage, for images). Many of the routines in these two modules are based on Discrete Fourier Transform of the data. SciPy has an extensive package of applications and definitions of these background algorithms, scipy.fftpack, which we will start covering first. (For more resources related to this topic, see here.) Discrete Fourier Transforms The Discrete Fourier Transform (DFT from now on) transforms any signal from its time/space domain into a related signal in the frequency domain. This allows us not only to be able to analyze the different frequencies of the data, but also for faster filtering operations, when used properly. It is possible to turn a signal in the frequency domain back to its time/spatial domain; thanks to the Inverse Fourier Transform. We will not go into detail of the mathematics behind these operators, since we assume familiarity at some level with this theory. We will focus on syntax and applications instead. The basic routines in the scipy.fftpack module compute the DFT and its inverse, for discrete signals in any dimension, which are fft and ifft (one dimension), fft2 and ifft2 (two dimensions), and fftn and ifftn (any number of dimensions). All of these routines assume that the data is complex valued. If we know beforehand that a particular dataset is actually real valued, and should offer real-valued frequencies, we use rfft and irfft instead, for a faster algorithm. All these routines are designed so that composition with their inverses always yields the identity. The syntax is the same in all cases, as follows: fft(x[, n, axis, overwrite_x]) The first parameter, x, is always the signal in any array-like form. Note that fft performs one-dimensional transforms. This means in particular, that if x happens to be two-dimensional, for example, fft will output another two-dimensional array where each row is the transform of each row of the original. We can change it to columns instead, with the optional parameter, axis. The rest of parameters are also optional; n indicates the length of the transform, and overwrite_x gets rid of the original data to save memory and resources. We usually play with the integer n when we need to pad the signal with zeros, or truncate it. For higher dimension, n is substituted by shape (a tuple), and axis by axes (another tuple). To better understand the output, it is often useful to shift the zero frequencies to the center of the output arrays with fftshift. The inverse of this operation, ifftshift, is also included in the module. The following code shows some of these routines in action, when applied to a checkerboard image: >>> import numpy >>> from scipy.fftpack import fft,fft2, fftshift >>> import matplotlib.pyplot as plt >>> B=numpy.ones((4,4)); W=numpy.zeros((4,4)) >>> signal = numpy.bmat("B,W;W,B") >>> onedimfft = fft(signal,n=16) >>> twodimfft = fft2(signal,shape=(16,16)) >>> plt.figure() >>> plt.gray() >>> plt.subplot(121,aspect='equal') >>> plt.pcolormesh(onedimfft.real) >>> plt.colorbar(orientation='horizontal') >>> plt.subplot(122,aspect='equal') >>> plt.pcolormesh(fftshift(twodimfft.real)) >>> plt.colorbar(orientation='horizontal') >>> plt.show() Note how the first four rows of the one-dimensional transform are equal (and so are the last four), while the two-dimensional transform (once shifted) presents a peak at the origin, and nice symmetries in the frequency domain. In the following screenshot (obtained from the preceding code), the left-hand side image is fft and the right-hand side image is fft2 of a 2 x 2 checkerboard signal: The scipy.fftpack module also offers the Discrete Cosine Transform with its inverse (dct, idct) as well as many differential and pseudo-differential operators defined in terms of all these transforms: diff (for derivative/integral), hilbert and ihilbert (for the Hilbert transform), tilbert and itilbert (for the h-Tilbert transform of periodic sequences), and so on. Signal construction To aid in the construction of signals with predetermined properties, the scipy.signal module has a nice collection of the most frequent one-dimensional waveforms in the literature: chirp and sweep_poly (for the frequency-swept cosine generator), gausspulse (a Gaussian modulated sinusoid) and sawtooth and square (for the waveforms with those names). They all take as their main parameter a one-dimensional ndarray representing the times at which the signal is to be evaluated. Other parameters control the design of the signal, according to frequency or time constraints. Let's take a look into the following code snippet, which illustrates the use of these one dimensional waveforms that we just discussed: >>> import numpy >>> from scipy.signal import chirp, sawtooth, square, gausspulse >>> import matplotlib.pyplot as plt >>> t=numpy.linspace(-1,1,1000) >>> plt.subplot(221); plt.ylim([-2,2]) >>> plt.plot(t,chirp(t,f0=100,t1=0.5,f1=200)) # plot a chirp >>> plt.subplot(222); plt.ylim([-2,2]) >>> plt.plot(t,gausspulse(t,fc=10,bw=0.5)) # Gauss pulse >>> plt.subplot(223); plt.ylim([-2,2]) >>> t*=3*numpy.pi >>> plt.plot(t,sawtooth(t)) # sawtooth >>> plt.subplot(224); plt.ylim([-2,2]) >>> plt.plot(t,square(t)) # Square wave >>> plt.show() Generated by this code, the following diagram shows waveforms for chirp (upper-left), gausspulse (upper-right), sawtooth (lower-left), and square (lower-right): The usual method of creating signals is to import them from the file. This is possible by using purely NumPy routines, for example fromfile: fromfile(file, dtype=float, count=-1, sep='') The file argument may point to either a file or a string, the count argument is used to determine the number of items to read, and sep indicates what constitutes a separator in the original file/string. For images, we have the versatile routine, imread in either the scipy.ndimage or scipy.misc module: imread(fname, flatten=False) The fname argument is a string containing the location of an image. The routine infers the type of file, and reads the data into an array, accordingly. In case the flatten argument is turned to True, the image is converted to gray scale. Note that, in order to work, the Python Imaging Library (PIL) needs to be installed. It is also possible to load .wav files for analysis, with the read and write routines from the wavfile submodule in the scipy.io module. For instance, given any audio file with this format, say audio.wav, the command, rate,data = scipy.io.wavfile.read("audio.wav"), assigns an integer value to the rate variable, indicating the sample rate of the file (in samples per second), and a NumPy ndarray to the data variable, containing the numerical values assigned to the different notes. If we wish to write some one-dimensional ndarray data into an audio file of this kind, with the sample rate given by the rate variable, we may do so by issuing the following command: >>> scipy.io.wavfile.write("filename.wav",rate,data) Filters A filter is an operation on signals that either removes features or extracts some component. SciPy has a very complete set of known filters, as well as the tools to allow construction of new ones. The complete list of filters in SciPy is long, and we encourage the reader to explore the help documents of the scipy.signal and scipy.ndimage modules for the complete picture. We will introduce in these pages, as an exposition, some of the most used filters in the treatment of audio or image processing. We start by creating a signal worth filtering: >>> from numpy import sin, cos, pi, linspace >>> f=lambda t: cos(pi*t) + 0.2*sin(5*pi*t+0.1) + 0.2*sin(30*pi*t) + 0.1*sin(32*pi*t+0.1) + 0.1*sin(47* pi*t+0.8) >>> t=linspace(0,4,400); signal=f(t) We first test the classical smoothing filter of Wiener and Kolmogorov, wiener. We present in a plot, the original signal (in black) and the corresponding filtered data, with a choice of a Wiener window of the size 55 samples (in blue). Next, we compare the result of applying the median filter, medfilt, with a kernel of the same size as before (in red): >>> from scipy.signal import wiener, medfilt >>> import matplotlib.pylab as plt >>> plt.plot(t,signal,'k') >>> plt.plot(t,wiener(signal,mysize=55),'r',linewidth=3) >>> plt.plot(t,medfilt(signal,kernel_size=55),'b',linewidth=3) >>> plt.show() This gives us the following graph showing the comparison of smoothing filters (wiener is the one that has its starting point just below 0.5 and medfilt has its starting point just above 0.5): Most of the filters in the scipy.signal module can be adapted to work in arrays of any dimension. But in the particular case of images, we prefer to use the implementations in the scipy.ndimage module, since they are coded with these objects in mind. For instance, to perform a median filter on an image for smoothing, we use scipy.ndimage.median_filter. Let's see an example. We will start by loading Lena to the array and corrupting the image with Gaussian noise (zero mean and standard deviation of 16): >>> from scipy.stats import norm # Gaussian distribution >>> import matplotlib.pyplot as plt >>> import scipy.misc >>> import scipy.ndimage >>> plt.gray() >>> lena=scipy.misc.lena().astype(float) >>> plt.subplot(221); >>> plt.imshow(lena) >>> lena+=norm(loc=0,scale=16).rvs(lena.shape) >>> plt.subplot(222); >>> plt.imshow(lena) >>> denoised_lena = scipy.ndimage.median_filter(lena,3) >>> plt.subplot(224); >>> plt.imshow(denoised_lena) The set of filters for images come in two flavors—statistical and morphological. For example, among the filters of statistical nature, we have the Sobel algorithm oriented to detection of edges (singularities along curves). Its syntax is as follows: sobel(image, axis=-1, output=None, mode='reflect', cval=0.0) The optional parameter, axis, indicates the dimension in which the computations are performed. By default, this is always the last axis (-1). The mode parameter, which is one of the strings 'reflect', 'constant', 'nearest', 'mirror', or 'wrap', indicates how to handle the border of the image, in case there is insufficient data to perform the computations there. In case the mode is 'constant', we may indicate the value to use in the border, with the cval parameter. Let's look into the following code snippet, which illustrates the use of the sobel filter: >>> from scipy.ndimage.filters import sobel >>> import numpy >>> lena=scipy.misc.lena() >>> sblX=sobel(lena,axis=0); sblY=sobel(lena,axis=1) >>> sbl=numpy.hypot(sblX,sblY) >>> plt.subplot(223); >>> plt.imshow(sbl) >>> plt.show() The following screenshot illustrates Lena (upper-left) and noisy Lena (upper-right) with the preceding two filters in action—edge map with sobel (lower-left) and median filter (lower-right): Morphology We also have the possibility of creating and applying filters to images based on mathematical morphology, both to binary and gray-scale images. The four basic morphological operations are opening (binary_opening), closing (binary_closing), dilation (binary_dilation), and erosion (binary_erosion). Note that the syntax for each of these filters is very simple, since we only need two ingredients—the signal to filter and the structuring element to perform the morphological operation. Let's take a look into the general syntax for these morphological operations: binary_operation(signal, structuring_element) We may use combinations of these four basic morphological operations to create more complex filters for removal of holes, hit-or-miss transforms (to find the location of specific patterns in binary images), denoising, edge detection, and many more. The SciPy module also allows for creating some common filters using the preceding syntax. For instance, for the location of the letter e in a text, we could use the following command instead: >>> binary_hit_or_miss(text, letterE) For comparative purposes, let's use this command in the following code snippet: >>> import numpy >>> import scipy.ndimage >>> import matplotlib.pylab as plt >>> from scipy.ndimage.morphology import binary_hit_or_miss >>> text = scipy.ndimage.imread('CHAP_05_input_textImage.png') >>> letterE = text[37:53,275:291] >>> HitorMiss = binary_hit_or_miss(text, structure1=letterE, origin1=1) >>> eLocation = numpy.where(HitorMiss==True) >>> x=eLocation[1]; y=eLocation[0] >>> plt.imshow(text, cmap=plt.cm.gray, interpolation='nearest') >>> plt.autoscale(False) >>> plt.plot(x,y,'wo',markersize=10) >>> plt.axis('off') >>> plt.show() The output for the preceding lines of code is generated as follows: For gray-scale images, we may use a structuring element (structuring_element) or a footprint. The syntax is, therefore, a little different: grey_operation(signal, [structuring_element, footprint, size, ...]) If we desire to use a completely flat and rectangular structuring element (all ones), then it is enough to indicate the size as a tuple. For instance, to perform gray-scale dilation of a flat element of size (15,15) on our classical image of Lena, we issue the following command: >>> grey_dilation(lena, size=(15,15)) The last kind of morphological operations coded in the scipy.ndimage module perform distance and feature transforms. Distance transforms create a map that assigns to each pixel, the distance to the nearest object. Feature transforms provide with the index of the closest background element instead. These operations are used to decompose images into different labels. We may even choose different metrics such as Euclidean distance, chessboard distance, and taxicab distance. The syntax for the distance transform (distance_transform) using a brute force algorithm is as follows: distance_transform_bf(signal, metric='euclidean', sampling=None, return_distances=True, return_indices=False, distances=None, indices=None) We indicate the metric with the strings such as 'euclidean', 'taxicab', or 'chessboard'. If we desire to provide the feature transform instead, we switch return_distances to False and return_indices to True. Similar routines are available with more sophisticated algorithms—distance_transform_cdt (using chamfering for taxicab and chessboard distances). For Euclidean distance, we also have distance_transform_edt. All these use the same syntax. Summary In this article, we explored signal processing (any dimensional) including the treatment of signals in frequency space, by means of their Discrete Fourier Transforms. These correspond to the fftpack, signal, and ndimage modules. Resources for Article: Further resources on this subject: Signal Processing Techniques [article] SciPy for Computational Geometry [article] Move Further with NumPy Modules [article]

0
0
13934

article-image-elasticsearch-administration

Packt

03 Mar 2015

28 min read

Elasticsearch Administration

Packt

03 Mar 2015

28 min read

0
0
5417

Packt

03 Mar 2015

11 min read

MapReduce functions

Packt

03 Mar 2015

11 min read

In this article, by John Zablocki, author of the book, Couchbase Essentials, you will be acquainted to MapReduce and how you'll use it to create secondary indexes for our documents. At its simplest, MapReduce is a programming pattern used to process large amounts of data that is typically distributed across several nodes in parallel. In the NoSQL world, MapReduce implementations may be found on many platforms from MongoDB to Hadoop, and of course, Couchbase. Even if you're new to the NoSQL landscape, it's quite possible that you've already worked with a form of MapReduce. The inspiration for MapReduce in distributed NoSQL systems was drawn from the functional programming concepts of map and reduce. While purely functional programming languages haven't quite reached mainstream status, languages such as Python, C#, and JavaScript all support map and reduce operations. (For more resources related to this topic, see here.) Map functions Consider the following Python snippet: numbers = [1, 2, 3, 4, 5] doubled = map(lambda n: n * 2, numbers) #doubled == [2, 4, 6, 8, 10] These two lines of code demonstrate a very simple use of a map() function. In the first line, the numbers variable is created as a list of integers. The second line applies a function to the list to create a new mapped list. In this case, the map() function is supplied as a Python lambda, which is just an inline, unnamed function. The body of lambda multiplies each number by two. This map() function can be made slightly more complex by doubling only odd numbers, as shown in this code: numbers = [1, 2, 3, 4, 5] defdouble_odd(num): if num % 2 == 0: return num else: return num * 2 doubled = map(double_odd, numbers) #doubled == [2, 2, 6, 4, 10] Map functions are implemented differently in each language or platform that supports them, but all follow the same pattern. An iterable collection of objects is passed to a map function. Each item of the collection is then iterated over with the map function being applied to that iteration. The final result is a new collection where each of the original items is transformed by the map. Reduce functions Like maps, the reduce functions also work by applying a provided function to an iterable data structure. The key difference between the two is that the reduce function works to produce a single value from the input iterable. Using Python's built-in reduce() function, we can see how to produce a sum of integers, as follows: numbers = [1, 2, 3, 4, 5] sum = reduce(lambda x, y: x + y, numbers) #sum == 15 You probably noticed that unlike our map operation, the reduce lambda has two parameters (x and y in this case). The argument passed to x will be the accumulated value of all applications of the function so far, and y will receive the next value to be added to the accumulation. Parenthetically, the order of operations can be seen as ((((1 + 2) + 3) + 4) + 5). Alternatively, the steps are shown in the following list: x = 1, y = 2 x = 3, y = 3 x = 6, y = 4 x = 10, y = 5 x = 15 As this list demonstrates, the value of x is the cumulative sum of previous x and y values. As such, reduce functions are sometimes termed accumulate or fold functions. Regardless of their name, reduce functions serve the common purpose of combining pieces of a recursive data structure to produce a single value. Couchbase MapReduce Creating an index (or view) in Couchbase requires creating a map function written in JavaScript. When the view is created for the first time, the map function is applied to each document in the bucket containing the view. When you update a view, only new or modified documents are indexed. This behavior is known as incremental MapReduce. You can think of a basic map function in Couchbase as being similar to a SQL CREATE INDEX statement. Effectively, you are defining a column or a set of columns, to be indexed by the server. Of course, these are not columns, but rather properties of the documents to be indexed. Basic mapping To illustrate the process of creating a view, first imagine that we have a set of JSON documents as shown here: var books=[ { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow" }, { "id": 2, "title": "The Godfather", "author": "Mario Puzzo" }, { "id": 3, "title": "Wiseguy", "author": "Nicholas Pileggi" } ]; Each document contains title and author properties. In Couchbase, to query these documents by either title or author, we'd first need to write a map function. Without considering how map functions are written in Couchbase, we're able to understand the process with vanilla JavaScript: books.map(function(book) { return book.author; }); In the preceding snippet, we're making use of the built-in JavaScript array's map() function. Similar to the Python snippets we saw earlier, JavaScript's map() function takes a function as a parameter and returns a new array with mapped objects. In this case, we'll have an array with each book's author, as follows: ["Robert Ludlow", "Mario Puzzo", "Nicholas Pileggi"] At this point, we have a mapped collection that will be the basis for our author index. However, we haven't provided a means for the index to be able to refer back to its original document. If we were using a relational database, we'd have effectively created an index on the Title column with no way to get back to the row that contained it. With a slight modification to our map function, we are able to provide the key (the id property) of the document as well in our index: books.map(function(book) { return [book.author, book.id]; }); In this slightly modified version, we're including the ID with the output of each author. In this way, the index has its document's key stored with its title. [["The Bourne Identity", 1], ["The Godfather", 2], ["Wiseguy", 3]] We'll soon see how this structure more closely resembles the values stored in a Couchbase index. Basic reducing Not every Couchbase index requires a reduce component. In fact, we'll see that Couchbase already comes with built-in reduce functions that will provide you with most of the reduce behavior you need. However, before relying on only those functions, it's important to understand why you'd use a reduce function in the first place. Returning to the preceding example of the map, let's imagine we have a few more documents in our set, as follows: var books=[ { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow" }, { "id": 2, "title": "The Bourne Ultimatum", "author": "Robert Ludlow" }, { "id": 3, "title": "The Godfather", "author": "Mario Puzzo" }, { "id": 4, "title": "The Bourne Supremacy", "author": "Robert Ludlow" }, { "id": 5, "title": "The Family", "author": "Mario Puzzo" }, { "id": 6, "title": "Wiseguy", "author": "Nicholas Pileggi" } ]; We'll still create our index using the same map function because it provides a way of accessing a book by its author. Now imagine that we want to know how many books an author has written, or (assuming we had more data) the average number of pages written by an author. These questions are not possible to answer with a map function alone. Each application of the map function knows nothing about the previous application. In other words, there is no way for you to compare or accumulate information about one author's book to another book by the same author. Fortunately, there is a solution to this problem. As you've probably guessed, it's the use of a reduce function. As a somewhat contrived example, consider this JavaScript: mapped = books.map(function (book) { return ([book.id, book.author]); }); counts = {} reduced = mapped.reduce(function(prev, cur, idx, arr) { var key = cur[1]; if (! counts[key]) counts[key] = 0; ++counts[key] }, null); This code doesn't quite accurately reflect the way you would count books with Couchbase but it illustrates the basic idea. You look for each occurrence of a key (author) and increment a counter when it is found. With Couchbase MapReduce, the mapped structure is supplied to the reduce() function in a better format. You won't need to keep track of items in a dictionary. Couchbase views At this point, you should have a general sense of what MapReduce is, where it came from, and how it will affect the creation of a Couchbase Server view. So without further ado, let's see how to write our first Couchbase view. In fact, there were two to choose from. The bucket we'll use is beer-sample. If you didn't install it, don't worry. You can add it by opening the Couchbase Console and navigating to the Settings tab. Here, you'll find the option to install the bucket, as shown next: First, you need to understand the document structures with which you're working. The following JSON object is a beer document (abbreviated for brevity): { "name": "Sundog", "type": "beer", "brewery_id": "new_holland_brewing_company", "description": "Sundog is an amber ale...", "style": "American-Style Amber/Red Ale", "category": "North American Ale" } As you can see, the beer documents have several properties. We're going to create an index to let us query these documents by name. In SQL, the query would look like this: SELECT Id FROM Beers WHERE Name = ? You might be wondering why the SQL example includes only the Id column in its projection. For now, just know that to query a document using a view with Couchbase, the property by which you're querying must be included in an index. To create that index, we'll write a map function. The simplest example of a map function to query beer documents by name is as follows: function(doc) { emit(doc.name); } This body of the map function has only one line. It calls the built-in Couchbase emit() function. This function is used to signal that a value should be indexed. The output of this map function will be an array of names. The beer-sample bucket includes brewery data as well. These documents look like the following code (abbreviated for brevity): { "name": "Thomas Hooker Brewing", "city": "Bloomfield", "state": "Connecticut", "website": "http://www.hookerbeer.com/", "type": "brewery" } If we reexamine our map function, we'll see an obvious problem; both the brewery and beer documents have a name property. When this map function is applied to the documents in the bucket, it will create an index with documents from either the brewery or beer documents. The problem is that Couchbase documents exist in a single container—the bucket. There is no namespace for a set of related documents. The solution has typically involved including a type or docType property on each document. The value of this property is used to distinguish one document from another. In the case of the beer-sample database, beer documents have type = "beer" and brewery documents have type = "brewery". Therefore, we are easily able to modify our map function to create an index only on beer documents: function(doc) { if (doc.type == "beer") { emit(doc.name); } } The emit() function actually takes two arguments. The first, as we've seen, emits a value to be indexed. The second argument is an optional value and is used by the reduce function. Imagine that we want to count the number of beer types in a particular category. In SQL, we would write the following query: SELECT Category, COUNT(*) FROM Beers GROUP BY Category To achieve the same functionality with Couchbase Server, we'll need to use both map and reduce functions. First, let's write the map. It will create an index on the category property: function(doc) { if (doc.type == "beer") { emit(doc.category, 1); } } The only real difference between our category index and our name index is that we're including an argument for the value parameter of the emit() function. What we'll do with that value is simply count them. This counting will be done in our reduce function: function(keys, values) { return values.length; } In this example, the values parameter will be given to the reduce function as a list of all values associated with a particular key. In our case, for each beer category, there will be a list of ones (that is, [1, 1, 1, 1, 1, 1]). Couchbase also provides a built-in _count function. It can be used in place of the entire reduce function in the preceding example. Now that we've seen the basic requirements when creating an actual Couchbase view, it's time to add a view to our bucket. The easiest way to do so is to use the Couchbase Console. Summary In this article, you learned the purpose of secondary indexes in a key/value store. We dug deep into MapReduce, both in terms of its history in functional languages and as a tool for NoSQL and big data systems. Resources for Article: Further resources on this subject: Map Reduce? [article] Introduction to Mapreduce [article] Working with Apps Splunk [article]

0
0
4795

article-image-performance-considerations

Packt

03 Mar 2015

13 min read

Performance Considerations

Packt

03 Mar 2015

13 min read

0
0
2339

Packt

03 Mar 2015

17 min read

Basics of Programming in Julia

Packt

03 Mar 2015

17 min read

In this article by Ivo Balbaert, author of the book Getting Started with Julia Programming, we will explore how Julia interacts with the outside world, reading from standard input and writing to standard output, files, networks, and databases. Julia provides asynchronous networking I/O using the libuv library. We will see how to handle data in Julia. We will also discover the parallel processing model of Julia. In this article, the following topics are covered: Working with files (including the CSV files) Using DataFrames (For more resources related to this topic, see here.) Working with files To work with files, we need the IOStream type. IOStream is a type with the supertype IO and has the following characteristics: The fields are given by names(IOStream) 4-element Array{Symbol,1}: :handle :ios :name :mark The types are given by IOStream.types (Ptr{None}, Array{Uint8,1}, String, Int64) The file handle is a pointer of the type Ptr, which is a reference to the file object. Opening and reading a line-oriented file with the name example.dat is very easy: // code in Chapter 8io.jl fname = "example.dat" f1 = open(fname) fname is a string that contains the path to the file, using escaping of special characters with when necessary; for example, in Windows, when the file is in the test folder on the D: drive, this would become d:\test\example.dat. The f1 variable is now an IOStream(<file example.dat>) object. To read all lines one after the other in an array, use data = readlines(f1), which returns 3-element Array{Union(ASCIIString,UTF8String),1}: "this is line 1.rn" "this is line 2.rn" "this is line 3." For processing line by line, now only a simple loop is needed: for line in data println(line) # or process line end close(f1) Always close the IOStream object to clean and save resources. If you want to read the file into one string, use readall. Use this only for relatively small files because of the memory consumption; this can also be a potential problem when using readlines. There is a convenient shorthand with the do syntax for opening a file, applying a function process, and closing it automatically. This goes as follows (file is the IOStream object in this code): open(fname) do file process(file) end The do command creates an anonymous function, and passes it to open. Thus, the previous code example would have been equivalent to open(process, fname). Use the same syntax for processing a file fname line by line without the memory overhead of the previous methods, for example: open(fname) do file for line in eachline(file) print(line) # or process line end end Writing a file requires first opening it with a "w" flag, then writing strings to it with write, print, or println, and then closing the file handle that flushes the IOStream object to the disk: fname = "example2.dat" f2 = open(fname, "w") write(f2, "I write myself to a filen") # returns 24 (bytes written) println(f2, "even with println!") close(f2) Opening a file with the "w" option will clear the file if it exists. To append to an existing file, use "a". To process all the files in the current folder (or a given folder as an argument to readdir()), use this for loop: for file in readdir() # process file end Reading and writing CSV files A CSV file is a comma-separated file. The data fields in each line are separated by commas "," or another delimiter such as semicolons ";". These files are the de-facto standard for exchanging small and medium amounts of tabular data. Such files are structured so that one line contains data about one data object, so we need a way to read and process the file line by line. As an example, we will use the data file Chapter 8winequality.csv that contains 1,599 sample measurements, 12 data columns, such as pH and alcohol per sample, separated by a semicolon. In the following screenshot, you can see the top 20 rows: In general, the readdlm function is used to read in the data from the CSV files: # code in Chapter 8csv_files.jl: fname = "winequality.csv" data = readdlm(fname, ';') The second argument is the delimiter character (here, it is ;). The resulting data is a 1600x12 Array{Any,2} array of the type Any because no common type could be found: "fixed acidity" "volatile acidity" "alcohol" "quality" 7.4 0.7 9.4 5.0 7.8 0.88 9.8 5.0 7.8 0.76 9.8 5.0 … If the data file is comma separated, reading it is even simpler with the following command: data2 = readcsv(fname) The problem with what we have done until now is that the headers (the column titles) were read as part of the data. Fortunately, we can pass the argument header=true to let Julia put the first line in a separate array. It then naturally gets the correct datatype, Float64, for the data array. We can also specify the type explicitly, such as this: data3 = readdlm(fname, ';', Float64, 'n', header=true) The third argument here is the type of data, which is a numeric type, String or Any. The next argument is the line separator character, and the fifth indicates whether or not there is a header line with the field (column) names. If so, then data3 is a tuple with the data as the first element and the header as the second, in our case, (1599x12 Array{Float64,2}, 1x12 Array{String,2}) (There are other optional arguments to define readdlm, see the help option). In this case, the actual data is given by data3[1] and the header by data3[2]. Let's continue working with the variable data. The data forms a matrix, and we can get the rows and columns of data using the normal array-matrix syntax). For example, the third row is given by row3 = data[3, :] with data: 7.8 0.88 0.0 2.6 0.098 25.0 67.0 0.9968 3.2 0.68 9.8 5.0, representing the measurements for all the characteristics of a certain wine. The measurements of a certain characteristic for all wines are given by a data column, for example, col3 = data[ :, 3] represents the measurements of citric acid and returns a column vector 1600-element Array{Any,1}: "citric acid" 0.0 0.0 0.04 0.56 0.0 0.0 … 0.08 0.08 0.1 0.13 0.12 0.47. If we need columns 2-4 (volatile acidity to residual sugar) for all wines, extract the data with x = data[:, 2:4]. If we need these measurements only for the wines on rows 70-75, get these with y = data[70:75, 2:4], returning a 6 x 3 Array{Any,2} outputas follows: 0.32 0.57 2.0 0.705 0.05 1.9 … 0.675 0.26 2.1 To get a matrix with the data from columns 3, 6, and 11, execute the following command: z = [data[:,3] data[:,6] data[:,11]] It would be useful to create a type Wine in the code. For example, if the data is to be passed around functions, it will improve the code quality to encapsulate all the data in a single data type, like this: type Wine fixed_acidity::Array{Float64} volatile_acidity::Array{Float64} citric_acid::Array{Float64} # other fields quality::Array{Float64} end Then, we can create objects of this type to work with them, like in any other object-oriented language, for example, wine1 = Wine(data[1, :]...), where the elements of the row are splatted with the ... operator into the Wine constructor. To write to a CSV file, the simplest way is to use the writecsv function for a comma separator, or the writedlm function if you want to specify another separator. For example, to write an array data to a file partial.dat, you need to execute the following command: writedlm("partial.dat", data, ';') If more control is necessary, you can easily combine the more basic functions from the previous section. For example, the following code snippet writes 10 tuples of three numbers each to a file: // code in Chapter 8tuple_csv.jl fname = "savetuple.csv" csvfile = open(fname,"w") # writing headers: write(csvfile, "ColName A, ColName B, ColName Cn") for i = 1:10 tup(i) = tuple(rand(Float64,3)...) write(csvfile, join(tup(i),","), "n") end close(csvfile) Using DataFrames If you measure n variables (each of a different type) of a single object of observation, then you get a table with n columns for each object row. If there are m observations, then we have m rows of data. For example, given the student grades as data, you might want to know "compute the average grade for each socioeconomic group", where grade and socioeconomic group are both columns in the table, and there is one row per student. The DataFrame is the most natural representation to work with such a (m x n) table of data. They are similar to pandas DataFrames in Python or data.frame in R. A DataFrame is a more specialized tool than a normal array for working with tabular and statistical data, and it is defined in the DataFrames package, a popular Julia library for statistical work. Install it in your environment by typing in Pkg.add("DataFrames") in the REPL. Then, import it into your current workspace with using DataFrames. Do the same for the packages DataArrays and RDatasets (which contains a collection of example datasets mostly used in the R literature). A common case in statistical data is that data values can be missing (the information is not known). The DataArrays package provides us with the unique value NA, which represents a missing value, and has the type NAtype. The result of the computations that contain the NA values mostly cannot be determined, for example, 42 + NA returns NA. (Julia v0.4 also has a new Nullable{T} type, which allows you to specify the type of a missing value). A DataArray{T} array is a data structure that can be n-dimensional, behaves like a standard Julia array, and can contain values of the type T, but it can also contain the missing (Not Available) values NA and can work efficiently with them. To construct them, use the @data macro: // code in Chapter 8dataarrays.jl using DataArrays using DataFrames dv = @data([7, 3, NA, 5, 42]) This returns 5-element DataArray{Int64,1}: 7 3 NA 5 42. The sum of these numbers is given by sum(dv) and returns NA. One can also assign the NA values to the array with dv[5] = NA; then, dv becomes [7, 3, NA, 5, NA]). Converting this data structure to a normal array fails: convert(Array, dv) returns ERROR: NAException. How to get rid of these NA values, supposing we can do so safely? We can use the dropna function, for example, sum(dropna(dv)) returns 15. If you know that you can replace them with a value v, use the array function: repl = -1 sum(array(dv, repl)) # returns 13 A DataFrame is a kind of an in-memory database, versatile in the ways you can work with the data. It consists of columns with names such as Col1, Col2, Col3, and so on. Each of these columns are DataArrays that have their own type, and the data they contain can be referred to by the column names as well, so we have substantially more forms of indexing. Unlike two-dimensional arrays, columns in a DataFrame can be of different types. One column might, for instance, contain the names of students and should therefore be a string. Another column could contain their age and should be an integer. We construct a DataFrame from the program data as follows: // code in Chapter 8dataframes.jl using DataFrames # constructing a DataFrame: df = DataFrame() df[:Col1] = 1:4 df[:Col2] = [e, pi, sqrt(2), 42] df[:Col3] = [true, false, true, false] show(df) Notice that the column headers are used as symbols. This returns the following 4 x 3 DataFrame object: We could also have used the full constructor as follows: df = DataFrame(Col1 = 1:4, Col2 = [e, pi, sqrt(2), 42], Col3 = [true, false, true, false]) You can refer to the columns either by an index (the column number) or by a name, both of the following expressions return the same output: show(df[2]) show(df[:Col2]) This gives the following output: [2.718281828459045, 3.141592653589793, 1.4142135623730951,42.0] To show the rows or subsets of rows and columns, use the familiar splice (:) syntax, for example: To get the first row, execute df[1, :]. This returns 1x3 DataFrame. | Row | Col1 | Col2 | Col3 | |-----|------|---------|------| | 1 | 1 | 2.71828 | true | To get the second and third row, execute df [2:3, :] To get only the second column from the previous result, execute df[2:3, :Col2]. This returns [3.141592653589793, 1.4142135623730951]. To get the second and third column from the second and third row, execute df[2:3, [:Col2, :Col3]], which returns the following output: 2x2 DataFrame | Row | Col2 | Col3 | |---- |----- -|-------| | 1 | 3.14159 | false | | 2 | 1.41421 | true | The following functions are very useful when working with DataFrames: The head(df) and tail(df) functions show you the first six and the last six lines of data respectively. The names function gives the names of the columns names(df). It returns 3-element Array{Symbol,1}: :Col1 :Col2 :Col3. The eltypes function gives the data types of the columns eltypes(df). It gives the output as 3-element Array{Type{T<:Top},1}: Int64 Float64 Bool. The describe function tries to give some useful summary information about the data in the columns, depending on the type, for example, describe(df) gives for column 2 (which is numeric) the min, max, median, mean, number, and percentage of NAs: Col2 Min 1.4142135623730951 1st Qu. 2.392264761937558 Median 2.929937241024419 Mean 12.318522011105483 3rd Qu. 12.856194490192344 Max 42.0 NAs 0 NA% 0.0% To load in data from a local CSV file, use the method readtable. The returned object is of type DataFrame: // code in Chapter 8dataframes.jl using DataFrames fname = "winequality.csv" data = readtable(fname, separator = ';') typeof(data) # DataFrame size(data) # (1599,12) Here is a fraction of the output: The readtable method also supports reading in gzipped CSV files. Writing a DataFrame to a file can be done with the writetable function, which takes the filename and the DataFrame as arguments, for example, writetable("dataframe1.csv", df). By default, writetable will use the delimiter specified by the filename extension and write the column names as headers. Both readtable and writetable support numerous options for special cases. Refer to the docs for more information (refer to http://dataframesjl.readthedocs.org/en/latest/). To demonstrate some of the power of DataFrames, here are some queries you can do: Make a vector with only the quality information data[:quality] Give the wines with alcohol percentage equal to 9.5, for example, data[ data[:alcohol] .== 9.5, :] Here, we use the .== operator, which does element-wise comparison. data[:alcohol] .== 9.5 returns an array of Boolean values (true for datapoints, where :alcohol is 9.5, and false otherwise). data[boolean_array, : ] selects those rows where boolean_array is true. Count the number of wines grouped by quality with by(data, :quality, data -> size(data, 1)), which returns the following: 6x2 DataFrame | Row | quality | x1 | |-----|---------|-----| | 1 | 3 | 10 | | 2 | 4 | 53 | | 3 | 5 | 681 | | 4 | 6 | 638 | | 5 | 7 | 199 | | 6 | 8 | 18 | The DataFrames package contains the by function, which takes in three arguments: A DataFrame, here it takes data A column to split the DataFrame on, here it takes quality A function or an expression to apply to each subset of the DataFrame, here data -> size(data, 1), which gives us the number of wines for each quality value Another easy way to get the distribution among quality is to execute the histogram hist function hist(data[:quality]) that gives the counts over the range of quality (2.0:1.0:8.0,[10,53,681,638,199,18]). More precisely, this is a tuple with the first element corresponding to the edges of the histogram bins, and the second denoting the number of items in each bin. So there are, for example, 10 wines with quality between 2 and 3, and so on. To extract the counts as a variable count of type Vector, we can execute _, count = hist(data[:quality]); the _ means that we neglect the first element of the tuple. To obtain the quality classes as a DataArray class, we will execute the following: class = sort(unique(data[:quality])) We can now construct a df_quality DataFrame with the class and count columns as df_quality = DataFrame(qual=class, no=count). This gives the following output: 6x2 DataFrame | Row | qual | no | |-----|------|-----| | 1 | 3 | 10 | | 2 | 4 | 53 | | 3 | 5 | 681 | | 4 | 6 | 638 | | 5 | 7 | 199 | | 6 | 8 | 18 | To deepen your understanding and learn about the other features of Julia DataFrames (such as joining, reshaping, and sorting), refer to the documentation available at http://dataframesjl.readthedocs.org/en/latest/. Other file formats Julia can work with other human-readable file formats through specialized packages: For JSON, use the JSON package. The parse method converts the JSON strings into Dictionaries, and the json method turns any Julia object into a JSON string. For XML, use the LightXML package For YAML, use the YAML package For HDF5 (a common format for scientific data), use the HDF5 package For working with Windows INI files, use the IniFile package Summary In this article we discussed the basics of network programming in Julia. Resources for Article: Further resources on this subject: Getting Started with Electronic Projects? [article] Getting Started with Selenium Webdriver and Python [article] Handling The Dom In Dart [article]

0
0
18945

article-image-getting-started-postgresql

Packt

03 Mar 2015

11 min read

Getting Started with PostgreSQL

Packt

03 Mar 2015

11 min read

0
0
2587

Your first FuelPHP application in 7 easy steps

Test Driving UITableViews with Cedar

Python functions – Avoid repeating code

Writing Consumers

Prototyping Arduino Projects using Python

Deployment Scenarios

AngularJS Performance

Native MS Security Tools and Configuration

KnockoutJS Templates

SciPy for Signal Processing

Trending Topics

Elasticsearch Administration

MapReduce functions

Performance Considerations

Basics of Programming in Julia

Getting Started with PostgreSQL

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access