Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-time-travelling-spring
Packt
03 Mar 2015
18 min read
Save for later

Time Travelling with Spring

Packt
03 Mar 2015
18 min read
This article by Sujoy Acharya, the author of the book Mockito for Spring, delves into the details Time Travelling with Spring. Spring 4.0 is the Java 8-enabled latest release of the Spring Framework. In this article, we'll discover the major changes in the Spring 4.x release and the four important features of the Spring 4 framework. We will cover the following topics in depth: @RestController AsyncRestTemplate Async tasks Caching (For more resources related to this topic, see here.) Discovering the new Spring release This section deals with the new features and enhancements in Spring Framework 4.0. The following are the features: Spring 4 supports Java 8 features such as Java lambda expressions and java.time. Spring 4 supports JDK 6 as the minimum. All deprecated packages/methods are removed. Java Enterprise Edition 6 or 7 are the base of Spring 4, which is based on JPA 2 and Servlet 3.0. Bean configuration using the Groovy DSL is supported in Spring Framework 4.0. Hibernate 4.3 is supported by Spring 4. Custom annotations are supported in Spring 4. Autowired lists and arrays can be ordered. The @Order annotation and the Ordered interface are supported. The @Lazy annotation can now be used on injection points as well as on the @Bean definitions. For the REST application, Spring 4 provides a new @RestController annotation. We will discuss this in detail in the following section. The AsyncRestTemplate feature (class) is added for asynchronous REST client development. Different time zones are supported in Spring 4.0. New spring-websocket and spring-messaging modules have been added. The SocketUtils class is added to examine the free TCP and UDP server ports on localhost. All the mocks under the org.springframework.mock.web package are now based on the Servlet 3.0 specification. Spring supports JCache annotations and new improvements have been made in caching. The @Conditional annotation has been added to conditionally enable or disable an @Configuration class or even individual @Bean methods. In the test module, SQL script execution can now be configured declaratively via the new @Sql and @SqlConfig annotations on a per-class or per-method basis. You can visit the Spring Framework reference at http://docs.spring.io/spring/docs/4.1.2.BUILD-SNAPSHOT/spring-framework-reference/htmlsingle/#spring-whats-new for more details. Also, you can watch a video at http://zeroturnaround.com/rebellabs/spring-4-on-java-8-geekout-2013-video/ for more details on the changes in Spring 4. Working with asynchronous tasks Java 7 has a feature called Future. Futures let you retrieve the result of an asynchronous operation at a later time. The FutureTask class runs in a separate thread, which allows you to perform non-blocking asynchronous operations. Spring provides an @Async annotation to make it more easier to use. We'll explore Java's Future feature and Spring's @Async declarative approach: Create a project, TimeTravellingWithSpring, and add a package, com.packt.async. We'll exercise a bank's use case, where an automated job will run and settle loan accounts. It will also find all the defaulters who haven't paid the loan EMI for a month and then send an SMS to their number. The job takes time to process thousands of accounts, so it will be good if we can send SMSes asynchronously to minimize the burden of the job. We'll create a service class to represent the job, as shown in the following code snippet: @Service public class AccountJob {    @Autowired    private SMSTask smsTask; public void process() throws InterruptedException, ExecutionException { System.out.println("Going to find defaulters... "); Future<Boolean> asyncResult =smsTask.send("1", "2", "3"); System.out.println("Defaulter Job Complete. SMS will be sent to all defaulter"); Boolean result = asyncResult.get(); System.out.println("Was SMS sent? " + result); } } The job class autowires an SMSTask class and invokes the send method with phone numbers. The send method is executed asynchronously and Future is returned. When the job calls the get() method on Future, a result is returned. If the result is not processed before the get() method invocation, the ExecutionException is thrown. We can use a timeout version of the get() method. Create the SMSTask class in the com.packt.async package with the following details: @Component public class SMSTask { @Async public Future<Boolean> send(String... numbers) { System.out.println("Selecting SMS format "); try { Thread.sleep(2000); } catch (InterruptedException e) { e.printStackTrace(); return new AsyncResult<>(false); } System.out.println("Async SMS send task is Complete!!!"); return new AsyncResult<>(true); } } Note that the method returns Future, and the method is annotated with @Async to signify asynchronous processing. Create a JUnit test to verify asynchronous processing: @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations="classpath:com/packt/async/          applicationContext.xml") public class AsyncTaskExecutionTest { @Autowired ApplicationContext context; @Test public void jobTest() throws Exception { AccountJob job = (AccountJob)context.getBean(AccountJob.class); job.process(); } } The job bean is retrieved from the applicationContext file and then the process method is called. When we execute the test, the following output is displayed: Going to find defaulters... Defaulter Job Complete. SMS will be sent to all defaulter Selecting SMS format Async SMS send task is Complete!!! Was SMS sent? true During execution, you might feel that the async task is executed after a delay of 2 seconds as the SMSTask class waits for 2 seconds. Exploring @RestController JAX-RS provides the functionality for Representational State Transfer (RESTful) web services. REST is well-suited for basic, ad hoc integration scenarios. Spring MVC offers controllers to create RESTful web services. In Spring MVC 3.0, we need to explicitly annotate a class with the @Controller annotation in order to specify a controller servlet and annotate each and every method with @ResponseBody to serve JSON, XML, or a custom media type. With the advent of the Spring 4.0 @RestController stereotype annotation, we can combine @ResponseBody and @Controller. The following example will demonstrate the usage of @RestController: Create a dynamic web project, RESTfulWeb. Modify the web.xml file and add a configuration to intercept requests with a Spring DispatcherServlet: <web-app xsi_schemaLocation="http:// java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/webapp_ 3_0.xsd" id="WebApp_ID" version="3.0"> <display-name>RESTfulWeb</display-name> <servlet> <servlet-name>dispatcher</servlet-name> <servlet-class> org.springframework.web.servlet.DispatcherServlet </servlet-class> <load-on-startup>1</load-on-startup> </servlet> <servlet-mapping> <servlet-name>dispatcher</servlet-name> <url-pattern>/</url-pattern> </servlet-mapping> <context-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/dispatcher-servlet.xml </param-value> </context-param> </web-app> The DispatcherServlet expects a configuration file with the naming convention [servlet-name]-servlet.xml. Create an application context XML, dispatcher-servlet.xml. We'll use annotations to configure Spring beans, so we need to tell the Spring container to scan the Java package in order to craft the beans. Add the following lines to the application context in order to instruct the container to scan the com.packt.controller package: <context:component-scan base-package= "com.packt.controller" /> <mvc:annotation-driven /> We need a REST controller class to handle the requests and generate a JSON output. Go to the com.packt.controller package and add a SpringService controller class. To configure the class as a REST controller, we need to annotate it with the @RestController annotation. The following code snippet represents the class: @RestController @RequestMapping("/hello") public class SpringService { private Set<String> names = new HashSet<String>(); @RequestMapping(value = "/{name}", method =          RequestMethod.GET) public String displayMsg(@PathVariable String name) {    String result = "Welcome " + name;    names.add(name);    return result; } @RequestMapping(value = "/all/", method =          RequestMethod.GET) public String anotherMsg() {    StringBuilder result = new StringBuilder("We          greeted so far ");    for(String name:names){      result.append(name).append(", ");    }    return result.toString();  } } We annotated the class with @RequestMapping("/hello"). This means that the SpringService class will cater for the requests with the http://{site}/{context}/hello URL pattern, or since we are running the app in localhost, the URL can be http://localhost:8080/RESTfulWeb/hello. The displayMsg method is annotated with @RequestMapping(value = "/{name}", method = RequestMethod.GET). So, the method will handle all HTTP GET requests with the URL pattern /hello/{name}. The name can be any String, such as /hello/xyz or /hello/john. In turn, the method stores the name to Set for later use and returns a greeting message, welcome {name}. The anotherMsg method is annotated with @RequestMapping(value = "/all/", method = RequestMethod.GET), which means that the method accepts all the requests with the http://{SITE}/{Context}/hello/all/ URL pattern. Moreover, this method builds a list of all users who visited the /hello/{names} URL. Remember, the displayMsg method stores the names in Set; this method iterates Set and builds a list of names who visited the /hello/{name} URL. There is some confusion though: what will happen if you enter the /hello/all URL in the browser? When we pass only a String literal after /hello/, the displayMsg method handles it, so you will be greeted with welcome all. However, if you type /hello/all/ instead—note that we added a slash after all—it means that the URL does not match the /hello/{name} pattern and the second method will handle the request and show you the list of users who visited the first URL. When we run the application and access the /hello/{name} URL, the following output is displayed: When we access http://localhost:8080/RESTfulWeb/hello/all/, the following output is displayed: Therefore, our RESTful application is ready for use, but just remember that in the real world, you need to secure the URLs against unauthorized access. In a web service, development security plays a key role. You can read the Spring security reference manual for additional information. Learning AsyncRestTemplate We live in a small, wonderful world where everybody is interconnected and impatient! We are interconnected through technology and applications, such as social networks, Internet banking, telephones, chats, and so on. Likewise, our applications are interconnected; often, an application housed in India may need to query an external service hosted in Philadelphia to get some significant information. We are impatient as we expect everything to be done in seconds; we get frustrated when we make an HTTP call to a remote service, and this blocks the processing unless the remote response is back. We cannot finish everything in milliseconds or nanoseconds, but we can process long-running tasks asynchronously or in a separate thread, allowing the user to work on something else. To handle RESTful web service calls asynchronously, Spring offers two useful classes: AsyncRestTemplate and ListenableFuture. We can make an async call using the template and get Future back and then continue with other processing, and finally we can ask Future to get the result. This section builds an asynchronous RESTful client to query the RESTful web service we developed in the preceding section. The AsyncRestTemplate class defines an array of overloaded methods to access RESTful web services asynchronously. We'll explore the exchange and execute methods. The following are the steps to explore the template: Create a package, com.packt.rest.template. Add a AsyncRestTemplateTest JUnit test. Create an exchange() test method and add the following lines: @Test public void exchange(){ AsyncRestTemplate asyncRestTemplate = new AsyncRestTemplate(); String url ="http://localhost:8080/RESTfulWeb/ hello/all/"; HttpMethod method = HttpMethod.GET; Class<String> responseType = String.class; HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.TEXT_PLAIN); HttpEntity<String> requestEntity = new HttpEntity<String>("params", headers); ListenableFuture<ResponseEntity<String>> future = asyncRestTemplate.exchange(url, method, requestEntity, responseType); try { //waits for the result ResponseEntity<String> entity = future.get(); //prints body of the given URL System.out.println(entity.getBody()); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } } The exchange() method has six overloaded versions. We used the method that takes a URL, an HttpMethod method such as GET or POST, an HttpEntity method to set the header, and finally a response type class. We called the exchange method, which in turn called the execute method and returned ListenableFuture. The ListenableFuture is the handle to our output; we invoked the GET method on ListenableFuture to get the RESTful service call response. The ResponseEntity has the getBody, getClass, getHeaders, and getStatusCode methods for extracting the web service call response. We invoked the http://localhost:8080/RESTfulWeb/hello/all/ URL and got back the following response: Now, create an execute test method and add the following lines: @Test public void execute(){ AsyncRestTemplate asyncTemp = new AsyncRestTemplate(); String url ="http://localhost:8080/RESTfulWeb /hello/reader"; HttpMethod method = HttpMethod.GET; HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.TEXT_PLAIN); AsyncRequestCallback requestCallback = new AsyncRequestCallback (){ @Override public void doWithRequest(AsyncClientHttpRequest request) throws IOException { System.out.println(request.getURI()); } }; ResponseExtractor<String> responseExtractor = new ResponseExtractor<String>(){ @Override public String extractData(ClientHttpResponse response) throws IOException { return response.getStatusText(); } }; Map<String,String> urlVariable = new HashMap<String, String>(); ListenableFuture<String> future = asyncTemp.execute(url, method, requestCallback, responseExtractor, urlVariable); try { //wait for the result String result = future.get(); System.out.println("Status =" +result); } catch (InterruptedException e) { e.printStackTrace(); } catch (ExecutionException e) { e.printStackTrace(); } } The execute method has several variants. We invoke the one that takes a URL, HttpMethod such as GET or POST, an AsyncRequestCallback method which is invoked from the execute method just before executing the request asynchronously, a ResponseExtractor to extract the response, such as a response body, status code or headers, and a URL variable such as a URL that takes parameters. We invoked the execute method and received a future, as our ResponseExtractor extracts the status code. So, when we ask the future to get the result, it returns the response status which is OK or 200. In the AsyncRequestCallback method, we invoked the request URI; hence, the output first displays the request URI and then prints the response status. The following is the output: Caching objects Scalability is a major concern in web application development. Generally, most web traffic is focused on some special set of information. So, only those records are queried very often. If we can cache these records, then the performance and scalability of the system will increase immensely. The Spring Framework provides support for adding caching into an existing Spring application. In this section, we'll work with EhCache, the most widely used caching solution. Download the latest EhCache JAR from the Maven repository; the URL to download version 2.7.2 is http://mvnrepository.com/artifact/net.sf.ehcache/ehcache/2.7.2. Spring provides two annotations for caching: @Cacheable and @CacheEvict. These annotations allow methods to trigger cache population or cache eviction, respectively. The @Cacheable annotation is used to identify a cacheable method, which means that for an annotate method the result is stored into the cache. Therefore, on subsequent invocations (with the same arguments), the value in the cache is returned without actually executing the method. The cache abstraction allows the eviction of cache for removing stale or unused data from the cache. The @CacheEvict annotation demarcates the methods that perform cache eviction, that is, methods that act as triggers to remove data from the cache. The following are the steps to build a cacheable application with EhCache: Create a serializable Employee POJO class in the com.packt.cache package to store the employee ID and name. The following is the class definition: public class Employee implements Serializable { private static final long serialVersionUID = 1L; private final String firstName, lastName, empId;   public Employee(String empId, String fName, String lName) {    this.firstName = fName;    this.lastName = lName;    this.empId = empId; //Getter methods Spring caching supports two storages: the ConcurrentMap and ehcache libraries. To configure caching, we need to configure a manager in the application context. The org.springframework.cache.ehcache.EhCacheCacheManager class manages ehcache. Then, we need to define a cache with a configurationLocation attribute. The configurationLocation attribute defines the configuration resource. The ehcache-specific configuration is read from the resource ehcache.xml. <beans   xsi:schemaLocation=" http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans- 4.1.xsd http://www.springframework.org/schema/cache http://www. springframework.org/schema/cache/spring-cache- 4.1.xsd http://www.springframework.org/schema/context http://www. springframework.org/schema/context/springcontext- 4.1.xsd "> <context:component-scan base-package= "com.packt.cache" /> <cache:annotation-driven/> <bean id="cacheManager" class="org.springframework.cache. ehcache.EhCacheCacheManager" p:cacheManager-ref="ehcache"/> <bean id="ehcache" class="org.springframework.cache. ehcache.EhCacheManagerFactoryBean" p:configLocation="classpath:com/packt/cache/ehcache.xml"/> </beans> The <cache:annotation-driven/> tag informs the Spring container that the caching and eviction is performed in annotated methods. We defined a cacheManager bean and then defined an ehcache bean. The ehcache bean's configLocation points to an ehcache.xml file. We'll create the file next. Create an XML file, ehcache.xml, under the com.packt.cache package and add the following cache configuration data: <ehcache>    <diskStore path="java.io.tmpdir"/>    <cache name="employee"            maxElementsInMemory="100"            eternal="false"            timeToIdleSeconds="120"            timeToLiveSeconds="120"            overflowToDisk="true"            maxElementsOnDisk="10000000"            diskPersistent="false"            diskExpiryThreadIntervalSeconds="120"            memoryStoreEvictionPolicy="LRU"/>   </ehcache> The XML configures many things. Cache is stored in memory, but memory has a limit, so we need to define maxElementsInMemory. EhCache needs to store data to disk when max elements in memory reaches the threshold limit. We provide diskStore for this purpose. The eviction policy is set as an LRU, but the most important thing is the cache name. The name employee will be used to access the cache configuration. Now, create a service to store the Employee objects in a HashMap. The following is the service: @Service public class EmployeeService { private final Map<String, Employee> employees = new ConcurrentHashMap<String, Employee>(); @PostConstruct public void init() { saveEmployee (new Employee("101", "John", "Doe")); saveEmployee (new Employee("102", "Jack", "Russell")); } @Cacheable("employee") public Employee getEmployee(final String employeeId) { System.out.println(String.format("Loading a employee with id of : %s", employeeId)); return employees.get(employeeId); } @CacheEvict(value = "employee", key = "#emp.empId") public void saveEmployee(final Employee emp) { System.out.println(String.format("Saving a emp with id of : %s", emp.getEmpId())); employees.put(emp.getEmpId(), emp); } } The getEmployee method is a cacheable method; it uses the cache employee. When the getEmployee method is invoked more than once with the same employee ID, the object is returned from the cache instead of the original method being invoked. The saveEmployee method is a CacheEvict method. Now, we'll examine caching. We'll call the getEmployee method twice; the first call will populate the cache and the subsequent call will be responded toby the cache. Create a JUnit test, CacheConfiguration, and add the following lines: @RunWith(SpringJUnit4ClassRunner.class) @ContextConfiguration(locations="classpath:com/packt/cache/ applicationContext.xml") public class CacheConfiguration { @Autowired ApplicationContext context; @Test public void jobTest() throws Exception { EmployeeService employeeService = (EmployeeService)context.getBean(EmployeeService.class); long time = System.currentTimeMillis(); employeeService.getEmployee("101"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("101"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("102"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("102"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); employeeService.saveEmployee(new Employee("1000", "Sujoy", "Acharya")); time = System.currentTimeMillis(); employeeService.getEmployee("1000"); System.out.println("time taken ="+(System.currentTimeMillis() - time)); time = System.currentTimeMillis(); employeeService.getEmployee("1000"); System.out.println("time taken to read from cache ="+(System.currentTimeMillis() - time)); } } Note that the getEmployee method is invoked twice for each employee, and we recorded the method execution time in milliseconds. You will find from the output that every second call is answered by the cache, as the first call prints Loading a employee with id of : 101 and then the next call doesn't print the message but prints the time taken to execute. You will also find that the time taken for the cached objects is zero or less than the method invocation time. The following screenshot shows the output: Summary This article started with discovering the features of the new major Spring release 4.0, such as Java 8 support and so on. Then, we picked four Spring 4 topics and explored them one by one. The @Async section showcased the execution of long-running methods asynchronously and provided an example of how to handle asynchronous processing. The @RestController section eased the RESTful web service development with the advent of the @RestController annotation. The AsyncRestTemplate section explained the RESTful client code to invoke RESTful web service asynchronously. Caching is inevitable for a high-performance, scalable web application. The caching section explained the EhCache and Spring integrations to achieve a high-availability caching solution. Resources for Article: Further resources on this subject: Getting Started with Mockito [article] Progressive Mockito [article] Understanding outside-in [article]
Read more
  • 0
  • 0
  • 2002

article-image-going-beyond-zabbix-agents
Packt
03 Mar 2015
17 min read
Save for later

Going beyond Zabbix agents

Packt
03 Mar 2015
17 min read
In this article by Andrea Dalle Vacche and Stefano Kewan Lee, author of Zabbix Network Monitoring Essentials, we will learn the different possibilities Zabbix offers to the enterprising network administrator. There are certainly many advantages in using Zabbix's own agents and protocol when it comes to monitoring Windows and Unix operating systems or the applications that run on them. However, when it comes to network monitoring, the vast majority of monitored objects are network appliances of various kinds, where it's often impossible to install and run a dedicated agent of any type. This by no means implies that you'll be unable to fully leverage Zabbix's power to monitor your network. Whether it's a simple ICMP echo request, an SNMP query, an SNMP trap, netflow logging, or a custom script, there are many possibilities to extract meaningful data from your network. This section will show you how to set up these different methods of gathering data, and give you a few examples on how to use them. (For more resources related to this topic, see here.) Simple checks An interesting use case is using one or more net.tcp.service items to make sure that some services are not running on a given interface. Take for example, the case of a border router or firewall. Unless you have some very special and specific needs, you'll typically want to make sure that no admin consoles are available on the external interfaces. You might have double-checked the appliance's initial configuration, but a system update, a careless admin, or a security bug might change the aforesaid configuration and open your appliance's admin interfaces to a far wider audience than intended. A security breach like this one could pass unobserved for a long time unless you configure a few simple TCP/IP checks on your appliance's external interfaces and then set up some triggers that will report a problem if those checks report an open and responsive port. Let's take the example of the router with two production interfaces and a management interface shown in the section about host interfaces. If the router's HTTPS admin console is available on TCP port 8000, you'll want to configure a simple check item for every interface: Item name Item key management_https_console net.tcp.service[https,192.168.1.254,8000] zoneA_https_console net.tcp.service[https,10.10.1.254,8000] zoneB_https_console net.tcp.service[https,172.16.7.254,8000] All these checks will return 1 if the service is available, and 0 if the service is not available. What changes is how you implement the triggers on these items. For the management item, you'll have a problem if the service is not available, while for the other two, you'll have a problem if the service is indeed available, as shown in the following table: Trigger name Trigger expression Management console down {it-1759-r1:net.tcp.service[http,192.168.1.254,8000].last()}=0 Console available from zone A {it-1759-r1:net.tcp.service[http,10.10.1.254,8000].last()}=1 Console available from zone B {it-1759-r1:net.tcp.service[http,172.16.7.254,8000].last()}=1 This way, you'll always be able to make sure that your device's configuration when it comes to open or closed ports will always match your expected setup and be notified when it diverges from the standard you set. To summarize, simple checks are great for all cases where you don't need complex monitoring data from your network as they are quite fast and lightweight. For the same reason, they could be the preferred solution if you have to monitor availability for hundreds to thousands of hosts as they will impart a relatively low overhead on your overall network traffic. When you do need more structure and more detail in your monitoring data, it's time to move to the bread and butter of all network monitoring solutions: SNMP. Keeping SNMP simple The Simple Network Monitoring Protocol (SNMP) is an excellent, general purpose protocol that has become widely used beyond its original purpose. When it comes to network monitoring though, it's also often the only protocol supported by many appliances, so it's often a forced, albeit natural and sensible, choice to integrate it into your monitoring scenarios. As a network administrator, you probably already know all there is to know about SNMP and how it works, so let's focus on how it's integrated into Zabbix and what you can do with it. Mapping SNMP OIDs to Zabbix items An SNMP value is composed of three different parts: the OID, the data type, and the value itself. When you use snmpwalk or snmpget to get values from an SNMP agent, the output looks like this: SNMPv2-MIB::sysObjectID.0 = OID: CISCO-PRODUCTS-MIB::cisco3640DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (83414) 0:13:54.14SNMPv2-MIB::sysContact.0 = STRING:SNMPv2-MIB::sysName.0 = STRING: R1SNMPv2-MIB::sysLocation.0 = STRING: Upper floor room 13SNMPv2-MIB::sysServices.0 = INTEGER: 78SNMPv2-MIB::sysORLastChange.0 = Timeticks: (0) 0:00:00.00...IF-MIB::ifPhysAddress.24 = STRING: c4:1:22:4:f2:fIF-MIB::ifPhysAddress.26 = STRING:IF-MIB::ifPhysAddress.27 = STRING: c4:1:1e:c8:0:0IF-MIB::ifAdminStatus.1 = INTEGER: up(1)IF-MIB::ifAdminStatus.2 = INTEGER: down(2)… And so on. The first part, the one before the = sign is, naturally, the OID. This will go into the SNMP OID field in the Zabbix item creation page and is the unique identifier for the metric you are interested in. Some OIDs represent a single and unique metric for the device, so they are easy to identify and address. In the above excerpt, one such OID is DISMAN-EVENT-MIB::sysUpTimeInstance. If you are interested in monitoring that OID, you'd only have to fill out the item creation form with the OID itself and then define an item name, a data type, and a retention policy, and you are ready to start monitoring it. In the case of an uptime value, time-ticks are expressed in seconds, so you'll choose a numeric decimal data type. We'll see in the next section how to choose Zabbix item data types and how to store values based on SNMP data types. You'll also want to store the value as is and optionally specify a unit of measure. This is because an uptime is already a relative value as it expresses the time elapsed since a device's latest boot. There would be no point in calculating a further delta when getting this measurement. Finally, you'll define a polling interval and choose a retention policy. In the following example, the polling interval is shown to be 5 minutes (300 seconds), the history retention policy as 3 days, and the trend storage period as one year. These should be sensible values as you don't normally need to store the detailed history of a value that either resets to zero, or, by definition, grows linearly by one tick every second. The following screenshot encapsulates what has been discussed in this paragraph: Remember that the item's key value still has to be unique at the host/template level as it will be referenced to by all other Zabbix components, from calculated items to triggers, maps, screens, and so on. Don't forget to put the right credentials for SNMPv3 if you are using this version of the protocol. Many of the more interesting OIDs, though, are a bit more complex: multiple OIDs can be related to one another by means of the same index. Let's look at another snmpwalk output excerpt: IF-MIB::ifNumber.0 = INTEGER: 26IF-MIB::ifIndex.1 = INTEGER: 1IF-MIB::ifIndex.2 = INTEGER: 2IF-MIB::ifIndex.3 = INTEGER: 3…IF-MIB::ifDescr.1 = STRING: FastEthernet0/0IF-MIB::ifDescr.2 = STRING: Serial0/0IF-MIB::ifDescr.3 = STRING: FastEthernet0/1…IF-MIB::ifType.1 = INTEGER: ethernetCsmacd(6)IF-MIB::ifType.2 = INTEGER: propPointToPointSerial(22)IF-MIB::ifType.3 = INTEGER: ethernetCsmacd(6)…IF-MIB::ifMtu.1 = INTEGER: 1500IF-MIB::ifMtu.2 = INTEGER: 1500IF-MIB::ifMtu.3 = INTEGER: 1500…IF-MIB::ifSpeed.1 = Gauge32: 10000000IF-MIB::ifSpeed.2 = Gauge32: 1544000IF-MIB::ifSpeed.3 = Gauge32: 10000000…IF-MIB::ifPhysAddress.1 = STRING: c4:1:1e:c8:0:0IF-MIB::ifPhysAddress.2 = STRING:IF-MIB::ifPhysAddress.3 = STRING: c4:1:1e:c8:0:1…IF-MIB::ifAdminStatus.1 = INTEGER: up(1)IF-MIB::ifAdminStatus.2 = INTEGER: down(2)IF-MIB::ifAdminStatus.3 = INTEGER: down(2)…IF-MIB::ifOperStatus.1 = INTEGER: up(1)IF-MIB::ifOperStatus.2 = INTEGER: down(2)IF-MIB::ifOperStatus.3 = INTEGER: down(2)…IF-MIB::ifLastChange.1 = Timeticks: (1738) 0:00:17.38IF-MIB::ifLastChange.2 = Timeticks: (1696) 0:00:16.96IF-MIB::ifLastChange.3 = Timeticks: (1559) 0:00:15.59…IF-MIB::ifInOctets.1 = Counter32: 305255IF-MIB::ifInOctets.2 = Counter32: 0IF-MIB::ifInOctets.3 = Counter32: 0…IF-MIB::ifInDiscards.1 = Counter32: 0IF-MIB::ifInDiscards.2 = Counter32: 0IF-MIB::ifInDiscards.3 = Counter32: 0…IF-MIB::ifInErrors.1 = Counter32: 0IF-MIB::ifInErrors.2 = Counter32: 0IF-MIB::ifInErrors.3 = Counter32: 0…IF-MIB::ifOutOctets.1 = Counter32: 347968IF-MIB::ifOutOctets.2 = Counter32: 0IF-MIB::ifOutOctets.3 = Counter32: 0 As you can see, for every network interface, there are several OIDs, each one detailing a specific aspect of the interface: its name, its type, whether it's up or down, the amount of traffic coming in or going out, and so on. The different OIDs are related through their last number, the actual index of the OID. Looking at the preceding excerpt, we know that the device has 26 interfaces, of which we are showing some values for just the first three. By correlating the index numbers, we also know that interface 1 is called FastEthernet0/0, its MAC address is c4:1:1e:c8:0:0, the interface is up and has been up for just 17 seconds, and some traffic already went through it. Now, one way to monitor several of these metrics for the same interface is to manually correlate these values when creating the items, putting the complete OID in the SNMP OID field, and making sure that both the item key and its name reflect the right interface. This process is not only prone to errors during the setup phase, but it could also introduce some inconsistencies down the road. There is no guarantee, in fact, that the index will remain consistent across hardware or software upgrades or even across configurations when it comes to more volatile states like the number of VLANs or routing tables instead of network interfaces. Fortunately Zabbix provides a feature, called dynamic indexes, that allows you to actually correlate different OIDs in the same SNMP OID field so that you can define an index based on the index exposed by another OID. This means that if you want to know the admin status of FastEthernet0/0, you don't need to find the index associated with FastEthernet0/0 (in this case it would be 1) and then add that index to IF-MIB::ifAdminStatus of the base OID, hoping that it won't ever change in the future. You can instead use the following code: IF-MIB::ifAdminStatus["index", "IF-MIB::ifDescr",   "FastEthernet0/0"] Upon using the preceding code in the SNMP OID field of your item, the item will dynamically find the index of the IF-MIB::ifDescr OID where the value is FastEthernet0/0 and append it to IF-MIB::ifAdminStatus in order to get the right status for the right interface. If you organize your items this way, you'll always be sure that related items actually show the right related values for the component you are interested in and not those of another one because things changed on the device's side without your knowledge. Moreover, we'll build on this technique to develop low-level discovery of a device. You can use the same technique to get other interesting information out of a device. Consider, for example, the following excerpt: ENTITY-MIB::entPhysicalVendorType.1 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevChassis3640ENTITY-MIB::entPhysicalVendorType.2 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevContainerSlotENTITY-MIB::entPhysicalVendorType.3 = OID: CISCO-ENTITY-VENDORTYPEOID-MIB::cevCpu37452feENTITY-MIB::entPhysicalClass.1 = INTEGER: chassis(3)ENTITY-MIB::entPhysicalClass.2 = INTEGER: container(5)ENTITY-MIB::entPhysicalClass.3 = INTEGER: module(9)ENTITY-MIB::entPhysicalName.1 = STRING: 3745 chassisENTITY-MIB::entPhysicalName.2 = STRING: 3640 Chassis Slot 0ENTITY-MIB::entPhysicalName.3 = STRING: c3745 Motherboard with FastEthernet on Slot 0ENTITY-MIB::entPhysicalHardwareRev.1 = STRING: 2.0ENTITY-MIB::entPhysicalHardwareRev.2 = STRING:ENTITY-MIB::entPhysicalHardwareRev.3 = STRING: 2.0ENTITY-MIB::entPhysicalSerialNum.1 = STRING: FTX0945W0MYENTITY-MIB::entPhysicalSerialNum.2 = STRING:ENTITY-MIB::entPhysicalSerialNum.3 = STRING: XXXXXXXXXXX It should be immediately clear to you that you can find the chassis's serial number by creating an item with: ENTITY-MIB::entPhysicalSerialNum["index", "ENTITY-MIB::entPhysicalName", "3745 chassis"] Then you can specify, in the same item, that it should populate the Serial Number field of the host's inventory. This is how you can have a more automatic, dynamic population of inventory fields. The possibilities are endless as we've only just scratched the surface of what any given device can expose as SNMP metrics. Before you go and find your favorite OIDs to monitor though, let's have a closer look at the preceding examples, and let's discuss data types. Getting data types right We have already seen how an OID's value has a specific data type that is usually clearly stated with the default snmpwalk command. In the preceding examples, you can clearly see the data type just after the = sign, before the actual value. There are a number of SNMP data types—some still current and some deprecated. You can find the official list and documentation in RFC2578 (http://tools.ietf.org/html/rfc2578), but let's have a look at the most important ones from the perspective of a Zabbix user: SNMP type Description Suggested Zabbix item type and options INTEGER This can have negative values and is usually used for enumerations Numeric unsigned, decimal Store value as is Show with value mappings STRING This is a regular character string and can contain new lines Text Store value as is OID This is an SNMP object identifier Character Store value as is IpAddress IPv4 only Character Store value as is Counter32 This includes only non-negative and nondecreasing values Numeric unsigned, decimal Store value as delta (speed per second) Gauge32 This includes only non-negative values, which can decrease Numeric unsigned, decimal Store value as is Counter64 This includes non-negative and nondecreasing 64-bit values Numeric unsigned, decimal Store value as delta (speed per second) TimeTicks This includes non-negative, nondecreasing values Numeric unsigned, decimal Store value as is First of all, remember that the above suggestions are just that—suggestions. You should always evaluate how to store your data on a case-by-case basis, but you'll probably find that in many cases those are indeed the most useful settings. Moving on to the actual data types, remember that the command line SNMP tools by default parse the values and show some already interpreted information. This is especially true for Timeticks values and for INTEGER values when these are used as enumerations. In other words, you see the following from the command line: VRRP-MIB::vrrpNotificationCntl.0 = INTEGER: disabled(2) However, what is actually passed as a request is the bare OID: 1.3.6.1.2.1.68.1.2.0 The SNMP agent will respond with just the value, which, in this case, is the value 2. This means that in the case of enumerations, Zabbix will just receive and store a number and not the string disabled(2) as seen from the command line. If you want to display monitoring values that are a bit clearer, you can apply value mappings to your numeric items. Value maps contain the mapping between numeric values and arbitrary string representations for a human-friendly representation. You can specify which one you need in the item configuration form, as follows: Zabbix comes with a few predefined value mappings. You can create your own mappings by following the show value mappings link and, provided you have admin roles on Zabbix, you'll be taken to a page where you can configure all value mappings that will be used by Zabbix. From there, click on Create value map in the upper-right corner of the page, and you'll be able to create a new mapping. Not all INTEGER values are enumerations, but those that are used as such will be clearly recognizable from your command-line tools as they will be defined as INTEGER values but will show a string label along with the actual value, just as in the preceding example. On the other hand, when they are not used as enumerations, they can represent different things depending on the context. As seen in the previous paragraph, they can represent the number of indexes available for a given OID. They can also represent application or protocol-specific values, such as default MTU, default TTL, route metrics, and so on. The main difference between gauges, counters, and integers is that integers can assume negative values, while gauges and counters cannot. In addition to that, counters can only increase or wrap around and start again from the bottom of their value range once they reach the upper limits of it. From the perspective of Zabbix, this marks the difference in how you'll want to store their values. Gauges are usually employed when a value can vary within a given range, such as the speed of an interface, the amount of free memory, or any limits and timeouts you might find for notifications, the number of instances, and so on. In all of these cases, the value can increase or decrease in time, so you'll want to store them as they are because once put on a graph, they'll draw a meaningful curve. Counters, on the other hand, can only increase by definition. They are typically used to show how many packets were processed by an interface, how many were dropped, how many errors were encountered, and so on. If you store counter values as they are, you'll find in your graphs some ever-ascending curves that won't tell you very much for your monitoring or capacity planning purposes. This is why you'll usually want to track a counter's amount of change in time, more than its actual value. To do that, Zabbix offers two different ways to store deltas or differences between successive values. The delta (simple change) storage method does exactly what it says: it simply computes the difference between the currently received value and the previously received one, and stores the result. It doesn't take into consideration the elapsed time between the two measurements, nor the fact that the result can even have a negative value if the counter overflows. The fact is that most of the time, you'll be very interested in evaluating how much time has passed between two different measurements and in treating correctly any negative values that can appear as a result. The delta (speed per second) will divide the difference between the currently received value and the previously received one by the difference between the current timestamp and the previous one, as follows: (value – prev_value)/(time - prev_time) This will ensure that the scale of the change will always be constant, as opposed to the scale of the simple change delta, which will vary every time you modify the update interval of the item, giving you inconsistent results. Moreover, the speed-per-second delta will ignore any negative values and just wait for the next measurement, so you won't find any false dips in your graph due to overflowing. Finally, while SNMP uses specific data types for IP addresses and SNMP OIDs, there are no such types in Zabbix, so you'll need to map them to some kind of string item. The suggested type here is character as both values won't be bigger than 255 characters and won't contain any newlines. String values, on the other hand, can be quite long as the SNMP specification allows for 65,535-character-long texts; however, text that long would be of little practical value. Even if they are usually much shorter, string values can often contain newlines and be longer than 255 characters. Consider, for example, the following SysDescr OID for this device: NMPv2-MIB::sysDescr.0 = STRING: Cisco IOS Software, 3700 Software(C3745-ADVENTERPRISEK9_SNA-M), Version 12.4(15)T14, RELEASE SOFTWARE(fc2)^MTechnical Support: http://www.cisco.com/techsupport^MCopyright (c) 1986-2010 by Cisco Systems, Inc.^MCompiled Tue 17-Aug-10 12:56 by prod_rel_tea As you can see, the string spans multiple lines, and it's definitely longer than 255 characters. This is why the suggested type for string values is text as it allows text of arbitrary length and structure. On the other hand, if you're sure that a specific OID value will always be much shorter and simpler, you can certainly use the character data type for your corresponding Zabbix item. Now, you are truly ready to get the most out of your devices' SNMP agents as you are now able to find the OID you want to monitor and map them perfectly to Zabbix items, down to how to store the values, their data types, with what frequency, and with any value mapping that might be necessary. Summary In this article, you have learned the different possibilities offered by Zabbix to the enterprising network administrator. You should now be able to choose, design, and implement all the monitoring items you need, based on the methods illustrated in the preceding paragraphs. Resources for Article: Further resources on this subject: Monitoring additional servers [Article] Bar Reports in Zabbix 1.8 [Article] Using Proxies to Monitor Remote Locations with Zabbix 1.8 [Article]
Read more
  • 0
  • 0
  • 15146

article-image-basics-programming-julia
Packt
03 Mar 2015
17 min read
Save for later

Basics of Programming in Julia

Packt
03 Mar 2015
17 min read
 In this article by Ivo Balbaert, author of the book Getting Started with Julia Programming, we will explore how Julia interacts with the outside world, reading from standard input and writing to standard output, files, networks, and databases. Julia provides asynchronous networking I/O using the libuv library. We will see how to handle data in Julia. We will also discover the parallel processing model of Julia. In this article, the following topics are covered: Working with files (including the CSV files) Using DataFrames (For more resources related to this topic, see here.) Working with files To work with files, we need the IOStream type. IOStream is a type with the supertype IO and has the following characteristics: The fields are given by names(IOStream) 4-element Array{Symbol,1}:  :handle   :ios    :name   :mark The types are given by IOStream.types (Ptr{None}, Array{Uint8,1}, String, Int64) The file handle is a pointer of the type Ptr, which is a reference to the file object. Opening and reading a line-oriented file with the name example.dat is very easy: // code in Chapter 8io.jl fname = "example.dat"                                 f1 = open(fname) fname is a string that contains the path to the file, using escaping of special characters with when necessary; for example, in Windows, when the file is in the test folder on the D: drive, this would become d:\test\example.dat. The f1 variable is now an IOStream(<file example.dat>) object. To read all lines one after the other in an array, use data = readlines(f1), which returns 3-element Array{Union(ASCIIString,UTF8String),1}: "this is line 1.rn" "this is line 2.rn" "this is line 3." For processing line by line, now only a simple loop is needed: for line in data   println(line) # or process line end close(f1) Always close the IOStream object to clean and save resources. If you want to read the file into one string, use readall. Use this only for relatively small files because of the memory consumption; this can also be a potential problem when using readlines. There is a convenient shorthand with the do syntax for opening a file, applying a function process, and closing it automatically. This goes as follows (file is the IOStream object in this code): open(fname) do file     process(file) end The do command creates an anonymous function, and passes it to open. Thus, the previous code example would have been equivalent to open(process, fname). Use the same syntax for processing a file fname line by line without the memory overhead of the previous methods, for example: open(fname) do file     for line in eachline(file)         print(line) # or process line     end end Writing a file requires first opening it with a "w" flag, then writing strings to it with write, print, or println, and then closing the file handle that flushes the IOStream object to the disk: fname =   "example2.dat" f2 = open(fname, "w") write(f2, "I write myself to a filen") # returns 24 (bytes written) println(f2, "even with println!") close(f2) Opening a file with the "w" option will clear the file if it exists. To append to an existing file, use "a". To process all the files in the current folder (or a given folder as an argument to readdir()), use this for loop: for file in readdir()   # process file end Reading and writing CSV files A CSV file is a comma-separated file. The data fields in each line are separated by commas "," or another delimiter such as semicolons ";". These files are the de-facto standard for exchanging small and medium amounts of tabular data. Such files are structured so that one line contains data about one data object, so we need a way to read and process the file line by line. As an example, we will use the data file Chapter 8winequality.csv that contains 1,599 sample measurements, 12 data columns, such as pH and alcohol per sample, separated by a semicolon. In the following screenshot, you can see the top 20 rows:   In general, the readdlm function is used to read in the data from the CSV files: # code in Chapter 8csv_files.jl: fname = "winequality.csv" data = readdlm(fname, ';') The second argument is the delimiter character (here, it is ;). The resulting data is a 1600x12 Array{Any,2} array of the type Any because no common type could be found:     "fixed acidity"   "volatile acidity"      "alcohol"   "quality"      7.4                        0.7                                9.4              5.0      7.8                        0.88                              9.8              5.0      7.8                        0.76                              9.8              5.0   … If the data file is comma separated, reading it is even simpler with the following command: data2 = readcsv(fname) The problem with what we have done until now is that the headers (the column titles) were read as part of the data. Fortunately, we can pass the argument header=true to let Julia put the first line in a separate array. It then naturally gets the correct datatype, Float64, for the data array. We can also specify the type explicitly, such as this: data3 = readdlm(fname, ';', Float64, 'n', header=true) The third argument here is the type of data, which is a numeric type, String or Any. The next argument is the line separator character, and the fifth indicates whether or not there is a header line with the field (column) names. If so, then data3 is a tuple with the data as the first element and the header as the second, in our case, (1599x12 Array{Float64,2}, 1x12 Array{String,2}) (There are other optional arguments to define readdlm, see the help option). In this case, the actual data is given by data3[1] and the header by data3[2]. Let's continue working with the variable data. The data forms a matrix, and we can get the rows and columns of data using the normal array-matrix syntax). For example, the third row is given by row3 = data[3, :] with data:  7.8  0.88  0.0  2.6  0.098  25.0  67.0  0.9968  3.2  0.68  9.8  5.0, representing the measurements for all the characteristics of a certain wine. The measurements of a certain characteristic for all wines are given by a data column, for example, col3 = data[ :, 3] represents the measurements of citric acid and returns a column vector 1600-element Array{Any,1}:   "citric acid" 0.0  0.0  0.04  0.56  0.0  0.0 …  0.08  0.08  0.1  0.13  0.12  0.47. If we need columns 2-4 (volatile acidity to residual sugar) for all wines, extract the data with x = data[:, 2:4]. If we need these measurements only for the wines on rows 70-75, get these with y = data[70:75, 2:4], returning a 6 x 3 Array{Any,2} outputas follows: 0.32   0.57  2.0 0.705  0.05  1.9 … 0.675  0.26  2.1 To get a matrix with the data from columns 3, 6, and 11, execute the following command: z = [data[:,3] data[:,6] data[:,11]] It would be useful to create a type Wine in the code. For example, if the data is to be passed around functions, it will improve the code quality to encapsulate all the data in a single data type, like this: type Wine     fixed_acidity::Array{Float64}     volatile_acidity::Array{Float64}     citric_acid::Array{Float64}     # other fields     quality::Array{Float64} end Then, we can create objects of this type to work with them, like in any other object-oriented language, for example, wine1 = Wine(data[1, :]...), where the elements of the row are splatted with the ... operator into the Wine constructor. To write to a CSV file, the simplest way is to use the writecsv function for a comma separator, or the writedlm function if you want to specify another separator. For example, to write an array data to a file partial.dat, you need to execute the following command: writedlm("partial.dat", data, ';') If more control is necessary, you can easily combine the more basic functions from the previous section. For example, the following code snippet writes 10 tuples of three numbers each to a file: // code in Chapter 8tuple_csv.jl fname = "savetuple.csv" csvfile = open(fname,"w") # writing headers: write(csvfile, "ColName A, ColName B, ColName Cn") for i = 1:10   tup(i) = tuple(rand(Float64,3)...)   write(csvfile, join(tup(i),","), "n") end close(csvfile) Using DataFrames If you measure n variables (each of a different type) of a single object of observation, then you get a table with n columns for each object row. If there are m observations, then we have m rows of data. For example, given the student grades as data, you might want to know "compute the average grade for each socioeconomic group", where grade and socioeconomic group are both columns in the table, and there is one row per student. The DataFrame is the most natural representation to work with such a (m x n) table of data. They are similar to pandas DataFrames in Python or data.frame in R. A DataFrame is a more specialized tool than a normal array for working with tabular and statistical data, and it is defined in the DataFrames package, a popular Julia library for statistical work. Install it in your environment by typing in Pkg.add("DataFrames") in the REPL. Then, import it into your current workspace with using DataFrames. Do the same for the packages DataArrays and RDatasets (which contains a collection of example datasets mostly used in the R literature). A common case in statistical data is that data values can be missing (the information is not known). The DataArrays package provides us with the unique value NA, which represents a missing value, and has the type NAtype. The result of the computations that contain the NA values mostly cannot be determined, for example, 42 + NA returns NA. (Julia v0.4 also has a new Nullable{T} type, which allows you to specify the type of a missing value). A DataArray{T} array is a data structure that can be n-dimensional, behaves like a standard Julia array, and can contain values of the type T, but it can also contain the missing (Not Available) values NA and can work efficiently with them. To construct them, use the @data macro: // code in Chapter 8dataarrays.jl using DataArrays using DataFrames dv = @data([7, 3, NA, 5, 42]) This returns 5-element DataArray{Int64,1}: 7  3   NA  5 42. The sum of these numbers is given by sum(dv) and returns NA. One can also assign the NA values to the array with dv[5] = NA; then, dv becomes [7, 3, NA, 5, NA]). Converting this data structure to a normal array fails: convert(Array, dv) returns ERROR: NAException. How to get rid of these NA values, supposing we can do so safely? We can use the dropna function, for example, sum(dropna(dv)) returns 15. If you know that you can replace them with a value v, use the array function: repl = -1 sum(array(dv, repl)) # returns 13 A DataFrame is a kind of an in-memory database, versatile in the ways you can work with the data. It consists of columns with names such as Col1, Col2, Col3, and so on. Each of these columns are DataArrays that have their own type, and the data they contain can be referred to by the column names as well, so we have substantially more forms of indexing. Unlike two-dimensional arrays, columns in a DataFrame can be of different types. One column might, for instance, contain the names of students and should therefore be a string. Another column could contain their age and should be an integer. We construct a DataFrame from the program data as follows: // code in Chapter 8dataframes.jl using DataFrames # constructing a DataFrame: df = DataFrame() df[:Col1] = 1:4 df[:Col2] = [e, pi, sqrt(2), 42] df[:Col3] = [true, false, true, false] show(df) Notice that the column headers are used as symbols. This returns the following 4 x 3 DataFrame object: We could also have used the full constructor as follows: df = DataFrame(Col1 = 1:4, Col2 = [e, pi, sqrt(2), 42],    Col3 = [true, false, true, false]) You can refer to the columns either by an index (the column number) or by a name, both of the following expressions return the same output: show(df[2]) show(df[:Col2]) This gives the following output: [2.718281828459045, 3.141592653589793, 1.4142135623730951,42.0] To show the rows or subsets of rows and columns, use the familiar splice (:) syntax, for example: To get the first row, execute df[1, :]. This returns 1x3 DataFrame.  | Row | Col1 | Col2    | Col3 |  |-----|------|---------|------|  | 1   | 1    | 2.71828 | true | To get the second and third row, execute df [2:3, :] To get only the second column from the previous result, execute df[2:3, :Col2]. This returns [3.141592653589793, 1.4142135623730951]. To get the second and third column from the second and third row, execute df[2:3, [:Col2, :Col3]], which returns the following output: 2x2 DataFrame  | Row | Col2    | Col3  |  |---- |-----   -|-------|  | 1   | 3.14159 | false |  | 2   | 1.41421 | true  | The following functions are very useful when working with DataFrames: The head(df) and tail(df) functions show you the first six and the last six lines of data respectively. The names function gives the names of the columns names(df). It returns 3-element Array{Symbol,1}:  :Col1  :Col2  :Col3. The eltypes function gives the data types of the columns eltypes(df). It gives the output as 3-element Array{Type{T<:Top},1}:  Int64  Float64  Bool. The describe function tries to give some useful summary information about the data in the columns, depending on the type, for example, describe(df) gives for column 2 (which is numeric) the min, max, median, mean, number, and percentage of NAs: Col2 Min      1.4142135623730951 1st Qu.  2.392264761937558  Median   2.929937241024419 Mean     12.318522011105483  3rd Qu.  12.856194490192344  Max      42.0  NAs      0  NA%      0.0% To load in data from a local CSV file, use the method readtable. The returned object is of type DataFrame: // code in Chapter 8dataframes.jl using DataFrames fname = "winequality.csv" data = readtable(fname, separator = ';') typeof(data) # DataFrame size(data) # (1599,12) Here is a fraction of the output: The readtable method also supports reading in gzipped CSV files. Writing a DataFrame to a file can be done with the writetable function, which takes the filename and the DataFrame as arguments, for example, writetable("dataframe1.csv", df). By default, writetable will use the delimiter specified by the filename extension and write the column names as headers. Both readtable and writetable support numerous options for special cases. Refer to the docs for more information (refer to http://dataframesjl.readthedocs.org/en/latest/). To demonstrate some of the power of DataFrames, here are some queries you can do: Make a vector with only the quality information data[:quality] Give the wines with alcohol percentage equal to 9.5, for example, data[ data[:alcohol] .== 9.5, :] Here, we use the .== operator, which does element-wise comparison. data[:alcohol] .== 9.5 returns an array of Boolean values (true for datapoints, where :alcohol is 9.5, and false otherwise). data[boolean_array, : ] selects those rows where boolean_array is true. Count the number of wines grouped by quality with by(data, :quality, data -> size(data, 1)), which returns the following: 6x2 DataFrame | Row | quality | x1  | |-----|---------|-----| | 1    | 3      | 10  | | 2    | 4      | 53  | | 3    | 5      | 681 | | 4    | 6      | 638 | | 5    | 7      | 199 | | 6    | 8      | 18  | The DataFrames package contains the by function, which takes in three arguments: A DataFrame, here it takes data A column to split the DataFrame on, here it takes quality A function or an expression to apply to each subset of the DataFrame, here data -> size(data, 1), which gives us the number of wines for each quality value Another easy way to get the distribution among quality is to execute the histogram hist function hist(data[:quality]) that gives the counts over the range of quality (2.0:1.0:8.0,[10,53,681,638,199,18]). More precisely, this is a tuple with the first element corresponding to the edges of the histogram bins, and the second denoting the number of items in each bin. So there are, for example, 10 wines with quality between 2 and 3, and so on. To extract the counts as a variable count of type Vector, we can execute _, count = hist(data[:quality]); the _ means that we neglect the first element of the tuple. To obtain the quality classes as a DataArray class, we will execute the following: class = sort(unique(data[:quality])) We can now construct a df_quality DataFrame with the class and count columns as df_quality = DataFrame(qual=class, no=count). This gives the following output: 6x2 DataFrame | Row | qual | no  | |-----|------|-----| | 1   | 3    | 10  | | 2   | 4    | 53  | | 3   | 5    | 681 | | 4   | 6    | 638 | | 5   | 7    | 199 | | 6   | 8    | 18  | To deepen your understanding and learn about the other features of Julia DataFrames (such as joining, reshaping, and sorting), refer to the documentation available at http://dataframesjl.readthedocs.org/en/latest/. Other file formats Julia can work with other human-readable file formats through specialized packages: For JSON, use the JSON package. The parse method converts the JSON strings into Dictionaries, and the json method turns any Julia object into a JSON string. For XML, use the LightXML package For YAML, use the YAML package For HDF5 (a common format for scientific data), use the HDF5 package For working with Windows INI files, use the IniFile package Summary In this article we discussed the basics of network programming in Julia. Resources for Article: Further resources on this subject: Getting Started with Electronic Projects? [article] Getting Started with Selenium Webdriver and Python [article] Handling The Dom In Dart [article]
Read more
  • 0
  • 0
  • 18945

article-image-introducing-splunk
Packt
03 Mar 2015
14 min read
Save for later

Introducing Splunk

Packt
03 Mar 2015
14 min read
In this article by Betsy Page Sigman, author of the book Splunk Essentials, Splunk, whose "name was inspired by the process of exploring caves, or splunking, helps analysts, operators, programmers, and many others explore data from their organizations by obtaining, analyzing, and reporting on it. This multinational company, cofounded by Michael Baum, Rob Das, and Erik Swan, has a core product called "Splunk Enterprise. This manages searches, inserts, deletes, and filters, and analyzes big data that is generated by machines, as well as other types of data. "They also have a free version that has most of the capabilities of Splunk Enterprise and is an excellent learning tool. (For more resources related to this topic, see here.) Understanding events, event types, and fields in Splunk An understanding of events and event types is important before going further. Events In Splunk, an event is not just one of" the many local user meetings that are set up between developers to help each other out (although those can be very useful), "but also refers to a record of one activity that is recorded in a log file. Each event usually has: A timestamp indicating the date and exact time the event was created Information about what happened on the system that is being tracked Event types An event type is a way to allow "users to categorize similar events. It is field-defined by the user. You can define an event type in several ways, and the easiest way is by using the SplunkWeb interface. One common reason for setting up an event type is to examine why a system has failed. Logins are often problematic for systems, and a search for failed logins can help pinpoint problems. For an interesting example of how to save "a search on failed logins as an event type, visit http://docs.splunk.com/Documentation/Splunk/6.1.3/Knowledge/ClassifyAndGroupSimilarEvents#Save_a_search_as_a_new_event_type. Why are events and event types so important in Splunk? Because without events, there would be nothing to search, of course. And event types allow us to make meaningful searches easily and quickly according to our needs, as we'll see later. Sourcetypes Sourcetypes are also "important to understand, as they help define the rules for an event. A sourcetype is one of the default fields that Splunk assigns to data as it comes into the system. It determines what type of data it is so that Splunk can format it appropriately as it indexes it. This also allows the user who wants to search the "data to easily categorize it. Some of the common sourcetypes are listed as follows: access_combined, for "NCSA combined format HTTP web server logs apache_error, for standard "Apache web server error logs cisco_syslog, for the "standard syslog produced by Cisco network devices (including PIX firewalls, routers, and ACS), usually via remote syslog to a central log host websphere_core, a core file" export from WebSphere (Source: http://docs.splunk.com/Documentation/Splunk/latest/Data/Whysourcetypesmatter) Fields Each event in Splunk is" associated with a number of fields. The core fields of host, course, sourcetype, and timestamp are key to Splunk. These fields are extracted from events at multiple points in the data processing pipeline that Splunk uses, and each of these fields includes a name and a value. The name describes the field (such as the userid) and the value says what that field's value is (susansmith, for example). Some of these fields are default fields that are given because of where the event came from or what it is. When data is processed by Splunk, and when it is indexed or searched, it uses these fields. For indexing, the default fields added include those of host, source, and sourcetype. When searching, Splunk is able to select from a bevy of fields that can either be defined by the user or are very basic, such as action results in a purchase (for a website event). Fields are essential for doing the basic work of Splunk – that is, indexing and searching. Getting data into Splunk It's time to spring into action" now and input some data into Splunk. Adding data is "simple, easy, and quick. In this section, we will use some data and tutorials created by Splunk to learn how to add data: Firstly, to obtain your data, visit the tutorial data at http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchTutorial/GetthetutorialdataintoSplunk that is readily available on Splunk. Here, download the folder tutorialdata.zip. Note that this will be a fresh dataset that has been collected over the last 7 days. Download it but don't extract the data from it just yet. You then need to log in to Splunk, using admin as the username and then by using your password. Once logged in, you will notice that toward the upper-right corner of your screen is the button Add Data, as shown in the following screenshot. Click "on this button: Button to Add Data Once you have "clicked on this button, you'll see a screen" similar to the "following screenshot: Add Data to Splunk by Choosing a Data Type or Data Source Notice here the "different types of data that you can select, as "well as the different data sources. Since the data we're going to use is a file, under "Or Choose a Data Source, click on From files and directories. Once you have clicked on this, you can then click on the radio button next to Skip preview, as indicated in the following screenshot, since you don't need to preview the data" now. You then need to click on "Continue: Preview data You can download the tutorial files at: http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchTutorial/GetthetutorialdataintoSplunk As shown in the next screenshot, click on Upload and index a file, find the tutorialdata.zip file you just downloaded (it is probably in your Downloads folder), and then click on More settings, filling it in as shown in the following screenshot. (Note that you will need to select Segment in path under Host and type 1 under Segment Number.) Click on Save when you are done: Can specify source, additional settings, and source type Following this, you "should see a screen similar to the following" screenshot. Click on Start Searching, we will look at the data now: You should see this if your data has been successfully indexed into Splunk. You will now" see a screen similar to the following" screenshot. Notice that the number of events you have will be different, as will the time of the earliest event. At this point, click on Data Summary: The Search screen You should see the Data Summary screen like in the following screenshot. However, note that the Hosts shown here will not be the same as the ones you get. Take a quick look at what is on the Sources tab and the Sourcetypes tab. Then find the most recent data (in this case 127.0.0.1) and click on it. Data Summary, where you can see Hosts, Sources, and Sourcetypes After" clicking on the most recent data, which in "this case is bps-T341s, look at the events contained there. Later, when we use streaming data, we can see how the events at the top of this list change rapidly. Here, you will see a listing of events, similar to those shown in the "following screenshot: Events lists for the host value You can click on the Splunk logo in the upper-left corner "of the web page to return to the home page. Under Administrator at the "top-right of the page, click on Logout. Searching Twitter data We will start here by doing a simple search of our Twitter index, which is automatically created by the app once you have enabled Twitter input (as explained previously). In our earlier searches, we used the default index (which the tutorial data was downloaded to), so we didn't have to specify the index we wanted to use. Here, we will use just the Twitter index, so we need to specify that in the search. A simple search Imagine that we wanted to search for tweets containing the word coffee. We could use the code presented here and place it in the search bar: index=twitter text=*coffee* The preceding code searches only your Twitter index and finds all the places where the word coffee is mentioned. You have to put asterisks there, otherwise you will only get the tweets with just "coffee". (Note that the text field is not case sensitive, so tweets with either "coffee" or "Coffee" will be included in the search results.) The asterisks are included before and after the text "coffee" because otherwise we would only get events where just "coffee" was tweeted – a rather rare occurrence, we expect. In fact, when we search our indexed Twitter data without the asterisks around coffee, we got no results. Examining the Twitter event Before going further, it is useful to stop and closely examine the events that are collected as part of the search. The sample tweet shown in the following screenshot shows the large number of fields that are part of each tweet. The > was clicked to expand the event: A Twitter event There are several items to look closely at here: _time: Splunk assigns a timestamp for every event. This is done in UTC (Coordinated Universal Time) time format. contributors: The value for this field is null, as are the values of many Twitter fields. Retweeted_status: Notice the {+} here; in the following event list, you will see there are a number of fields associated with this, which can be seen when the + is selected and the list is expanded. This is the case wherever you see a {+} in a list of fields: Various retweet fields In addition to those shown previously, there are many other fields associated with a tweet. The 140 character (maximum) text field that most people consider to be the tweet is actually a small part of the actual data collected. The implied AND If you want to search on more than one term, there is no need to add AND as it is already implied. If, for example, you want to search for all tweets that include both the text "coffee" and the text "morning", then use: index=twitter text=*coffee* text=*morning* If you don't specify text= for the second term and just put *morning*, Splunk assumes that you want to search for *morning* in any field. Therefore, you could get that word in another field in an event. This isn't very likely in this case, although coffee could conceivably be part of a user's name, such as "coffeelover". But if you were searching for other text strings, such as a computer term like log or error, such terms could be found in a number of fields. So specifying the field you are interested in would be very important. The need to specify OR Unlike AND, you must always specify the word OR. For example, to obtain all events that mention either coffee or morning, enter: index=twitter text=*coffee* OR text=*morning* Finding other words used Sometimes you might want to find out what other words are used in tweets about coffee. You can do that with the following search: index=twitter text=*coffee* | makemv text | mvexpand text | top 30 text This search first searches for the word "coffee" in a text field, then creates a multivalued field from the tweet, and then expands it so that each word is treated as a separate piece of text. Then it takes the top 30 words that it finds. You might be asking yourself how you would use this kind of information. This type of analysis would be of interest to a marketer, who might want to use words that appear to be associated with coffee in composing the script for an advertisement. The following screenshot shows the results that appear (1 of 2 pages). From this search, we can see that the words love, good, and cold might be words worth considering: Search of top 30 text fields found with *coffee* When you do a search like this, you will notice that there are a lot of filler words (a, to, for, and so on) that appear. You can do two things to remedy this. You can increase the limit for top words so that you can see more of the words that come up, or you can rerun the search using the following code. "Coffee" (with a capital C) is listed (on the unshown second page) separately here from "coffee". The reason for this is that while the search is not case sensitive (thus both "coffee" and "Coffee" are picked up when you search on "coffee"), the process of putting the text fields through the makemv and the mvexpand processes ends up distinguishing on the basis of case. We could rerun the search, excluding some of the filler words, using the code shown here: index=twitter text=*coffee* | makemv text | mvexpand text |search NOT text="RT" AND NOT text="a" AND NOT text="to" ANDNOT text="the" | top 30 text Using a lookup table Sometimes it is useful to use a lookup file to avoid having to use repetitive code. It would help us to have a list of all the small words that might be found often in a tweet just by the nature of each word's frequent use in language, so that we might eliminate them from our quest to find words that would be relevant for use in the creation of advertising. If we had a file of such small words, we could use a command indicating not to use any of these more common, irrelevant words when listing the top 30 words associated with our search topic of interest. Thus, for our search for words associated with the text "coffee", we would be interested in words like " dark", "flavorful", and "strong", but not words like "a", "the", and "then". We can do this using a lookup command. There are three types of lookup commands, which are presented in the following table: Command Description lookup Matches a value of one field with a value of another, based on a .csv file with the two fields. Consider a lookup table named lutable that contains fields for machine_name and owner. Consider what happens when the following code snippet is used after a preceding search (indicated by . . . |): . . . | lookup lutable owner Splunk will use the lookup table to match the owner's name with its machine_name and add the machine_name to each event. inputlookup All fields in the .csv file are returned as results. If the following code snippet is used, both machine_name and owner would be searched: . . . | inputlookup lutable outputlookup This code outputs search results to a lookup table. The following code outputs results from the preceding research directly into a table it creates: . . . | outputlookup newtable.csv saves The command we will use here is inputlookup, because we want to reference a .csv file we can create that will include words that we want to filter out as we seek to find possible advertising words associated with coffee. Let's call the .csv file filtered_words.csv, and give it just a single text field, containing words like "is", "the", and "then". Let's rewrite the search to look like the following code: index=twitter text=*coffee*| makemv text | mvexpand text| search NOT [inputlookup filtered_words | fields text ]| top 30 text Using the preceding code, Splunk will search our Twitter index for *coffee*, and then expand the text field so that individual words are separated out. Then it will look for words that do NOT match any of the words in our filtered_words.csv file, and finally output the top 30 most frequently found words among those. As you can see, the lookup table can be very useful. To learn more about Splunk lookup tables, go to http://docs.splunk.com/Documentation/Splunk/6.1.5/SearchReference/Lookup. Summary In this article, we have learned more about how to use Splunk to create reports, dashboards. Splunk Enterprise Software, or Splunk, is an extremely powerful tool for searching, exploring, and visualizing data of all types. Splunk is becoming increasingly popular, as more and more businesses, both large and small, discover its ease and usefulness. Analysts, managers, students, and others can quickly learn how to use the data from their systems, networks, web traffic, and social media to make attractive and informative reports. This is a straightforward, practical, and quick introduction to Splunk that should have you making reports and gaining insights from your data in no time. Resources for Article: Further resources on this subject: Lookups [article] Working with Apps in Splunk [article] Loading data, creating an app, and adding dashboards and reports in Splunk [article]
Read more
  • 0
  • 0
  • 11723

article-image-central-air-and-heating-thermostat
Packt
03 Mar 2015
15 min read
Save for later

Central Air and Heating Thermostat

Packt
03 Mar 2015
15 min read
In this article by Andrew K. Dennis, author of the book Raspberry Pi Home Automation with Arduino Second Edition, you will learn how to build a thermostat device using an Arduino. You will also learn how to use the temperature data to switch relays on and off. Relays are the main components that you can use for interaction between your Arduino and high-voltage electronic devices. The thermostat will also provide a web interface so that you can connect to it and check out the temperature. (For more resources related to this topic, see here.) Introducing the thermostat A thermostat is a control device that is used to manipulate other devices based on a temperature setting. This temperature setting is known as the setpoint. When the temperature changes in relation to the setpoint, a device can be switched on or off. For example, let's imagine a system where a simple thermostat is set to switch an electric heater on when the temperature drops below 25 degrees Celsius. Within our thermostat, we have a temperature-sensing device such as a thermistor that returns a temperature reading every few seconds. When the thermistor reads a temperature below the setpoint (25 degrees Celsius), the thermostat will switch a relay on, completing the circuit between the wall plug and our electric heater and providing it with power. Thus, we can see that a simple electronic thermostat can be used to switch on a variety of devices. Warren S. Johnson, a college professor in Wisconsin, is credited with inventing the electric room thermostat in the 1880s. Johnson was known throughout his lifetime as a prolific inventor who worked in a variety of fields, including electricity. These electric room thermostats became a common feature in homes across the course of the twentieth century as larger parts of the world were hooked up the electricity grid. Now, with open hardware electronic tools such as the Arduino available, we can build custom thermostats for a variety of home projects. They can be used to control baseboard heaters, heat lamps, and air conditioner units. They can also be used for the following: Fish tank heaters Indoor gardens Electric heaters Fans Now that we have explored the uses of thermostats, let's take a look at our project. Setting up our hardware In the following examples, we will list the pins to which you need to connect your hardware. However, we recommend that when you purchase any device such as the Ethernet shield, you check whether certain pins are available or not. Due to the sheer range of hardware available, it is not possible to list every potential hardware combination. Therefore, if the pin in the example is not free, you can update the circuit and source code to use a different pin. When building the example, we also recommend using a breadboard. This will allow you to experiment with building your circuit without having to solder any components. Our first task will be to set up our thermostat device so that it has Ethernet access. Adding the Ethernet shield The Arduino Uno does not contain an Ethernet port. Therefore, you will need a way for your thermostat to be accessible on your home network. One simple solution is to purchase an Ethernet shield and connect it to your microcontroller. There are several shields in the market, including the Arduino Ethernet shield (http://arduino.cc/en/Main/ArduinoEthernetShield) and Seeed Ethernet shield (http://www.seeedstudio.com/wiki/Ethernet_Shield_V1.0). These shields are plugged into the GPIO pins on the Arduino. If you purchase one of these shields, then we would also recommend buying some extra GPIO headers. These are plugged into the existing headers attached to the Ethernet shield. Their purpose is to provide some extra clearance above the Ethernet port on the board so that you can connect other shields in future if you decide to purchase them. Take a board of your choice and attach it to the Arduino Uno. When you plug the USB cable into your microcontroller and into your computer, the lights on both the Uno and Ethernet shield should light up. Now our device has a medium to send and receive data over a LAN. Let's take a look at setting up our thermostat relays. Relays A relay is a type of switch controlled by an electromagnet. It allows us to use a small amount of power to control a much larger amount, for example, using a 9V power supply to switch 220V wall power. Relays are rated to work with different voltages and currents. A relay has three contact points: Normally Open, Common Connection, and Normally Closed. Two of these points will be wired up to our fan. In the context of an Arduino project, the relay will also have a pin for ground, 5V power and a data pin that is used to switch the relay on and off. A popular choice for a relay is the Pololu Basic SPDT Relay Carrier. This can be purchased from http://www.pololu.com/category/135/relay-modules. This relay has featured in some other Packt Publishing books on the Arduino, so it is a good investment. Once you have the relay, you need to wire it up to the microcontroller. Connect a wire from the relay to digital pin 5 on the Arduino, another wire to the GRD pin, and the final wire to the 5V pin. This completes the relay setup. In order to control relays though, we need some data to trigger switching them between on and off. Our thermistor device handles the task of collecting this data. Connecting the thermistor A thermistor is an electronic component that, when included in a circuit, can be used to measure temperature. The device is a type of resistor that has the property whereby its resistance varies as the temperature changes. It can be found in a variety of devices, including thermostats and electronic thermometers. There are two categories of thermistors available: Negative Thermistor Coefficient (NTC) and Positive Thermistor Coefficient (PTC). The difference between them is that as the temperature increases, the resistance decreases in the case of an NTC, and on the other hand, it increases in the case of a PTC. We are going to use a prebuilt digital device with the model number AM2303. This can be purchased at https://www.adafruit.com/products/393. This device reads both temperature and humidity. It also comes with a software library that you can use in your Arduino sketches. One of the benefits of this library is that many functions that precompute values, such as temperature in Celsius, are available and thus don't require you to write a lot of code. Take your AM203 and connect it to the GRD pin, 5V pin and digital pin 4. The following diagram shows how it should be set up: You are now ready to move on to creating the software to test for temperature readings. Setting up our software We now need to write an application in the Arduino IDE to control our new thermostat device. Our software will contain the following: The code responsible for collecting the temperature data Methods to switch relays on and off based on this data Code to handle accepting incoming HTTP requests so that we can view our thermostat's current temperature reading and change the setpoint A method to send our temperature readings to the Raspberry Pi The next step is to hook up our Arduino thermostat with the USB port of the device we installed the IDE on. You may need to temporarily disconnect your relay from the Arduino. This will prevent your thermostat device from drawing too much power from your computer's USB port, which may result in the port being disabled. We now need to download the DHT library that interacts with our AM2303. This can be found on GitHub, at https://github.com/adafruit/DHT-sensor-library. Click on the Download ZIP link and unzip the file to a location on your hard drive. Next, we need to install the library to make it accessible from our sketch: Open the Arduino IDE. Navigate to Sketch | Import Library. Next, click on Add library. Choose the folder on your hard drive. You can now use the library. With the library installed, we can include it in our sketch and access a number of useful functions. Let's now start creating our software. Thermostat software We can start adding some code to the Arduino to control our thermostat. Open a new sketch in the Arduino IDE and perform the following steps: Inside the sketch, we are going to start by adding the code to include the libraries we need to use. At the top of the sketch, add the following code: #include "DHT.h" // Include this if using the AM2302 #include <SPI.h> #include <Ethernet.h> Next, we will declare some variables to be used by our application. These will be responsible for defining:     The pin the AM2303 thermistor is located on     The relay pin     The IP address we want our Arduino to use, which should be unique     The Mac address of the Arduino, which should also be unique     The name of the room the thermostat is located in     The variables responsible for Ethernet communication The IP address will depend on your own home network. Check out your wireless router to see what range of IP addresses is available. Select an address that isn't in use and update the IPAddress variable as follows: #define DHTPIN 4 // The digital pin to read from #define DHTTYPE DHT22 // DHT 22 (AM2302)   unsigned char relay = 5; //The relay pins String room = "library"; byte mac[] = { 0xDE, 0xAD, 0xBE, 0xEF, 0xFE, 0xED }; IPAddress ip(192,168,3,5); DHT dht(DHTPIN, DHTTYPE); EthernetServer server(80); EthernetClient client; We can now include the setup() function. This is responsible for initializing some variables with their default values, and setting the pin to which our relay is connected to output mode: void setup() {   Serial.begin(9600);   Ethernet.begin(mac, ip);   server.begin();   dht.begin();   pinMode(relay, OUTPUT); } The next block of code we will add is the loop() function. This contains the main body of our program to be executed. Here, we will assign a value to the setpoint and grab our temperature readings: void loop() {   int setpoint = 25;   float h = dht.readHumidity();   float t = dht.readTemperature(); Following this, we check whether the temperature is above or below the setpoint and switch the relay on or off as needed. Paste this code below the variables you just added: if(t <setpoint) {   digitalWrite(relay,HIGH); } else {   digitalWrite(relay,LOW); } Next, we need to handle the HTTP requests to the thermostat. We start by collecting all of the incoming data. The following code also goes inside the loop() function: client = server.available(); if (client) {   // an http request ends with a blank line   booleancurrentLineIsBlank = true;   String result;   while (client.connected()) {     if (client.available()) {       char c = client.read();       result= result + c;     } With the incoming request stored in the result variable, we can examine the HTTP header to know whether we are requesting an HTML page or a JSON object. You'll learn more about JavaScript Object Notation (JSON) shortly. If we request an HTML page, this is displayed in the browser. Next, add the following code to your sketch: if(result.indexOf("text/html") > -1) {   client.println("HTTP/1.1 200 OK");   client.println("Content-Type: text/html");   client.println();   if (isnan(h) || isnan(t)) {     client.println("Failed to read from DHT sensor!");     return;   }   client.print("<b>Thermostat</b> set to: ");   client.print(setpoint);    client.print("degrees C <br />Humidity: ");   client.print(h);   client.print(" %t");   client.print("<br />Temperature: ");   client.print(t);   client.println(" degrees C ");   break; } The following code handles a request for the data to be returned in JSON format. Our Raspberry Pi will make HTTP requests to the Arduino, and then process the data returned to it. At the bottom of this last block of code is a statement adding a short delay to allow the Arduino to process the request and close the client connection. Paste this final section of code in your sketch: if( result.indexOf("application/json") > -1 ) { client.println("HTTP/1.1 200 OK"); client.println("Content-Type: application/json;charset=utf-8"); client.println("Server: Arduino"); client.println("Connnection: close"); client.println(); client.print("{"thermostat":[{"location":""); client.print(room); client.print(""},"); client.print("{"temperature":""); client.print(t); client.print(""},"); client.print("{"humidity":""); client.print(h); client.print(""},"); client.print("{"setpoint":""); client.print(setpoint); client.print(""}"); client.print("]}"); client.println(); break;           }     } delay(1); client.stop();   }  } This completes our program. We can now save it and run the Verify process. Click on the small check mark in a circle located in the top-left corner of the sketch. If you have added all of the code correctly, you should see Binary sketch size: 16,962 bytes (of a 32,256 byte maximum). Now that our code is verified and saved, we can look at uploading it to the Arduino, attaching the fan, and testing our thermostat. Testing our thermostat and fan We have our hardware set up and the code ready. Now we can test the thermostat and see it in action with a device connected to the mains electricity. We will first attach a fan and then run the sketch to switch it on and off. Attaching the fan Ensure that your Arduino is powered down and that the fan is not plugged into the wall. Using a wire stripper and cutters, cut one side of the cable that connects the plug to the fan body. Take the end of the cable attached to the plug, and attach it to the NO point on the relay. Use a screwdriver to ensure that it is fastened correctly. Now, take the other portion of the cut cable that is attached to the fan body, and attach this to the COM point. Once again, use a screwdriver to ensure that it is fastened securely to the relay. Your connection should look as follows: You can now reattach your Arduino to the computer via its USB cable. However, do not plug the fan into the wall yet. Starting your thermostat application With the fan connected to our relay, we can upload our sketch and test it: From the Arudino IDE, select the upload icon. Once the code has been uploaded, disconnect your Arduino board. Next, connect an Ethernet cable to your Arduino. Following this, plug the Arduino into the wall to get mains power. Finally, connect the fan to the wall outlet. You should hear the clicking sound of the relay as it switches on or off depending on the room temperature. When the relay switch is on (or off), the fan will follow suit. Using a separate laptop if you have it, or from your Raspberry Pi, access the IP address you specified in the application via a web browser, for example, http://192.168.3.5/. You should see something similar to this: Thermostat set to: 25degrees C  Humidity: 35.70 % Temperature: 14.90 degrees C You can now stimulate the thermistor using an ice cube and hair dryer, to switch the relay on and off, and the fan will follow suit. If you refresh your connection to the IP address, you should see the change in the temperature output on the screen. You can use the F5 key to do this. Let's now test the JSON response. Testing the JSON response A format useful in transferring data between applications is JavaScript Object Notation (JSON). You can read more about this on the official JSON website, at http://www.json.org/. The purpose of us generating data in JSON format is to allow the Raspberry Pi control device we are building to query the thermostat periodically and collect the data being generated. We can verify that we are getting JSON data back from the sketch by making an HTTP request using the application/json header. Load a web browser such as Google Chrome or FireFox. We are going to make an XML HTTP request directly from the browser to our thermostat. This type of request is commonly known as an Asynchronous JavaScript and XML (AJAX) request. It can be used to refresh data on a page without having to actually reload it. In your web browser, locate and open the developer tools. The following link lists the location and shortcut keys in major browsers: http://webmasters.stackexchange.com/questions/8525/how-to-open-the-javascript-console-in-different-browsers In the JavaScript console portion of the developer tools, type the following JavaScript code: var xmlhttp; xmlhttp=new XMLHttpRequest(); xmlhttp.open("POST","192.168.3.5",true); xmlhttp.setRequestHeader("Content-type","application/json"); xmlhttp.onreadystatechange = function() {//Call a function when the state changes.    if(xmlhttp.readyState == 4 &&xmlhttp.status == 200) {          console.log(xmlhttp);    } }; xmlhttp.send() Press the return key or run option to execute the code. This will fire an HTTP request, and you should see a JSON object return: "{"thermostat":     [      {"location":"library"},      {"temperature":"14.90"},      {"humidity":"29.90"},      {"setpoint":"25"}   ] }" This confirms that our application can return data to the Raspberry Pi. We have tested our software and hardware and seen that they are working. Summary In this article, we built a thermostat device. We looked at thermistors, and we learned how to set up an Ethernet connection. To control our thermostat, we wrote an Arduino sketch, uploaded it to the microcontroller, and then tested it with a fan plugged into the mains electricity. Resources for Article: Further resources on this subject: The Raspberry Pi and Raspbian? [article] Clusters Parallel Computing and Raspberry Pi Brief Background [article] The Arduino Mobile Robot [article]
Read more
  • 0
  • 0
  • 21062

article-image-speeding-vagrant-development-docker
Packt
03 Mar 2015
13 min read
Save for later

Speeding Vagrant Development With Docker

Packt
03 Mar 2015
13 min read
In this article by Chad Thompson, author of Vagrant Virtual Development Environment Cookbook, we will learn that many software developers are familiar with using Vagrant (http://vagrantup.com) to distribute and maintain development environments. In most cases, Vagrant is used to manage virtual machines running in desktop hypervisor software such as VirtualBox or the VMware Desktop product suites. (VMware Fusion for OS X and VMware Desktop for Linux and Windows environments.) More recently, Docker (http://docker.io) has become increasingly popular for deploying containers—Linux processes that can run in a single operating system environment yet be isolated from one another. In practice, this means that a container includes the runtime environment for an application, down to the operating system level. While containers have been popular for deploying applications, we can also use them for desktop development. Vagrant can use Docker in a couple of ways: As a target for running a process defined by Vagrant with the Vagrant provider. As a complete development environment for building and testing containers within the context of a virtual machine. This allows you to build a complete production-like container deployment environment with the Vagrant provisioner. In this example, we'll take a look at how we can use the Vagrant provider to build and run a web server. Running our web server with Docker will allow us to build and test our web application without the added overhead of booting and provisioning a virtual machine. (For more resources related to this topic, see here.) Introducing the Vagrant Provider The Vagrant Docker provider will build and deploy containers to a Docker runtime. There are a couple of cases to consider when using Vagrant with Docker: On a Linux host machine, Vagrant will use a native (locally installed) Docker environment to deploy containers. Make sure that Docker is installed before using Vagrant. Docker itself is a technology built on top of Linux Containers (LXC) technology—so Docker itself requires an operating system with a recent version (newer than Linux 3.8 which was released in February, 2013) of the Linux kernel. Most recent Linux distributions should support the ability to run Docker. On nonLinux environments (namely OS X and Windows), the provider will require a local Linux runtime to be present for deploying containers. When running the Docker provisioner in these environments, Vagrant will download and boot a version of the boot2docker (http://boot2docker.io) environment—in this case, a repackaging of boot2docker in Vagrant box format. Let's take a look at two scenarios for using the Docker provider. In each of these examples, we'll start these environments from an OS X environment so we will see some tasks that are required for using the boot2docker environment. Installing a Docker image from a repository We'll start with a simple case: installing a Docker container from a repository (a MySQL container) and connecting it to an external tool for development (the MySQL Workbench or a client tool of your choice). We'll need to initialize the boot2docker environment and use some Vagrant tools to interact with the environment and the deployed containers. Before we can start, we'll need to find a suitable Docker image to launch. One of the unique advantages to use Docker as a development environment is its ability to select a base Docker image, then add successive build steps on top of the base image. In this simple example, we can find a base MySQL image on the Docker Hub registry. (https://registry.hub.docker.com).The MySQL project provides an official Docker image that we can build from. We'll note from the repository the command for using the image: docker pull mysql and note that the image name is mysql. Start with a Vagrantfile that defines the docker: # -*- mode: ruby -*- # vi: set ft=ruby :   VAGRANTFILE_API_VERSION = "2" ENV['VAGRANT_DEFAULT_PROVIDER'] = 'vmware_fusion' Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.define"database" do |db|    db.vm.provider"docker"do |d|      d.image="mysql"    end end end An important thing to note immediately is that when we define the database machine and the provider with the Docker provider, we do not specify a box file. The Docker provider will start and launch containers into a boot2docker environment, negating the need for a Vagrant box or virtual machine definition. This will introduce a bit of a complication in interacting with the Vagrant environment in later steps. Also note the mysql image taken from the Docker Hub Registry. We'll need to launch the image with a few basic parameters. Add the following to the Docker provider block:    db.vm.provider "docker" do |d|      d.image="mysql"      d.env = {        :MYSQL_ROOT_PASSWORD => ""root",        :MYSQL_DATABASE     => ""dockertest",        :MYSQL_USER         => ""dockertest",        :MYSQL_PASSWORD     => ""d0cker"      }      d.ports =["3306:3306"]      d.remains_running = "true"    end The environment variables (d.env) are taken from the documentation on the MySQL Docker image page (https://registry.hub.docker.com/_/mysql/). This is how the image expects to set certain parameters. In this case, our parameters will set the database root password (for the root user) and create a database with a new user that has full permissions to that database. The d.ports parameter is an array of port listings that will be forwarded from the container (the default MySQL port of 3306) to the host operating system, in this case also 3306.The contained application will, thus, behave like a natively installed MySQL installation. The port forwarding here is from the container to the operating system that hosts the container (in this case, the container host is our boot2docker image). If we are developing and hosting containers natively with Vagrant on a Linux distribution, the port forwarding will be to localhost, but boot2docker introduces something of a wrinkle in doing Docker development on Windows or OS X. We'll either need to refer to our software installation by the IP of the boot2docker container or configure a second port forwarding configuration that allows a Docker contained application to be available to the host operating system as localhost. The final parameter (d.remains_running = true) is a flag for Vagrant to note that the Vagrant run should mark as failed if the Docker container exits on start. In the case of software that runs as a daemon process (such as the MySQL database), a Docker container that exits immediately is an error condition. Start the container using the vagrant up –provider=docker command. A few things will happen here: If this is the first time you have started the project, you'll see some messages about booting a box named mitchellh/boot2docker. This is a Vagrant-packaged version of the boot2docker project. Once the machine boots, it becomes a host for all Docker containers managed with Vagrant. Keep in mind that boot2doocker is necessary only for nonLinux operating systems that are running Docker through a virtual machine. On a Linux system running Docker natively, you will not see information about boot2docker. After the container is booted (or if it is already running), Vagrant will display notifications about rsyncing a folder (if we are using boot2docker) and launching the image: Docker generates unique identifiers for containers and notes any port mapping information. Let's take a look at some details on the containers that are running in the Docker host. We'll need to find a way to gain access to the Vagrant boot2docker image (and only if we are using boot2docker and not a native Linux environment), which is not quite as straightforward as a vagrant ssh; we'll need to identify the Vagrant container to access. First, identify the Docker Vagrant machine from the global Vagrant status. Vagrant keeps track of running instances that can be accessed from Vagrant itself. In this case, we are only interested in the Vagrant instance named docker-host. The instance we're interested in can be found with the vagrant global-status command: In this case, Vagrant identifies the instance as d381331 (a unique value for every Vagrant machine launched). We can access this instance with a vagrant ssh command: vagrant ssh d381331 This will display an ASCII-art boot2docker logo and a command prompt for the boot2docker instance. Let's take a look at Docker containers running on the system with the docker psps command: The docker ps command will provide information about the running Docker containers on the system; in this case, the unique ID of the container (output during the Vagrant startup) and other information about the container. Find the IP address of the boot2docker (only if we're using boot2docker) to connect to the MySQL instance. In this case, execute the ifconfig command: docker@boot2docker:~$ ifconfig This will output information about the network interfaces on the machine; we are interested in the eth0 entry. In particular, we can note the IP address of the machine on the eth0 interface: Make a note of the IP address noted as the inet addr; in this case 192.168.30.129. Connect a MySQL client to the running Docker container. In this case, we'll need to note some information to the connection: The IP address of the boot2docker virtual machine (if using boot2docker). In this case, we'll note 192.168.30.129. The port that the MySQL instance will respond to on the Docker host. In this case, the Docker container is forwarding port 3306 in the container to port 3306 on the host. Information noted in the Vagrantfile for the username or password on the MySQL instance. With this information in hand, we can configure a MySQL client. The MySQL project provides a supported GUI client named MySQL Workbench (http://www.mysql.com/products/workbench/). With the client installed on our host operating system, we can create a new connection in the Workbench client (consult the documentation for your version of Workbench, or use a MySQL client of your choice). In this case, we're connecting to the boot2docker instance. If you are running Docker natively on a Linux instance, the connection should simply forward to localhost. If the connection is successful, the Workbench client once connected will display an empty database: Once we've connected, we can use the MySQL database as we would for any other MySQL instance that is hosted this time in a Docker container without having to install and configure the MySQL package itself. Building a Docker image with Vagrant While launching packaged Docker, applications can be useful (particularly in the case where launching a Docker container is simpler than native installation steps), Vagrant becomes even more useful when used to launch containers that are being developed. On OS X and Windows machines, the use of Vagrant can make managing the container deployment somewhat simpler through the boot2docker containers, while on Linux, using the native Docker tools could be somewhat simpler. In this example, we'll use a simple Dockerfile to modify a base image. First, start with a simple Vagrantfile. In this case, we'll specify a build directory rather than a image file: # -*- mode: ruby -*- # vi: set ft=ruby :   # Vagrantfile API/syntax version. Don't touch unless you know what you're doing! VAGRANTFILE_API_VERSION = "2" ENV['VAGRANT_DEFAULT_PROVIDER'] = 'vmware_fusion'   Vagrant.configure(VAGRANTFILE_API_VERSION) do |config| config.vm.define "nginx" do |nginx|    nginx.vm.provider "docker" do |d|      d.build_dir = "build"      d.ports = ["49153:80"]    end end end This Vagrantfile specifies a build directory as well as the ports forwarded to the host from the container. In this case, the standard HTTP port (80) forwards to port 49153 on the host machine, which in this case is the boot2docker instance. Create our build directory in the same directory as the Vagrantfile. In the build directory, create a Dockerfile. A Dockerfile is a set of instructions on how to build a Docker container. See https://docs.docker.com/reference/builder/ or James Turnbull's The Docker Book for more information on how to construct a Dockerfile. In this example, we'll use a simple Dockerfile to copy a working HTML directory to a base NGINX image: FROM nginx COPY content /usr/share/nginx/html Create a directory in our build directory named content. In the directory, place a simple index.html file that will be served from the new container: <html> <body>    <div style="text-align:center;padding-top:40px;border:dashed 2px;">      This is an NGINX build.    </div> </body> </html> Once all the pieces are in place, our working directory will have the following structure: . ├── Vagrantfile └── build ├── Dockerfile    └── content        └── index.html Start the container in the working directory with the command: vagrant up nginx --provider=docker This will start the container build and deploy process. Once the container is launched, the web server can be accessed using the IP address of the boot2docker instance (see the previous section for more information on obtaining this address) and the forwarded port. One other item to note, especially, if you have completed both steps in this section without halting or destroying the Vagrant project is that when using the Docker provider, containers are deployed to a single shared virtual machine. If the boot2docker instance is accessed and the docker ps command is executed, it can be noted that two separate Vagrant projects deploy containers to a single host. When using the Docker provider, the single instance has a few effects: The single virtual machine can use fewer resources on your development workstation Deploying and rebuilding containers is a process that is much faster than booting and shutting down entire operating systems Docker development with the Docker provider can be a useful technique to create and test Docker containers, although Vagrant might not be of particular help in packaging and distributing Docker containers. If you wish to publish containers, consult the documentation or The Docker Book on getting started with packaging and distributing Docker containers. See also Docker: http://docker.io boot2docker: http://boot2docker.io The Docker Book: http://www.dockerbook.com The Docker repository: https://registry.hub.docker.com Summary In this article, we learned how to use Docker provisioner with Vagrant by covering the topics mentioned in the preceding paragraphs. Resources for Article: Further resources on this subject: Going Beyond the Basics [article] Module, Facts, Types and Reporting tools in Puppet [article] Setting Up a Development Environment [article]
Read more
  • 0
  • 0
  • 13344
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-scipy-signal-processing
Packt
03 Mar 2015
14 min read
Save for later

SciPy for Signal Processing

Packt
03 Mar 2015
14 min read
In this article by Sergio J. Rojas G. and Erik A Christensen, authors of the book Learning SciPy for Numerical and Scientific Computing - Second Edition, we will focus on the usage of some most commonly used routines that are included in SciPy modules—scipy.signal, scipy.ndimage, and scipy.fftpack, which are used for signal processing, multidimensional image processing, and computing Fourier transforms, respectively. We define a signal as data that measures either a time-varying or spatially varying phenomena. Sound or electrocardiograms are excellent examples of time-varying quantities, while images embody the quintessential spatially varying cases. Moving images are treated with the techniques of both types of signals, obviously. The field of signal processing treats four aspects of this kind of data: its acquisition, quality improvement, compression, and feature extraction. SciPy has many routines to treat effectively tasks in any of the four fields. All these are included in two low-level modules (scipy.signal being the main module, with an emphasis on time-varying data, and scipy.ndimage, for images). Many of the routines in these two modules are based on Discrete Fourier Transform of the data. SciPy has an extensive package of applications and definitions of these background algorithms, scipy.fftpack, which we will start covering first. (For more resources related to this topic, see here.) Discrete Fourier Transforms The Discrete Fourier Transform (DFT from now on) transforms any signal from its time/space domain into a related signal in the frequency domain. This allows us not only to be able to analyze the different frequencies of the data, but also for faster filtering operations, when used properly. It is possible to turn a signal in the frequency domain back to its time/spatial domain; thanks to the Inverse Fourier Transform. We will not go into detail of the mathematics behind these operators, since we assume familiarity at some level with this theory. We will focus on syntax and applications instead. The basic routines in the scipy.fftpack module compute the DFT and its inverse, for discrete signals in any dimension, which are fft and ifft (one dimension), fft2 and ifft2 (two dimensions), and fftn and ifftn (any number of dimensions). All of these routines assume that the data is complex valued. If we know beforehand that a particular dataset is actually real valued, and should offer real-valued frequencies, we use rfft and irfft instead, for a faster algorithm. All these routines are designed so that composition with their inverses always yields the identity. The syntax is the same in all cases, as follows: fft(x[, n, axis, overwrite_x]) The first parameter, x, is always the signal in any array-like form. Note that fft performs one-dimensional transforms. This means in particular, that if x happens to be two-dimensional, for example, fft will output another two-dimensional array where each row is the transform of each row of the original. We can change it to columns instead, with the optional parameter, axis. The rest of parameters are also optional; n indicates the length of the transform, and overwrite_x gets rid of the original data to save memory and resources. We usually play with the integer n when we need to pad the signal with zeros, or truncate it. For higher dimension, n is substituted by shape (a tuple), and axis by axes (another tuple). To better understand the output, it is often useful to shift the zero frequencies to the center of the output arrays with fftshift. The inverse of this operation, ifftshift, is also included in the module. The following code shows some of these routines in action, when applied to a checkerboard image: >>> import numpy >>> from scipy.fftpack import fft,fft2, fftshift >>> import matplotlib.pyplot as plt >>> B=numpy.ones((4,4)); W=numpy.zeros((4,4)) >>> signal = numpy.bmat("B,W;W,B") >>> onedimfft = fft(signal,n=16) >>> twodimfft = fft2(signal,shape=(16,16)) >>> plt.figure() >>> plt.gray() >>> plt.subplot(121,aspect='equal') >>> plt.pcolormesh(onedimfft.real) >>> plt.colorbar(orientation='horizontal') >>> plt.subplot(122,aspect='equal') >>> plt.pcolormesh(fftshift(twodimfft.real)) >>> plt.colorbar(orientation='horizontal') >>> plt.show() Note how the first four rows of the one-dimensional transform are equal (and so are the last four), while the two-dimensional transform (once shifted) presents a peak at the origin, and nice symmetries in the frequency domain. In the following screenshot (obtained from the preceding code), the left-hand side image is fft and the right-hand side image is fft2 of a 2 x 2 checkerboard signal: The scipy.fftpack module also offers the Discrete Cosine Transform with its inverse (dct, idct) as well as many differential and pseudo-differential operators defined in terms of all these transforms: diff (for derivative/integral), hilbert and ihilbert (for the Hilbert transform), tilbert and itilbert (for the h-Tilbert transform of periodic sequences), and so on. Signal construction To aid in the construction of signals with predetermined properties, the scipy.signal module has a nice collection of the most frequent one-dimensional waveforms in the literature: chirp and sweep_poly (for the frequency-swept cosine generator), gausspulse (a Gaussian modulated sinusoid) and sawtooth and square (for the waveforms with those names). They all take as their main parameter a one-dimensional ndarray representing the times at which the signal is to be evaluated. Other parameters control the design of the signal, according to frequency or time constraints. Let's take a look into the following code snippet, which illustrates the use of these one dimensional waveforms that we just discussed: >>> import numpy >>> from scipy.signal import chirp, sawtooth, square, gausspulse >>> import matplotlib.pyplot as plt >>> t=numpy.linspace(-1,1,1000) >>> plt.subplot(221); plt.ylim([-2,2]) >>> plt.plot(t,chirp(t,f0=100,t1=0.5,f1=200))   # plot a chirp >>> plt.subplot(222); plt.ylim([-2,2]) >>> plt.plot(t,gausspulse(t,fc=10,bw=0.5))     # Gauss pulse >>> plt.subplot(223); plt.ylim([-2,2]) >>> t*=3*numpy.pi >>> plt.plot(t,sawtooth(t))                     # sawtooth >>> plt.subplot(224); plt.ylim([-2,2]) >>> plt.plot(t,square(t))                       # Square wave >>> plt.show() Generated by this code, the following diagram shows waveforms for chirp (upper-left), gausspulse (upper-right), sawtooth (lower-left), and square (lower-right): The usual method of creating signals is to import them from the file. This is possible by using purely NumPy routines, for example fromfile: fromfile(file, dtype=float, count=-1, sep='') The file argument may point to either a file or a string, the count argument is used to determine the number of items to read, and sep indicates what constitutes a separator in the original file/string. For images, we have the versatile routine, imread in either the scipy.ndimage or scipy.misc module: imread(fname, flatten=False) The fname argument is a string containing the location of an image. The routine infers the type of file, and reads the data into an array, accordingly. In case the flatten argument is turned to True, the image is converted to gray scale. Note that, in order to work, the Python Imaging Library (PIL) needs to be installed. It is also possible to load .wav files for analysis, with the read and write routines from the wavfile submodule in the scipy.io module. For instance, given any audio file with this format, say audio.wav, the command, rate,data = scipy.io.wavfile.read("audio.wav"), assigns an integer value to the rate variable, indicating the sample rate of the file (in samples per second), and a NumPy ndarray to the data variable, containing the numerical values assigned to the different notes. If we wish to write some one-dimensional ndarray data into an audio file of this kind, with the sample rate given by the rate variable, we may do so by issuing the following command: >>> scipy.io.wavfile.write("filename.wav",rate,data) Filters A filter is an operation on signals that either removes features or extracts some component. SciPy has a very complete set of known filters, as well as the tools to allow construction of new ones. The complete list of filters in SciPy is long, and we encourage the reader to explore the help documents of the scipy.signal and scipy.ndimage modules for the complete picture. We will introduce in these pages, as an exposition, some of the most used filters in the treatment of audio or image processing. We start by creating a signal worth filtering: >>> from numpy import sin, cos, pi, linspace >>> f=lambda t: cos(pi*t) + 0.2*sin(5*pi*t+0.1) + 0.2*sin(30*pi*t)    + 0.1*sin(32*pi*t+0.1) + 0.1*sin(47* pi*t+0.8) >>> t=linspace(0,4,400); signal=f(t) We first test the classical smoothing filter of Wiener and Kolmogorov, wiener. We present in a plot, the original signal (in black) and the corresponding filtered data, with a choice of a Wiener window of the size 55 samples (in blue). Next, we compare the result of applying the median filter, medfilt, with a kernel of the same size as before (in red): >>> from scipy.signal import wiener, medfilt >>> import matplotlib.pylab as plt >>> plt.plot(t,signal,'k') >>> plt.plot(t,wiener(signal,mysize=55),'r',linewidth=3) >>> plt.plot(t,medfilt(signal,kernel_size=55),'b',linewidth=3) >>> plt.show() This gives us the following graph showing the comparison of smoothing filters (wiener is the one that has its starting point just below 0.5 and medfilt has its starting point just above 0.5): Most of the filters in the scipy.signal module can be adapted to work in arrays of any dimension. But in the particular case of images, we prefer to use the implementations in the scipy.ndimage module, since they are coded with these objects in mind. For instance, to perform a median filter on an image for smoothing, we use scipy.ndimage.median_filter. Let's see an example. We will start by loading Lena to the array and corrupting the image with Gaussian noise (zero mean and standard deviation of 16): >>> from scipy.stats import norm     # Gaussian distribution >>> import matplotlib.pyplot as plt >>> import scipy.misc >>> import scipy.ndimage >>> plt.gray() >>> lena=scipy.misc.lena().astype(float) >>> plt.subplot(221); >>> plt.imshow(lena) >>> lena+=norm(loc=0,scale=16).rvs(lena.shape) >>> plt.subplot(222); >>> plt.imshow(lena) >>> denoised_lena = scipy.ndimage.median_filter(lena,3) >>> plt.subplot(224); >>> plt.imshow(denoised_lena) The set of filters for images come in two flavors—statistical and morphological. For example, among the filters of statistical nature, we have the Sobel algorithm oriented to detection of edges (singularities along curves). Its syntax is as follows: sobel(image, axis=-1, output=None, mode='reflect', cval=0.0) The optional parameter, axis, indicates the dimension in which the computations are performed. By default, this is always the last axis (-1). The mode parameter, which is one of the strings 'reflect', 'constant', 'nearest', 'mirror', or 'wrap', indicates how to handle the border of the image, in case there is insufficient data to perform the computations there. In case the mode is 'constant', we may indicate the value to use in the border, with the cval parameter. Let's look into the following code snippet, which illustrates the use of the sobel filter: >>> from scipy.ndimage.filters import sobel >>> import numpy >>> lena=scipy.misc.lena() >>> sblX=sobel(lena,axis=0); sblY=sobel(lena,axis=1) >>> sbl=numpy.hypot(sblX,sblY) >>> plt.subplot(223); >>> plt.imshow(sbl) >>> plt.show() The following screenshot illustrates Lena (upper-left) and noisy Lena (upper-right) with the preceding two filters in action—edge map with sobel (lower-left) and median filter (lower-right): Morphology We also have the possibility of creating and applying filters to images based on mathematical morphology, both to binary and gray-scale images. The four basic morphological operations are opening (binary_opening), closing (binary_closing), dilation (binary_dilation), and erosion (binary_erosion). Note that the syntax for each of these filters is very simple, since we only need two ingredients—the signal to filter and the structuring element to perform the morphological operation. Let's take a look into the general syntax for these morphological operations: binary_operation(signal, structuring_element) We may use combinations of these four basic morphological operations to create more complex filters for removal of holes, hit-or-miss transforms (to find the location of specific patterns in binary images), denoising, edge detection, and many more. The SciPy module also allows for creating some common filters using the preceding syntax. For instance, for the location of the letter e in a text, we could use the following command instead: >>> binary_hit_or_miss(text, letterE) For comparative purposes, let's use this command in the following code snippet: >>> import numpy >>> import scipy.ndimage >>> import matplotlib.pylab as plt >>> from scipy.ndimage.morphology import binary_hit_or_miss >>> text = scipy.ndimage.imread('CHAP_05_input_textImage.png') >>> letterE = text[37:53,275:291] >>> HitorMiss = binary_hit_or_miss(text, structure1=letterE,    origin1=1) >>> eLocation = numpy.where(HitorMiss==True) >>> x=eLocation[1]; y=eLocation[0] >>> plt.imshow(text, cmap=plt.cm.gray, interpolation='nearest') >>> plt.autoscale(False) >>> plt.plot(x,y,'wo',markersize=10) >>> plt.axis('off') >>> plt.show() The output for the preceding lines of code is generated as follows: For gray-scale images, we may use a structuring element (structuring_element) or a footprint. The syntax is, therefore, a little different: grey_operation(signal, [structuring_element, footprint, size, ...]) If we desire to use a completely flat and rectangular structuring element (all ones), then it is enough to indicate the size as a tuple. For instance, to perform gray-scale dilation of a flat element of size (15,15) on our classical image of Lena, we issue the following command: >>> grey_dilation(lena, size=(15,15)) The last kind of morphological operations coded in the scipy.ndimage module perform distance and feature transforms. Distance transforms create a map that assigns to each pixel, the distance to the nearest object. Feature transforms provide with the index of the closest background element instead. These operations are used to decompose images into different labels. We may even choose different metrics such as Euclidean distance, chessboard distance, and taxicab distance. The syntax for the distance transform (distance_transform) using a brute force algorithm is as follows: distance_transform_bf(signal, metric='euclidean', sampling=None, return_distances=True, return_indices=False,                      distances=None, indices=None) We indicate the metric with the strings such as 'euclidean', 'taxicab', or 'chessboard'. If we desire to provide the feature transform instead, we switch return_distances to False and return_indices to True. Similar routines are available with more sophisticated algorithms—distance_transform_cdt (using chamfering for taxicab and chessboard distances). For Euclidean distance, we also have distance_transform_edt. All these use the same syntax. Summary In this article, we explored signal processing (any dimensional) including the treatment of signals in frequency space, by means of their Discrete Fourier Transforms. These correspond to the fftpack, signal, and ndimage modules. Resources for Article: Further resources on this subject: Signal Processing Techniques [article] SciPy for Computational Geometry [article] Move Further with NumPy Modules [article]
Read more
  • 0
  • 0
  • 13934

article-image-postgresql-extensible-rdbms
Packt
03 Mar 2015
18 min read
Save for later

PostgreSQL as an Extensible RDBMS

Packt
03 Mar 2015
18 min read
This article by Usama Dar, the author of the book PostgreSQL Server Programming - Second Edition, explains the process of creating a new operator, overloading it, optimizing it, creating index access methods, and much more. PostgreSQL is an extensible database. I hope you've learned this much by now. It is extensible by virtue of the design that it has. As discussed before, PostgreSQL uses a catalog-driven design. In fact, PostgreSQL is more catalog-driven than most of the traditional relational databases. The key benefit here is that the catalogs can be changed or added to, in order to modify or extend the database functionality. PostgreSQL also supports dynamic loading, that is, a user-written code can be provided as a shared library, and PostgreSQL will load it as required. (For more resources related to this topic, see here.) Extensibility is critical for many businesses, which have needs that are specific to that business or industry. Sometimes, the tools provided by the traditional database systems do not fulfill those needs. People in those businesses know best how to solve their particular problems, but they are not experts in database internals. It is often not possible for them to cook up their own database kernel or modify the core or customize it according to their needs. A truly extensible database will then allow you to do the following: Solve domain-specific problems in a seamless way, like a native solution Build complete features without modifying the core database engine Extend the database without interrupting availability PostgreSQL not only allows you to do all of the preceding things, but also does these, and more with utmost ease. In terms of extensibility, you can do the following things in a PostgreSQL database: Create your own data types Create your own functions Create your own aggregates Create your own operators Create your own index access methods (operator classes) Create your own server programming language Create foreign data wrappers (SQL/MED) and foreign tables What can't be extended? Although PostgreSQL is an extensible platform, there are certain things that you can't do or change without explicitly doing a fork, as follows: You can't change or plug in a new storage engine. If you are coming from the MySQL world, this might annoy you a little. However, PostgreSQL's storage engine is tightly coupled with its executor and the rest of the system, which has its own benefits. You can't plug in your own planner/parser. One can argue for and against the ability to do that, but at the moment, the planner, parser, optimizer, and so on are baked into the system and there is no possibility of replacing them. There has been some talk on this topic, and if you are of the curious kind, you can read some of the discussion at http://bit.ly/1yRMkK7. We will now briefly discuss some more of the extensibility capabilities of PostgreSQL. We will not dive deep into the topics, but we will point you to the appropriate link where more information can be found. Creating a new operator Now, let's take look at how we can add a new operator in PostgreSQL. Adding new operators is not too different from adding new functions. In fact, an operator is syntactically just a different way to use an existing function. For example, the + operator calls a built-in function called numeric_add and passes it the two arguments. When you define a new operator, you must define the data types that the operator expects as arguments and define which function is to be called. Let's take a look at how to define a simple operator. You have to use the CREATE OPERATOR command to create an operator. Let's use that function to create a new Fibonacci operator, ##, which will have an integer on its left-hand side: CREATE OPERATOR ## (PROCEDURE=fib, LEFTARG=integer); Now, you can use this operator in your SQL to calculate a Fibonacci number: testdb=# SELECT 12##;?column?----------144(1 row) Note that we defined that the operator will have an integer on the left-hand side. If you try to put a value on the right-hand side of the operator, you will get an error: postgres=# SELECT ##12;ERROR: operator does not exist: ## integer at character 8HINT: No operator matches the given name and argument type(s). Youmight need to add explicit type casts.STATEMENT: select ##12;ERROR: operator does not exist: ## integerLINE 1: select ##12;^HINT: No operator matches the given name and argument type(s). Youmight need to add explicit type casts. Overloading an operator Operators can be overloaded in the same way as functions. This means, that an operator can have the same name as an existing operator but with a different set of argument types. More than one operator can have the same name, but two operators can't share the same name if they accept the same types and positions of the arguments. As long as there is a function that accepts the same kind and number of arguments that an operator defines, it can be overloaded. Let's override the ## operator we defined in the last example, and also add the ability to provide an integer on the right-hand side of the operator: CREATE OPERATOR ## (PROCEDURE=fib, RIGHTARG=integer); Now, running the same SQL, which resulted in an error last time, should succeed, as shown here: testdb=# SELECT ##12;?column?----------144(1 row) You can drop the operator using the DROP OPERATOR command. You can read more about creating and overloading new operators in the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/sql-createoperator.html and http://www.postgresql.org/docs/current/static/xoper.html. There are several optional clauses in the operator definition that can optimize the execution time of the operators by providing information about operator behavior. For example, you can specify the commutator and the negator of an operator that help the planner use the operators in index scans. You can read more about these optional clauses at http://www.postgresql.org/docs/current/static/xoper-optimization.html. Since this article is just an introduction to the additional extensibility capabilities of PostgreSQL, we will just introduce a couple of optimization options; any serious production quality operator definitions should include these optimization clauses, if applicable. Optimizing operators The optional clauses tell the PostgreSQL server about how the operators behave. These options can result in considerable speedups in the execution of queries that use the operator. However, if you provide these options incorrectly, it can result in a slowdown of the queries. Let's take a look at two optimization clauses called commutator and negator. COMMUTATOR This clause defines the commuter of the operator. An operator A is a commutator of operator B if it fulfils the following condition: x A y = y B x. It is important to provide this information for the operators that will be used in indexes and joins. As an example, the commutator for > is <, and the commutator of = is = itself. This helps the optimizer to flip the operator in order to use an index. For example, consider the following query: SELECT * FROM employee WHERE new_salary > salary; If the index is defined on the salary column, then PostgreSQL can rewrite the preceding query as shown: SELECT * from employee WHERE salary < new_salary This allows PostgreSQL to use a range scan on the index column salary. For a user-defined operator, the optimizer can only do this flip around if the commutator of a user-defined operator is defined: CREATE OPERATOR > (LEFTARG=integer, RIGHTARG=integer, PROCEDURE=comp, COMMUTATOR = <) NEGATOR The negator clause defines the negator of the operator. For example, <> is a negator of =. Consider the following query: SELECT * FROM employee WHERE NOT (dept = 10); Since <> is defined as a negator of =, the optimizer can simplify the preceding query as follows: SELECT * FROM employee WHERE dept <> 10; You can even verify that using the EXPLAIN command: postgres=# EXPLAIN SELECT * FROM employee WHERE NOTdept = 'WATER MGMNT';QUERY PLAN---------------------------------------------------------Foreign Scan on employee (cost=0.00..1.10 rows=1 width=160)Filter: ((dept)::text <> 'WATER MGMNT'::text)Foreign File: /Users/usamadar/testdata.csvForeign File Size: 197(4 rows) Creating index access methods Let's discuss how to index new data types or user-defined types and operators. In PostgreSQL, an index is more of a framework that can be extended or customized for using different strategies. In order to create new index access methods, we have to create an operator class. Let's take a look at a simple example. Let's consider a scenario where you have to store some special data such as an ID or a social security number in the database. The number may contain non-numeric characters, so it is defined as a text type: CREATE TABLE test_ssn (ssn text);INSERT INTO test_ssn VALUES ('222-11-020878');INSERT INTO test_ssn VALUES ('111-11-020978'); Let's assume that the correct order for this data is such that it should be sorted on the last six digits and not the ASCII value of the string. The fact that these numbers need a unique sort order presents a challenge when it comes to indexing the data. This is where PostgreSQL operator classes are useful. An operator allows a user to create a custom indexing strategy. Creating an indexing strategy is about creating your own operators and using them alongside a normal B-tree. Let's start by writing a function that changes the order of digits in the value and also gets rid of the non-numeric characters in the string to be able to compare them better: CREATE OR REPLACE FUNCTION fix_ssn(text)RETURNS text AS $$BEGINRETURN substring($1,8) || replace(substring($1,1,7),'-','');END;$$LANGUAGE 'plpgsql' IMMUTABLE; Let's run the function and verify that it works: testdb=# SELECT fix_ssn(ssn) FROM test_ssn;fix_ssn-------------0208782221102097811111(2 rows) Before an index can be used with a new strategy, we may have to define some more functions depending on the type of index. In our case, we are planning to use a simple B-tree, so we need a comparison function: CREATE OR REPLACE FUNCTION ssn_compareTo(text, text)RETURNS int AS$$BEGINIF fix_ssn($1) < fix_ssn($2)THENRETURN -1;ELSIF fix_ssn($1) > fix_ssn($2)THENRETURN +1;ELSERETURN 0;END IF;END;$$ LANGUAGE 'plpgsql' IMMUTABLE; It's now time to create our operator class: CREATE OPERATOR CLASS ssn_opsFOR TYPE text USING btreeASOPERATOR 1 < ,OPERATOR 2 <= ,OPERATOR 3 = ,OPERATOR 4 >= ,OPERATOR 5 > ,FUNCTION 1 ssn_compareTo(text, text); You can also overload the comparison operators if you need to compare the values in a special way, and use the functions in the compareTo function as well as provide them in the CREATE OPERATOR CLASS command. We will now create our first index using our brand new operator class: CREATE INDEX idx_ssn ON test_ssn (ssn ssn_ops); We can check whether the optimizer is willing to use our special index, as follows: testdb=# SET enable_seqscan=off;testdb=# EXPLAIN SELECT * FROM test_ssn WHERE ssn = '02087822211';QUERY PLAN------------------------------------------------------------------Index Only Scan using idx_ssn on test_ssn (cost=0.13..8.14 rows=1width=32)Index Cond: (ssn = '02087822211'::text)(2 rows) Therefore, we can confirm that the optimizer is able to use our new index. You can read about index access methods in the PostgreSQL documentation at http://www.postgresql.org/docs/current/static/xindex.html. Creating user-defined aggregates User-defined aggregate functions are probably a unique PostgreSQL feature, yet they are quite obscure and perhaps not many people know how to create them. However, once you are able to create this function, you will wonder how you have lived for so long without using this feature. This functionality can be incredibly useful, because it allows you to perform custom aggregates inside the database, instead of querying all the data from the client and doing a custom aggregate in your application code, that is, the number of hits on your website per minute from a specific country. PostgreSQL has a very simple process for defining aggregates. Aggregates can be defined using any functions and in any languages that are installed in the database. Here are the basic steps to building an aggregate function in PostgreSQL: Define a start function that will take in the values of a result set; this function can be defined in any PL language you want. Define an end function that will do something with the final output of the start function. This can be in any PL language you want. Define the aggregate using the CREATE AGGREGATE command, providing the start and end functions you just created. Let's steal an example from the PostgreSQL wiki at http://wiki.postgresql.org/wiki/Aggregate_Median. In this example, we will calculate the statistical median of a set of data. For this purpose, we will define start and end aggregate functions. Let's define the end function first, which takes an array as a parameter and calculates the median. We are assuming here that our start function will pass an array to the following end function: CREATE FUNCTION _final_median(anyarray) RETURNS float8 AS $$WITH q AS(SELECT valFROM unnest($1) valWHERE VAL IS NOT NULLORDER BY 1),cnt AS(SELECT COUNT(*) AS c FROM q)SELECT AVG(val)::float8FROM(SELECT val FROM qLIMIT 2 - MOD((SELECT c FROM cnt), 2)OFFSET GREATEST(CEIL((SELECT c FROM cnt) / 2.0) - 1,0)) q2;$$ LANGUAGE sql IMMUTABLE; Now, we create the aggregate as shown in the following code: CREATE AGGREGATE median(anyelement) (SFUNC=array_append,STYPE=anyarray,FINALFUNC=_final_median,INITCOND='{}'); The array_append start function is already defined in PostgreSQL. This function appends an element to the end of an array. In our example, the start function takes all the column values and creates an intermediate array. This array is passed on to the end function, which calculates the median. Now, let's create a table and some test data to run our function: testdb=# CREATE TABLE median_test(t integer);CREATE TABLEtestdb=# INSERT INTO median_test SELECT generate_series(1,10);INSERT 0 10 The generate_series function is a set returning function that generates a series of values, from start to stop with a step size of one. Now, we are all set to test the function: testdb=# SELECT median(t) FROM median_test;median--------5.5(1 row) The mechanics of the preceding example are quite easy to understand. When you run the aggregate, the start function is used to append all the table data from column t into an array using the append_array PostgreSQL built-in. This array is passed on to the final function, _final_median, which calculates the median of the array and returns the result in the same data type as the input parameter. This process is done transparently to the user of the function who simply has a convenient aggregate function available to them. You can read more about the user-defined aggregates in the PostgreSQL documentation in much more detail at http://www.postgresql.org/docs/current/static/xaggr.html. Using foreign data wrappers PostgreSQL foreign data wrappers (FDW) are an implementation of SQL Management of External Data (SQL/MED), which is a standard added to SQL in 2013. FDWs are drivers that allow PostgreSQL database users to read and write data to other external data sources, such as other relational databases, NoSQL data sources, files, JSON, LDAP, and even Twitter. You can query the foreign data sources using SQL and create joins across different systems or even across different data sources. There are several different types of data wrappers developed by different developers and not all of them are production quality. You can see a select list of wrappers on the PostgreSQL wiki at http://wiki.postgresql.org/wiki/Foreign_data_wrappers. Another list of FDWs can be found on PGXN at http://pgxn.org/tag/fdw/. Let's take look at a small example of using file_fdw to access data in a CSV file. First, you need to install the file_fdw extension. If you compiled PostgreSQL from the source, you will need to install the file_fdw contrib module that is distributed with the source. You can do this by going into the contrib/file_fdw folder and running make and make install. If you used an installer or a package for your platform, this module might have been installed automatically. Once the file_fdw module is installed, you will need to create the extension in the database: postgres=# CREATE EXTENSION file_fdw;CREATE EXTENSION Let's now create a sample CSV file that uses the pipe, |, as a separator and contains some employee data: $ cat testdata.csvAARON, ELVIA J|WATER RATE TAKER|WATER MGMNT|81000.00|73862.00AARON, JEFFERY M|POLICE OFFICER|POLICE|74628.00|74628.00AARON, KIMBERLEI R|CHIEF CONTRACT EXPEDITER|FLEETMANAGEMNT|77280.00|70174.00 Now, we should create a foreign server that is pretty much a formality because the file is on the same server. A foreign server normally contains the connection information that a foreign data wrapper uses to access an external data resource. The server needs to be unique within the database: CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw; The next step, is to create a foreign table that encapsulates our CSV file: CREATE FOREIGN TABLE employee (emp_name VARCHAR,job_title VARCHAR,dept VARCHAR,salary NUMERIC,sal_after_tax NUMERIC) SERVER file_serverOPTIONS (format 'csv',header 'false' , filename '/home/pgbook/14/testdata.csv', delimiter '|', null '');''); The CREATE FOREIGN TABLE command creates a foreign table and the specifications of the file are provided in the OPTIONS section of the preceding code. You can provide the format, and if the first line of the file is a header (header 'false'), in our case there is no file header. We then provide the name and path of the file and the delimiter used in the file, which in our case is the pipe symbol |. In this example, we also specify that the null values should be represented as an empty string. Let's run a SQL command on our foreign table: postgres=# select * from employee;-[ RECORD 1 ]-+-------------------------emp_name | AARON, ELVIA Jjob_title | WATER RATE TAKERdept | WATER MGMNTsalary | 81000.00sal_after_tax | 73862.00-[ RECORD 2 ]-+-------------------------emp_name | AARON, JEFFERY Mjob_title | POLICE OFFICERdept | POLICEsalary | 74628.00sal_after_tax | 74628.00-[ RECORD 3 ]-+-------------------------emp_name | AARON, KIMBERLEI Rjob_title | CHIEF CONTRACT EXPEDITERdept | FLEET MANAGEMNTsalary | 77280.00sal_after_tax | 70174.00 Great, looks like our data is successfully loaded from the file. You can also use the d meta command to see the structure of the employee table: postgres=# d employee;Foreign table "public.employee"Column | Type | Modifiers | FDW Options---------------+-------------------+-----------+-------------emp_name | character varying | |job_title | character varying | |dept | character varying | |salary | numeric | |sal_after_tax | numeric | |Server: file_serverFDW Options: (format 'csv', header 'false',filename '/home/pg_book/14/testdata.csv', delimiter '|',"null" '') You can run explain on the query to understand what is going on when you run a query on the foreign table: postgres=# EXPLAIN SELECT * FROM employee WHERE salary > 5000;QUERY PLAN---------------------------------------------------------Foreign Scan on employee (cost=0.00..1.10 rows=1 width=160)Filter: (salary > 5000::numeric)Foreign File: /home/pgbook/14/testdata.csvForeign File Size: 197(4 rows) The ALTER FOREIGN TABLE command can be used to modify the options. More information about the file_fdw is available at http://www.postgresql.org/docs/current/static/file-fdw.html. You can take a look at the CREATE SERVER and CREATE FOREIGN TABLE commands in the PostgreSQL documentation for more information on the many options available. Each of the foreign data wrappers comes with its own documentation about how to use the wrapper. Make sure that an extension is stable enough before it is used in production. The PostgreSQL core development group does not support most of the FDW extensions. If you want to create your own data wrappers, you can find the documentation at http://www.postgresql.org/docs/current/static/fdwhandler.html as an excellent starting point. The best way to learn, however, is to read the code of other available extensions. Summary This includes the ability to add new operators, new index access methods, and create your own aggregates. You can access foreign data sources, such as other databases, files, and web services using PostgreSQL foreign data wrappers. These wrappers are provided as extensions and should be used with caution, as most of them are not officially supported. Even though PostgreSQL is very extensible, you can't plug in a new storage engine or change the parser/planner and executor interfaces. These components are very tightly coupled with each other and are, therefore, highly optimized and mature. Resources for Article: Further resources on this subject: Load balancing MSSQL [Article] Advanced SOQL Statements [Article] Running a PostgreSQL Database Server [Article]
Read more
  • 0
  • 0
  • 9211

article-image-mapreduce-functions
Packt
03 Mar 2015
11 min read
Save for later

MapReduce functions

Packt
03 Mar 2015
11 min read
 In this article, by John Zablocki, author of the book, Couchbase Essentials, you will be acquainted to MapReduce and how you'll use it to create secondary indexes for our documents. At its simplest, MapReduce is a programming pattern used to process large amounts of data that is typically distributed across several nodes in parallel. In the NoSQL world, MapReduce implementations may be found on many platforms from MongoDB to Hadoop, and of course, Couchbase. Even if you're new to the NoSQL landscape, it's quite possible that you've already worked with a form of MapReduce. The inspiration for MapReduce in distributed NoSQL systems was drawn from the functional programming concepts of map and reduce. While purely functional programming languages haven't quite reached mainstream status, languages such as Python, C#, and JavaScript all support map and reduce operations. (For more resources related to this topic, see here.) Map functions Consider the following Python snippet: numbers = [1, 2, 3, 4, 5] doubled = map(lambda n: n * 2, numbers) #doubled == [2, 4, 6, 8, 10] These two lines of code demonstrate a very simple use of a map() function. In the first line, the numbers variable is created as a list of integers. The second line applies a function to the list to create a new mapped list. In this case, the map() function is supplied as a Python lambda, which is just an inline, unnamed function. The body of lambda multiplies each number by two. This map() function can be made slightly more complex by doubling only odd numbers, as shown in this code: numbers = [1, 2, 3, 4, 5] defdouble_odd(num):   if num % 2 == 0:     return num   else:     return num * 2   doubled = map(double_odd, numbers) #doubled == [2, 2, 6, 4, 10] Map functions are implemented differently in each language or platform that supports them, but all follow the same pattern. An iterable collection of objects is passed to a map function. Each item of the collection is then iterated over with the map function being applied to that iteration. The final result is a new collection where each of the original items is transformed by the map. Reduce functions Like maps, the reduce functions also work by applying a provided function to an iterable data structure. The key difference between the two is that the reduce function works to produce a single value from the input iterable. Using Python's built-in reduce() function, we can see how to produce a sum of integers, as follows: numbers = [1, 2, 3, 4, 5] sum = reduce(lambda x, y: x + y, numbers) #sum == 15 You probably noticed that unlike our map operation, the reduce lambda has two parameters (x and y in this case). The argument passed to x will be the accumulated value of all applications of the function so far, and y will receive the next value to be added to the accumulation. Parenthetically, the order of operations can be seen as ((((1 + 2) + 3) + 4) + 5). Alternatively, the steps are shown in the following list: x = 1, y = 2 x = 3, y = 3 x = 6, y = 4 x = 10, y = 5 x = 15 As this list demonstrates, the value of x is the cumulative sum of previous x and y values. As such, reduce functions are sometimes termed accumulate or fold functions. Regardless of their name, reduce functions serve the common purpose of combining pieces of a recursive data structure to produce a single value. Couchbase MapReduce Creating an index (or view) in Couchbase requires creating a map function written in JavaScript. When the view is created for the first time, the map function is applied to each document in the bucket containing the view. When you update a view, only new or modified documents are indexed. This behavior is known as incremental MapReduce. You can think of a basic map function in Couchbase as being similar to a SQL CREATE INDEX statement. Effectively, you are defining a column or a set of columns, to be indexed by the server. Of course, these are not columns, but rather properties of the documents to be indexed. Basic mapping To illustrate the process of creating a view, first imagine that we have a set of JSON documents as shown here: var books=[     { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow"     },     { "id": 2, "title": "The Godfather", "author": "Mario Puzzo"     },     { "id": 3, "title": "Wiseguy", "author": "Nicholas Pileggi"     } ]; Each document contains title and author properties. In Couchbase, to query these documents by either title or author, we'd first need to write a map function. Without considering how map functions are written in Couchbase, we're able to understand the process with vanilla JavaScript: books.map(function(book) {   return book.author; }); In the preceding snippet, we're making use of the built-in JavaScript array's map() function. Similar to the Python snippets we saw earlier, JavaScript's map() function takes a function as a parameter and returns a new array with mapped objects. In this case, we'll have an array with each book's author, as follows: ["Robert Ludlow", "Mario Puzzo", "Nicholas Pileggi"] At this point, we have a mapped collection that will be the basis for our author index. However, we haven't provided a means for the index to be able to refer back to its original document. If we were using a relational database, we'd have effectively created an index on the Title column with no way to get back to the row that contained it. With a slight modification to our map function, we are able to provide the key (the id property) of the document as well in our index: books.map(function(book) {   return [book.author, book.id]; }); In this slightly modified version, we're including the ID with the output of each author. In this way, the index has its document's key stored with its title. [["The Bourne Identity", 1], ["The Godfather", 2], ["Wiseguy", 3]] We'll soon see how this structure more closely resembles the values stored in a Couchbase index. Basic reducing Not every Couchbase index requires a reduce component. In fact, we'll see that Couchbase already comes with built-in reduce functions that will provide you with most of the reduce behavior you need. However, before relying on only those functions, it's important to understand why you'd use a reduce function in the first place. Returning to the preceding example of the map, let's imagine we have a few more documents in our set, as follows: var books=[     { "id": 1, "title": "The Bourne Identity", "author": "Robert Ludlow"     },     { "id": 2, "title": "The Bourne Ultimatum", "author": "Robert Ludlow"     },     { "id": 3, "title": "The Godfather", "author": "Mario Puzzo"     },     { "id": 4, "title": "The Bourne Supremacy", "author": "Robert Ludlow"     },     { "id": 5, "title": "The Family", "author": "Mario Puzzo"     },  { "id": 6, "title": "Wiseguy", "author": "Nicholas Pileggi"     } ]; We'll still create our index using the same map function because it provides a way of accessing a book by its author. Now imagine that we want to know how many books an author has written, or (assuming we had more data) the average number of pages written by an author. These questions are not possible to answer with a map function alone. Each application of the map function knows nothing about the previous application. In other words, there is no way for you to compare or accumulate information about one author's book to another book by the same author. Fortunately, there is a solution to this problem. As you've probably guessed, it's the use of a reduce function. As a somewhat contrived example, consider this JavaScript: mapped = books.map(function (book) {     return ([book.id, book.author]); });   counts = {} reduced = mapped.reduce(function(prev, cur, idx, arr) { var key = cur[1];     if (! counts[key]) counts[key] = 0;     ++counts[key] }, null); This code doesn't quite accurately reflect the way you would count books with Couchbase but it illustrates the basic idea. You look for each occurrence of a key (author) and increment a counter when it is found. With Couchbase MapReduce, the mapped structure is supplied to the reduce() function in a better format. You won't need to keep track of items in a dictionary. Couchbase views At this point, you should have a general sense of what MapReduce is, where it came from, and how it will affect the creation of a Couchbase Server view. So without further ado, let's see how to write our first Couchbase view. In fact, there were two to choose from. The bucket we'll use is beer-sample. If you didn't install it, don't worry. You can add it by opening the Couchbase Console and navigating to the Settings tab. Here, you'll find the option to install the bucket, as shown next: First, you need to understand the document structures with which you're working. The following JSON object is a beer document (abbreviated for brevity): {  "name": "Sundog",  "type": "beer",  "brewery_id": "new_holland_brewing_company",  "description": "Sundog is an amber ale...",  "style": "American-Style Amber/Red Ale",  "category": "North American Ale" } As you can see, the beer documents have several properties. We're going to create an index to let us query these documents by name. In SQL, the query would look like this: SELECT Id FROM Beers WHERE Name = ? You might be wondering why the SQL example includes only the Id column in its projection. For now, just know that to query a document using a view with Couchbase, the property by which you're querying must be included in an index. To create that index, we'll write a map function. The simplest example of a map function to query beer documents by name is as follows: function(doc) {   emit(doc.name); } This body of the map function has only one line. It calls the built-in Couchbase emit() function. This function is used to signal that a value should be indexed. The output of this map function will be an array of names. The beer-sample bucket includes brewery data as well. These documents look like the following code (abbreviated for brevity): {   "name": "Thomas Hooker Brewing",   "city": "Bloomfield",   "state": "Connecticut",   "website": "http://www.hookerbeer.com/",   "type": "brewery" } If we reexamine our map function, we'll see an obvious problem; both the brewery and beer documents have a name property. When this map function is applied to the documents in the bucket, it will create an index with documents from either the brewery or beer documents. The problem is that Couchbase documents exist in a single container—the bucket. There is no namespace for a set of related documents. The solution has typically involved including a type or docType property on each document. The value of this property is used to distinguish one document from another. In the case of the beer-sample database, beer documents have type = "beer" and brewery documents have type = "brewery". Therefore, we are easily able to modify our map function to create an index only on beer documents: function(doc) {   if (doc.type == "beer") {     emit(doc.name);   } } The emit() function actually takes two arguments. The first, as we've seen, emits a value to be indexed. The second argument is an optional value and is used by the reduce function. Imagine that we want to count the number of beer types in a particular category. In SQL, we would write the following query: SELECT Category, COUNT(*) FROM Beers GROUP BY Category To achieve the same functionality with Couchbase Server, we'll need to use both map and reduce functions. First, let's write the map. It will create an index on the category property: function(doc) {   if (doc.type == "beer") {     emit(doc.category, 1);   } } The only real difference between our category index and our name index is that we're including an argument for the value parameter of the emit() function. What we'll do with that value is simply count them. This counting will be done in our reduce function: function(keys, values) {   return values.length; } In this example, the values parameter will be given to the reduce function as a list of all values associated with a particular key. In our case, for each beer category, there will be a list of ones (that is, [1, 1, 1, 1, 1, 1]). Couchbase also provides a built-in _count function. It can be used in place of the entire reduce function in the preceding example. Now that we've seen the basic requirements when creating an actual Couchbase view, it's time to add a view to our bucket. The easiest way to do so is to use the Couchbase Console. Summary In this article, you learned the purpose of secondary indexes in a key/value store. We dug deep into MapReduce, both in terms of its history in functional languages and as a tool for NoSQL and big data systems. Resources for Article: Further resources on this subject: Map Reduce? [article] Introduction to Mapreduce [article] Working with Apps Splunk [article]
Read more
  • 0
  • 0
  • 4795

article-image-performance-considerations
Packt
03 Mar 2015
13 min read
Save for later

Performance Considerations

Packt
03 Mar 2015
13 min read
In this article by Dayong Du, the author of Apache Hive Essentials, we will look at the different performance considerations when using Hive. Although Hive is built to deal with big data, we still cannot ignore the importance of performance. Most of the time, a better Hive query can rely on the smart query optimizer to find the best execution strategy as well as the default setting best practice from vendor packages. However, as experienced users, we should learn more about the theory and practice of performance tuning in Hive, especially when working in a performance-based project or environment. We will start from utilities available in Hive to find potential issues causing poor performance. Then, we introduce the best practices of performance considerations in the areas of queries and job. (For more resources related to this topic, see here.) Performance utilities Hive provides the EXPLAIN and ANALYZE statements that can be used as utilities to check and identify the performance of queries. The EXPLAIN statement Hive provides an EXPLAIN command to return a query execution plan without running the query. We can use an EXPLAIN command for queries if we have a doubt or a concern about performance. The EXPLAIN command will help to see the difference between two or more queries for the same purpose. The syntax for EXPLAIN is as follows: EXPLAIN [EXTENDED|DEPENDENCY|AUTHORIZATION] hive_query The following keywords can be used: EXTENDED: This provides additional information for the operators in the plan, such as file pathname and abstract syntax tree. DEPENDENCY: This provides a JSON format output that contains a list of tables and partitions that the query depends on. It is available since HIVE 0.10.0. AUTHORIZATION: This lists all entities needed to be authorized including input and output to run the Hive query and authorization failures, if any. It is available since HIVE 0.14.0. A typical query plan contains the following three sections. We will also have a look at an example later: Abstract syntax tree (AST): Hive uses a pacer generator called ANTLR (see http://www.antlr.org/) to automatically generate a tree of syntax for HQL. We can usually ignore this most of the time. Stage dependencies: This lists all dependencies and number of stages used to run the query. Stage plans: It contains important information, such as operators and sort orders, for running the job. The following is what a typical query plan looks like. From the following example, we can see that the AST section is not shown since the EXTENDED keyword is not used with EXPLAIN. In the STAGE DEPENDENCIES section, both Stage-0 and Stage-1 are independent root stages. In the STAGE PLANS section, Stage-1 has one map and reduce referred to by Map Operator Tree and Reduce Operator Tree. Inside each Map/Reduce Operator Tree section, all operators corresponding to Hive query keywords as well as expressions and aggregations are listed. The Stage-0 stage does not have map and reduce. It is just a Fetch operation. jdbc:hive2://> EXPLAIN SELECT sex_age.sex, count(*). . . . . . .> FROM employee_partitioned. . . . . . .> WHERE year=2014 GROUP BY sex_age.sex LIMIT 2;+-----------------------------------------------------------------------------+| Explain |+-----------------------------------------------------------------------------+| STAGE DEPENDENCIES: || Stage-1 is a root stage || Stage-0 is a root stage || || STAGE PLANS: || Stage: Stage-1 || Map Reduce || Map Operator Tree: || TableScan || alias: employee_partitioned || Statistics: Num rows: 0 Data size: 227 Basic stats:PARTIAL || Column stats: NONE || Select Operator || expressions: sex_age (type: struct<sex:string,age:int>) || outputColumnNames: sex_age || Statistics: Num rows: 0 Data size: 227 Basic stats:PARTIAL || Column stats: NONE || Group By Operator || aggregations: count() || keys: sex_age.sex (type: string) || mode: hash || outputColumnNames: _col0, _col1 || Statistics: Num rows: 0 Data size: 227 Basic stats:PARTIAL || Column stats: NONE || Reduce Output Operator || key expressions: _col0 (type: string) || sort order: + || Map-reduce partition columns: _col0 (type: string) || Statistics: Num rows: 0 Data size: 227 Basic stats:PARTIAL|| Column stats: NONE || value expressions: _col1 (type: bigint) || Reduce Operator Tree: || Group By Operator || aggregations: count(VALUE._col0) || keys: KEY._col0 (type: string) || mode: mergepartial || outputColumnNames: _col0, _col1 || Statistics: Num rows: 0 Data size: 0 Basic stats: NONE || Column stats: NONE || Select Operator || expressions: _col0 (type: string), _col1 (type: bigint) || outputColumnNames: _col0, _col1 || Statistics: Num rows: 0 Data size: 0 Basic stats: NONE || Column stats: NONE || Limit || Number of rows: 2 || Statistics: Num rows: 0 Data size: 0 Basic stats: NONE || Column stats: NONE || File Output Operator || compressed: false || Statistics: Num rows: 0 Data size: 0 Basic stats: NONE || Column stats: NONE || table: || input format: org.apache.hadoop.mapred.TextInputFormat || output format:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat|| serde:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe|| || Stage: Stage-0 || Fetch Operator || limit: 2 |+-----------------------------------------------------------------------------+53 rows selected (0.26 seconds) The ANALYZE statement Hive statistics are a collection of data that describe more details, such as the number of rows, number of files, and raw data size, on the objects in the Hive database. Statistics is a metadata of Hive data. Hive supports statistics at the table, partition, and column level. These statistics serve as an input to the Hive Cost-Based Optimizer (CBO), which is an optimizer to pick the query plan with the lowest cost in terms of system resources required to complete the query. The statistics are gathered through the ANALYZE statement since Hive 0.10.0 on tables, partitions, and columns as given in the following examples: jdbc:hive2://> ANALYZE TABLE employee COMPUTE STATISTICS;No rows affected (27.979 seconds)jdbc:hive2://> ANALYZE TABLE employee_partitioned. . . . . . .> PARTITION(year=2014, month=12) COMPUTE STATISTICS;No rows affected (45.054 seconds)jdbc:hive2://> ANALYZE TABLE employee_id COMPUTE STATISTICS. . . . . . .> FOR COLUMNS employee_id;No rows affected (41.074 seconds) Once the statistics are built, we can check the statistics by the DESCRIBE EXTENDED/FORMATTED statement. From the table/partition output, we can find the statistics information inside the parameters, such as parameters:{numFiles=1, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1417726247, numRows=4, totalSize=227, rawDataSize=223}). The following is an example: jdbc:hive2://> DESCRIBE EXTENDED employee_partitioned. . . . . . .> PARTITION(year=2014, month=12);jdbc:hive2://> DESCRIBE EXTENDED employee;…parameters:{numFiles=1, COLUMN_STATS_ACCURATE=true, transient_lastDdlTime=1417726247, numRows=4, totalSize=227, rawDataSize=223}).jdbc:hive2://> DESCRIBE FORMATTED employee.name;+--------+---------+---+---+---------+--------------+-----------+-----------+|col_name|data_type|min|max|num_nulls|distinct_count|avg_col_len|max_col_len|+--------+---------+---+---+---------+--------------+-----------+-----------+| name | string | | | 0 | 5 | 5.6 | 7 |+--------+---------+---+---+---------+--------------+-----------+-----------++---------+----------+-----------------+|num_trues|num_falses| comment |+---------+----------+-----------------+| | |from deserializer|+---------+----------+-----------------+3 rows selected (0.116 seconds) Hive statistics are persisted in the metastore to avoid computing them every time. For newly created tables and/or partitions, statistics are automatically computed by default if we enable the following setting: jdbc:hive2://> SET hive.stats.autogather=ture; Hive logs Logs provide useful information to find out how a Hive query/job runs. By checking the Hive logs, we can identify runtime problems and issues that may cause bad performance. There are two types of logs available in Hive: system log and job log. The system log contains the Hive running status and issues. It is configured in {HIVE_HOME}/conf/hive-log4j.properties. The following three lines for Hive log can be found: hive.root.logger=WARN,DRFAhive.log.dir=/tmp/${user.name}hive.log.file=hive.log To modify the status, we can either modify the preceding lines in hive-log4j.properties (applies to all users) or set from the Hive CLI (only applies to the current user and current session) as follows: hive --hiveconf hive.root.logger=DEBUG,console The job log contains Hive query information and is saved at the same place, /tmp/${user.name}, by default as one file for each Hive user session. We can override it in hive-site.xml with the hive.querylog.location property. If a Hive query generates MapReduce jobs, those logs can also be viewed through the Hadoop JobTracker Web UI. Job and query optimization Job and query optimization covers experience and skills to improve performance in the area of job-running mode, JVM reuse, job parallel running, and query optimizations in JOIN. Local mode Hadoop can run in standalone, pseudo-distributed, and fully distributed mode. Most of the time, we need to configure Hadoop to run in fully distributed mode. When the data to process is small, it is an overhead to start distributed data processing since the launching time of the fully distributed mode takes more time than the job processing time. Since Hive 0.7.0, Hive supports automatic conversion of a job to run in local mode with the following settings: jdbc:hive2://> SET hive.exec.mode.local.auto=true; --default falsejdbc:hive2://> SET hive.exec.mode.local.auto.inputbytes.max=50000000;jdbc:hive2://> SET hive.exec.mode.local.auto.input.files.max=5;--default 4 A job must satisfy the following conditions to run in the local mode: The total input size of the job is lower than hive.exec.mode.local.auto.inputbytes.max The total number of map tasks is less than hive.exec.mode.local.auto.input.files.max The total number of reduce tasks required is 1 or 0 JVM reuse By default, Hadoop launches a new JVM for each map or reduce job and runs the map or reduce task in parallel. When the map or reduce job is a lightweight job running only for a few seconds, the JVM startup process could be a significant overhead. The MapReduce framework (version 1 only, not Yarn) has an option to reuse JVM by sharing the JVM to run mapper/reducer serially instead of parallel. JVM reuse applies to map or reduce tasks in the same job. Tasks from different jobs will always run in a separate JVM. To enable the reuse, we can set the maximum number of tasks for a single job for JVM reuse using the mapred.job.reuse.jvm.num.tasks property. Its default value is 1: jdbc:hive2://> SET mapred.job.reuse.jvm.num.tasks=5; We can also set the value to –1 to indicate that all the tasks for a job will run in the same JVM. Parallel execution Hive queries commonly are translated into a number of stages that are executed by the default sequence. These stages are not always dependent on each other. Instead, they can run in parallel to save the overall job running time. We can enable this feature with the following settings: jdbc:hive2://> SET hive.exec.parallel=true; -- default falsejdbc:hive2://> SET hive.exec.parallel.thread.number=16;-- default 8, it defines the max number for running in parallel Parallel execution will increase the cluster utilization. If the utilization of a cluster is already very high, parallel execution will not help much in terms of overall performance. Join optimization Here, we'll briefly review the key settings for join improvement. Common join The common join is also called reduce side join. It is a basic join in Hive and works for most of the time. For common joins, we need to make sure the big table is on the right-most side or specified by hit, as follows: /*+ STREAMTABLE(stream_table_name) */. Map join Map join is used when one of the join tables is small enough to fit in the memory, so it is very fast but limited. Since Hive 0.7.0, Hive can convert map join automatically with the following settings: jdbc:hive2://> SET hive.auto.convert.join=true; --default falsejdbc:hive2://> SET hive.mapjoin.smalltable.filesize=600000000;--default 25Mjdbc:hive2://> SET hive.auto.convert.join.noconditionaltask=true;--default false. Set to true so that map join hint is not needed jdbc:hive2://> SET hive.auto.convert.join.noconditionaltask.size=10000000;--The default value controls the size of table to fit in memory Once autoconvert is enabled, Hive will automatically check if the smaller table file size is bigger than the value specified by hive.mapjoin.smalltable.filesize, and then Hive will convert the join to a common join. If the file size is smaller than this threshold, it will try to convert the common join into a map join. Once autoconvert join is enabled, there is no need to provide the map join hints in the query. Bucket map join Bucket map join is a special type of map join applied on the bucket tables. To enable bucket map join, we need to enable the following settings: jdbc:hive2://> SET hive.auto.convert.join=true; --default falsejdbc:hive2://> SET hive.optimize.bucketmapjoin=true; --default false In bucket map join, all the join tables must be bucket tables and join on buckets columns. In addition, the buckets number in bigger tables must be a multiple of the bucket number in the small tables. Sort merge bucket (SMB) join SMB is the join performed on the bucket tables that have the same sorted, bucket, and join condition columns. It reads data from both bucket tables and performs common joins (map and reduce triggered) on the bucket tables. We need to enable the following properties to use SMB: jdbc:hive2://> SET hive.input.format=. . . . . . .> org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;jdbc:hive2://> SET hive.auto.convert.sortmerge.join=true;jdbc:hive2://> SET hive.optimize.bucketmapjoin=true;jdbc:hive2://> SET hive.optimize.bucketmapjoin.sortedmerge=true;jdbc:hive2://> SET hive.auto.convert.sortmerge.join.noconditionaltask=true; Sort merge bucket map (SMBM) join SMBM join is a special bucket join but triggers map-side join only. It can avoid caching all rows in the memory like map join does. To perform SMBM joins, the join tables must have the same bucket, sort, and join condition columns. To enable such joins, we need to enable the following settings: jdbc:hive2://> SET hive.auto.convert.join=true;jdbc:hive2://> SET hive.auto.convert.sortmerge.join=truejdbc:hive2://> SET hive.optimize.bucketmapjoin=true;jdbc:hive2://> SET hive.optimize.bucketmapjoin.sortedmerge=true;jdbc:hive2://> SET hive.auto.convert.sortmerge.join.noconditionaltask=true;jdbc:hive2://> SET hive.auto.convert.sortmerge.join.bigtable.selection.policy=org.apache.hadoop.hive.ql.optimizer.TableSizeBasedBigTableSelectorForAutoSMJ; Skew join When working with data that has a highly uneven distribution, the data skew could happen in such a way that a small number of compute nodes must handle the bulk of the computation. The following setting informs Hive to optimize properly if data skew happens: jdbc:hive2://> SET hive.optimize.skewjoin=true;--If there is data skew in join, set it to true. Default is false. jdbc:hive2://> SET hive.skewjoin.key=100000;--This is the default value. If the number of key is bigger than--this, the new keys will send to the other unused reducers. Skew data could happen on the GROUP BY data too. To optimize it, we need to do the following settings to enable skew data optimization in the GROUP BY result: SET hive.groupby.skewindata=true; Once configured, Hive will first trigger an additional MapReduce job whose map output will randomly distribute to the reducer to avoid data skew. For more information about Hive join optimization, please refer to the Apache Hive wiki available at https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization and https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization. Summary In this article, we first covered how to identify performance bottlenecks using the EXPLAIN and ANALYZE statements. Then, we discussed job and query optimization in Hive. Resources for Article: Further resources on this subject: Apache Maven and m2eclipse [Article] Apache Karaf – Provisioning and Clusters [Article] Introduction to Apache ZooKeeper [Article]
Read more
  • 0
  • 0
  • 2339
article-image-starting-small-and-growing-modular-way
Packt
02 Mar 2015
27 min read
Save for later

Starting Small and Growing in a Modular Way

Packt
02 Mar 2015
27 min read
This article written by Carlo Russo, author of the book KnockoutJS Blueprints, describes that RequireJS gives us a simplified format to require many parameters and to avoid parameter mismatch using the CommonJS require format; for example, another way (use this or the other one) to write the previous code is: define(function(require) {   var $ = require("jquery"),       ko = require("knockout"),       viewModel = {};   $(function() {       ko.applyBindings(viewModel);   });}); (For more resources related to this topic, see here.) In this way, we skip the dependencies definition, and RequireJS will add all the texts require('xxx') found in the function to the dependency list. The second way is better because it is cleaner and you cannot mismatch dependency names with named function arguments. For example, imagine you have a long list of dependencies; you add one or remove one, and you miss removing the relative function parameter. You now have a hard-to-find bug. And, in case you think that r.js optimizer behaves differently, I just want to assure you that it's not so; you can use both ways without any concern regarding optimization. Just to remind you, you cannot use this form if you want to load scripts dynamically or by depending on variable value; for example, this code will not work: var mod = require(someCondition ? "a" : "b");if (someCondition) {   var a = require('a');} else {   var a = require('a1');} You can learn more about this compatibility problem at this URL: http://www.requirejs.org/docs/whyamd.html#commonjscompat. You can see more about this sugar syntax at this URL: http://www.requirejs.org/docs/whyamd.html#sugar. Now that you know the basic way to use RequireJS, let's look at the next concept. Component binding handler The component binding handler is one of the new features introduced in Version 2.3 of KnockoutJS. Inside the documentation of KnockoutJS, we find the following explanation: Components are a powerful, clean way of organizing your UI code into self-contained, reusable chunks. They can represent individual controls/widgets, or entire sections of your application. A component is a combination of HTML and JavaScript. The main idea behind their inclusion was to create full-featured, reusable components, with one or more points of extensibility. A component is a combination of HTML and JavaScript. There are cases where you can use just one of them, but normally you'll use both. You can get a first simple example about this here: http://knockoutjs.com/documentation/component-binding.html. The best way to create self-contained components is with the use of an AMD module loader, such as RequireJS; put the View Model and the template of the component inside two different files, and then you can use it from your code really easily. Creating the bare bones of a custom module Writing a custom module of KnockoutJS with RequireJS is a 4-step process: Creating the JavaScript file for the View Model. Creating the HTML file for the template of the View. Registering the component with KnockoutJS. Using it inside another View. We are going to build bases for the Search Form component, just to move forward with our project; anyway, this is the starting code we should use for each component that we write from scratch. Let's cover all of these steps. Creating the JavaScript file for the View Model We start with the View Model of this component. Create a new empty file with the name BookingOnline/app/components/search.js and put this code inside it: define(function(require) {var ko = require("knockout"),     template = require("text!./search.html");function Search() {}return {   viewModel: Search,   template: template};}); Here, we are creating a constructor called Search that we will fill later. We are also using the text plugin for RequireJS to get the template search.html from the current folder, into the argument template. Then, we will return an object with the constructor and the template, using the format needed from KnockoutJS to use as a component. Creating the HTML file for the template of the View In the View Model we required a View called search.html in the same folder. At the moment, we don't have any code to put inside the template of the View, because there is no boilerplate code needed; but we must create the file, otherwise RequireJS will break with an error. Create a new file called BookingOnline/app/components/search.html with the following content: <div>Hello Search</div> Registering the component with KnockoutJS When you use components, there are two different ways to give KnockoutJS a way to find your component: Using the function ko.components.register Implementing a custom component loader The first way is the easiest one: using the default component loader of KnockoutJS. To use it with our component you should just put the following row inside the BookingOnline/app/index.js file, just before the row $(function () {: ko.components.register("search", {require: "components/search"}); Here, we are registering a module called search, and we are telling KnockoutJS that it will have to find all the information it needs using an AMD require for the path components/search (so it will load the file BookingOnline/app/components/search.js). You can find more information and a really good example about a custom component loader at: http://knockoutjs.com/documentation/component-loaders.html#example-1-a-component-loader-that-sets-up-naming-conventions. Using it inside another View Now, we can simply use the new component inside our View; put the following code inside our Index View (BookingOnline/index.html), before the script tag:    <div data-bind="component: 'search'"></div> Here, we are using the component binding handler to use the component; another commonly used way is with custom elements. We can replace the previous row with the following one:    <search></search> KnockoutJS will use our search component, but with a WebComponent-like code. If you want to support IE6-8 you should register the WebComponents you are going to use before the HTML parser can find them. Normally, this job is done inside the ko.components.register function call, but, if you are putting your script tag at the end of body as we have done until now, your WebComponent will be discarded. Follow the guidelines mentioned here when you want to support IE6-8: http://knockoutjs.com/documentation/component-custom-elements.html#note-custom-elements-and-internet-explorer-6-to-8 Now, you can open your web application and you should see the text, Hello Search. We put that markup only to check whether everything was working here, so you can remove it now. Writing the Search Form component Now that we know how to create a component, and we put the base of our Search Form component, we can try to look for the requirements for this component. A designer will review the View later, so we need to keep it simple to avoid the need for multiple changes later. From our analysis, we find that our competitors use these components: Autocomplete field for the city Calendar fields for check-in and check-out Selection field for the number of rooms, number of adults and number of children, and age of children This is a wireframe of what we should build (we got inspired by Trivago): We could do everything by ourselves, but the easiest way to realize this component is with the help of a few external plugins; we are already using jQuery, so the most obvious idea is to use jQuery UI to get the Autocomplete Widget, the Date Picker Widget, and maybe even the Button Widget. Adding the AMD version of jQuery UI to the project Let's start downloading the current version of jQuery UI (1.11.1); the best thing about this version is that it is one of the first versions that supports AMD natively. After reading the documentation of jQuery UI for the AMD (URL: http://learn.jquery.com/jquery-ui/environments/amd/) you may think that you can get the AMD version using the download link from the home page. However, if you try that you will get just a package with only the concatenated source; for this reason, if you want the AMD source file, you will have to go directly to GitHub or use Bower. Download the package from https://github.com/jquery/jquery-ui/archive/1.11.1.zip and extract it. Every time you use an external library, remember to check the compatibility support. In jQuery UI 1.11.1, as you can see in the release notes, they removed the support for IE7; so we must decide whether we want to support IE6 and 7 by adding specific workarounds inside our code, or we want to remove the support for those two browsers. For our project, we need to put the following folders into these destinations: jquery-ui-1.11.1/ui -> BookingOnline/app/ui jquery-ui-1.11.1/theme/base -> BookingOnline/css/ui We are going to apply the widget by JavaScript, so the only remaining step to integrate jQuery UI is the insertion of the style sheet inside our application. We do this by adding the following rows to the top of our custom style sheet file (BookingOnline/css/styles.css): @import url("ui/core.css");@import url("ui/menu.css");@import url("ui/autocomplete.css");@import url("ui/button.css");@import url("ui/datepicker.css");@import url("ui/theme.css") Now, we are ready to add the widgets to our web application. You can find more information about jQuery UI and AMD at: http://learn.jquery.com/jquery-ui/environments/amd/ Making the skeleton from the wireframe We want to give to the user a really nice user experience, but as the first step we can use the wireframe we put before to create a skeleton of the Search Form. Replace the entire content with a form inside the file BookingOnline/components/search.html: <form data-bind="submit: execute"></form> Then, we add the blocks inside the form, step by step, to realize the entire wireframe: <div>   <input type="text" placeholder="Enter a destination" />   <label> Check In: <input type="text" /> </label>   <label> Check Out: <input type="text" /> </label>   <input type="submit" data-bind="enable: isValid" /></div> Here, we built the first row of the wireframe; we will bind data to each field later. We bound the execute function to the submit event (submit: execute), and a validity check to the button (enable: isValid); for now we will create them empty. Update the View Model (search.js) by adding this code inside the constructor: this.isValid = ko.computed(function() {return true;}, this); And add this function to the Search prototype: Search.prototype.execute = function() { }; This is because the validity of the form will depend on the status of the destination field and of the check-in date and check-out date; we will update later, in the next paragraphs. Now, we can continue with the wireframe, with the second block. Here, we should have a field to select the number of rooms, and a block for each room. Add the following markup inside the form, after the previous one, for the second row to the View (search.html): <div>   <fieldset>     <legend>Rooms</legend>     <label>       Number of Room       <select data-bind="options: rangeOfRooms,                           value: numberOfRooms">       </select>     </label>     <!-- ko foreach: rooms -->       <fieldset>         <legend>           Room <span data-bind="text: roomNumber"></span>         </legend>       </fieldset>     <!-- /ko -->   </fieldset></div> In this markup we are asking the user to choose between the values found inside the array rangeOfRooms, to save the selection inside a property called numberOfRooms, and to show a frame for each room of the array rooms with the room number, roomNumber. When developing and we want to check the status of the system, the easiest way to do it is with a simple item inside a View bound to the JSON of a View Model. Put the following code inside the View (search.html): <pre data-bind="text: ko.toJSON($data, null, 2)"></pre> With this code, you can check the status of the system with any change directly in the printed JSON. You can find more information about ko.toJSON at http://knockoutjs.com/documentation/json-data.html Update the View Model (search.js) by adding this code inside the constructor: this.rooms = ko.observableArray([]);this.numberOfRooms = ko.computed({read: function() {   return this.rooms().length;},write: function(value) {   var previousValue = this.rooms().length;   if (value > previousValue) {     for (var i = previousValue; i < value; i++) {       this.rooms.push(new Room(i + 1));     }   } else {     this.rooms().splice(value);     this.rooms.valueHasMutated();   }},owner: this}); Here, we are creating the array of rooms, and a property to update the array properly. If the new value is bigger than the previous value it adds to the array the missing item using the constructor Room; otherwise, it removes the exceeding items from the array. To get this code working we have to create a module, Room, and we have to require it here; update the require block in this way:    var ko = require("knockout"),       template = require("text!./search.html"),       Room = require("room"); Also, add this property to the Search prototype: Search.prototype.rangeOfRooms = ko.utils.range(1, 10); Here, we are asking KnockoutJS for an array with the values from the given range. ko.utils.range is a useful method to get an array of integers. Internally, it simply makes an array from the first parameter to the second one; but if you use it inside a computed field and the parameters are observable, it re-evaluates and updates the returning array. Now, we have to create the View Model of the Room module. Create a new file BookingOnline/app/room.js with the following starting code: define(function(require) {var ko = require("knockout");function Room(roomNumber) {   this.roomNumber = roomNumber;}return Room;}); Now, our web application should appear like so: As you can see, we now have a fieldset for each room, so we can work on the template of the single room. Here, you can also see in action the previous tip about the pre field with the JSON data. With KnockoutJS 3.2 it is harder to decide when it's better to use a normal template or a component. The rule of thumb is to identify the degree of encapsulation you want to manage: Use the component when you want a self-enclosed black box, or the template if you want to manage the View Model directly. What we want to show for each room is: Room number Number of adults Number of children Age of each child We can update the Room View Model (room.js) by adding this code into the constructor: this.numberOfAdults = ko.observable(2);this.ageOfChildren = ko.observableArray([]);this.numberOfChildren = ko.computed({read: function() {   return this.ageOfChildren().length;},write: function(value) {   var previousValue = this.ageOfChildren().length;   if (value > previousValue) {     for (var i = previousValue; i < value; i++) {       this.ageOfChildren.push(ko.observable(0));     }   } else {     this.ageOfChildren().splice(value);     this.ageOfChildren.valueHasMutated();   }},owner: this});this.hasChildren = ko.computed(function() {return this.numberOfChildren() > 0;}, this); We used the same logic we have used before for the mapping between the count of the room and the count property, to have an array of age of children. We also created a hasChildren property to know whether we have to show the box for the age of children inside the View. We have to add—as we have done before for the Search View Model—a few properties to the Room prototype: Room.prototype.rangeOfAdults = ko.utils.range(1, 10);Room.prototype.rangeOfChildren = ko.utils.range(0, 10);Room.prototype.rangeOfAge = ko.utils.range(0, 17); These are the ranges we show inside the relative select. Now, as the last step, we have to put the template for the room in search.html; add this code inside the fieldset tag, after the legend tag (as you can see here, with the external markup):      <fieldset>       <legend>         Room <span data-bind="text: roomNumber"></span>       </legend>       <label> Number of adults         <select data-bind="options: rangeOfAdults,                            value: numberOfAdults"></select>       </label>       <label> Number of children         <select data-bind="options: rangeOfChildren,                             value: numberOfChildren"></select>       </label>       <fieldset data-bind="visible: hasChildren">         <legend>Age of children</legend>         <!-- ko foreach: ageOfChildren -->           <select data-bind="options: $parent.rangeOfAge,                               value: $rawData"></select>         <!-- /ko -->       </fieldset>     </fieldset>     <!-- /ko --> Here, we are using the properties we have just defined. We are using rangeOfAge from $parent because inside foreach we changed context, and the property, rangeOfAge, is inside the Room context. Why did I use $rawData to bind the value of the age of the children instead of $data? The reason is that ageOfChildren is an array of observables without any container. If you use $data, KnockoutJS will unwrap the observable, making it one-way bound; but if you use $rawData, you will skip the unwrapping and get the two-way data binding we need here. In fact, if we use the one-way data binding our model won't get updated at all. If you really don't like that the fieldset for children goes to the next row when it appears, you can change the fieldset by adding a class, like this: <fieldset class="inline" data-bind="visible: hasChildren"> Now, your application should appear as follows: Now that we have a really nice starting form, we can update the three main fields to use the jQuery UI Widgets. Realizing an Autocomplete field for the destination As soon as we start to write the code for this field we face the first problem: how can we get the data from the backend? Our team told us that we don't have to care about the backend, so we speak to the backend team to know how to get the data. After ten minutes we get three files with the code for all the calls to the backend; all we have to do is to download these files (we already got them with the Starting Package, to avoid another download), and use the function getDestinationByTerm inside the module, services/rest. Before writing the code for the field let's think about which behavior we want for it: When you put three or more letters, it will ask the server for the list of items Each recurrence of the text inside the field into each item should be bold When you select an item, a new button should appear to clear the selection If the current selected item and the text inside the field are different when the focus exits from the field, it should be cleared The data should be taken using the function, getDestinationByTerm, inside the module, services/rest The documentation of KnockoutJS also explains how to create custom binding handlers in the context of RequireJS. The what and why about binding handlers All the bindings we use inside our View are based on the KnockoutJS default binding handler. The idea behind a binding handler is that you should put all the code to manage the DOM inside a component different from the View Model. Other than this, the binding handler should be realized with reusability in mind, so it's always better not to hard-code application logic inside. The KnockoutJS documentation about standard binding is already really good, and you can find many explanations about its inner working in the Appendix, Binding Handler. When you make a custom binding handler it is important to remember that: it is your job to clean after; you should register event handling inside the init function; and you should use the update function to update the DOM depending on the change of the observables. This is the standard boilerplate code when you use RequireJS: define(function(require) {var ko = require("knockout"),     $ = require("jquery");ko.bindingHandlers.customBindingHandler = {   init: function(element, valueAccessor,                   allBindingsAccessor, data, context) {     /* Code for the initialization… */     ko.utils.domNodeDisposal.addDisposeCallback(element,       function () { /* Cleaning code … */ });   },   update: function (element, valueAccessor) {     /* Code for the update of the DOM… */   }};}); And inside the View Model module you should require this module, as follows: require('binding-handlers/customBindingHandler'); ko.utils.domNodeDisposal is a list of callbacks to be executed when the element is removed from the DOM; it's necessary because it's where you have to put the code to destroy the widgets, or remove the event handlers. Binding handler for the jQuery Autocomplete widget So, now we can write our binding handler. We will define a binding handler named autocomplete, which takes the observable to put the found value. We will also define two custom bindings, without any logic, to work as placeholders for the parameters we will send to the main binding handler. Our binding handler should: Get the value for the autoCompleteOptions and autoCompleteEvents optional data bindings. Apply the Autocomplete Widget to the item using the option of the previous step. Register all the event listeners. Register the disposal of the Widget. We also should ensure that if the observable gets cleared, the input field gets cleared too. So, this is the code of the binding handler to put inside BookingOnline/app/binding-handlers/autocomplete.js (I put comments between the code to make it easier to understand): define(function(require) {var ko = require("knockout"),     $ = require("jquery"),     autocomplete = require("ui/autocomplete");ko.bindingHandlers.autoComplete = {   init: function(element, valueAccessor, allBindingsAccessor, data, context) { Here, we are giving the name autoComplete to the new binding handler, and we are also loading the Autocomplete Widget of jQuery UI: var value = ko.utils.unwrapObservable(valueAccessor()),   allBindings = ko.utils.unwrapObservable(allBindingsAccessor()),   options = allBindings.autoCompleteOptions || {},   events = allBindings.autoCompleteEvents || {},   $element = $(element); Then, we take the data from the binding for the main parameter, and for the optional binding handler; we also put the current element into a jQuery container: autocomplete(options, $element);if (options._renderItem) {   var widget = $element.autocomplete("instance");   widget._renderItem = options._renderItem;}for (var event in events) {   ko.utils.registerEventHandler(element, event, events[event]);} Now we can apply the Autocomplete Widget to the field. If you are questioning why we used ko.utils.registerEventHandler here, the answer is: to show you this function. If you look at the source, you can see that under the wood it uses $.bind if jQuery is registered; so in our case we could simply use $.bind or $.on without any problem. But I wanted to show you this function because sometimes you use KnockoutJS without jQuery, and you can use it to support event handling of every supported browser. The source code of the function _renderItem is (looking at the file ui/autocomplete.js): _renderItem: function( ul, item ) {return $( "<li>" ).text( item.label ).appendTo( ul );}, As you can see, for security reasons, it uses the function text to avoid any possible code injection. It is important that you know that you should do data validation each time you get data from an external source and put it in the page. In this case, the source of data is already secured (because we manage it), so we override the normal behavior, to also show the HTML tag for the bold part of the text. In the last three rows we put a cycle to check for events and we register them. The standard way to register for events is with the event binding handler. The only reason you should use a custom helper is to give to the developer of the View a way to register events more than once. Then, we add to the init function the disposal code: // handle disposalko.utils.domNodeDisposal.addDisposeCallback(element, function() {$element.autocomplete("destroy");}); Here, we use the destroy function of the widget. It's really important to clean up after the use of any jQuery UI Widget or you'll create a really bad memory leak; it's not a big problem with simple applications, but it will be a really big problem if you realize an SPA. Now, we can add the update function:    },   update: function(element, valueAccessor) {     var value = valueAccessor(),         $element = $(element),         data = value();     if (!data)       $element.val("");   }};}); Here, we read the value of the observable, and clean the field if the observable is empty. The update function is executed as a computed observable, so we must be sure that we subscribe to the observables required inside. So, pay attention if you put conditional code before the subscription, because your update function could be not called anymore. Now that the binding is ready, we should require it inside our form; update the View search.html by modifying the following row:    <input type="text" placeholder="Enter a destination" /> Into this:    <input type="text" placeholder="Enter a destination"           data-bind="autoComplete: destination,                     autoCompleteEvents: destination.events,                     autoCompleteOptions: destination.options" /> If you try the application you will not see any error; the reason is that KnockoutJS ignores any data binding not registered inside the ko.bindingHandlers object, and we didn't require the binding handler autocomplete module. So, the last step to get everything working is the update of the View Model of the component; add these rows at the top of the search.js, with the other require(…) rows:      Room = require("room"),     rest = require("services/rest");require("binding-handlers/autocomplete"); We need a reference to our new binding handler, and a reference to the rest object to use it as source of data. Now, we must declare the properties we used inside our data binding; add all these properties to the constructor as shown in the following code: this.destination = ko.observable();this.destination.options = { minLength: 3,source: rest.getDestinationByTerm,select: function(event, data) {   this.destination(data.item);}.bind(this),_renderItem: function(ul, item) {   return $("<li>").append(item.label).appendTo(ul);}};this.destination.events = {blur: function(event) {   if (this.destination() && (event.currentTarget.value !==                               this.destination().value)) {     this.destination(undefined);   }}.bind(this)}; Here, we are defining the container (destination) for the data selected inside the field, an object (destination.options) with any property we want to pass to the Autocomplete Widget (you can check all the documentation at: http://api.jqueryui.com/autocomplete/), and an object (destination.events) with any event we want to apply to the field. Here, we are clearing the field if the text inside the field and the content of the saved data (inside destination) are different. Have you noticed .bind(this) in the previous code? You can check by yourself that the value of this inside these functions is the input field. As you can see, in our code we put references to the destination property of this, so we have to update the context to be the object itself; the easiest way to do this is with a simple call to the bind function. Summary In this article, we have seen all some functionalities of KnockoutJS (core). The application we realized was simple enough, but we used it to learn better how to use components and custom binding handlers. If you think we put too much code for such a small project, try to think what differences you have seen between the first and the second component: the more component and binding handler code you write, the lesser you will have to write in the future. The most important point about components and custom binding handlers is that you have to realize them looking at future reuse; more good code you write, the better it will be for you later. The core point of this article was AMD and RequireJS; how to use them inside a KnockoutJS project, and why you should do it. Resources for Article: Further resources on this subject: Components [article] Web Application Testing [article] Top features of KnockoutJS [article] e to add—as we have done before for the Search View Model—  
Read more
  • 0
  • 0
  • 2180

Packt
02 Mar 2015
19 min read
Save for later

Entity Framework DB First – Inheritance Relationships between Entities

Packt
02 Mar 2015
19 min read
This article is written by Rahul Rajat Singh, the author of Mastering Entity Framework. So far, we have seen how we can use various approaches of Entity Framework, how we can manage database table relationships, and how to perform model validations using Entity Framework. In this article, we will see how we can implement the inheritance relationship between the entities. We will see how we can change the generated conceptual model to implement the inheritance relationship, and how it will benefit us in using the entities in an object-oriented manner and the database tables in a relational manner. (For more resources related to this topic, see here.) Domain modeling using inheritance in Entity Framework One of the major challenges while using a relational database is to manage the domain logic in an object-oriented manner when the database itself is implemented in a relational manner. ORMs like Entity Framework provide the strongly typed objects, that is, entities for the relational tables. However, it might be possible that the entities generated for the database tables are logically related to each other, and they can be better modeled using inheritance relationships rather than having independent entities. Entity Framework lets us create inheritance relationships between the entities, so that we can work with the entities in an object-oriented manner, and internally, the data will get persisted in the respective tables. Entity Framework provides us three ways of object relational domain modeling using the inheritance relationship: The Table per Type (TPT) inheritance The Table per Class Hierarchy (TPH) inheritance The Table per Concrete Class (TPC) inheritance Let's now take a look at the scenarios where the generated entities are not logically related, and how we can use these inheritance relationships to create a better domain model by implementing inheritance relationships between entities using the Entity Framework Database First approach. The Table per Type inheritance The Table per Type (TPT) inheritance is useful when our database has tables that are related to each other using a one-to-one relationship. This relation is being maintained in the database by a shared primary key. To illustrate this, let's take a look at an example scenario. Let's assume a scenario where an organization maintains a database of all the people who work in a department. Some of them are employees getting a fixed salary, and some of them are vendors who are hired at an hourly rate. This is modeled in the database by having all the common data in a table called Person, and there are separate tables for the data that is specific to the employees and vendors. Let's visualize this scenario by looking at the database schema: The database schema showing the TPT inheritance database schema The ID column for the People table can be an auto-increment identity column, but it should not be an auto-increment identity column for the Employee and Vendors tables. In the preceding figure, the People table contains all the data common to both type of worker. The Employee table contains the data specific to the employees and the Vendors table contains the data specific to the vendors. These tables have a shared primary key and thus, there is a one-to-one relationship between the tables. To implement the TPT inheritance, we need to perform the following steps in our application: Generate the default Entity Data Model. Delete the default relationships. Add the inheritance relationship between the entities. Use the entities via the DBContext object. Generating the default Entity Data Model Let's add a new ADO.NET Entity Data Model to our application, and generate the conceptual Entity Model for these tables. The default generated Entity Model will look like this: The generated Entity Data Model where the TPT inheritance could be used Looking at the preceding conceptual model, we can see that Entity Framework is able to figure out the one-to-one relationship between the tables and creates the entities with the same relationship. However, if we take a look at the generated entities from our application domain perspective, it is fairly evident that these entities can be better managed if they have an inheritance relationship between them. So, let's see how we can modify the generated conceptual model to implement the inheritance relationship, and Entity Framework will take care of updating the data in the respective tables. Deleting default relationships The first thing we need to do to create the inheritance relationship is to delete the existing relationship from the Entity Model. This can be done by right-clicking on the relationship and selecting Delete from Model as follows: Deleting an existing relationship from the Entity Model Adding inheritance relationships between entities Once the relationships are deleted, we can add the new inheritance relationships in our Entity Model as follows: Adding inheritance relationships in the Entity Model When we add an inheritance relationship, the Visual Entity Designer will ask for the base class and derived class as follows: Selecting the base class and derived class participating in the inheritance relationship Once the inheritance relationship is created, the Entity Model will look like this: Inheritance relationship in the Entity Model After creating the inheritance relationship, we will get a compile error that the ID property is defined in all the entities. To resolve this problem, we need to delete the ID column from the derived classes. This will still keep the ID column that maps the derived classes as it is. So, from the application perspective, the ID column is defined in the base class but from the mapping perspective, it is mapped in both the base class and derived class, so that the data will get inserted into tables mapped in both the base and derived entities. With this inheritance relationship in place, the entities can be used in an object-oriented manner, and Entity Framework will take care of updating the respective tables for each entity. Using the entities via the DBContext object As we know, DbContext is the primary class that should be used to perform various operations on entities. Let's try to use our SampleDbContext class to create an Employee and a Vendor using this Entity Model and see how the data gets updated in the database: using (SampleDbEntities db = new SampleDbEntities()) { Employee employee = new Employee(); employee.FirstName = "Employee 1"; employee.LastName = "Employee 1"; employee.PhoneNumber = "1234567"; employee.Salary = 50000; employee.EmailID = "employee1@test.com"; Vendor vendor = new Vendor(); vendor.FirstName = "vendor 1"; vendor.LastName = "vendor 1"; vendor.PhoneNumber = "1234567"; vendor.HourlyRate = 100; vendor.EmailID = "vendor1@test.com"; db.Workers.Add(employee); db.Workers.Add(vendor); db.SaveChanges(); } In the preceding code, what we are doing is creating an object of the Employee and Vendor type, and then adding them to People using the DbContext object. What Entity Framework will do internally is that it will look at the mappings of the base entity and the derived entities, and then push the respective data into the respective tables. So, if we take a look at the data inserted in the database, it will look like the following: A database snapshot of the inserted data It is clearly visible from the preceding database snapshot that Entity Framework looks at our inheritance relationship and pushes the data into the Person, Employee, and Vendor tables. The Table per Class Hierarchy inheritance The Table per Class Hierarchy (TPH) inheritance is modeled by having a single database table for all the entity classes in the inheritance hierarchy. The TPH inheritance is useful in cases where all the information about the related entities is stored in a single table. For example, using the earlier scenario, let's try to model the database in such a way that it will only contain a single table called Workers to store the Employee and Vendor details. Let's try to visualize this table: A database schema showing the TPH inheritance database schema Now what will happen in this case is that the common fields will be populated whenever we create a type of worker. Salary will only contain a value if the worker is of type Employee. The HourlyRate field will be null in this case. If the worker is of type Vendor, then the HourlyRate field will have a value, and Salary will be null. This pattern is not very elegant from a database perspective. Since we are trying to keep unrelated data in a single table, our table is not normalized. There will always be some redundant columns that contain null values if we use this approach. We should try not to use this pattern unless it is absolutely needed. To implement the TPH inheritance relationship using the preceding table structure, we need to perform the following activities: Generate the default Entity Data Model. Add concrete classes to the Entity Data Model. Map the concrete class properties to their respective tables and columns. Make the base class entity abstract. Use the entities via the DBContext object. Let's discuss this in detail. Generating the default Entity Data Model Let's now generate the Entity Data Model for this table. The Entity Framework will create a single entity, Worker, for this table: The generated model for the table created for implementing the TPH inheritance Adding concrete classes to the Entity Data Model From the application perspective, it would be a much better solution if we have classes such as Employee and Vendor, which are derived from the Worker entity. The Worker class will contain all the common properties, and Employee and Vendor will contain their respective properties. So, let's add new entities for Employee and Vendor. While creating the entity, we can specify the base class entity as Worker, which is as follows: Adding a new entity in the Entity Data Model using a base class type Similarly, we will add the Vendor entity to our Entity Data Model, and specify the Worker entity as its base class entity. Once the entities are generated, our conceptual model will look like this: The Entity Data Model after adding the derived entities Next, we have to remove the Salary and HourlyRate properties from the Worker entity, and put them in the Employee and the Vendor entities respectively. So, once the properties are put into the respective entities, our final Entity Data model will look like this: The Entity Data Model after moving the respective properties into the derived entities Mapping the concrete class properties to the respective tables and columns After this, we have to define the column mappings in the derived classes to let the derived classes know which table and column should be used to put the data. We also need to specify the mapping condition. The Employee entity should save the Salary property's value in the Salary column of the Workers table when the Salary property is Not Null and HourlyRate is Null: Table mapping and conditions to map the Employee entity to the respective tables Once this mapping is done, we have to mark the Salary property as Nullable=false in the entity property window. This will let Entity Framework know that if someone is creating an object of the Employee type, then the Salary field is mandatory: Setting the Employee entity properties as Nullable Similarly, the Vendor entity should save the HourlyRate property's value in the HourlyRate column of the Workers table when Salary is Null and HourlyRate is Not Null: Table mapping and conditions to map the Vendor entity to the respective tables And similar to the Employee class, we also have to mark the HourlyRate property as Nullable=false in the Entity Property window. This will help Entity Framework know that if someone is creating an object of the Vendor type, then the HourlyRate field is mandatory: Setting the Vendor entity properties to Nullable Making the base class entity abstract There is one last change needed to be able to use these models. To be able to use these models, we need to mark the base class as abstract, so that Entity Framework is able to resolve the object of Employee and Vendors to the Workers table. Making the base class Workers as abstract This will also be a better model from the application perspective because the Worker entity itself has no meaning from the application domain perspective. Using the entities via the DBContext object Now we have our Entity Data Model configured to use the TPH inheritance. Let's try to create an Employee object and a Vendor object, and add them to the database using the TPH inheritance hierarchy: using (SampleDbEntities db = new SampleDbEntities()){Employee employee = new Employee();employee.FirstName = "Employee 1";employee.LastName = "Employee 1";employee.PhoneNumber = "1234567";employee.Salary = 50000;employee.EmailID = "employee1@test.com";Vendor vendor = new Vendor();vendor.FirstName = "vendor 1";vendor.LastName = "vendor 1";vendor.PhoneNumber = "1234567";vendor.HourlyRate = 100;vendor.EmailID = "vendor1@test.com";db.Workers.Add(employee);db.Workers.Add(vendor);db.SaveChanges();} In the preceding code, we created objects of the Employee and Vendor types, and then added them to the Workers collection using the DbContext object. Entity Framework will look at the mappings of the base entity and the derived entities, will check the mapping conditions and the actual values of the properties, and then push the data to the respective tables. So, let's take a look at the data inserted in the Workers table: A database snapshot after inserting the data using the Employee and Vendor entities So, we can see that for our Employee and Vendor models, the actual data is being kept in the same table using Entity Framework's TPH inheritance. The Table per Concrete Class inheritance The Table per Concrete Class (TPC) inheritance can be used when the database contains separate tables for all the logical entities, and these tables have some common fields. In our existing example, if there are two separate tables of Employee and Vendor, then the database schema would look like the following: The database schema showing the TPC inheritance database schema One of the major problems in such a database design is the duplication of columns in the tables, which is not recommended from the database normalization perspective. To implement the TPC inheritance, we need to perform the following tasks: Generate the default Entity Data Model. Create the abstract class. Modify the CDSL to cater to the change. Specify the mapping to implement the TPT inheritance. Use the entities via the DBContext object. Generating the default Entity Data Model Let's now take a look at the generated entities for this database schema: The default generated entities for the TPC inheritance database schema Entity Framework has given us separate entities for these two tables. From our application domain perspective, we can use these entities in a better way if all the common properties are moved to a common abstract class. The Employee and Vendor entities will contain the properties specific to them and inherit from this abstract class to use all the common properties. Creating the abstract class Let's add a new entity called Worker to our conceptual model and move the common properties into this entity: Adding a base class for all the common properties Next, we have to mark this class as abstract from the properties window: Marking the base class as abstract class Modifying the CDSL to cater to the change Next, we have to specify the mapping for these tables. Unfortunately, the Visual Entity Designer has no support for this type of mapping, so we need to perform this mapping ourselves in the EDMX XML file. The conceptual schema definition language (CSDL) part of the EDMX file is all set since we have already moved the common properties into the abstract class. So, now we should be able to use these properties with an abstract class handle. The problem will come in the storage schema definition language (SSDL) and mapping specification language (MSL). The first thing that we need to do is to change the SSDL to let Entity Framework know that the abstract class Worker is capable of saving the data in two tables. This can be done by setting the EntitySet name in the EntityContainer tags as follows: <EntityContainer Name="todoDbModelStoreContainer">   <EntitySet Name="Employee" EntityType="Self.Employee" Schema="dbo" store_Type="Tables" />   <EntitySet Name="Vendor" EntityType="Self.Vendor" Schema="dbo" store_Type="Tables" /></EntityContainer> Specifying the mapping to implement the TPT inheritance Next, we need to change the MSL to properly map the properties to the respective tables based on the actual type of object. For this, we have to specify EntitySetMapping. The EntitySetMapping should look like the following: <EntityContainerMapping StorageEntityContainer="todoDbModelStoreContainer" CdmEntityContainer="SampleDbEntities">    <EntitySetMapping Name="Workers">   <EntityTypeMapping TypeName="IsTypeOf(SampleDbModel.Vendor)">       <MappingFragment StoreEntitySet="Vendor">       <ScalarProperty Name="HourlyRate" ColumnName="HourlyRate" />       <ScalarProperty Name="EMailId" ColumnName="EMailId" />       <ScalarProperty Name="PhoneNumber" ColumnName="PhoneNumber" />       <ScalarProperty Name="LastName" ColumnName="LastName" />       <ScalarProperty Name="FirstName" ColumnName="FirstName" />       <ScalarProperty Name="ID" ColumnName="ID" />       </MappingFragment>   </EntityTypeMapping>      <EntityTypeMapping TypeName="IsTypeOf(SampleDbModel.Employee)">       <MappingFragment StoreEntitySet="Employee">       <ScalarProperty Name="ID" ColumnName="ID" />       <ScalarProperty Name="Salary" ColumnName="Salary" />       <ScalarProperty Name="EMailId" ColumnName="EMailId" />       <ScalarProperty Name="PhoneNumber" ColumnName="PhoneNumber" />       <ScalarProperty Name="LastName" ColumnName="LastName" />       <ScalarProperty Name="FirstName" ColumnName="FirstName" />       </MappingFragment>   </EntityTypeMapping>   </EntitySetMapping></EntityContainerMapping> In the preceding code, we specified that if the actual type of object is Vendor, then the properties should map to the columns in the Vendor table, and if the actual type of entity is Employee, the properties should map to the Employee table, as shown in the following screenshot: After EDMX modifications, the mapping are visible in Visual Entity Designer If we now open the EDMX file again, we can see the properties being mapped to the respective tables in the respective entities. Doing this mapping from Visual Entity Designer is not possible, unfortunately. Using the entities via the DBContext object Let's use these "entities from our code: using (SampleDbEntities db = new SampleDbEntities()) { Employee employee = new Employee(); employee.FirstName = "Employee 1"; employee.LastName = "Employee 1"; employee.PhoneNumber = "1234567"; employee.Salary = 50000; employee.EMailId = "employee1@test.com"; Vendor vendor = new Vendor(); vendor.FirstName = "vendor 1"; vendor.LastName = "vendor 1"; vendor.PhoneNumber = "1234567"; vendor.HourlyRate = 100; vendor.EMailId = "vendor1@test.com"; db.Workers.Add(employee); db.Workers.Add(vendor); db.SaveChanges(); } In the preceding code, we created objects of the Employee and Vendor types and saved them using the Workers entity set, which is actually an abstract class. If we take a look at the inserted database, we will see the following: Database snapshot of the inserted data using TPC inheritance From the preceding screenshot, it is clear that the data is being pushed to the respective tables. The insert operation we saw in the previous code is successful but there will be an exception in the application. This exception is because when Entity Framework tries to access the values that are in the abstract class, it finds two records with same ID, and since the ID column is specified as a primary key, two records with the same value is a problem in this scenario. This exception clearly shows that the store/database generated identity columns will not work with the TPC inheritance. If we want to use the TPC inheritance, then we either need to use GUID based IDs, or pass the ID from the application, or perhaps use some database mechanism that can maintain the uniqueness of auto-generated columns across multiple tables. Choosing the inheritance strategy Now that we know about all the inheritance strategies supported by Entity Framework, let's try to analyze these approaches. The most important thing is that there is no single strategy that will work for all the scenarios. Especially if we have a legacy database. The best option would be to analyze the application requirements and then look at the existing table structure to see which approach is best suited. The Table per Class Hierarchy inheritance tends to give us denormalized tables and have redundant columns. We should only use it when the number of properties in the derived classes is very less, so that the number of redundant columns is also less, and this denormalized structure will not create problems over a period of time. Contrary to TPH, if we have a lot of properties specific to derived classes and only a few common properties, we can use the Table per Concrete Class inheritance. However, in this approach, we will end up with some properties being repeated in all the tables. Also, this approach imposes some limitations such as we cannot use auto-increment identity columns in the database. If we have a lot of common properties that could go into a base class and a lot of properties specific to derived classes, then perhaps Table per Type is the best option to go with. In any case, complex inheritance relationships that become unmanageable in the long run should be avoided. One alternative could be to have separate domain models to implement the application logic in an object-oriented manner, and then use mappers to map these domain models to Entity Framework's generated entity models. Summary In this article, we looked at the various types of inheritance relationship using Entity Framework. We saw how these inheritance relationships can be implemented, and some guidelines on which should be used in which scenario. Resources for Article: Further resources on this subject: Working with Zend Framework 2.0 [article] Hosting the service in IIS using the TCP protocol [article] Applying LINQ to Entities to a WCF Service [article]
Read more
  • 0
  • 0
  • 15753

article-image-dealing-interrupts
Packt
02 Mar 2015
19 min read
Save for later

Dealing with Interrupts

Packt
02 Mar 2015
19 min read
This article is written by Francis Perea, the author of the book Arduino Essentials. In all our previous projects, we have been constantly looking for events to occur. We have been polling, but looking for events to occur supposes a relatively big effort and a waste of CPU cycles to only notice that nothing happened. In this article, we will learn about interrupts as a totally new way to deal with events, being notified about them instead of looking for them constantly. Interrupts may be really helpful when developing projects in which fast or unknown events may occur, and thus we will see a very interesting project which will lead us to develop a digital tachograph for a computer-controlled motor. Are you ready? Here we go! (For more resources related to this topic, see here.) The concept of an interruption As you may have intuited, an interrupt is a special mechanism the CPU incorporates to have a direct channel to be noticed when some event occurs. Most Arduino microcontrollers have two of these: Interrupt 0 on digital pin 2 Interrupt 1 on digital pin 3 But some models, such as the Mega2560, come with up to five interrupt pins. Once an interrupt has been notified, the CPU completely stops what it was doing and goes on to look at it, by running a special dedicated function in our code called Interrupt Service Routine (ISR). When I say that the CPU completely stops, I mean that even functions such as delay() or millis() won't be updated while the ISR is being executed. Interrupts can be programmed to respond on different changes of the signal connected to the corresponding pin and thus the Arduino language has four predefined constants to represent each of these four modes: LOW: It will trigger the interrupt whenever the pin gets a LOW value CHANGE: The interrupt will be triggered when the pins change their values from HIGH to LOW or vice versa RISING: It will trigger the interrupt when signal goes from LOW to HIGH FALLING: It is just the opposite of RISING; the interrupt will be triggered when the signal goes from HIGH to LOW The ISR The function that the CPU will call whenever an interrupt occurs is so important to the micro that it has to accomplish a pair of rules: They can't have any parameter They can't return anything The interrupts can be executed only one at a time Regarding the first two points, they mean that we can neither pass nor receive any data from the ISR directly, but we have other means to achieve this communication with the function. We will use global variables for it. We can set and read from a global variable inside an ISR, but even so, these variables have to be declared in a special way. We have to declare them as volatile as we will see this later on in the code. The third point, which specifies that only one ISR can be attended at a time, is what makes the function millis() not being able to be updated. The millis() function relies on an interrupt to be updated, and this doesn't happen if another interrupt is already being served. As you may understand, ISR is critical to the correct code execution in a microcontroller. As a rule of thumb, we will try to keep our ISRs as simple as possible and leave all heavy weight processing that occurs outside of it, in the main loop of our code. The tachograph project To understand and manage interrupts in our projects, I would like to offer you a very particular one, a tachograph, a device that is present in all our cars and whose mission is to account for revolutions, normally the engine revolutions, but also in brake systems such as Anti-lock Brake System (ABS) and others. Mechanical considerations Well, calling it mechanical perhaps is too much, but let's make some considerations regarding how we are going to make our project account for revolutions. For this example project, I have used a small DC motor driven through a small transistor and, like in lots of industrial applications, an encoded wheel is a perfect mechanism to read the number of revolutions. By simply attaching a small disc of cardboard perpendicularly to your motor shaft, it is very easy to achieve it. By using our old friend, the optocoupler, we can sense something between its two parts, even with just a piece of cardboard with a small slot in just one side of its surface. Here, you can see the template I elaborated for such a disc, the cross in the middle will help you position the disc as perfectly as possible, that is, the cross may be as close as possible to the motor shaft. The slot has to be cut off of the black rectangle as shown in the following image: The template for the motor encoder Once I printed it, I glued it to another piece of cardboard to make it more resistant and glued it all to the crown already attached to my motor shaft. If yours doesn't have a surface big enough to glue the encoder disc to its shaft, then perhaps you can find a solution by using just a small piece of dough or similar to it. Once the encoder disc is fixed to the motor and spins attached to the motor shaft, we have to find a way to place the optocoupler in a way that makes it able to read through the encoder disc slot. In my case, just a pair of drops of glue did the trick, but if your optocoupler or motor doesn't allow you to apply this solution, I'm sure that a pair of zip ties or a small piece of dough can give you another way to fix it to the motor too. In the following image, you can see my final assembled motor with its encoder disc and optocoupler ready to be connected to the breadboard through alligator clips: The complete assembly for the motor encoder Once we have prepared our motor encoder, let's perform some tests to see it working and begin to write code to deal with interruptions. A simple interrupt tester Before going deep inside the whole code project, let's perform some tests to confirm that our encoder assembly is working fine and that we can correctly trigger an interrupt whenever the motor spins and the cardboard slot passes just through the optocoupler. The only thing you have to connect to your Arduino at the moment is the optocoupler; we will now operate our motor by hand and in a later section, we will control its speed from the computer. The test's circuit schematic is as follows: A simple circuit to test the encoder Nothing new in this circuit, it is almost the same as the one used in the optical coin detector, with the only important and necessary difference of connecting the wire coming from the detector side of the optocoupler to pin 2 of our Arduino board, because, as said in the preceding text, the interrupt 0 is available only through that pin. For this first test, we will make the encoder disc spin by hand, which allows us to clearly perceive when the interrupt triggers. For the rest of this example, we will use the LED included with the Arduino board connected to pin 13 as a way to visually indicate that the interrupts have been triggered. Our first interrupt and its ISR Once we have connected the optocoupler to the Arduino and prepared things to trigger some interrupts, let's see the code that we will use to test our assembly. The objective of this simple sketch is to commute the status of an LED every time an interrupt occurs. In the proposed tester circuit, the LED status variable will be changed every time the slot passes through the optocoupler: /*  Chapter 09 - Dealing with interrupts  A simple tester  By Francis Perea for Packt Publishing */   // A LED will be used to notify the change #define ledPin 13   // Global variables we will use // A variable to be used inside ISR volatile int status = LOW;   // A function to be called when the interrupt occurs void revolution(){   // Invert LED status   status=!status; }   // Configuration of the board: just one output void setup() {   pinMode(ledPin, OUTPUT);   // Assign the revolution() function as an ISR of interrupt 0   // Interrupt will be triggered when the signal goes from   // LOW to HIGH   attachInterrupt(0, revolution, RISING); }   // Sketch execution loop void loop(){    // Set LED status   digitalWrite(ledPin, status); } Let's take a look at its most important aspects. The LED pin apart, we declare a variable to account for changes occurring. It will be updated in the ISR of our interrupt; so, as I told you earlier, we declare it as follows: volatile int status = LOW; Following which we declare the ISR function, revolution(), which as we already know doesn't receive any parameter nor return any value. And as we said earlier, it must be as simple as possible. In our test case, the ISR simply inverts the value of the global volatile variable to its opposite value, that is, from LOW to HIGH and from HIGH to LOW. To allow our ISR to be called whenever an interrupt 0 occurs, in the setup() function, we make a call to the attachInterrupt() function by passing three parameters to it: Interrupt: The interrupt number to assign the ISR to ISR: The name without the parentheses of the function that will act as the ISR for this interrupt Mode: One of the following already explained modes that define when exactly the interrupt will be triggered In our case, the concrete sentence is as follows: attachInterrupt(0, revolution, RISING); This makes the function revolution() be the ISR of interrupt 0 that will be triggered when the signal goes from LOW to HIGH. Finally, in our main loop there is little to do. Simply update the LED based on the current value of the status variable that is going to be updated inside the ISR. If everything went right, you should see the LED commute every time the slot passes through the optocoupler as a consequence of the interrupt being triggered and the revolution() function inverting the value of the status variable that is used in the main loop to set the LED accordingly. A dial tachograph For a more complete example in this section, we will build a tachograph, a device that will present the current revolutions per minute of the motor in a visual manner by using a dial. The motor speed will be commanded serially from our computer by reusing some of the codes in our previous projects. It is not going to be very complicated if we include some way to inform about an excessive number of revolutions and even cut the engine in an extreme case to protect it, is it? The complete schematic of such a big circuit is shown in the following image. Don't get scared about the number of components as we have already seen them all in action before: The tachograph circuit As you may see, we will use a total of five pins of our Arduino board to sense and command such a set of peripherals: Pin 2: This is the interrupt 0 pin and thus it will be used to connect the output of the optocoupler. Pin 3: It will be used to deal with the servo to move the dial. Pin 4: We will use this pin to activate sound alarm once the engine current has been cut off to prevent overcharge. Pin 6: This pin will be used to deal with the motor transistor that allows us to vary the motor speed based on the commands we receive serially. Remember to use a PWM pin if you choose to use another one. Pin 13: Used to indicate with an LED an excessive number of revolutions per minute prior to cutting the engine off. There are also two more pins which, although not physically connected, will be used, pins 0 and 1, given that we are going to talk to the device serially from the computer. Breadboard connections diagram There are some wires crossed in the previous schematic, and perhaps you can see the connections better in the following breadboard connection image: Breadboard connection diagram for the tachograph The complete tachograph code This is going to be a project full of features and that is why it has such a number of devices to interact with. Let's resume the functioning features of the dial tachograph: The motor speed is commanded from the computer via a serial communication with up to five commands: Increase motor speed (+) Decrease motor speed (-) Totally stop the motor (0) Put the motor at full throttle (*) Reset the motor after a stall (R) Motor revolutions will be detected and accounted by using an encoder and an optocoupler Current revolutions per minute will be visually presented with a dial operated with a servomotor It gives visual indication via an LED of a high number of revolutions In case a maximum number of revolutions is reached, the motor current will be cut off and an acoustic alarm will sound With such a number of features, it is normal that the code for this project is going to be a bit longer than our previous sketches. Here is the code: /*  Chapter 09 - Dealing with interrupt  Complete tachograph system  By Francis Perea for Packt Publishing */   #include <Servo.h>   //The pins that will be used #define ledPin 13 #define motorPin 6 #define buzzerPin 4 #define servoPin 3   #define NOTE_A4 440 // Milliseconds between every sample #define sampleTime 500 // Motor speed increment #define motorIncrement 10 // Range of valir RPMs, alarm and stop #define minRPM  0 #define maxRPM 10000 #define alarmRPM 8000 #define stopRPM 9000   // Global variables we will use // A variable to be used inside ISR volatile unsigned long revolutions = 0; // Total number of revolutions in every sample long lastSampleRevolutions = 0; // A variable to convert revolutions per sample to RPM int rpm = 0; // LED Status int ledStatus = LOW; // An instace on the Servo class Servo myServo; // A flag to know if the motor has been stalled boolean motorStalled = false; // Thr current dial angle int dialAngle = 0; // A variable to store serial data int dataReceived; // The current motor speed int speed = 0; // A time variable to compare in every sample unsigned long lastCheckTime;   // A function to be called when the interrupt occurs void revolution(){   // Increment the total number of   // revolutions in the current sample   revolutions++; }   // Configuration of the board void setup() {   // Set output pins   pinMode(motorPin, OUTPUT);   pinMode(ledPin, OUTPUT);   pinMode(buzzerPin, OUTPUT);   // Set revolution() as ISR of interrupt 0   attachInterrupt(0, revolution, CHANGE);   // Init serial communication   Serial.begin(9600);   // Initialize the servo   myServo.attach(servoPin);   //Set the dial   myServo.write(dialAngle);   // Initialize the counter for sample time   lastCheckTime = millis(); }   // Sketch execution loop void loop(){    // If we have received serial data   if (Serial.available()) {     // read the next char      dataReceived = Serial.read();      // Act depending on it      switch (dataReceived){        // Increment speed        case '+':          if (speed<250) {            speed += motorIncrement;          }          break;        // Decrement speed        case '-':          if (speed>5) {            speed -= motorIncrement;          }          break;                // Stop motor        case '0':          speed = 0;          break;            // Full throttle           case '*':          speed = 255;          break;        // Reactivate motor after stall        case 'R':          speed = 0;          motorStalled = false;          break;      }     //Only if motor is active set new motor speed     if (motorStalled == false){       // Set the speed motor speed       analogWrite(motorPin, speed);     }   }   // If a sample time has passed   // We have to take another sample   if (millis() - lastCheckTime > sampleTime){     // Store current revolutions     lastSampleRevolutions = revolutions;     // Reset the global variable     // So the ISR can begin to count again     revolutions = 0;     // Calculate revolution per minute     rpm = lastSampleRevolutions * (1000 / sampleTime) * 60;     // Update last sample time     lastCheckTime = millis();     // Set the dial according new reading     dialAngle = map(rpm,minRPM,maxRPM,180,0);     myServo.write(dialAngle);   }   // If the motor is running in the red zone   if (rpm > alarmRPM){     // Turn on LED     digitalWrite(ledPin, HIGH);   }   else{     // Otherwise turn it off     digitalWrite(ledPin, LOW);   }   // If the motor has exceed maximum RPM   if (rpm > stopRPM){     // Stop the motor     speed = 0;     analogWrite(motorPin, speed);     // Disable it until a 'R' command is received     motorStalled = true;     // Make alarm sound     tone(buzzerPin, NOTE_A4, 1000);   }   // Send data back to the computer   Serial.print("RPM: ");   Serial.print(rpm);   Serial.print(" SPEED: ");   Serial.print(speed);   Serial.print(" STALL: ");   Serial.println(motorStalled); } It is the first time in this article that I think I have nothing to explain regarding the code that hasn't been already explained before. I have commented everything so that the code can be easily read and understood. In general lines, the code declares both constants and global variables that will be used and the ISR for the interrupt. In the setup section, all initializations of different subsystems that need to be set up before use are made: pins, interrupts, serials, and servos. The main loop begins by looking for serial commands and basically updates the speed value and the stall flag if command R is received. The final motor speed setting only occurs in case the stall flag is not on, which will occur in case the motor reaches the stopRPM value. Following with the main loop, the code looks if it has passed a sample time, in which case the revolutions are stored to compute real revolutions per minute (rpm), and the global revolutions counter incremented inside the ISR is set to 0 to begin again. The current rpm value is mapped to an angle to be presented by the dial and thus the servo is set accordingly. Next, a pair of controls is made: One to see if the motor is getting into the red zone by exceeding the max alarmRPM value and thus turning the alarm LED on And another to check if the stopRPM value has been reached, in which case the motor will be automatically cut off, the motorStalled flag is set to true, and the acoustic alarm is triggered When the motor has been stalled, it won't accept changes in its speed until it has been reset by issuing an R command via serial communication. In the last action, the code sends back some info to the Serial Monitor as another way of feedback with the operator at the computer and this should look something like the following screenshot: Serial Monitor showing the tachograph in action Modular development It has been quite a complex project in that it incorporates up to six different subsystems: optocoupler, motor, LED, buzzer, servo, and serial, but it has also helped us to understand that projects need to be developed by using a modular approach. We have worked and tested every one of these subsystems before, and that is the way it should usually be done. By developing your projects in such a submodular way, it will be easy to assemble and program the whole of the system. As you may see in the following screenshot, only by using such a modular way of working will you be able to connect and understand such a mess of wires: A working desktop may get a bit messy Summary I'm sure you have got the point regarding interrupts with all the things we have seen in this article. We have met and understood what an interrupt is and how does the CPU attend to it by running an ISR, and we have even learned about their special characteristics and restrictions and that we should keep them as little as possible. On the programming side, the only thing necessary to work with interrupts is to correctly attach the ISR with a call to the attachInterrupt() function. From the point of view of hardware, we have assembled an encoder that has been attached to a spinning motor to account for its revolutions. Finally, we have the code. We have seen a relatively long sketch, which is a sign that we are beginning to master the platform, are able to deal with a bigger number of peripherals, and that our projects require more complex software every time we have to deal with these peripherals and to accomplish all the other necessary tasks to meet what is specified in the project specifications. Resources for Article: Further resources on this subject: The Arduino Mobile Robot? [article] Using the Leap Motion Controller with Arduino [article] Android and Udoo Home Automation [article]
Read more
  • 0
  • 0
  • 28248
article-image-quick-start-guide-flume
Packt
02 Mar 2015
15 min read
Save for later

A Quick Start Guide to Flume

Packt
02 Mar 2015
15 min read
In this article by Steve Hoffman, the author of the book, Apache Flume: Distributed Log Collection for Hadoop - Second Edition, we will learn about the basics that are required to be known before we start working with Apache Flume. This article will help you get started with Flume. So, let's start with the first step: downloading and configuring Flume. (For more resources related to this topic, see here.) Downloading Flume Let's download Flume from http://flume.apache.org/. Look for the download link in the side navigation. You'll see two compressed .tar archives available along with the checksum and GPG signature files used to verify the archives. Instructions to verify the download are on the website, so I won't cover them here. Checking the checksum file contents against the actual checksum verifies that the download was not corrupted. Checking the signature file validates that all the files you are downloading (including the checksum and signature) came from Apache and not some nefarious location. Do you really need to verify your downloads? In general, it is a good idea and it is recommended by Apache that you do so. If you choose not to, I won't tell. The binary distribution archive has bin in the name, and the source archive is marked with src. The source archive contains just the Flume source code. The binary distribution is much larger because it contains not only the Flume source and the compiled Flume components (jars, javadocs, and so on), but also all the dependent Java libraries. The binary package contains the same Maven POM file as the source archive, so you can always recompile the code even if you start with the binary distribution. Go ahead, download and verify the binary distribution to save us some time in getting started. Flume in Hadoop distributions Flume is available with some Hadoop distributions. The distributions supposedly provide bundles of Hadoop's core components and satellite projects (such as Flume) in a way that ensures things such as version compatibility and additional bug fixes are taken into account. These distributions aren't better or worse; they're just different. There are benefits to using a distribution. Someone else has already done the work of pulling together all the version-compatible components. Today, this is less of an issue since the Apache BigTop project started (http://bigtop.apache.org/). Nevertheless, having prebuilt standard OS packages, such as RPMs and DEBs, ease installation as well as provide startup/shutdown scripts. Each distribution has different levels of free and paid options, including paid professional services if you really get into a situation you just can't handle. There are downsides, of course. The version of Flume bundled in a distribution will often lag quite a bit behind the Apache releases. If there is a new or bleeding-edge feature you are interested in using, you'll either be waiting for your distribution's provider to backport it for you, or you'll be stuck patching it yourself. Furthermore, while the distribution providers do a fair amount of testing, such as any general-purpose platform, you will most likely encounter something that their testing didn't cover, in which case, you are still on the hook to come up with a workaround or dive into the code, fix it, and hopefully, submit that patch back to the open source community (where, at a future point, it'll make it into an update of your distribution or the next version). So, things move slower in a Hadoop distribution world. You can see that as good or bad. Usually, large companies don't like the instability of bleeding-edge technology or making changes often, as change can be the most common cause of unplanned outages. You'd be hard pressed to find such a company using the bleeding-edge Linux kernel rather than something like Red Hat Enterprise Linux (RHEL), CentOS, Ubuntu LTS, or any of the other distributions whose target is stability and compatibility. If you are a startup building the next Internet fad, you might need that bleeding-edge feature to get a leg up on the established competition. If you are considering a distribution, do the research and see what you are getting (or not getting) with each. Remember that each of these offerings is hoping that you'll eventually want and/or need their Enterprise offering, which usually doesn't come cheap. Do your homework. Here's a short, nondefinitive list of some of the more established players. For more information, refer to the following links: Cloudera: http://cloudera.com/ Hortonworks: http://hortonworks.com/ MapR: http://mapr.com/ An overview of the Flume configuration file Now that we've downloaded Flume, let's spend some time going over how to configure an agent. A Flume agent's default configuration provider uses a simple Java property file of key/value pairs that you pass as an argument to the agent upon startup. As you can configure more than one agent in a single file, you will need to additionally pass an agent identifier (called a name) so that it knows which configurations to use. In my examples where I'm only specifying one agent, I'm going to use the name agent. By default, the configuration property file is monitored for changes every 30 seconds. If a change is detected, Flume will attempt to reconfigure itself. In practice, many of the configuration settings cannot be changed after the agent has started. Save yourself some trouble and pass the undocumented --no-reload-conf argument when starting the agent (except in development situations perhaps). If you use the Cloudera distribution, the passing of this flag is currently not possible. I've opened a ticket to fix that at https://issues.cloudera.org/browse/DISTRO-648. If this is important to you, please vote it up. Each agent is configured, starting with three parameters: agent.sources=<list of sources>agent.channels=<list of channels>agent.sinks=<list of sinks> Each source, channel, and sink also has a unique name within the context of that agent. For example, if I'm going to transport my Apache access logs, I might define a channel named access. The configurations for this channel would all start with the agent.channels.access prefix. Each configuration item has a type property that tells Flume what kind of source, channel, or sink it is. In this case, we are going to use an in-memory channel whose type is memory. The complete configuration for the channel named access in the agent named agent would be: agent.channels.access.type=memory Any arguments to a source, channel, or sink are added as additional properties using the same prefix. The memory channel has a capacity parameter to indicate the maximum number of Flume events it can hold. Let's say we didn't want to use the default value of 100; our configuration would now look like this: agent.channels.access.type=memoryagent.channels.access.capacity=200 Finally, we need to add the access channel name to the agent.channels property so that the agent knows to load it: agent.channels=access Let's look at a complete example using the canonical "Hello, World!" example. Starting up with "Hello, World!" No technical article would be complete without a "Hello, World!" example. Here is the configuration file we'll be using: agent.sources=s1agent.channels=c1agent.sinks=k agent.sources.s1.type=netcatagent.sources.s1.channels=c1agent.sources.s1.bind=0.0.0.0agent.sources.s1.port=1234 agent.channels.c1.type=memory agent.sinks.k1.type=loggeragent.sinks.k1.channel=c1 Here, I've defined one agent (called agent) who has a source named s1, a channel named c1, and a sink named k1. The s1 source's type is netcat, which simply opens a socket listening for events (one line of text per event). It requires two parameters: a bind IP and a port number. In this example, we are using 0.0.0.0 for a bind address (the Java convention to specify listen on any address) and port 12345. The source configuration also has a parameter called channels (plural), which is the name of the channel(s) the source will append events to, in this case, c1. It is plural, because you can configure a source to write to more than one channel; we just aren't doing that in this simple example. The channel named c1 is a memory channel with a default configuration. The sink named k1 is of the logger type. This is a sink that is mostly used for debugging and testing. It will log all events at the INFO level using Log4j, which it receives from the configured channel, in this case, c1. Here, the channel keyword is singular because a sink can only be fed data from one channel. Using this configuration, let's run the agent and connect to it using the Linux netcat utility to send an event. First, explode the .tar archive of the binary distribution we downloaded earlier: $ tar -zxf apache-flume-1.5.2-bin.tar.gz$ cd apache-flume-1.5.2-bin Next, let's briefly look at the help. Run the flume-ng command with the help command: $ ./bin/flume-ng helpUsage: ./bin/flume-ng <command> [options]... commands:help                 display this help textagent                run a Flume agentavro-client           run an avro Flume clientversion               show Flume version info global options:--conf,-c <conf>     use configs in <conf> directory--classpath,-C <cp>   append to the classpath--dryrun,-d          do not actually start Flume, just print the command--plugins-path <dirs> colon-separated list of plugins.d directories. See the                       plugins.d section in the user guide for more details.                       Default: $FLUME_HOME/plugins.d-Dproperty=value     sets a Java system property value-Xproperty=value     sets a Java -X option agent options:--conf-file,-f <file> specify a config file (required)--name,-n <name>     the name of this agent (required)--help,-h             display help text avro-client options:--rpcProps,-P <file>   RPC client properties file with server connection params--host,-H <host>       hostname to which events will be sent--port,-p <port>       port of the avro source--dirname <dir>       directory to stream to avro source--filename,-F <file>   text file to stream to avro source (default: std input)--headerFile,-R <file> File containing event headers as key/value pairs on each new line--help,-h             display help text Either --rpcProps or both --host and --port must be specified. Note that if <conf> directory is specified, then it is always included first in the classpath. As you can see, there are two ways with which you can invoke the command (other than the simple help and version commands). We will be using the agent command. The use of avro-client will be covered later. The agent command has two required parameters: a configuration file to use and the agent name (in case your configuration contains multiple agents). Let's take our sample configuration and open an editor (vi in my case, but use whatever you like): $ vi conf/hw.conf Next, place the contents of the preceding configuration into the editor, save, and exit back to the shell. Now you can start the agent: $ ./bin/flume-ng agent -n agent -c conf -f conf/hw.conf -Dflume.root.logger=INFO,console The -Dflume.root.logger property overrides the root logger in conf/log4j.properties to use the console appender. If we didn't override the root logger, everything would still work, but the output would go to the log/flume.log file instead of being based on the contents of the default configuration file. Of course, you can edit the conf/log4j.properties file and change the flume.root.logger property (or anything else you like). To change just the path or filename, you can set the flume.log.dir and flume.log.file properties in the configuration file or pass additional flags on the command line as follows: $ ./bin/flume-ng agent -n agent -c conf -f conf/hw.conf -Dflume.root.logger=INFO,console -Dflume.log.dir=/tmp -Dflume.log.file=flume-agent.log You might ask why you need to specify the -c parameter, as the -f parameter contains the complete relative path to the configuration. The reason for this is that the Log4j configuration file should be included on the class path. If you left the -c parameter off the command, you'll see this error: Warning: No configuration directory set! Use --conf <dir> to override.log4j:WARN No appenders could be found for logger (org.apache.flume.lifecycle.LifecycleSupervisor).log4j:WARN Please initialize the log4j system properly.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info But you didn't do that so you should see these key log lines: 2014-10-05 15:39:06,109 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration foragents: [agent] This line tells you that your agent starts with the name agent. Usually you'd look for this line only to be sure you started the right configuration when you have multiple configurations defined in your configuration file. 2014-10-05 15:39:06,076 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:133)] Reloadingconfiguration file:conf/hw.conf This is another sanity check to make sure you are loading the correct file, in this case our hw.conf file. 2014-10-05 15:39:06,221 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:138)]Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:s1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@442fbe47 counterGroup:{ name:null counters:{} } }}channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} } Once all the configurations have been parsed, you will see this message, which shows you everything that was configured. You can see s1, c1, and k1, and which Java classes are actually doing the work. As you probably guessed, netcat is a convenience for org.apache.flume.source.NetcatSource. We could have used the class name if we wanted. In fact, if I had my own custom source written, I would use its class name for the source's type parameter. You cannot define your own short names without patching the Flume distribution. 2014-10-05 15:39:06,427 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:164)] CreatedserverSocket:sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:12345] Here, we see that our source is now listening on port 12345 for the input. So, let's send some data to it. Finally, open a second terminal. We'll use the nc command (you can use Telnet or anything else similar) to send the Hello World string and press the Return (Enter) key to mark the end of the event: % nc localhost 12345Hello WorldOK The OK message came from the agent after we pressed the Return key, signifying that it accepted the line of text as a single Flume event. If you look at the agent log, you will see the following: 2014-10-05 15:44:11,215 (SinkRunner-PollingRunner-DefaultSinkProcessor)[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)] Event: { headers:{} body: 48 65 6C 6C 6F 20 57 6F 72 6C 64Hello World } This log message shows you that the Flume event contains no headers (NetcatSource doesn't add any itself). The body is shown in hexadecimal along with a string representation (for us humans to read, in this case, our Hello World message). If I send the following line and then press the Enter key, you'll get an OK message: The quick brown fox jumped over the lazy dog. You'll see this in the agent's log: 2014-10-05 15:44:57,232 (SinkRunner-PollingRunner-DefaultSinkProcessor)[INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:70)]Event: { headers:{} body: 54 68 65 20 71 75 69 63 6B 20 62 72 6F 77 6E 20The quick brown } The event appears to have been truncated. The logger sink, by design, limits the body content to 16 bytes to keep your screen from being filled with more than what you'd need in a debugging context. If you need to see the full contents for debugging, you should use a different sink, perhaps the file_roll sink, which would write to the local filesystem. Summary In this article, we covered how to download the Flume binary distribution. We created a simple configuration file that included one source writing to one channel, feeding one sink. The source listened on a socket for network clients to connect to and to send it event data. These events were written to an in-memory channel and then fed to a Log4j sink to become the output. We then connected to our listening agent using the Linux netcat utility and sent some string events to our Flume agent's source. Finally, we verified that our Log4j-based sink wrote the events out. Resources for Article: Further resources on this subject: About Cassandra [article] Introducing Kafka [article] Transformation [article]
Read more
  • 0
  • 0
  • 7160

article-image-model-view-viewmodel
Packt
02 Mar 2015
24 min read
Save for later

Model-View-ViewModel

Packt
02 Mar 2015
24 min read
In this article, by Einar Ingebrigtsen, author of the book, SignalR Blueprints, we will focus on a different programming model for client development: Model-View-ViewModel (MVVM). It will reiterate what you have already learned about SignalR, but you will also start to see a recurring theme in how you should architect decoupled software that adheres to the SOLID principles. It will also show the benefit of thinking in single page application terms (often referred to as Single Page Application (SPA)), and how SignalR really fits well with this idea. (For more resources related to this topic, see here.) The goal – an imagined dashboard A counterpart to any application is often a part of monitoring its health. Is it running? and are there any failures?. Getting this information in real time when the failure occurs is important and also getting some statistics from it is interesting. From a SignalR perspective, we will still use the hub abstraction to do pretty much what we have been doing, but the goal is to give ideas of how and what we can use SignalR for. Another goal is to dive into the architectural patterns, making it ready for larger applications. MVVM allows better separation and is very applicable for client development in general. A question that you might ask yourself is why KnockoutJS instead of something like AngularJS? It boils down to the personal preference to a certain degree. AngularJS is described as a MVW where W stands for Whatever. I find AngularJS less focused on the same things I focus on and I also find it very verbose to get it up and running. I'm not in any way an expert in AngularJS, but I have used it on a project and I found myself writing a lot to make it work the way I wanted it to in terms of MVVM. However, I don't think it's fair to compare the two. KnockoutJS is very focused in what it's trying to solve, which is just a little piece of the puzzle, while AngularJS is a full client end-to-end framework. On this note, let's just jump straight to it. Decoupling it all MVVM is a pattern for client development that became very popular in the XAML stack, enabled by Microsoft based on Martin Fowlers presentation model. Its principle is that you have a ViewModel that holds the state and exposes behavior that can be utilized from a view. The view observes any changes of the state the ViewModel exposes, making the ViewModel totally unaware that there is a view. The ViewModel is decoupled and can be put in isolation and is perfect for automated testing. As part of the state that the ViewModel typically holds is the model part, which is something it usually gets from the server, and a SignalR hub is the perfect transport to get this. It boils down to recognizing the different concerns that make up the frontend and separating it all. This gives us the following diagram: Back to basics This time we will go back in time, going down what might be considered a more purist path; use the browser elements (HTML, JavaScript, and CSS) and don't rely on any server-side rendering. Clients today are powerful and very capable and offloading the composition of what the user sees onto the client frees up server resources. You can also rely on the infrastructure of the Web for caching with static HTML files not rendered by the server. In fact, you could actually put these resources on a content delivery network, making the files available as close as possible to the end user. This would result in better load times for the user. You might have other reasons to perform server-side rendering and not just plain HTML. Leveraging existing infrastructure or third-party party tools could be those reasons. It boils down to what's right for you. But this particular sample will focus on things that the client can do. Anyways, let's get started. Open Visual Studio and create a new project by navigating to FILE | New | Project. The following dialog box will show up: From the left-hand side menu, select Web and then ASP.NET Web Application. Enter Chapter4 in the Name textbox and select your location. Select the Empty template from the template selector and make sure you deselect the Host in the cloud option. Then, click on OK, as shown in the following screenshot: Setting up the packages First, we want Twitter bootstrap. To get this, follow these steps: Add a NuGet package reference. Right-click on References in Solution Explorer and select Manage NuGet Packages and type Bootstrap in the search dialog box. Select it and then click on Install. We want a slightly different look, so we'll download one of the many bootstrap themes out here. Add a NuGet package reference called metro-bootstrap. As jQuery is still a part of this, let's add a NuGet package reference to it as well. For the MVVM part, we will use something called KnockoutJS; add it through NuGet as well. Add a NuGet package reference, as in the previous steps, but this time, type SignalR in the search dialog box. Find the package called Microsoft ASP.NET SignalR. Making any SignalR hubs available for the client Add a file called Startup.cs file to the root of the project. Add a Configuration method that will expose any SignalR hubs, as follows: public void Configuration(IAppBuilder app) { app.MapSignalR(); } At the top of the Startup.cs file, above the namespace declaration, but right below the using statements, add the following code:  [assembly: OwinStartupAttribute(typeof(Chapter4.Startup))] Knocking it out of the park KnockoutJS is a framework that implements a lot of the principles found in MVVM and makes it easier to apply. We're going to use the following two features of KnockoutJS, and it's therefore important to understand what they are and what significance they have: Observables: In order for a view to be able to know when state change in a ViewModel occurs, KnockoutJS has something called an observable for single objects or values and observable array for arrays. BindingHandlers: In the view, the counterparts that are able to recognize the observables and know how to deal with its content are known as BindingHandlers. We create binding expression in the view that instructs the view to get its content from the properties found in the binding context. The default binding context will be the ViewModel, but there are more advanced scenarios where this changes. In fact, there is a BindingHandler that enables you to specify the context at any given time called with. Our single page Whether one should strive towards having an SPA is widely discussed on the Web these days. My opinion on the subject, in the interest of the user, is that we should really try to push things in this direction. Having not to post back and cause a full reload of the page and all its resources and getting into the correct state gives the user a better experience. Some of the arguments to perform post-backs every now and then go in the direction of fixing potential memory leaks happening in the browser. Although, the technique is sound and the result is right, it really just camouflages a problem one has in the system. However, as with everything, it really depends on the situation. At the core of an SPA is a single page (pun intended), which is usually the index.html file sitting at the root of the project. Add the new index.html file and edit it as follows: Add a new HTML file (index.html) at the root of the project by right- clicking on the Chapter4 project in Solution Explorer. Navigate to Add | New Item | Web from the left-hand side menu, and then select HTML Page and name it index.html. Finally, click on Add. Let's put in the things we've added dependencies to, starting with the style sheets. In the index.html file, you'll find the <head> tag; add the following code snippet under the <title></title> tag: <link href="Content/bootstrap.min.css" rel="stylesheet" /> <link href="Content/metro-bootstrap.min.css" rel="stylesheet" /> Next, add the following code snippet right beneath the preceding code: <script type="text/javascript" src="Scripts/jquery- 1.9.0.min.js"></script> <script type="text/javascript" src="Scripts/jquery.signalR- 2.1.1.js"></script> <script type="text/javascript" src="signalr/hubs"></script> <script type="text/javascript" src="Scripts/knockout- 3.2.0.js"></script> Another thing we will need in this is something that helps us visualize things; Google has a free, open source charting library that we will use. We will take a dependency to the JavaScript APIs from Google. To do this, add the following script tag after the others: <script type="text/javascript" src="https://www.google.com/jsapi"></script> Now, we can start filling in the view part. Inside the <body> tag, we start by putting in a header, as shown here: <div class="navbar navbar-default navbar-static-top bsnavbar">     <div class="container">         <div class="navbar-header">             <h1>My Dashboard</h1>         </div>     </div> </div> The server side of things In this little dashboard thing, we will look at web requests, both successful and failed. We will perform some minor things for us to be able to do this in a very naive way, without having to flesh out a full mechanism to deal with error situations. Let's start by enabling all requests even static resources, such as HTML files, to run through all HTTP modules. A word of warning: there are performance implications of putting all requests through the managed pipeline, so normally, you wouldn't necessarily want to do this on a production system, but for this sample, it will be fine to show the concepts. Open Web.config in the project and add the following code snippet within the <configuration> tag: <system.webServer>   <modules runAllManagedModulesForAllRequests="true" /> </system.webServer> The hub In this sample, we will only have one hub, the one that will be responsible for dealing with reporting requests and failed requests. Let's add a new class called RequestStatisticsHub. Right-click on the project in Solution Explorer, select Class from Add, name it RequestStatisticsHub.cs, and then click on Add. The new class should inherit from the hub. Add the following using statement at the top: using Microsoft.AspNet.SignalR; We're going to keep a track of the count of requests and failed requests per time with a resolution of not more than every 30 seconds in the memory on the server. Obviously, if one wants to scale across multiple servers, this is way too naive and one should choose an out-of-process shared key-value store that goes across servers. However, for our purpose, this will be fine. Let's add a using statement at the top, as shown here: using System.Collections.Generic; At the top of the class, add the two dictionaries that we will use to hold this information: static Dictionary<string, int> _requestsLog = new Dictionary<string, int>(); static Dictionary<string, int> _failedRequestsLog = new Dictionary<string, int>(); In our client, we want to access these logs at startup. So let's add two methods to do so: public Dictionary<string, int> GetRequests() {     return _requestsLog; }   public Dictionary<string, int> GetFailedRequests() {     return _failedRequestsLog; } Remember the resolution of only keeping track of number of requests per 30 seconds at a time. There is no default mechanism in the .NET Framework to do this so we need to add a few helper methods to deal with rounding of time. Let's add a class called DateTimeRounding at the root of the project. Mark the class as a public static class and put the following extension methods in the class: public static DateTime RoundUp(this DateTime dt, TimeSpan d) {     var delta = (d.Ticks - (dt.Ticks % d.Ticks)) % d.Ticks;     return new DateTime(dt.Ticks + delta); }   public static DateTime RoundDown(this DateTime dt, TimeSpan d) {     var delta = dt.Ticks % d.Ticks;     return new DateTime(dt.Ticks - delta); }   public static DateTime RoundToNearest(this DateTime dt, TimeSpan d) {     var delta = dt.Ticks % d.Ticks;     bool roundUp = delta > d.Ticks / 2;       return roundUp ? dt.RoundUp(d) : dt.RoundDown(d); } Let's go back to the RequestStatisticsHub class and add some more functionality now so that we can deal with rounding of time: static void Register(Dictionary<string, int> log, Action<dynamic, string, int> hubCallback) {     var now = DateTime.Now.RoundToNearest(TimeSpan.FromSeconds(30));     var key = now.ToString("HH:mm");       if (log.ContainsKey(key))         log[key] = log[key] + 1;     else         log[key] = 1;       var hub = GlobalHost.ConnectionManager.GetHubContext<RequestStatisticsHub>() ;     hubCallback(hub.Clients.All, key, log[key]); }   public static void Request() {     Register(_requestsLog, (hub, key, value) => hub.requestCountChanged(key, value)); }   public static void FailedRequest() {     Register(_requestsLog, (hub, key, value) => hub.failedRequestCountChanged(key, value)); } This enables us to have a place to call in order to report requests and these get published back to any clients connected to this particular hub. Note the usage of GlobalHost and its ConnectionManager property. When we want to get a hub instance and when we are not in the hub context of a method being called from a client, we use ConnectionManager to get it. It gives is a proxy for the hub and enables us to call methods on any connected client. Naively dealing with requests With all this in place, we will be able to easily and naively deal with what we consider correct and failed requests. Let's add a Global.asax file by right-clicking on the project in Solution Explorer and select the New item from the Add. Navigate to Web and find Global Application Class, then click on Add. In the new file, we want to replace the BindingHandlers method with the following code snippet: protected void Application_AuthenticateRequest(object sender, EventArgs e) {     var path = HttpContext.Current.Request.Path;     if (path == "/") path = "index.html";       if (path.ToLowerInvariant().IndexOf(".html") < 0) return;       var physicalPath = HttpContext.Current.Request.MapPath(path);     if (File.Exists(physicalPath))     {         RequestStatisticsHub.Request();     }     else     {         RequestStatisticsHub.FailedRequest();     } } Basically, with this, we are only measuring requests with .html in its path, and if it's only "/", we assume it's "index.html". Any file that does not exist, accordingly, is considered an error; typically a 404 error and we register it as a failed request. Bringing it all back to the client With the server taken care of, we can start consuming all this in the client. We will now be heading down the path of creating a ViewModel and hook everything up. ViewModel Let's start by adding a JavaScript file sitting next to our index.html file at the root level of the project, call it index.js. This file will represent our ViewModel. Also, this scenario will be responsible to set up KnockoutJS, so that the ViewModel is in fact activated and applied to the page. As we only have this one page for this sample, this will be fine. Let's start by hooking up the jQuery document that is ready: $(function() { }); Inside the function created here, we will enter our viewModel definition, which will start off being an empty one: var viewModel = function() { }; KnockoutJS has a function to apply a viewModel to the document, meaning that the document or body will be associated with the viewModel instance given. Right under the definition of viewModel, add the following line: ko.applyBindings(new viewModel()); Compiling this and running it should at the very least not give you any errors but nothing more than a header saying My Dashboard. So, we need to lighten this up a bit. Inside the viewModel function definition, add the following code snippet: var self = this; this.requests = ko.observableArray(); this.failedRequests = ko.observableArray(); We enter a reference to this as a variant called self. This will help us with scoping issues later on. The arrays we added are now KnockoutJS's observable arrays that allows the view or any BindingHandler to observe the changes that are coming in. The ko.observableArray() and ko.observable() arrays both return a new function. So, if you want to access any values in it, you must unwrap it by calling it something that might seem counterintuitive at first. You might consider your variable as just another property. However, for the observableArray(), KnockoutJS adds most of the functions found in the array type in JavaScript and they can be used directly on the function without unwrapping. If you look at a variable that is an observableArray in the console of the browser, you'll see that it looks as if it actually is just any array. This is not really true though; to get to the values, you will have to unwrap it by adding () after accessing the variable. However, all the functions you're used to having on an array are here. Let's add a function that will know how to handle an entry into the viewModel function. An entry coming in is either an existing one or a new one; the key of the entry is the giveaway to decide: function handleEntry(log, key, value) {     var result = log().forEach(function (entry) {         if (entry[0] == key) {             entry[1](value);             return true;         }     });       if (result !== true) {         log.push([key, ko.observable(value)]);     } }; Let's set up the hub and add the following code to the viewModel function: var hub = $.connection.requestStatisticsHub; var initializedCount = 0;   hub.client.requestCountChanged = function (key, value) {     if (initializedCount < 2) return;     handleEntry(self.requests, key, value); }   hub.client.failedRequestCountChanged = function (key, value) {     if (initializedCount < 2) return;     handleEntry(self.failedRequests, key, value); } You might notice the initalizedCount variable. Its purpose is not to deal with requests until completely initialized, which comes next. Add the following code snippet to the viewModel function: $.connection.hub.start().done(function () {     hub.server.getRequests().done(function (requests) {         for (var property in requests) {             handleEntry(self.requests, property, requests[property]);         }           initializedCount++;     });     hub.server.getFailedRequests().done(function (requests) {         for (var property in requests) {             handleEntry(self.failedRequests, property, requests[property]);         }           initializedCount++;     }); }); We should now have enough logic in our viewModel function to actually be able to get any requests already sitting there and also respond to new ones coming. BindingHandler The key element of KnockoutJS is its BindingHandler mechanism. In KnockoutJS, everything starts with a data-bind="" attribute on an element in the HTML view. Inside the attribute, one puts binding expressions and the BindingHandlers are a key to this. Every expression starts with the name of the handler. For instance, if you have an <input> tag and you want to get the value from the input into a property on the ViewModel, you would use the BindingHandler value. There are a few BindingHandlers out of the box to deal with the common scenarios (text, value for each, and more). All of the BindingHandlers are very well documented on the KnockoutJS site. For this sample, we will actually create our own BindingHandler. KnockoutJS is highly extensible and allows you to do just this amongst other extensibility points. Let's add a JavaScript file called googleCharts.js at the root of the project. Inside it, add the following code: google.load('visualization', '1.0', { 'packages': ['corechart'] }); This will tell the Google API to enable the charting package. The next thing we want to do is to define the BindingHandler. Any handler has the option of setting up an init function and an update function. The init function should only occur once, when it's first initialized. Actually, it's when the binding context is set. If the parent binding context of the element changes, it will be called again. The update function will be called whenever there is a change in an observable or more observables that the binding expression is referring to. For our sample, we will use the init function only and actually respond to changes manually because we have a more involved scenario than what the default mechanism would provide us with. The update function that you can add to a BindingHandler has the exact same signature as the init function; hence, it is called an update. Let's add the following code underneath the load call: ko.bindingHandlers.lineChart = {     init: function (element, valueAccessor, allValueAccessors, viewModel, bindingContext) {     } }; This is the core structure of a BindingHandler. As you can see, we've named the BindingHandler as lineChart. This is the name we will use in our view later on. The signature of init and update are the same. The first parameter represents the element that holds the binding expression, whereas the second valueAccessor parameter holds a function that enables us to access the value, which is a result of the expression. KnockoutJS deals with the expression internally and parses any expression and figures out how to expand any values, and so on. Add the following code into the init function: optionsInput = valueAccessor();   var options = {     title: optionsInput.title,     width: optionsInput.width || 300,     height: optionsInput.height || 300,     backgroundColor: 'transparent',     animation: {         duration: 1000,         easing: 'out'     } };   var dataHash = {};   var chart = new google.visualization.LineChart(element); var data = new google.visualization.DataTable(); data.addColumn('string', 'x'); data.addColumn('number', 'y');   function addRow(row, rowIndex) {     var value = row[1];     if (ko.isObservable(value)) {         value.subscribe(function (newValue) {             data.setValue(rowIndex, 1, newValue);             chart.draw(data, options);         });     }       var actualValue = ko.unwrap(value);     data.addRow([row[0], actualValue]);       dataHash[row[0]] = actualValue; };   optionsInput.data().forEach(addRow);   optionsInput.data.subscribe(function (newValue) {     newValue.forEach(function(row, rowIndex) {         if( !dataHash.hasOwnProperty(row[0])) {             addRow(row,rowIndex);         }     });       chart.draw(data, options); });         chart.draw(data, options); As you can see, observables has a function called subscribe(), which is the same for both an observable array and a regular observable. The code adds a subscription to the array itself; if there is any change to the array, we will find the change and add any new row to the chart. In addition, when we create a new row, we subscribe to any change in its value so that we can update the chart. In the ViewModel, the values were converted into observable values to accommodate this. View Go back to the index.html file; we need the UI for the two charts we're going to have. Plus, we need to get both the new BindingHandler loaded and also the ViewModel. Add the following script references after the last script reference already present, as shown here: <script type="text/javascript" src="googleCharts.js"></script> <script type="text/javascript" src="index.js"></script> Inside the <body> tag below the header, we want to add a bootstrap container and a row to hold two metro styled tiles and utilize our new BindingHandler. Also, we want a footer sitting at the bottom, as shown in the following code: <div class="container">     <div class="row">         <div class="col-sm-6 col-md-4">             <div class="thumbnail tile tile-green-sea tile-large">                 <div data-bind="lineChart: { title: 'Web Requests', width: 300, height: 300, data: requests }"></div>             </div>         </div>           <div class="col-sm-6 col-md-4">             <div class="thumbnail tile tile-pomegranate tile- large">                 <div data-bind="lineChart: { title: 'Failed Web Requests', width: 300, height: 300, data: failedRequests }"></div>             </div>         </div>     </div>       <hr />     <footer class="bs-footer" role="contentinfo">         <div class="container">             The Dashboard         </div>     </footer> </div> Note the data: requests and data: failedRequests are a part of the binding expressions. These will be handled and resolved by KnockoutJS internally and pointed to the observable arrays on the ViewModel. The other properties are options that go into the BindingHandler and something it forwards to the Google Charting APIs. Trying it all out Running the preceding code (Ctrl + F5) should yield the following result: If you open a second browser and go to the same URL, you will see the change in the chart in real time. Waiting approximately for 30 seconds and refreshing the browser should add a second point automatically and also animate the chart accordingly. Typing a URL with a file that does exist should have the same effect on the failed requests chart. Summary In this article, we had a brief encounter with MVVM as a pattern with the sole purpose of establishing good practices for your client code. We added this to a single page application setting, sprinkling on top the SignalR to communicate from the server to any connected client. Resources for Article: Further resources on this subject: Using R for Statistics Research and Graphics? [article] Aspects Data Manipulation in R [article] Learning Data Analytics R and Hadoop [article]
Read more
  • 0
  • 0
  • 1928
Modal Close icon
Modal Close icon