Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7009 Articles
article-image-hyper-v-architecture-and-components
Packt
04 Jan 2017
15 min read
Save for later

Hyper-V Architecture and Components

Packt
04 Jan 2017
15 min read
In this article by Charbel Nemnom and Patrick Lownds, the author of the book Windows Server 2016 Hyper-V Cookbook, Second Edition, we will see Hyper-V architecture along with the most important components in Hyper-V and also differences between Windows Server 2016 Hyper-V, Nano Server, Hyper-V Server, Hyper-V Client, and VMware. Virtualization is not a new feature or technology that everyone decided to have in their environment overnight. Actually, it's quite old. There are a couple of computers in the mid-60s that were using virtualization already, such as the IBM M44/44X, where you could run multiple VMs using hardware and software abstraction. It is known as the first virtualization system and the creation of the term virtual machine. Although Hyper-V is in its fifth version, Microsoft virtualization technology is very mature. Everything started in 1988 with a company named Connectix. It had innovative products such as Connectix Virtual PC and Virtual Server, an x86 software emulation for Mac, Windows, and OS/2. In 2003, Microsoft acquired Connectix and a year later released Microsoft Virtual PC and Microsoft Virtual Server 2005. After lots of improvements in the architecture during the project Viridian, Microsoft released Hyper-V in 2008, the second version in 2009 (Windows Server 2008 R2), the third version in 2012 (Windows Server 2012), a year later in 2013 the fourth version was released (Windows Server 2012 R2), the current and fifth version in 2016 (Windows Server 2016). In the past years, Microsoft has proven that Hyper-V is a strong and competitive solution for server virtualization and provides scalability, flexible infrastructure, high availability, and resiliency. To better understand the different virtualization models, and how the VMs are created and managed by Hyper-V, it is very important to know its core, architecture, and components. By doing so, you will understand how it works, you can compare with other solutions, and troubleshoot problems easily. Microsoft has long told customers that Azure datacenters are powered by Microsoft Hyper-V, and the forthcoming Azure Stack will actually allow us to run Azure in our own datacenters on top of Windows Server 2016 Hyper-V as well. For more information about Azure Stack, please refer to the following link: https://azure.microsoft.com/en-us/overview/azure-stack/ Microsoft Hyper-V proves over the years that it's a very scalable platform to virtualize any and every workload without exception. This appendix includes well-explained topics with the most important Hyper-V architecture components compared with other versions. (For more resources related to this topic, see here.) Understanding Hypervisors The Virtual Machine Manager (VMM), also known as Hypervisor, is the software application responsible for running multiple VMs in a single system. It is also responsible for creation, preservation, division, system access, and VM management running on the Hypervisor layer. These are the types of Hypervisors: VMM Type 2 VMM Hybrid VMM Type 1 VMM Type 2 This type runs Hypervisor on top of an OS, as shown in the following diagram, we have the hardware at the bottom, the OS and then the Hypervisor running on top. Microsoft Virtual PC and VMware Workstation is an example of software that uses VMM Type 2. VMs pass hardware requests to the Hypervisor, to the host OS, and finally reaching the hardware. That leads to performance and management limitation imposed by the host OS. Type 2 is common for test environments—VMs with hardware restrictions—to run on software applications that are installed in the host OS. VMM Hybrid When using the VMM Hybrid type, the Hypervisor runs on the same level as the OS, as shown in the following diagram. As both Hypervisor and the OS are sharing the same access to the hardware with the same priority, it is not as fast and safe as it could be. This is the type used by the Hyper-V predecessor named Microsoft Virtual Server 2005: VMM Type 1 VMM Type 1 is a type that has the Hypervisor running in a tiny software layer between the hardware and the partitions, managing and orchestrating the hardware access. The host OS, known as Parent Partition, run on the same level as the Child Partition, known as VMs, as shown in the next diagram. Due to the privileged access that the Hypervisor has on the hardware, it provides more security, performance, and control over the partitions. This is the type used by Hyper-V since its first release: Hyper-V architecture Knowing how Hyper-V works and how its architecture is constructed will make it easier to understand its concepts and operations. The following sections will explore the most important components in Hyper-V. Windows before Hyper-V Before we dive into the Hyper-V architecture details, it will be easy to understand what happens after Hyper-V is installed, by looking at Windows without Hyper-V, as shown in the following diagram: In a normal Windows installation, the instructions access is divided by four privileged levels in the processor called Rings. The most privileged level is Ring 0, with direct access to the hardware and where the Windows Kernel sits. Ring 3 is responsible for hosting the user level, where most common applications run and with the least privileged access. Windows after Hyper-V When Hyper-V is installed, it needs a higher privilege than Ring 0. Also, it must have dedicated access to the hardware. This is possible due to the capabilities of the new processor created by Intel and AMD, called Intel-VT and AMD-V respectively, that allows the creation of a fifth ring called Ring -1. Hyper-V uses this ring to add its Hypervisor, having a higher privilege and running under Ring 0, controlling all the access to the physical components, as shown in the following diagram: The OS architecture suffers several changes after Hyper-V installation. Right after the first boot, the Operating System Boot Loader file (winload.exe) checks the processor that is being used and loads the Hypervisor image on Ring -1 (using the files Hvix64.exe for Intel processors and Hvax64.exe for AMD processors). Then, Windows Server is initiated running on top of the Hypervisor and every VM that runs beside it. After Hyper-V installation, Windows Server has the same privilege level as a VM and is responsible for managing VMs using several components. Differences between Windows Server 2016 Hyper-V, Nano Server, Hyper-V Server, Hyper-V Client, and VMware There are four different versions of Hyper-V—the role that is installed on Windows Server 2016 (Core or Full Server), the role that can be installed on a Nano Server, its free version called Hyper-V Server and the Hyper-V that comes in Windows 10 called Hyper-V Client. The following sections will explain the differences between all the versions and a comparison between Hyper-V and its competitor, VMware. Windows Server 2016 Hyper-V Hyper-V is one of the most fascinating and improved role on Windows Server 2016. Its fifth version goes beyond virtualization and helps us deliver the correct infrastructure to host your cloud environment. Hyper-V can be installed as a role in both Windows Server Standard and Datacenter editions. The only difference in Windows Server 2012 and 2012 R2 in the Standard edition, two free Windows Server OSes are licensed whereas there are unlimited licenses in the Datacenter edition. However, in Windows Server 2016 there are significant changes between the two editions. The following table will show the difference between Windows Server 2016 Standard and Datacenter editions: Resource Windows Server 2016 Datacenter edition Windows Server 2016 Standard edition Core functionality of Windows Server Yes Yes OSes/Hyper-V Containers Unlimited 2 Windows Server Containers Unlimited Unlimited Nano Server Yes Yes Storage features for software-defined datacenter including Storage Spaces Direct and Storage Replica Yes N/A Shielded VMs Yes N/A Networking stack for software-defined datacenter Yes N/A Licensing Model Core + CAL Core + CAL As you can see in preceding table, the Datacenter edition is designed for highly virtualized private and hybrid cloud environments and Standard edition is for low density or non-virtualized (physical) environments. In Windows Server 2016, Microsoft is also changing the licensing model from a per-processor to per-core licensing for Standard and Datacenter editions. The following points will guide you in order to license Windows Server 2016 Standard and Datacenter edition: All physical cores in the server must be licensed. In other words, servers are licensed based on the number of processor cores in the physical server. You need a minimum of 16 core licenses for each server. You need a minimum of 8 core licenses for each physical processor. The core licenses will be sold in packs of two. Eight 2-core packs will be the minimum required to license each physical server. The 2-core pack for each edition is one-eighth the price of a 2-processor license for corresponding Windows Server 2012 R2 editions. The Standard edition provides rights for up to two OSEs or Hyper-V containers when all physical cores in the server are licensed. For every two additional VMs, all the cores in the server have to be licensed again. The price of 16-core licenses of Windows Server 2016 Datacenter and Standard edition will be the same price as the 2-processor license of the corresponding editions of the Windows Server 2012 R2 version. Existing customers' servers under Software Assurance agreement will receive core grants as required, with documentation. The following table illustrates the new licensing model based on number of 2-core pack licenses: Legend: Gray cells represent licensing costs White cells represent additional licensing is required Windows Server 2016 Standard edition may need additional licensing. Nano Server Nano Server is a new headless, 64-bit only installation option that installs "just enough OS" resulting in a dramatically smaller footprint that results in more uptime and a smaller attack surface. Users can choose to add server roles as needed, including Hyper-V, Scale out File Server, DNS Server and IIS server roles. User can also choose to install features, including Container support, Defender, Clustering, Desired State Configuration (DSC), and Shielded VM support. Nano Server is available in Windows Server 2016 for: Physical Machines Virtual Machines Hyper-V Containers Windows Server Containers Supports the following inbox optional roles and features: Hyper-V, including container and shielded VM support Datacenter Bridging Defender DNS Server Desired State Configuration Clustering IIS Network Performance Diagnostics Service (NPDS) System Center Virtual Machine Manager and System Center Operations Manager Secure Startup Scale out File Server, including Storage Replica, MPIO, iSCSI initiator, Data Deduplication The Windows Server 2016 Hyper-V role can be installed on a Nano Server; this is a key Nano Server role, shrinking the OS footprint and minimizing reboots required when Hyper-V is used to run virtualization hosts. Nano server can be clustered, including Hyper-V failover clusters. Hyper-V works the same on Nano Server including all features does in Windows Server 2016, aside from a few caveats: All management must be performed remotely, using another Windows Server 2016 computer. Remote management consoles such as Hyper-V Manager, Failover Cluster Manager, PowerShell remoting, and management tools like System Center Virtual Machine Manager as well as the new Azure web-based Server Management Tool (SMT) can all be used to manage a Nano Server environment. RemoteFX is not available. Microsoft Hyper-V Server 2016 Hyper-V Server 2016, the free virtualization solution from Microsoft has all the features included on Windows Server 2016 Hyper-V. The only difference is that Microsoft Hyper-V Server does not include VM licenses and a graphical interface. The management can be done remotely using PowerShell, Hyper-V Manager from another Windows Server 2016 or Windows 10. All the other Hyper-V features and limits in Windows Server 2016, including Failover Cluster, Shared Nothing Live Migration, RemoteFX, Discrete Device Assignment and Hyper-V Replica are included in the Hyper-V free version. Hyper-V Client In Windows 8, Microsoft introduced the first Hyper-V Client version. Its third version now with Windows 10. Users can have the same experience from Windows Server 2016 Hyper-V on their desktops or tablet, making their test and development virtualized scenarios much easier. Hyper-V Client in Windows 10 goes beyond only virtualization and helps Windows developers to use containers by bringing Hyper-V Containers natively into Windows 10. This will further empower developers to build amazing cloud applications benefiting from native container capabilities right in Windows. Since Hyper-V Containers utilize their own instance of the Windows kernel, the container is truly a server container all the way down to the kernel. Plus, with the flexibility of Windows container runtimes (Windows Server Containers or Hyper-V Containers), containers built on Windows 10 can be run on Windows Server 2016 as either Windows Server Containers or Hyper-V Containers. Because Windows 10 only supports Hyper-V containers, the Hyper-V feature must also be enabled. Hyper-V Client is present only in the Windows 10 Pro or Enterprise version and requires the same CPU feature as in Windows Server 2016 called Second Level Address Translation (SLAT). Although Hyper-V client is very similar to the server version, there are some components that are only present on Windows Server 2016 Hyper-V. Here is a list of components you will find only on the server version: Hyper-V Replica Remote FX capability to virtualize GPUs Discrete Device Assignment (DDA) Live Migration and Shared Nothing Live Migration ReFS Accelerated VHDX Operations SR-IOV Networks Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET) Virtual Fibre Channel Network Virtualization Failover Clustering Shielded VMs VM Monitoring Even with these limitations, Hyper-V Client has very interesting features such as Storage Migration, VHDX, VMs running on SMB 3.1 File Shares, PowerShell integration, Hyper-V Manager, Hyper-V Extensible Switch, Quality of Services, Production Checkpoints, the same VM hardware limits as Windows Server 2016 Hyper-V, Dynamic Memory, Runtime Memory Resize, Nested Virtualization, DHCP Guard, Port Mirroring, NIC Device Naming and much more. In Windows 8, Microsoft introduced the first Hyper-V Client version. Its third version now with Windows 10. Users can have the same experience from Windows Server 2016 Hyper-V on their desktops or tablet, making their test and development virtualized scenarios much easier. Hyper-V Client in Windows 10 goes beyond only virtualization and helps Windows developers to use containers by bringing Hyper-V Containers natively into Windows 10. This will further empower developers to build amazing cloud applications benefiting from native container capabilities right in Windows. Since Hyper-V Containers utilize their own instance of the Windows kernel, the container is truly a server container all the way down to the kernel. Plus, with the flexibility of Windows container runtimes (Windows Server Containers or Hyper-V Containers), containers built on Windows 10 can be run on Windows Server 2016 as either Windows Server Containers or Hyper-V Containers. Because Windows 10 only supports Hyper-V containers, the Hyper-V feature must also be enabled. Hyper-V Client is present only in the Windows 10 Pro or Enterprise version and requires the same CPU feature as in Windows Server 2016 called Second Level Address Translation (SLAT). Although Hyper-V client is very similar to the server version, there are some components that are only present on Windows Server 2016 Hyper-V. Here is a list of components you will find only on the server version: Hyper-V Replica Remote FX capability to virtualize GPUs Discrete Device Assignment (DDA) Live Migration and Shared Nothing Live Migration ReFS Accelerated VHDX Operations SR-IOV Networks Remote Direct Memory Access (RDMA) and Switch Embedded Teaming (SET) Virtual Fibre Channel Network Virtualization Failover Clustering Shielded VMs VM Monitoring Even with these limitations, Hyper-V Client has very interesting features such as Storage Migration, VHDX, VMs running on SMB 3.1 File Shares, PowerShell integration, Hyper-V Manager, Hyper-V Extensible Switch, Quality of Services, Production Checkpoints, the same VM hardware limits as Windows Server 2016 Hyper-V, Dynamic Memory, Runtime Memory Resize, Nested Virtualization, DHCP Guard, Port Mirroring, NIC Device Naming and much more. Windows Server 2016 Hyper-V X VMware vSphere 6.0 VMware is the existing competitor of Hyper-V and the current version 6.0 offers the VMware vSphere as a free and a standalone Hypervisor, vSphere Standard, Enterprise, and Enterprise Plus. The following list compares all the features existing in the free version of Hyper-V with VMware Sphere and Enterprise Plus: Feature Windows Server 2012 R2 Windows Server 2016 VMware vSphere 6.0 VMware vSphere 6.0 Enterprise Plus Logical Processors 320 512 480 480 Physical Memory 4TB 24TB 6TB 6TB/12TB Virtual CPU per Host 2,048 2,048 4,096 4,096 Virtual CPU per VM 64 240 8 128 Memory per VM 1TB 12TB 4TB 4TB Active VMs per Host 1,024 1,024 1,024 1,024 Guest NUMA Yes Yes Yes Yes Maximum Nodes 64 64 N/A 64 Maximum VMs per Cluster 8,000 8,000 N/A 8,000 VM Live Migration Yes Yes No Yes VM Live Migration with Compression Yes Yes N/A No VM Live Migration using RDMA Yes Yes N/A No 1GB Simultaneous Live Migrations Unlimited Unlimited N/A 4 10GB Simultaneous Live Migrations Unlimited Unlimited N/A 8 Live Storage Migration Yes Yes No Yes Shared Nothing Live Migration Yes Yes No Yes Cluster Rolling Upgrades Yes Yes N/A Yes VM Replica Hot/Add virtual Disk Yes Yes Yes Yes Native 4-KB Disk Support Yes Yes No No Maximum Virtual Disk Size 64TB 64TB 2TB 62TB Maximum Pass Through Disk Size 256TB or more 256TB or more 64TB 64TB Extensible Network Switch Yes Yes No Third party vendors   Network Virtualization Yes Yes No Requires vCloud networking and security IPsec Task Offload Yes Yes No No SR-IOV Yes Yes N/A Yes Virtual NICs per VM 12 12 10 10 VM NIC Device Naming No Yes N/A No Guest OS Application Monitoring Yes Yes No No Guest Clustering with Live Migration Yes Yes N/A No Guest Clustering with Dynamic Memory Yes Yes N/A No Shielded VMs No Yes N/A No Summary In this article, we have covered Hyper-V architecture along with the most important components in Hyper-V and also differences between Windows Server 2016 Hyper-V, Nano Server, Hyper-V Server, Hyper-V Client, and VMware. Resources for Article: Further resources on this subject: Storage Practices and Migration to Hyper-V 2016 [article] Proxmox VE Fundamentals [article] Designing and Building a vRealize Automation 6.2 Infrastructure [article]
Read more
  • 0
  • 0
  • 33224

article-image-task-parallel-library-multi-threading-net-core
Aaron Lazar
16 Aug 2018
11 min read
Save for later

Task parallel library for easy multi-threading in .NET Core [Tutorial]

Aaron Lazar
16 Aug 2018
11 min read
Compared to the classic threading model in .NET, Task Parallel Library minimizes the complexity of using threads and provides an abstraction through a set of APIs that help developers focus more on the application program instead of focusing on how the threads will be provisioned. In this article, we'll learn how TPL benefits of using traditional threading techniques for concurrency and high performance. There are several benefits of using TPL over threads: It autoscales the concurrency to a multicore level It autoscales LINQ queries to a multicore level It handles the partitioning of the work and uses ThreadPool where required It is easy to use and reduces the complexity of working with threads directly This tutorial is an extract from the book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Creating a task using TPL TPL APIs are available in the System.Threading and System.Threading.Tasks namespaces. They work around the task, which is a program or a block of code that runs asynchronously. An asynchronous task can be run by calling either the Task.Run or TaskFactory.StartNew methods. When we create a task, we provide a named delegate, anonymous method, or a lambda expression that the task executes. Here is a code snippet that uses a lambda expression to execute the ExecuteLongRunningTasksmethod using Task.Run: class Program { static void Main(string[] args) { Task t = Task.Run(()=>ExecuteLongRunningTask(5000)); t.Wait(); } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Hello World"); } } In the preceding code snippet, we have executed the ExecuteLongRunningTask method asynchronously using the Task.Run method. The Task.Run method returns the Task object that can be used to further wait for the asynchronous piece of code to be executed completely before the program ends. To wait for the task, we have used the Wait method. Alternatively, we can also use the Task.Factory.StartNew method, which is more advanced and provides more options. While calling the Task.Factory.StartNew method, we can specify CancellationToken, TaskCreationOptions, and TaskScheduler to set the state, specify other options, and schedule tasks. TPL uses multiple cores of the CPU out of the box. When the task is executed using the TPL API, it automatically splits the task into one or more threads and utilizes multiple processors, if they are available. The decision as to how many threads will be created is calculated at runtime by CLR. Whereas a thread only has an affinity to a single processor, running any task on multiple processors needs a proper manual implementation. Task-based asynchronous pattern (TAP) When developing any software, it is always good to implement the best practices while designing its architecture. The task-based asynchronous pattern is one of the recommended patterns that can be used when working with TPL. There are, however, a few things to bear in mind while implementing TAP. Naming convention The method executing asynchronously should have the naming suffix Async. For example, if the method name starts with ExecuteLongRunningOperation, it should have the suffix Async, with the resulting name of ExecuteLongRunningOperationAsync. Return type The method signature should return either a System.Threading.Tasks.Task or System.Threading.Tasks.Task<TResult>. The task's return type is equivalent to the method that returns void, whereas TResult is the data type. Parameters The out and ref parameters are not allowed as parameters in the method signature. If multiple values need to be returned, tuples or a custom data structure can be used. The method should always return Task or Task<TResult>, as discussed previously. Here are a few signatures for both synchronous and asynchronous methods: Synchronous methodAsynchronous methodVoid Execute();Task ExecuteAsync();List<string> GetCountries();Task<List<string>> GetCountriesAsync();Tuple<int, string> GetState(int stateID);Task<Tuple<int, string>> GetStateAsync(int stateID);Person GetPerson(int personID);Task<Person> GetPersonAsync(int personID); Exceptions The asynchronous method should always throw exceptions that are assigned to the returning task. However, the usage errors, such as passing null parameters to the asynchronous method, should be properly handled. Let's suppose we want to generate several documents dynamically based on a predefined templates list, where each template populates the placeholders with dynamic values and writes it on the filesystem. We assume that this operation will take a sufficient amount of time to generate a document for each template. Here is a code snippet showing how the exceptions can be handled: static void Main(string[] args) { List<Template> templates = GetTemplates(); IEnumerable<Task> asyncDocs = from template in templates select GenerateDocumentAsync(template); try { Task.WaitAll(asyncDocs.ToArray()); }catch(Exception ex) { Console.WriteLine(ex); } Console.Read(); } private static async Task<int> GenerateDocumentAsync(Template template) { //To automate long running operation Thread.Sleep(3000); //Throwing exception intentionally throw new Exception(); } In the preceding code, we have a GenerateDocumentAsync method that performs a long running operation, such as reading the template from the database, populating placeholders, and writing a document to the filesystem. To automate this process, we used Thread.Sleep to sleep the thread for three seconds and then throw an exception that will be propagated to the calling method. The Main method loops the templates list and calls the GenerateDocumentAsync method for each template. Each GenerateDocumentAsync method returns a task. When calling an asynchronous method, the exception is actually hidden until the Wait, WaitAll, WhenAll, and other methods are called. In the preceding example, the exception will be thrown once the Task.WaitAll method is called, and will log the exception on the console. Task status The task object provides a TaskStatus that is used to know whether the task is executing the method running, has completed the method, has encountered a fault, or whether some other occurrence has taken place. The task initialized using Task.Run initially has the status of Created, but when the Start method is called, its status is changed to Running. When applying the TAP pattern, all the methods return the Task object, and whether they are using the Task.Run inside, the method body should be activated. That means that the status should be anything other than Created. The TAP pattern ensures the consumer that the task is activated and the starting task is not required. Task cancellation Cancellation is an optional thing for TAP-based asynchronous methods. If the method accepts the CancellationToken as the parameter, it can be used by the caller party to cancel a task. However, for a TAP, the cancellation should be properly handled. Here is a basic example showing how cancellation can be implemented: static void Main(string[] args) { CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token)); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken) { if (cancellationToken.IsCancellationRequested) { Console.WriteLine("Cancellation is requested..."); cancellationToken.ThrowIfCancellationRequested } //Do some file save operation File.WriteAllBytes(path, fileBytes); return Task.FromResult<int>(0); } In the preceding code, we have a SaveFileAsync method that takes the byte array and the CancellationToken as parameters. In the Main method, we initialize the CancellationTokenSource that can be used to cancel the asynchronous operation later in the program. To test the cancellation scenario, we will just call the Cancel method of the tokenSource after the Task.Factory.StartNew method and the operation will be canceled. Moreover, when the task is canceled, its status is set to Cancelled and the IsCompleted property is set to true. Task progress reporting With TPL, we can use the IProgress<T> interface to get real-time progress notifications from the asynchronous operations. This can be used in scenarios where we need to update the user interface or the console app of asynchronous operations. When defining the TAP-based asynchronous methods, defining IProgress<T> in a parameter is optional. We can have overloaded methods that can help consumers to use in the case of specific needs. However, they should only be used if the asynchronous method supports them.  Here is the modified version of SaveFileAsync that updates the user about the real progress: static void Main(string[] args) { var progressHandler = new Progress<string>(value => { Console.WriteLine(value); }); var progress = progressHandler as IProgress<string>; CancellationTokenSource tokenSource = new CancellationTokenSource(); CancellationToken token = tokenSource.Token; Task.Factory.StartNew(() => SaveFileAsync(path, bytes, token, progress)); Console.Read(); } static Task<int> SaveFileAsync(string path, byte[] fileBytes, CancellationToken cancellationToken, IProgress<string> progress) { if (cancellationToken.IsCancellationRequested) { progress.Report("Cancellation is called"); Console.WriteLine("Cancellation is requested..."); } progress.Report("Saving File"); File.WriteAllBytes(path, fileBytes); progress.Report("File Saved"); return Task.FromResult<int>(0); } Implementing TAP using compilers Any method that is attributed with the async keyword (for C#) or Async for (Visual Basic) is called an asynchronous method. The async keyword can be applied to a method, anonymous method, or a Lambda expression, and the language compiler can execute that task asynchronously. Here is a simple implementation of the TAP method using the compiler approach: static void Main(string[] args) { var t = ExecuteLongRunningOperationAsync(100000); Console.WriteLine("Called ExecuteLongRunningOperationAsync method, now waiting for it to complete"); t.Wait(); Console.Read(); } public static async Task<int> ExecuteLongRunningOperationAsync(int millis) { Task t = Task.Factory.StartNew(() => RunLoopAsync(millis)); await t; Console.WriteLine("Executed RunLoopAsync method"); return 0; } public static void RunLoopAsync(int millis) { Console.WriteLine("Inside RunLoopAsync method"); for(int i=0;i< millis; i++) { Debug.WriteLine($"Counter = {i}"); } Console.WriteLine("Exiting RunLoopAsync method"); } In the preceding code, we have the ExecuteLongRunningOperationAsync method, which is implemented as per the compiler approach. It calls the RunLoopAsync that executes a loop for a certain number of milliseconds that is passed in the parameter. The async keyword on the ExecuteLongRunningOperationAsync method actually tells the compiler that this method has to be executed asynchronously, and, once the await statement is reached, the method returns to the Main method that writes the line on a console and waits for the task to be completed. Once the RunLoopAsync is executed, the control comes back to await and starts executing the next statements in the ExecuteLongRunningOperationAsync method. Implementing TAP with greater control over Task As we know, that the TPL is centered on the Task and Task<TResult> objects. We can execute an asynchronous task by calling the Task.Run method and execute a delegate method or a block of code asynchronously and use Wait or other methods on that task. However, this approach is not always adequate, and there are scenarios where we may have different approaches to executing asynchronous operations, and we may use an Event-based Asynchronous Pattern (EAP) or an Asynchronous Programming Model (APM). To implement TAP principles here, and to get the same control over asynchronous operations executing with different models, we can use the TaskCompletionSource<TResult> object. The TaskCompletionSource<TResult> object is used to create a task that executes an asynchronous operation. When the asynchronous operation completes, we can use the TaskCompletionSource<TResult> object to set the result, exception, or state of the task. Here is a basic example that executes the ExecuteTask method that returns Task, where the ExecuteTask method uses the TaskCompletionSource<TResult> object to wrap the response as a Task and executes the ExecuteLongRunningTask through the Task.StartNew method: static void Main(string[] args) { var t = ExecuteTask(); t.Wait(); Console.Read(); } public static Task<int> ExecuteTask() { var tcs = new TaskCompletionSource<int>(); Task<int> t1 = tcs.Task; Task.Factory.StartNew(() => { try { ExecuteLongRunningTask(10000); tcs.SetResult(1); }catch(Exception ex) { tcs.SetException(ex); } }); return tcs.Task; } public static void ExecuteLongRunningTask(int millis) { Thread.Sleep(millis); Console.WriteLine("Executed"); } So now, we've been able to use TPL and TAP over traditional threads, thus improving performance. If you liked this article and would like to learn more such techniques, pick up this book, C# 7 and .NET Core 2.0 High Performance, authored by Ovais Mehboob Ahmed Khan. Get to know ASP.NET Core Web API [Tutorial] .NET Core completes move to the new compiler – RyuJIT Applying Single Responsibility principle from SOLID in .NET Core
Read more
  • 0
  • 0
  • 33211

article-image-why-choose-opencv-over-matlab-for-your-next-computer-vision-project
Vincy Davis
20 Dec 2019
6 min read
Save for later

Why choose OpenCV over MATLAB for your next Computer Vision project

Vincy Davis
20 Dec 2019
6 min read
Scientific Computing relies on executing computer algorithms coded in different programming languages. One such interdisciplinary scientific field is the study of Computer Vision, often abbreviated as CV. Computer Vision is used to develop techniques that can automate tasks like acquiring, processing, analyzing and understanding digital images. It is also utilized for extracting high-dimensional data from the real world to produce symbolic information. In simple words, Computer Vision gives computers the ability to see, understand and process images and videos like humans. The vast advances in hardware, machine learning tools, and frameworks have resulted in the implementation of Computer Vision in various fields like IoT, manufacturing, healthcare, security, etc. Major tech firms like Amazon, Google, Microsoft, and Facebook are investing immensely in the research and development of this field. Out of the many tools and libraries available for Computer Vision nowadays, there are two major tools OpenCV and Matlab that stand out in terms of their speed and efficiency. In this article, we will have a detailed look at both of them. Further Reading [box type="shadow" align="" class="" width=""]To learn how to build interesting image recognition models like setting up license plate recognition using OpenCV, read the book “Computer Vision Projects with OpenCV and Python 3” by author Matthew Rever. The book will also guide you to design and develop production-grade Computer Vision projects by tackling real-world problems.[/box] OpenCV: An open-source multiplatform solution tailored for Computer Vision OpenCV, developed by Intel and now supported by Willow Garage, is released under the BSD 3-Clause license and is free for commercial use. It is one of the most popular computer vision tools aimed at providing a well-optimized, well tested, and open-source (C++)-based implementation for computer vision algorithms. The open-source library has interfaces for multiple languages like C++, Python, and Java and supports Linux, macOS, Windows, iOS, and Android. Many of its functions are implemented on GPU. The first stable release of OpenCV version 1.0 was in the year 2006. The OpenCV community has grown rapidly ever since and with its latest release, OpenCV version 4.1.1, it also brings improvements in the dnn (Deep Neural Networks) module, which is a popular module in the library that implements forward pass (inferencing) with deep networks, which are pre-trained using popular deep learning frameworks.  Some of the features offered by OpenCV include: imread function to read the images in the BGR (Blue-Green-Red) format by default. Easy up and downscaling for resizing an image. Supports various interpolation and downsampling methods like INTER_NEAREST to represent the nearest neighbor interpolation. Supports multiple variations of thresholding like adaptive thresholding, bitwise operations, edge detection, image filtering, image contours, and more. Enables image segmentation (Watershed Algorithm) to classify each pixel in an image to a particular class of background and foreground. Enables multiple feature-matching algorithms, like brute force matching, knn feature matching, among others. With its active community and regular updates for Machine Learning, OpenCV is only going to grow by leaps and bounds in the field of Computer Vision projects.  MATLAB: A licensed quick prototyping tool with OpenCV integration One disadvantage of OpenCV, which makes novice computer vision users tilt towards Matlab is the former's complex nature. OpenCV is comparatively harder to learn due to lack of documentation and error handling codes. Matlab, developed by MathWorks is a proprietary programming language with a multi-paradigm numerical computing environment. It has over 3 million users worldwide and is considered one of the easiest and most productive software for engineers and scientists. It has a very powerful and swift matrix library.  Matlab also works in integration with OpenCV. This enables MATLAB users to explore, analyze, and debug designs that incorporate OpenCV algorithms. The support package of MATLAB includes the data type conversions necessary for MATLAB and OpenCV. MathWorks provided Computer Vision Toolbox renders algorithms, functions, and apps for designing and testing computer vision, 3D vision, and video processing systems. It also allows detection, tracking, feature extraction, and matching of objects. Matlab can also train custom object detectors using deep learning and machine learning algorithms such as YOLO v2, Faster R-CNN, and ACF. Most of the toolbox algorithms in Matlab support C/C++ code generation for integrating with existing code, desktop prototyping, and embedded vision system deployment. However, Matlab does not contain as many functions for computer vision as OpenCV, which has more of its functions implemented on GPU. Another issue with Matlab is that it's not open-source, it’s license is costly and the programs are not portable.  Another important factor which matters a lot in computer vision is the performance of a code, especially when working on real-time video processing.  Which has a faster execution time? OpenCV or Matlab? Along with Computer Vision, other fields also require faster execution while choosing a programming language or library for implementing any function. This factor is analyzed in detail in a paper titled “Matlab vs. OpenCV: A Comparative Study of Different Machine Learning Algorithms”.  The paper provides a very practical comparative study between Matlab and OpenCV using 20 different real datasets. The differentiation is based on the execution time for various machine learning algorithms like Classification and Regression Trees (CART), Naive Bayes, Boosting, Random Forest and K-Nearest Neighbor (KNN). The experiments were run on an Intel core 2 duo P7450 machine, with 3GB RAM, and Ubuntu 11.04 32-bit operating system on Matlab version 7.12.0.635 (R2011a), and OpenCV C++ version 2.1.  The paper states, “To compare the speed of Matlab and OpenCV for a particular machine learning algorithm, we run the algorithm 1000 times and take the average of the execution times. Averaging over 1000 experiments is more than necessary since convergence is reached after a few hundred.” The outcome of all the experiments revealed that though Matlab is a successful scientific computing environment, it is outrun by OpenCV for almost all the experiments when their execution time is considered. The paper points out that this could be due to a combination of a number of dimensionalities, sample size, and the use of training sets. One of the listed machine learning algorithms KNN produced a log time ratio of 0.8 and 0.9 on datasets D16 and D17 respectively.  Clearly, Matlab is great for exploring and fiddling with computer vision concepts as researchers and students at universities that can afford the software. However, when it comes to building production-ready real-world computer vision projects, OpenCV beats Matlab hand down. You can learn about building more Computer Vision projects like human pose estimation using TensorFlow from our book ‘Computer Vision Projects with OpenCV and Python 3’. Master the art of face swapping with OpenCV and Python by Sylwek Brzęczkowski, developer at TrustStamp NVIDIA releases Kaolin, a PyTorch library to accelerate research in 3D computer vision and AI Generating automated image captions using NLP and computer vision [Tutorial] Computer vision is growing quickly. Here’s why. Introducing Intel’s OpenVINO computer vision toolkit for edge computing
Read more
  • 0
  • 0
  • 33178

article-image-anatomy-wordpress-plugin
Packt
25 Mar 2011
7 min read
Save for later

Anatomy of a WordPress Plugin

Packt
25 Mar 2011
7 min read
  WordPress 3 Plugin Development Essentials Create your own powerful, interactive plugins to extend and add features to your WordPress site         Read more about this book       WordPress is a popular content management system (CMS), most renowned for its use as a blogging / publishing application. According to usage statistics tracker, BuiltWith (http://builtWith.com), WordPress is considered to be the most popular blogging software on the planet—not bad for something that has only been around officially since 2003. Before we develop any substantial plugins of our own, let's take a few moments to look at what other people have done, so we get an idea of what the final product might look like. By this point, you should have a fresh version of WordPress installed and running somewhere for you to play with. It is important that your installation of WordPress is one with which you can tinker. In this article by Brian Bondari and Everett Griffiths, authors of WordPress 3 Plugin Development Essentials, we will purposely break a few things to help see how they work, so please don't try anything in this article on a live production site. Deconstructing an existing plugin: "Hello Dolly" WordPress ships with a simple plugin named "Hello Dolly". Its name is a whimsical take on the programmer's obligatory "Hello, World!", and it is trotted out only for pedantic explanations like the one that follows (unless, of course, you really do want random lyrics by Jerry Herman to grace your administration screens). Activating the plugin Let's activate this plugin so we can have a look at what it does: Browse to your WordPress Dashboard at http://yoursite.com/wp-admin/. Navigate to the Plugins section. Under the Hello Dolly title, click on the Activate link. You should now see a random lyric appear in the top-right portion of the Dashboard. Refresh the page a few times to get the full effect. Examining the hello.php file Now that we've tried out the "Hello Dolly" plugin, let's have a closer look. In your favorite text editor, open up the /wp-content/plugins/hello.php file. Can you identify the following integral parts? The Information Header which describes details about the plugin (author and description). This is contained in a large PHP /* comment */. User-defined functions, such as the hello_dolly() function. The add_action() and/or add_filter() functions, which hook a WordPress event to a user-defined function. It looks pretty simple, right? That's all you need for a plugin: An information header Some user-defined functions add_action() and/or add_filter() functions In your WordPress Dashboard, ensure that the "Hello Dolly" plugin has been activated. If applicable, use your preferred (s)FTP program to connect to your WordPress installation. Using your text editor, temporarily delete the information header from wpcontent/ plugins/hello.php and save the file (you can save the header elsewhere for now). Save the file. Refresh the Plugins page in your browser. You should get a warning from WordPress stating that the plugin does not have a valid header: Ensure that the "Hello Dolly" plugin is active. Open the /wp-content/plugins/hello.php file in your text editor. Immediately before the line that contains function hello_dolly_get_lyric, type in some gibberish text, such as "asdfasdf" and save the file. Reload the plugins page in your browser. This should generate a parse error, something like: pre width="70"> Parse error: syntax error, unexpected T_FUNCTION in /path/to/ wordpress/html/wp-content/plugins/hello.php on line 16 Author: Listed below the plugin name Author URI: Together with "Author", this creates a link to the author's site Description: Main block of text describing the plugin Plugin Name: The displayed name of the plugin Plugin URI: Destination of the "Visit plugin site" link Version: Use this to track your changes over time Now that we've identified the critical component parts, let's examine them in more detail. Information header Don't just skim this section thinking it's a waste of breath on the self-explanatory header fields. Unlike a normal PHP file in which the comments are purely optional, in WordPress plugin and theme files, the Information Header is required! It is this block of text that causes a file to show up on WordPress' radar so that you can activate it or deactivate it. If your plugin is missing a valid information header, you cannot use it! Exercise—breaking the header To reinforce that the information header is an integral part of a plugin, try the following exercise: After you've seen the tragic consequences, put the header information back into the hello.php file. This should make it abundantly clear to you that the information header is absolutely vital for every WordPress plugin. If your plugin has multiple files, the header should be inside the primary file—in this article we use index.php as our primary file, but many plugins use a file named after the plugin name as their primary file. Location, name, and format The header itself is similar in form and function to other content management systems, such as Drupal's module.info files or Joomla's XML module configurations—it offers a way to store additional information about a plugin in a standardized format. The values can be extended, but the most common header values are listed below: For more information about header blocks, see the WordPress codex at: http://codex.wordpress.org/File_Header. In order for a PHP file to show up in WordPress' Plugins menu: The file must have a .php extension. The file must contain the information header somewhere in it (preferably at the beginning). The file must be either in the /wp-content/plugins directory, or in a subdirectory of the plugins directory. It cannot be more deeply nested. Understanding the Includes When you activate a plugin, the name of the file containing the information header is stored in the WordPress database. Each time a page is requested, WordPress goes through a laundry list of PHP files it needs to load, so activating a plugin ensures that your own files are on that list. To help illustrate this concept, let's break WordPress again. Exercise – parse errors Try the following exercise: Yikes! Your site is now broken. Why did this happen? We introduced errors into the plugin's main file (hello.php), so including it caused PHP and WordPress to choke. Delete the gibberish line from the hello.php file and save to return the plugin back to normal. The parse error only occurs if there is an error in an active plugin. Deactivated plugins are not included by WordPress and therefore their code is not parsed. You can try the same exercise after deactivating the plugin and you'll notice that WordPress does not raise any errors. Bonus for the curious In case you're wondering exactly where and how WordPress stores the information about activated plugins, have a look in the database. Using your MySQL client, you can browse the wp_options table or execute the following query: SELECT option_value FROM wp_options WHERE option_name='active_ plugins'; The active plugins are stored as a serialized PHP hash, referencing the file containing the header. The following is an example of what the serialized hash might contain if you had activated a plugin named "Bad Example". You can use PHP's unserialize() function to parse the contents of this string into a PHP variable as in the following script: <?php $active_plugin_str = 'a:1:{i:0;s:27:"bad-example/bad-example. php";}'; print_r( unserialize($active_plugin_str) ); ?> And here's its output: Array ( [0] => bad-example/bad-example.php )
Read more
  • 0
  • 1
  • 33166

article-image-jim-balsillie-on-data-governance-challenges-and-6-recommendations-to-tackle-them
Savia Lobo
05 Jun 2019
5 min read
Save for later

Jim Balsillie on Data Governance Challenges and 6 Recommendations to tackle them

Savia Lobo
05 Jun 2019
5 min read
The Canadian Parliament's Standing Committee on Access to Information, Privacy and Ethics hosted the hearing of the International Grand Committee on Big Data, Privacy and Democracy from Monday, May 27 to Wednesday, May 29.  Witnesses from at least 11 countries appeared before representatives to testify on how governments can protect democracy and citizen rights in the age of big data. This section of the hearing, which took place on May 28, includes Jim Balsillie’s take on Data Governance. Jim Balsillie, Chair, Centre for International Governance Innovation; Retired Chairman and co-CEO of BlackBerry, starts off by talking about how Data governance is the most important public policy issue of our time. It is cross-cutting with economic, social and security dimensions. It requires both national policy frameworks and international coordination. He applauded the seriousness and integrity of Mr. Zimmer Angus and Erskine Smith who have spearheaded a Canadian bipartisan effort to deal with data governance over the past three years. “My perspective is that of a capitalist and global tech entrepreneur for 30 years and counting. I'm the retired Chairman and co-CEO of Research in Motion, a Canadian technology company [that] we scaled from an idea to 20 billion in sales. While most are familiar with the iconic BlackBerry smartphones, ours was actually a platform business that connected tens of millions of users to thousands of consumer and enterprise applications via some 600 cellular carriers in over 150 countries. We understood how to leverage Metcalfe's law of network effects to create a category-defining company, so I'm deeply familiar with multi-sided platform business model strategies as well as navigating the interface between business and public policy.”, he adds. He further talks about his different observations about the nature, scale, and breadth of some collective challenges for the committee’s consideration: Disinformation in fake news is just two of the negative outcomes of unregulated attention based business models. They cannot be addressed in isolation; they have to be tackled horizontally as part of an integrated whole. To agonize over social media’s role in the proliferation of online hate, conspiracy theories, politically motivated misinformation, and harassment, is to miss the root and scale of the problem. Social media’s toxicity is not a bug, it's a feature. Technology works exactly as designed. Technology products services and networks are not built in a vacuum. Usage patterns drive product development decisions. Behavioral scientists involved with today's platforms helped design user experiences that capitalize on negative reactions because they produce far more engagement than positive reactions. Among the many valuable insights provided by whistleblowers inside the tech industry is this quote, “the dynamics of the attention economy are structurally set up to undermine the human will.” Democracy and markets work when people can make choices align with their interests. The online advertisement driven business model subverts choice and represents a fundamental threat to markets election integrity and democracy itself. Technology gets its power through the control of data. Data at the micro-personal level gives technology unprecedented power to influence. “Data is not the new oil, it's the new plutonium amazingly powerful dangerous when it spreads difficult to clean up and with serious consequences when improperly used.” Data deployed through next-generation 5G networks are transforming passive in infrastructure into veritable digital nervous systems. Our current domestic and global institutions rules and regulatory frameworks are not designed to deal with any of these emerging challenges. Because cyberspace knows no natural borders, digital transformation effects cannot be hermetically sealed within national boundaries; international coordination is critical. With these observations, Balsillie has further provided six recommendations: Eliminate tax deductibility of specific categories of online ads. Ban personalized online advertising for elections. Implement strict data governance regulations for political parties. Provide effective whistleblower protections. Add explicit personal liability alongside corporate responsibility to effect the CEO and board of directors’ decision-making. Create a new institution for like-minded nations to address digital cooperation and stability. Technology is becoming the new 4th Estate Technology is disrupting governance and if left unchecked could render liberal democracy obsolete. By displacing the print and broadcast media and influencing public opinion, technology is becoming the new Fourth Estate. In our system of checks and balances, this makes technology co-equal with the executive that led the legislative and the judiciary. When this new Fourth Estate declines to appear before this committee, as Silicon Valley executives are currently doing, it is symbolically asserting this aspirational co-equal status. But is asserting the status and claiming its privileges without the traditions, disciplines, legitimacy, or transparency that checked the power of the traditional Fourth Estate. The work of this international grand committee is a vital first step towards reset redress of this untenable current situation. Referring to what Professor Zuboff said last night, we Canadians are currently in a historic battle for the future of our democracy with a charade called sidewalk Toronto. He concludes by saying, “I'm here to tell you that we will win that battle.” To know more you can listen to the full hearing video titled, “Meeting No. 152 ETHI - Standing Committee on Access to Information, Privacy, and Ethics” on ParlVU. Speech2Face: A neural network that “imagines” faces from hearing voices. Is it too soon to worry about ethnic profiling? UK lawmakers to social media: “You’re accessories to radicalization, accessories to crimes”, hearing on spread of extremist content Key Takeaways from Sundar Pichai’s Congress hearing over user data, political bias, and Project Dragonfly
Read more
  • 0
  • 0
  • 33154

article-image-internationalization
Packt
10 Nov 2016
16 min read
Save for later

Internationalization

Packt
10 Nov 2016
16 min read
In this article by Jérémie Bouchet author of the book Magento Extensions Development. We will see how to handle this aspect of our extension and how it is handled in a complex extension using an EAV table structure. In this article, we will cover the following topics: The EAV approach Store relation table Translation of template interface texts (For more resources related to this topic, see here.) The EAV approach The EAV structure in Magento is used for complex models, such as customer and product entities. In our extension, if we want to add a new field for our events, we would have to add a new column in the main table. With the EAV structure, each attribute is stored in a separate table depending on its type. For example, catalog_product_entity, catalog_product_entity_varchar and catalog_product_entity_int. Each row in the subtables has a foreign key reference to the main table. In order to handle multiple store views in this structure, we will add a column for the store ID in the subtables. Let's see an example for a product entity, where our main table contains only the main attribute: The varchar table structure is as follows: The 70 attribute corresponds to the product name and is linked to our 1 entity. There is a different product name for the store view, 0 (default) and 2 (in French in this example). In order to create an EAV model, you will have to extend the right class in your code. You can inspire your development on the existing modules, such as customers or products. Store relation table In our extension, we will handle the store views scope by using a relation table. This behavior is also used for the CMS pages or blocks, reviews, ratings, and all the models that are not EAV-based and need to be store views-related. Creating the new table The first step is to create the new table to store the new data: Create the [extension_path]/Setup/UpgradeSchema.php file and add the following code: <?php namespace BlackbirdTicketBlasterSetup; use MagentoEavSetupEavSetup; use MagentoEavSetupEavSetupFactory; use MagentoFrameworkSetupUpgradeSchemaInterface; use MagentoFrameworkSetupModuleContextInterface; use MagentoFrameworkSetupSchemaSetupInterface; /** * @codeCoverageIgnore */ class UpgradeSchema implements UpgradeSchemaInterface { /** * EAV setup factory * * @varEavSetupFactory */ private $eavSetupFactory; /** * Init * * @paramEavSetupFactory $eavSetupFactory */ public function __construct(EavSetupFactory $eavSetupFactory) { $this->eavSetupFactory = $eavSetupFactory; } public function upgrade(SchemaSetupInterface $setup, ModuleContextInterface $context) { if (version_compare($context->getVersion(), '1.3.0', '<')) { $installer = $setup; $installer->startSetup(); /** * Create table 'blackbird_ticketblaster_event_store' */ $table = $installer->getConnection()->newTable( $installer->getTable('blackbird_ticketblaster_event_store') )->addColumn( 'event_id', MagentoFrameworkDBDdlTable::TYPE_SMALLINT, null, ['nullable' => false, 'primary' => true], 'Event ID' )->addColumn( 'store_id', MagentoFrameworkDBDdlTable::TYPE_SMALLINT, null, ['unsigned' => true, 'nullable' => false, 'primary' => true], 'Store ID' )->addIndex( $installer->getIdxName('blackbird_ticketblaster_event_store', ['store_id']), ['store_id'] )->addForeignKey( $installer->getFkName('blackbird_ticketblaster_event_store', 'event_id', 'blackbird_ticketblaster_event', 'event_id'), 'event_id', $installer->getTable('blackbird_ticketblaster_event'), 'event_id', MagentoFrameworkDBDdlTable::ACTION_CASCADE )->addForeignKey( $installer->getFkName('blackbird_ticketblaster_event_store', 'store_id', 'store', 'store_id'), 'store_id', $installer->getTable('store'), 'store_id', MagentoFrameworkDBDdlTable::ACTION_CASCADE )->setComment( 'TicketBlaster Event To Store Linkage Table' ); $installer->getConnection()->createTable($table); $installer->endSetup(); } } } The upgrade method will handle all the necessary updates in our database for our extension. In order to differentiate the update for a different version of the extension, we surround the script with a version_compare() condition. Once this code is set, we need to tell Magento that our extension has new database upgrades to process. Open the [extension_path]/etc/module.xml file and change the version number 1.2.0 to 1.3.0: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="../../../../../lib/internal/Magento/Framework/Module/etc/module.xsd"> <module name="Blackbird_TicketBlaster" setup_version="1.3.0"> <sequence> <module name="Magento_Catalog"/> <module name="Blackbird_AnotherModule"/> </sequence> </module> </config> In your terminal, run the upgrade by typing the following command: php bin/magentosetup:upgrade The new table structure now contains two columns: event_id and store_id. This table will store which events are available for store views: If you have previously created events, we recommend emptying the existing blackbird_ticketblaster_event table, because they won't have a default store view and this may trigger an error output. Adding the new input to the edit form In order to select the store view for the content, we will need to add the new input to the edit form. Before running this code, you should add a new store view: Here's how to do that: Open the [extension_path]/Block/Adminhtml/Event/Edit/Form.php file and add the following code in the _prepareForm() method, below the last addField() call: /* Check is single store mode */ if (!$this->_storeManager->isSingleStoreMode()) { $field = $fieldset->addField( 'store_id', 'multiselect', [ 'name' => 'stores[]', 'label' => __('Store View'), 'title' => __('Store View'), 'required' => true, 'values' => $this->_systemStore->getStoreValuesForForm(false, true) ] ); $renderer = $this->getLayout()->createBlock( 'MagentoBackendBlockStoreSwitcherFormRendererFieldsetElement' ); $field->setRenderer($renderer); } else { $fieldset->addField( 'store_id', 'hidden', ['name' => 'stores[]', 'value' => $this->_storeManager->getStore(true)->getId()] ); $model->setStoreId($this->_storeManager->getStore(true)->getId()); } This results in a new multiselect field in the form. Saving the new data in the new table Now we have the form and the database table, we have to write the code to save the data from the form: Open the [extension_path]/Model/Event.php file and add the following method at its end: /** * Receive page store ids * * @return int[] */ public function getStores() { return $this->hasData('stores') ? $this->getData('stores') : $this->getData('store_id'); } Open the [extension_path]/Model/ResourceModel/Event.php file and replace all the code with the following code: <?php namespace BlackbirdTicketBlasterModelResourceModel; class Event extends MagentoFrameworkModelResourceModelDbAbstractDb { [...] The afterSave() method is handling our insert queries in the new table. The afterload() and getLoadSelect() methods are handling the new load mode to select the right events. Your new table is now filled when you save your events; they are also properly loaded when you go back to your edit form. Showing the store views in the admin grid In order to inform admin users of the selected store views for one event, we will add a new column in the admin grid: Open the [extension_path]/Model/ResourceModel/Event/Collection.php file and replace all the code with the following code: <?php namespace BlackbirdTicketBlasterModelResourceModelEvent; class Collection extends MagentoFrameworkModelResourceModelDbCollectionAbstractCollection { [...] Open the [extention_path]/view/adminhtml/ui_component/ticketblaster_event_listing.xml file and add the following XML instructions before the end of the </filters> tag: <filterSelect name="store_id"> <argument name="optionsProvider" xsi_type="configurableObject"> <argument name="class" xsi_type="string">MagentoCmsUiComponentListingColumnCmsOptions</argument> </argument> <argument name="data" xsi_type="array"> <item name="config" xsi_type="array"> <item name="dataScope" xsi_type="string">store_id</item> <item name="label" xsi_type="string" translate="true">Store View</item> <item name="captionValue" xsi_type="string">0</item> </item> </argument> </filterSelect> Before the actionsColumn tag, add the new column: <column name="store_id" class="MagentoStoreUiComponentListingColumnStore"> <argument name="data" xsi_type="array"> <item name="config" xsi_type="array"> <item name="bodyTmpl" xsi_type="string">ui/grid/cells/html</item> <item name="sortable" xsi_type="boolean">false</item> <item name="label" xsi_type="string" translate="true">Store View</item> </item> </argument> </column> You can refresh your grid page and see the new column added at the end. Magento remembers the previous column's order. If you add a new column, it will always be added at the end of the table. You will have to manually reorder them by dragging and dropping them. Modifying the frontend event list Our frontend list (/events) is still listing all the events. In order to list only the events available for our current store view, we need to change a file: Edit the [extension_path]/Block/EventList.php file and replace the code with the following code: <?php namespace BlackbirdTicketBlasterBlock; use BlackbirdTicketBlasterApiDataEventInterface; use BlackbirdTicketBlasterModelResourceModelEventCollection as EventCollection; use MagentoCustomerModelContext; class EventList extends MagentoFrameworkViewElementTemplate implements MagentoFrameworkDataObjectIdentityInterface { /** * Store manager * * @var MagentoStoreModelStoreManagerInterface */ protected $_storeManager; /** * @var MagentoCustomerModelSession */ protected $_customerSession; /** * Construct * * @param MagentoFrameworkViewElementTemplateContext $context * @param BlackbirdTicketBlasterModelResourceModelEventCollectionFactory $eventCollectionFactory, * @param array $data */ public function __construct( MagentoFrameworkViewElementTemplateContext $context, BlackbirdTicketBlasterModelResourceModelEventCollectionFactory $eventCollectionFactory, MagentoStoreModelStoreManagerInterface $storeManager, MagentoCustomerModelSession $customerSession, array $data = [] ) { parent::__construct($context, $data); $this->_storeManager = $storeManager; $this->_eventCollectionFactory = $eventCollectionFactory; $this->_customerSession = $customerSession; } /** * @return BlackbirdTicketBlasterModelResourceModelEventCollection */ public function getEvents() { if (!$this->hasData('events')) { $events = $this->_eventCollectionFactory ->create() ->addOrder( EventInterface::CREATION_TIME, EventCollection::SORT_ORDER_DESC ) ->addStoreFilter($this->_storeManager->getStore()->getId()); $this->setData('events', $events); } return $this->getData('events'); } /** * Return identifiers for produced content * * @return array */ public function getIdentities() { return [BlackbirdTicketBlasterModelEvent::CACHE_TAG . '_' . 'list']; } /** * Is logged in * * @return bool */ public function isLoggedIn() { return $this->_customerSession->isLoggedIn(); } } Note that we have a new property available and instantiated in our constructor: storeManager. Thanks to this class, we can filter our collection with the store view ID by calling the addStoreFilter() method on our events collection. Restricting the frontend access by store view The events will not be listed in our list page if they are not available for the current store view, but they can still be accessed with their direct URL, for example http://[magento_url]/events/view/index/event_id/2. We will change this to restrict the frontend access by store view: Open the [extention_path]/Helper/Event.php file and replace the code with the following code: <?php namespace BlackbirdTicketBlasterHelper; use BlackbirdTicketBlasterApiDataEventInterface; use BlackbirdTicketBlasterModelResourceModelEventCollection as EventCollection; use MagentoFrameworkAppActionAction; class Event extends MagentoFrameworkAppHelperAbstractHelper { /** * @var BlackbirdTicketBlasterModelEvent */ protected $_event; /** * @var MagentoFrameworkViewResultPageFactory */ protected $resultPageFactory; /** * Store manager * * @var MagentoStoreModelStoreManagerInterface */ protected $_storeManager; /** * Constructor * * @param MagentoFrameworkAppHelperContext $context * @param BlackbirdTicketBlasterModelEvent $event * @param MagentoFrameworkViewResultPageFactory $resultPageFactory * @SuppressWarnings(PHPMD.ExcessiveParameterList) */ public function __construct( MagentoFrameworkAppHelperContext $context, BlackbirdTicketBlasterModelEvent $event, MagentoFrameworkViewResultPageFactory $resultPageFactory, MagentoStoreModelStoreManagerInterface $storeManager, ) { $this->_event = $event; $this->_storeManager = $storeManager; $this->resultPageFactory = $resultPageFactory; $this->_customerSession = $customerSession; parent::__construct($context); } /** * Return an event from given event id. * * @param Action $action * @param null $eventId * @return MagentoFrameworkViewResultPage|bool */ public function prepareResultEvent(Action $action, $eventId = null) { if ($eventId !== null && $eventId !== $this->_event->getId()) { $delimiterPosition = strrpos($eventId, '|'); if ($delimiterPosition) { $eventId = substr($eventId, 0, $delimiterPosition); } $this->_event->setStoreId($this->_storeManager->getStore()->getId()); if (!$this->_event->load($eventId)) { return false; } } if (!$this->_event->getId()) { return false; } /** @var MagentoFrameworkViewResultPage $resultPage */ $resultPage = $this->resultPageFactory->create(); // We can add our own custom page handles for layout easily. $resultPage->addHandle('ticketblaster_event_view'); // This will generate a layout handle like: ticketblaster_event_view_id_1 // giving us a unique handle to target specific event if we wish to. $resultPage->addPageLayoutHandles(['id' => $this->_event->getId()]); // Magento is event driven after all, lets remember to dispatch our own, to help people // who might want to add additional functionality, or filter the events somehow! $this->_eventManager->dispatch( 'blackbird_ticketblaster_event_render', ['event' => $this->_event, 'controller_action' => $action] ); return $resultPage; } } The setStoreId() method called on our model will load the model only for the given ID. The events are no longer available through their direct URL if we are not on their available store view. Translation of template interface texts In order to translate the texts written directly in the template file, for the interface or in your PHP class, you need to use the __('Your text here') method. Magento looks for a corresponding match within all the translation CSV files. There is nothing to be declared in XML; you simply have to create a new folder at the root of your module and create the required CSV: Create the [extension_path]/i18n folder. Create [extension_path]/i18n/en_US.csv and add the following code: "Event time:","Event time:" "Please sign in to read more details.","Please sign in to read more details." "Read more","Read more" Create [extension_path]/i18n/en_US.csv and add the following code: "Event time:","Date de l'évènement :" "Pleasesign in to read more details.","Merci de vous inscrire pour plus de détails." "Read more","Lire la suite" The CSV file contains the correspondences between the key used in the code and the value in its final language. Translation of e-mail templates: creating and translating the e-mails We will add a new form in the Details page to share the event to a friend. The first step is to declare your e-mail template. To declare your e-mail template, create a new [extension_path]/etc/email_templates.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Email:etc/email_templates.xsd"> <template id="ticketblaster_email_email_template" label="Share Form" file="share_form.html" type="text" module="Blackbird_TicketBlaster" area="adminhtml"/> </config> This XML line declares a new template ID, label, file path, module, and area (frontend or adminhtml). Next, create the corresponding template by creating the [extension_path]/view/adminhtml/email/share_form.html file and add the following code: <!--@subject Share Form@--> <!--@vars { "varpost.email":"Sharer Email", "varevent.title":"Event Title", "varevent.venue":"Event Venue" } @--> <p>{{trans "Your friend %email is sharing an event with you:" email=$post.email}}</p> {{trans "Title: %title" title=$event.title}}<br/> {{trans "Venue: %venue" venue=$event.venue}}<br/> <p>{{trans "View the detailed page: %url" url=$event.url}}</p> Note that in order to translate texts within the HTML file, we use the trans function, which works like the default PHP printf() function. The function will also use our i18n CSV files to find a match for the text. Your e-mail template can also be overridden directly from the backoffice: Marketing | Email templates. The e-mail template is ready; we will also add the ability to change it in the system configuration and allow users to determine the sender's e-mail and name: Create the [extension_path]/etc/adminhtml/system.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Config:etc/system_file.xsd"> <system> <section id="ticketblaster" translate="label" type="text" sortOrder="100" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Ticket Blaster</label> <tab>general</tab> <resource>Blackbird_TicketBlaster::event</resource> <group id="email" translate="label" type="text" sortOrder="50" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Options</label> <field id="recipient_email" translate="label" type="text" sortOrder="10" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Send Emails To</label> <validate>validate-email</validate> </field> <field id="sender_email_identity" translate="label" type="select" sortOrder="20" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Sender</label> <source_model>MagentoConfigModelConfigSourceEmailIdentity</source_model> </field> <field id="email_template" translate="label comment" type="select" sortOrder="30" showInDefault="1" showInWebsite="1" showInStore="1"> <label>Email Template</label> <comment>Email template chosen based on theme fallback when "Default" option is selected.</comment> <source_model>MagentoConfigModelConfigSourceEmailTemplate</source_model> </field> </group> </section> </system> </config> Create the [extension_path]/etc/config.xml file and add the following code: <?xml version="1.0"?> <config xsi_noNamespaceSchemaLocation="urn:magento:module:Magento_Store:etc/config.xsd"> <default> <ticketblaster> <email> <recipient_email> <![CDATA[hello@example.com]]> </recipient_email> <sender_email_identity>custom2</sender_email_identity> <email_template>ticketblaster_email_email_template</email_template> </email> </ticketblaster> </default> </config> Thanks to these two files, you can change the configuration for the e-mail template in the Admin panel (Stores | Configuration). Let's create our HTML form and the controller that will handle our submission: Open the existing [extension_path]/view/frontend/templates/view.phtml file and add the following code at the end: <form action="<?php echo $block->getUrl('events/view/share', array('event_id' => $event->getId())); ?>" method="post" id="form-validate" class="form"> <h3> <?php echo __('Share this event to my friend'); ?> </h3> <input type="email" name="email" class="input-text" placeholder="email" /> <button type="submit" class="button"><?php echo __('Share'); ?></button> </form> Create the [extension_path]/Controller/View/Share.php file and add the following code: <?php namespace BlackbirdTicketBlasterControllerView; use MagentoFrameworkExceptionNotFoundException; use MagentoFrameworkAppRequestInterface; use MagentoStoreModelScopeInterface; use BlackbirdTicketBlasterApiDataEventInterface; class Share extends MagentoFrameworkAppActionAction { [...] This controller will get the necessary configuration entirely from the admin and generate the e-mail to be sent. Testing our code by sending the e-mail Go to the page of an event and fill in the form we prepared. When you submit it, Magento will send the e-mail immediately. Summary In this article, we addressed all the main processes that are run for internationalization. We can now create and control the availability of our events with regards to Magento's stores and translate the contents of our pages and e-mails. Resources for Article: Further resources on this subject: Magento Theme Distribution [article] Installing Magento [article] Magento 2 – the New E-commerce Era [article]
Read more
  • 0
  • 0
  • 33122
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-introducing-r-rstudio-and-shiny
Packt
25 Sep 2015
9 min read
Save for later

Introducing R, RStudio, and Shiny

Packt
25 Sep 2015
9 min read
 In this article, by Hernán G. Resnizky, author of the book Learning Shiny, the main objective will be to learn how to install all the needed components to build an application in R with Shiny. Additionally, some general ideas about what R is will be covered in order to be able to dive deeper into programming using R. The following topics will be covered: A brief introduction to R, RStudio, and Shiny Installation of R and Shiny General tips and tricks (For more resources related to this topic, see here.) About R As stated on the R-project main website: "R is a language and environment for statistical computing and graphics." R is a successor of S and is a GNU project. This means, briefly, that anyone can have access to its source codes and can modify or adapt it to their needs. Nowadays, it is gaining territory over classic commercial software, and it is, along with Python, the most used language for statistics and data science. Regarding R's main characteristics, the following can be considered: Object oriented: R is a language that is composed mainly of objects and functions. Can be easily contributed to: Similar to GNU projects, R is constantly being enriched by user's contributions either by making their codes accessible via "packages" or libraries, or by editing/improving its source code. There are actually almost 7000 packages in the common R repository, Comprehensive R Archive Network (CRAN). Additionally, there are R repositories of public access, such as bioconductor project that contains packages for bioinformatics. Runtime execution: Unlike C or Java, R does not need compilation. This means that you can, for instance, write 2 + 2 in the console and it will return the value. Extensibility: The R functionalities can be extended through the installation of packages and libraries. Standard proven libraries can be found in CRAN repositories and are accessible directly from R by typing install.packages(). Installing R R can be installed in every operating system. It is highly recommended to download the program directly from http://cran.rstudio.com/ when working on Windows or Mac OS. On Ubuntu, R can be easily installed from the terminal as follows: sudo apt-get update sudo apt-get install r-base sudo apt-get install r-base-dev The installation of r-base-dev is highly recommended as it is a package that enables users to compile the R packages from source, that is, maintain the packages or install additional R packages directly from the R console using the install.packages() command. To install R on other UNIX-based operating systems, visit the following links: http://cran.rstudio.com/ http://cran.r-project.org/doc/manuals/r-release/R-admin.html#Obtaining-R A quick guide to R When working on Windows, R can be launched via its application. After the installation, it is available as any other program on Windows. When opening the program, a window like this will appear: When working on Linux, you can access the R console directly by typing R on the command line: In both the cases, R executes in runtime. This means that you can type in code, press Enter, and the result will be given immediately as follows: > 2+2 [1] 4 The R application in any operating system does not provide an easy environment to develop code. For this reason, it is highly recommended (not only to write web applications in R with Shiny, but for any task you want to perform in R) to use an Integrated Development Environment (IDE). About RStudio As with other programming languages, there is a huge variety of IDEs available for R. IDEs are applications that make code development easier and clearer for the programmer. RStudio is one of the most important ones for R, and it is especially recommended to write web applications in R with Shiny because this contains features specially designed for R. Additionally, RStudio provides facilities to write C++, Latex, or HTML documents and also integrates them to the R code. RStudio also provides version control, project management, and debugging features among many others. Installing RStudio RStudio for desktop computers can be downloaded from its official website at http://www.rstudio.com/products/rstudio/download/ where you can get versions of the software for Windows, MAC OS X, Ubuntu, Debian, and Fedora. Quick guide to RStudio Before installing and running RStudio, it is important to have R installed. As it is an IDE and not the programming language, it will not work at all. The following screenshot shows RStudio's starting view: At the first glance, the following four main windows are available: Text editor: This provides facilities to write the R scripts such as highlighting and a code completer (when hitting Tab, you can see the available options to complete the code written). It is also possible to include the R code in an HTML, Latex, or C++ piece of code. Environment and history: They are defined as follows: In the Environment section, you can see the active objects in each environment. By clicking on Global Environment (which is the environment shown by default), you can change the environment and see the active objects. In the History tab, the pieces of codes executed are stored line by line. You can select one or more lines and send them either to the editor or to the console. In addition, you can look up for a certain specific piece of code by typing it in the textbox in the top right part of this window. Console: This is an exact equivalent of R console, as described in Quick guide of R. Tabs: The different tabs are defined as follows: Files: This consists of a file browser with several additional features (renaming, deleting, and copying). Clicking on a file will open it in editor or the Environment tab depending on the type of the file. If it is a .rda or .RData file, it will open in both. If it is a text file, it will open in one of them. Plots: Whenever a plot is executed, it will be displayed in that tab. Packages: This shows a list of available and active packages. When the package is active, it will appear as clicked. Packages can also be installed interactively by clicking on Install Packages. Help: This is a window to seek and read active packages' documentation. Viewer: This enables us to see the HTML-generated content within RStudio. Along with numerous features, RStudio also provides keyboard shortcuts. A few of them are listed as follows: Description Windows/Linux OSX Complete the code. Tab Tab Run the selected piece of code. If no piece of code is selected, the active line is run. Ctrl + Enter ⌘ + Enter Comment the selected block of code. Ctrl + Shift + C ⌘ + / Create a section of code, which can be expanded or compressed by clicking on the arrow to the left. Additionally, it can be accessed by clicking on it in the bottom left menu. ##### ##### Find and replace. Ctrl + F ⌘ + F The following screenshots show how a block of code can be collapsed by clicking on the arrow and how it can be accessed quickly by clicking on its name in the bottom-left part of the window: Clicking on the circled arrow will collapse the Section 1 block, as follows: The full list of shortcuts can be found at https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts. For further information about other RStudio features, the full documentation is available at https://support.rstudio.com/hc/en-us/categories/200035113-Documentation. About Shiny Shiny is a package created by RStudio, which enables to easily interface R with a web browser. As stated in its official documentation, Shiny is a web application framework for R that makes it incredibly easy to build interactive web applications with R. One of its main advantages is that there is no need to combine R code with HTML/JavaScript code as the framework already contains prebuilt features that cover the most commonly used functionalities in a web interactive application. There is a wide range of software that has web application functionalities, especially oriented to interactive data visualization. What are the advantages of using R/Shiny then, you ask? They are as follows: It is free not only in terms of money, but as all GNU projects, in terms of freedom. As stated in the GNU main page: To understand the concept (GNU), you should think of free as in free speech, not as in free beer. Free software is a matter of the users' freedom to run, copy, distribute, study, change, and improve the software. All the possibilities of a powerful language such as R is available. Thanks to its contributive essence, you can develop a web application that can display any R-generated output. This means that you can, for instance, run complex statistical models and return the output in a friendly way in the browser, obtain and integrate data from the various sources and formats (for instance, SQL, XML, JSON, and so on) the way you need, and subset, process, and dynamically aggregate the data the way you want. These options are not available (or are much more difficult to accomplish) under most of the commercial BI tools. Installing and loading Shiny As with any other package available in the CRAN repositories, the easiest way to install Shiny is by executing install.packages("shiny"). The following output should appear on the console: Due to R's extensibility, many of its packages use elements (mostly functions) from other packages. For this reason, these packages are loaded or installed when the package that is dependent on them is loaded or installed. This is called dependency. Shiny (on its 0.10.2.1 version) depends on Rcpp, httpuv, mime, htmltools, and R6. An R session is started only with the minimal packages loaded. So if functions from other packages are used, they need to be loaded before using them. The corresponding command for this is as follows: library(shiny) When installing a package, the package name must be quoted but when loading the package, it must be unquoted. Summary After these instructions, the reader should be able to install all the fundamental elements to create a web application with Shiny. Additionally, he or she must have acquired at least a general idea of what R and the R project is. Resources for Article: Further resources on this subject: R ─ Classification and Regression Trees[article] An overview of common machine learning tasks[article] Taking Control of Reactivity, Inputs, and Outputs [article]
Read more
  • 0
  • 0
  • 33103

article-image-4-popular-algorithms-distance-based-outlier-detection
Sugandha Lahoti
01 Dec 2017
7 min read
Save for later

4 popular algorithms for Distance-based outlier detection

Sugandha Lahoti
01 Dec 2017
7 min read
[box type="note" align="" class="" width=""]The article is an excerpt from our book titled Mastering Java Machine Learning by Dr. Uday Kamath and  Krishna Choppella.[/box] This book introduces you to an array of expert machine learning techniques, including classification, clustering, anomaly detection, stream learning, active learning, semi-supervised learning, probabilistic graph modelling and a lot more. The article given below is extracted from Chapter 5 of the book - Real-time Stream Machine Learning, explaining 4 popular algorithms for Distance-based outlier detection. Distance-based outlier detection is the most studied, researched, and implemented method in the area of stream learning. There are many variants of the distance-based methods, based on sliding windows, the number of nearest neighbors, radius and thresholds, and other measures for considering outliers in the data. We will try to give a sampling of the most important algorithms in this article. Inputs and outputs Most algorithms take the following parameters as inputs: Window size w, corresponding to the fixed size on which the algorithm looks for outlier patterns. Sliding size s, corresponds to the number of new instances that will be added to the window, and old ones removed. The count threshold k of instances when using nearest neighbor computation. The distance threshold R used to define the outlier threshold in distances. Outliers as labels or scores (based on neighbors and distance) are outputs. How does it work? We present different variants of distance-based stream outlier algorithms, giving insights into what they do differently or uniquely. The unique elements in each algorithm define what happens when the slide expires, how a new slide is processed, and how outliers are reported. Exact Storm Exact Storm stores the data in the current window w in a well-known index structure, so that the range query search or query to find neighbors within the distance R for a given point is done efficiently. It also stores k preceding and succeeding neighbors of all data points: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries but are preserved in the preceding list of neighbors. New Slide: For each data point in the new slide, range query R is executed, results are used to update the preceding and succeeding list for the instance, and the instance is stored in the index structure. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, any instance with at least k elements from the succeeding list and non-expired preceding list is reported as an outlier. Abstract-C Abstract-C keeps the index structure similar to Exact Storm but instead of preceding and succeeding lists for every object it just maintains a list of counts of neighbors for the windows the instance is participating in: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries and the first element from the list of counts is removed corresponding to the last window. New Slide: For each data point in the new slide, range query R is executed and results are used to update the list count. For existing instances, the count gets updated with new neighbors and instances are added to the index structure. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, all instances with a neighbors count less than k in the current window are considered outliers. Direct Update of Events (DUE) DUE keeps the index structure for efficient range queries exactly like the other algorithms but has a different assumption, that when an expired slide occurs, not every instance is affected in the same way. It maintains two priority queues: the unsafe inlier queue and the outlier list. The unsafe inlier queue has sorted instances based on the increasing order of smallest expiration time of their preceding neighbors. The outlier list has all the outliers in the current window: Expired Slide: Instances in expired slides are removed from the index structure that affects range queries and the unsafe inlier queue is updated for expired neighbors. Those unsafe inliers which become outliers are removed from the priority queue and moved to the outlier list. New Slide: For each data point in the new slide, range query R is executed, results are used to update the succeeding neighbors of the point, and only the most recent preceding points are updated for the instance. Based on the updates, the point is added to the unsafe inlier priority queue or removed from the queue and added to the outlier list. Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, all instances in the outlier list are reported as outliers. Micro Clustering based Algorithm (MCOD) Micro-clustering based outlier detection overcomes the computational issues of performing range queries for every data point. The micro-cluster data structure is used instead of range queries in these algorithms. A micro-cluster is centered around an instance and has a radius of R. All the points belonging to the micro-clusters become inliers. The points that are outside can be outliers or inliers and stored in a separate list. It also has a data structure similar to DUE to keep a priority queue of unsafe inliers: Expired Slide: Instances in expired slides are removed from both microclusters and the data structure with outliers and inliers. The unsafe inlier queue is updated for expired neighbors as in the DUE algorithm. Microclusters are also updated for non-expired data points. New Slide: For each data point in the new slide, the instance either becomes a center of a micro-cluster, or part of a micro-cluster or added to the event queue and the data structure of the outliers. If the point is within the distance R, it gets assigned to an existing micro-cluster; otherwise, if there are k points within R, it becomes the center of the new micro cluster; if not, it goes into the two structures of the event queue and possible outliers.   Outlier Reporting: In any window, after the processing of expired and new slide elements is complete, any instance in the outlier structure with less than k neighboring instances is reported as an outlier. Advantages and limitations The advantages and limitations are as follows:   Exact Storm is demanding in storage and CPU for storing lists and retrieving neighbors. Also, it introduces delays; even though they are implemented in efficient data structures, range queries can be slow.   Abstract-C has a small advantage over Exact Storm, as no time is spent on finding active neighbors for each instance in the window. The storage and time spent is still very much dependent on the window and slide chosen.   DUE has some advantage over Exact Storm and Abstract-C as it can efficiently re-evaluate the "inlierness" of points (that is, whether unsafe inliers remain inliers or become outliers) but sorting the structure impacts both CPU and memory. MCOD has distinct advantages in memory and CPU owing to the use of the micro-cluster structure and removing the pairwise distance computation. Storing the neighborhood information in micro-clusters helps memory too. Validation and evaluation of stream-based outliers is still an open research area. By varying parameters such as window-size, neighbors within radius, and so on, we determine the sensitivity to the performance metrics (time to evaluate in terms of CPU times per object, Number of outliers detected in the streams,TP/Precision/Recall/ Area under PRC curve) and determine the robustness. If you liked the above article, checkout our book Mastering Java Machine Learning to explore more on advanced machine learning techniques using the best Java-based tools available.
Read more
  • 0
  • 0
  • 33076

article-image-salesforce-is-buying-tableau-in-a-15-7-billion-all-stock-deal
Richard Gall
10 Jun 2019
4 min read
Save for later

Salesforce is buying Tableau in a $15.7 billion all-stock deal

Richard Gall
10 Jun 2019
4 min read
Salesforce, one of the world's leading CRM platforms, is buying data visualization software Tableau in an all-stock deal worth $15.7 billion. The news comes just days after it emerged that Google is buying one of Tableau's competitors in the data visualization market, Looker. Taken together, the stories highlight the importance of analytics to some of the planet's biggest companies. They suggest that despite years of the big data revolution, it's only now that market-leading platforms are starting to realise that their customers want the level of capabilities offered by the best in the data visualization space. Salesforce shareholders will use their stock to purchase Tableau. As the press release published on the Salesforce site explains "each share of Tableau Class A and Class B common stock will be exchanged for 1.103 shares of Salesforce common stock, representing an enterprise value of $15.7 billion (net of cash), based on the trailing 3-day volume weighted average price of Salesforce's shares as of June 7, 2019." The acquisition is expected to be completed by the end of October 2019. https://twitter.com/tableau/status/1138040596604575750 Why is Salesforce buying Tableau? The deal is an incredible result for Tableau shareholders. At the end of last week, its market cap was $10.7 billion. This has led to some scepticism about just how good a deal this is for Salesforce. One commenter on Hacker News said "this seems really high for a company without earnings and a weird growth curve. Their ticker is cool and maybe sales force [sic] wants to be DATA on nasdaq. Otherwise, it will be hard to justify this high markup for a tool company." With Salesforce shares dropping 4.5% as markets opened this week, it seems investors are inclined to agree - Salesforce is certainly paying a premium for Tableau. However, whatever the long term impact of the acquisition, the price paid underlines the fact that Salesforce views Tableau as exceptionally important to its long term strategy. It opens up an opportunity for Salesforce to reposition and redefine itself as much more than just a CRM platform. It means it can start compete with the likes of Microsoft, which has a full suite of professional and business intelligence tools. Moreover, it also provides the platform with another way of potentially onboarding customers - given Tableau is well-known as a powerful yet accessible data visualization tool, it create an avenue through which new users can find their way to the Salesforce product. Marc Benioff, Chair and co-CEO of Salesforce, said "we are bringing together the world’s #1 CRM with the #1 analytics platform. Tableau helps people see and understand data, and Salesforce helps people engage and understand customers. It’s truly the best of both worlds for our customers--bringing together two critical platforms that every customer needs to understand their world.” Tableau has been a target for Salesforce for some time. Leaked documents from 2016 found that the data visualization was one of 14 companies that Salesforce had an interest in (another was LinkedIn, which would eventually be purchased by Microsoft). Read next: Alteryx vs. Tableau: Choosing the right data analytics tool for your business What's in it for Tableau (aside from the money...)? For Tableau, there are many other benefits of being purchased by Salesforce alongside the money. Primarily this is about expanding the platform's reach - Salesforce users are people who are interested in data with a huge range of use cases. By joining up with Salesforce, Tableau will become their go-to data visualization tool. "As our two companies began joint discussions," Tableau CEO Adam Selipsky said, "the possibilities of what we might do together became more and more intriguing. They have leading capabilities across many CRM areas including sales, marketing, service, application integration, AI for analytics and more. They have a vast number of field personnel selling to and servicing customers. They have incredible reach into the fabric of so many customers, all of whom need rich analytics capabilities and visual interfaces... On behalf of our customers, we began to dream about we might accomplish if we could combine our ability to help people see and understand data with their ability to help people engage and understand customers." What will happen to Tableau? Tableau won't be going anywhere. It will continue to exist under its own brand with the current leadership all remaining, including Selipsky. What does this all mean for the technology market? At the moment, it's too early to say - but the last year or so has seen some major high-profile acquisitions by tech companies. Perhaps we're seeing the emergence of a tooling arms race as the biggest organizations attempt to arm themselves with ecosystems of established market-leading tools. Whether this is good or bad for users remains to be seen, however.  
Read more
  • 0
  • 0
  • 33073

article-image-create-local-ubuntu-repository-using-apt-mirror-and-apt-cacher
Packt
05 Oct 2009
7 min read
Save for later

Create a Local Ubuntu Repository using Apt-Mirror and Apt-Cacher

Packt
05 Oct 2009
7 min read
How can a company or organization minimize bandwidth costs when maintaining multiple Ubuntu installations? With bandwidth becoming the currency of the new millennium, being responsible with the bandwidth you have can be a real concern. In this article by Christer Edwards, we will learn how to create, maintain and make available a local Ubuntu repository mirror, allowing you to save bandwidth and improve network efficiency with each machine you add to your network. (For more resources on Ubuntu, see here.) Background Before I begin, let me give you some background into what prompted me to initially come up with these solutions. I used to volunteer at local university user groups whenever they held "installfests". I always enjoyed offering my support and expertise with new Linux users. I have to say, the distribution that we helped deploy the most was Ubuntu. The fact that Ubuntu's parent company, Canonical, provided us with pressed CDs didn't hurt! Despite having more CDs than we could handle we still regularly ran into the same problem. Bandwidth. Fetching updates and security errata was always our bottleneck, consistently slowing down our deployments. Imagine you are in a conference room with dozens of other Linux users, each of them trying to use the internet to fetch updates. Imagine how slow and bogged down that internet connection would be. If you've ever been to a large tech conference you've probably experienced this first hand! After a few of these "installfests" I started looking for a solution to our bandwidth issue. I knew that all of these machines were downloading the same updates and security errata, therefore duplicating work that another user had already done. It just seemed so inefficient. There had to be a better way. Eventually I came across two solutions, which I will outline below. Apt-Mirror The first method makes use of a tool called “apt-mirror”. It is a perl-based utility for downloading and mirroring the entire contents of a public repository. This may likely include packages that you don't use and will not use, but anything stored in a public repository will also be stored in your mirror. To configure apt-mirror you will need the following: apt-mirror package (sudo aptitude install apt-mirror) apache2 package (sudo aptitude install apache2) roughly 15G of storage per release, per architecture Once these requirements are met you can begin configuring the apt-mirror tool. The main items that you'll need to define are: storage location (base_path) number of download threads (nthreads) release(s) and architecture(s) you want The configuration is done in the /etc/apt/mirror.list file. A default config was put in place when you installed the package, but it is generic. You'll need to update the values as mentioned above. Below I have included a complete configuration which will mirror Ubuntu 8.04 LTS for both 32 and 64 bit installations. It will require nearly 30G of storage space, which I will be putting on a removable drive. The drive will be mounted at /media/STORAGE/. # apt-mirror configuration file#### The default configuration options (uncomment and change to override)###set base_path /media/STORAGE/# set mirror_path $base_path/mirror# set skel_path $base_path/skel# set var_path $base_path/var## set defaultarch <running host architecture>set nthreads 20## 8.04 "hardy" i386 mirrordeb-i386 http://us.archive.ubuntu.com/ubuntu hardy main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-security main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-backports main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy-proposed main restricted universe multiversedeb-i386 http://us.archive.ubuntu.com/ubuntu hardy main/debian-installer restricted/debian-installer universe/debian-installer multiverse/debian-installerdeb-i386 http://packages.medibuntu.org/ hardy free non-free# 8.04 "hardy" amd64 mirrordeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-updates main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-security main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-backports main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy-proposed main restricted universe multiversedeb-amd64 http://us.archive.ubuntu.com/ubuntu hardy main/debian-installer restricted/debian-installer universe/debian-installer multiverse/debian-installerdeb-amd64 http://packages.medibuntu.org/ hardy free non-free# Cleaning sectionclean http://us.archive.ubuntu.com/clean http://packages.medibuntu.org/ It should be noted that each of the repository lines within the file should begin with deb-i386 or deb-amd64. Formatting may have changed based on the web formatting. Once your configuration is saved you can begin to populate your mirror by running the command: apt-mirror Be warned that, based on your internet connection speeds, this could take quite a long time. Hours, if not more than a day for the initial 30G to download. Each time you run the command after the initial download will be much faster, as only incremental updates are downloaded. You may be wondering if it is possible to cancel the transfer before it is finished and begin again where it left off. Yes, I have done this countless times and I have not run into any issues. You should now have a copy of the repository stored locally on your machine. In order for this to be available to other clients you'll need to share the contents over http. This is why we listed Apache as a requirement above. In my example configuration above I mirrored the repository on a removable drive, and mounted it at /media/STORAGE. What I need to do now is make that address available over the web. This can be done by way of a symbolic link. cd /var/www/sudo ln -s /media/STORAGE/mirror/us.archive.ubuntu.com/ubuntu/ ubuntu The commands above will tell the filesystem to follow any requests for “ubuntu” back up to the mounted external drive, where it will find your mirrored contents. If you have any problems with this linking double-check your paths (as compared to those suggested here) and make sure your link points to the ubuntu directory, which is a subdirectory of the mirror you pulled from. If you point anywhere below this point your clients will not properly be able to find the contents. The additional task of keeping this newly downloaded mirror updated with the latest updates can be automated by way of a cron job. By activating the cron job your machine will automatically run the apt-mirror command on a regular daily basis, keeping your mirror up to date without any additional effort on your part. To activate the automated cron job, edit the file /etc/cron.d/apt-mirror. There will be a sample entry in the file. Simply uncomment the line and (optionally) change the “4” to an hour more convenient for you. If left with its defaults it will run the apt-mirror command, updating and synchronizing your mirror, every morning at 4:00am. The final step needed, now that you have your repository created, shared over http and configured to automatically update, is to configure your clients to use it. Ubuntu clients define where they should grab errata and security updates in the /etc/apt/sources.list file. This will usually point to the default or regional mirror. To update your clients to use your local mirror instead you'll need to edit the /etc/apt/sources.list and comment the existing entries (you may want to revert to them later!) Once you've commented the entries you can create new entries, this time pointing to your mirror. A sample entry pointing to a local mirror might look something like this: deb http://192.168.0.10/ubuntu hardy main restricted universe multiversedeb http://192.168.0.10/ubuntu hardy-updates main restricted universe multiversedeb http://192.168.0.10/ubuntu hardy-security main restricted universe multiverse Basically what you are doing is recreating your existing entries but replacing the archive.ubuntu.com with the IP of your local mirror. As long as the mirrored contents are made available over http on the mirror-server itself you should have no problems. If you do run into problems check your apache logs for details. By following these simple steps you'll have your own private (even portable!) Ubuntu repository available within your LAN, saving you future bandwidth and avoiding redundant downloads.
Read more
  • 0
  • 0
  • 33055
article-image-working-with-kibana-in-elasticsearch-5-x
Savia Lobo
26 Jan 2018
9 min read
Save for later

Working with Kibana in Elasticsearch 5.x

Savia Lobo
26 Jan 2018
9 min read
[box type="note" align="" class="" width=""]Below given post is a book excerpt from Mastering Elasticsearch 5.x written by  Bharvi Dixit. This book introduces you to the new features of Elasticsearch 5.[/box] The following article showcases Kibana, a tool belongs to the Elastic Stack, and used for visualization and exploration of data residing in Elasticsearch. One can install Kibana and start to explore Elasticsearch indices in minutes — no code, no additional infrastructure required. If you have been using an older version of Kibana, you will notice that it has transformed altogether in terms of functionality. Note: This URL has all the latest changes done in Kibana 5.0: https://www.elastic.co/guide/en/kibana/current/breaking-changes- 5.0.html. Installing Kibana Similar to other Elastic Stack tools, you can visit the following URL to download Kibana 5.0.0, as per your operating system distribution: https://www.elastic.co/downloads/past-releases/kibana-5-0-0 An example of downloading and installing Kibana from the Debian package. First of all, download the package: https://artifacts.elastic.co/downloads/kibana/kibana-5.0.0-amd64.deb Then install it using the following command: sudo dpkg -i kibana-5.0.0-amd64.deb Kibana configuration Once installed, you can find the Kibana configuration file, kibana.yml, inside the/etc/kibana/ directory. All the settings related to Kibana are done only in this file. There is a big list of configuration options available inside the Kibana settings which you can learn about here: https://www.elastic.co/guide/en/kibana/current/settings.html. Starting Kibana Kibana can be started using the following command and it will be started on port 5601 bounded on localhost by default: sudo service kibana start Exploring and visualizing data on Kibana Now all the components of Elastic Stack are installed and configured, we can start exploring the awesomeness of Kibana visualizations. Kibana 5.x is supported on almost all of the latest major web browsers, including Internet Explorer 11+. To load Kibana, you just need to type localhost:5601 in your web browser. You will see different options available in the left panel of the screen, as shown in following figure: These different options are used for the following purposes: Discover: Used for data exploration where you get the access of each field along with a default time. Visualize: Used for creating visualizations of the data in your Elasticsearch indices. You can then build dashboards that display related visualizations. Dashboard: Used to display a collection of saved visualizations. Timelion: A time series data visualizer that enables you to combine totally independent data sources within a single visualization. It is based on simple expression  language. Management: A place where you perform your runtime configuration of Kibana, including both the initial setup and ongoing configuration of index patterns, advanced settings that tweak the behaviors of Kibana itself and saved objects. Dev Tools: Contains the console which is based on the Sense plugin and allows you to write Elasticsearch commands in one tab and see the responses of those commands in the other tab. Understanding the Kibana Management screen The Management screen has three tabs available: Index Patterns: For selecting and configuring index names Saved Objects: Where all of your saved visualizations, searches, and dashboards are located Advanced Settings: Contains advanced settings of Kibana: As you can see on the management screen, the very first tab is for Index Patterns. Kibana is asking you to configure an index pattern so that it can load all the mappings and settings from the defined index. It defaults to logstash-*; you can add as many index patterns or absolute index names as you want and can select them while creating the visualization. Since we do have an index already available with the logstash-* pattern, when you click on the Time-field name drop-down list, you will find that it will show you two fields, @timestamp and received_at, which are of the date type, as shown in following screenshot: We will select the @timestamp field and hit the Create button. As soon as you do it, the following screen appears: In the above screenshot, you can see that Kibana has loaded all the mappings from our Logstash index. In addition, you can see three labels in blue (for marking this index as the default), yellow (for reloading the mappings; this is needed if you have updated the mapping after selecting the index pattern), and red (for deleting this index pattern altogether from Kibana). The second tab on the management screen is about saved objects, which contain all of your saved visualizations, searches, and dashboards as you can see in the following screenshot. Please note that you can see the imported dashboards and visualizations from Metricbeat here, which we have done a while ago. The third option is for Advanced Settings and you should not play with the settings shown on this page if you are not aware of the tweaking effects. Discovering data on Kibana When you move to the Discover page of Kibana, you will see a screen similar to the following: Setting the time range and auto-refresh interval Please note that Kibana by default loads the data of the last 15 minutes, which you change by clicking on the clock sign which you can find in the top-right corner of the screen and selecting the desired time range. We have shown it in the following screenshot: One more thing to take look out for is that, after clicking on this clock sign, apart from time- based settings, you will see one more option in the top corner with the name Auto-refresh. This setting tells Kibana how often it needs to query Elasticsearch. When you click on this setting, you will get the option to choose either to completely turn off the auto-refresh or select the desired time interval. Adding fields for exploration and using the search panel As you can see in the following screenshot, you have all your fields available inside your index. On the Visualization screen, by default Kibana shows the timestamp and _source field but you can add your selected fields from the left panel by just moving the cursor on them and then clicking Add. Similarly, if you want to remove the field from the column, just move the cursor to the field's name on the column heading and click on the cross icon. In addition, Kibana also provides you with a search panel in which you can write queries. For example, in the following screenshot, I have searched for the logstash keyword inside the syslog_message field. When you hit the search button, the search text gets highlighted inside the rendered responses: Exploring more options on the Visualization page On Kibana, you will see lots of small arrow signs to open or collapse the sections/settings. You will see one of these arrows in the following image, in the bottom-left corner (I have also added a custom text on the image just beside the arrow): When you click on this arrow, the time series histogram gets hidden and you get to see the following screen, which contains multiple properties such as Table, which contains the histogram data in tabular format; Request, which contains the actual JSON query sent to Elasticsearch; Response, which contains the JSON response returned from Elasticsearch; and Statistics, which shows the query execution time and number of hits matching the query: Using the Dashboard screen to create/load dashboards When you click on the Dashboard panel, you first get a blank screen with some options, such as New for creating a dashboard and Open to open an existing dashboard, along with some more options. If you are creating a dashboard from scratch, you will have to add the built visualizations onto it and then save it using some name. But since we already have a dashboard available which we imported using Metricbeat, we will click Open and you will see something similar to the following screenshot on your Kibana page: Please note that if you do not have Apache installed on your system, selecting the first option, Metricbeat – Apache HTTPD server status, will load a blank dashboard. You can select any other title; for example, if you select the second option, you will see a dashboard similar to the following: Editing an existing visualization When you move the cursor on the visualizations presented on the dashboard, you will notice that a pencil sign appears, as shown in the following screenshot: When you click on that pencil sign, it will open that particular visualization inside the visualization editor panel, as shown in the following screenshot. Here you can edit the properties and either override the same visualization or save it using some other name: Please note that if you want to create a visualization from scratch, just click on the Visualize option on the left-hand side and it will guide you through the steps of creating the visualization. Kibana provides almost 10 types of visualizations. To get the details about working with each type of visualization, please follow the official documentation of Kibana on this link: https://www.elastic.co/guide/en/kibana/master/createvis.html. Using Sense Inside the Dev-Tools option, you can find the console for Kibana, which was previously known as Sense Editor. This is one of the most wonderful tools to help you speed up the learning curve of Elasticsearch since it provides auto-suggestions for all the endpoints and queries, as shown in the following screenshot: You will see that the Kibana Console is divided into two parts; the left part is where you write your queries/requests, and after clicking the green arrow, the response from Elasticsearch is rendered inside the right-hand panel:   To summarize we explained how to work with the Kibana tool in Elasticsearch 5.x. We explored installation of  Kibana, Kibana configuration, and moving ahead with exploring and visualizing data using Kibana. If you enjoyed this excerpt, and want to get an understanding of how you can scale your ElasticSearch cluster to contextualize it and improve its performance, check out the book Mastering Elasticsearch 5.x.    
Read more
  • 0
  • 0
  • 33036

article-image-how-sql-server-handles-data-under-the-hood
Sunith Shetty
27 Feb 2018
11 min read
Save for later

How SQL Server handles data under the hood

Sunith Shetty
27 Feb 2018
11 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Marek Chmel and Vladimír Mužný titled SQL Server 2017 Administrator's Guide. In this book, you will learn the required skills needed to successfully create, design, and deploy database using SQL Server 2017.[/box] Today, we will explore how SQL Server handles data as it is of utmost importance to get an understanding of what, when, and why data should be backed. Data structures and transaction logging We can think about a database as of physical database structure consisting of tables and indexes. However, this is just a human point of view. From the SQL Server's perspective, a database is a set of precisely structured files described in a form of metadata also saved in database structures. A conceptual imagination of how every database works is very helpful when the database has to be backed up correctly. How data is stored Every database on SQL Server must have at least two files: The primary data file with the usual suffix, mdf The transaction log file with the usual suffix, ldf For lots of databases, this minimal set of files is not enough. When the database contains big amounts of data such as historical tables, or the database has big data contention such as production tracking systems, it's good practise to design more data files. Another situation when a basic set of files is not sufficient can arise when documents or pictures would be saved along with relational data. However, SQL Server still is able to store all of our data in the basic file set, but it can lead to a performance bottlenecks and management issues. That's why we need to know all possible storage types useful for different scenarios of deployment. A complete structure of files is depicted in the following image: Database A relational database is defined as a complex data type consisting of tables with a given amount of columns, and each column has its domain that is actually a data type (such as an integer or a date) optionally complemented by some constraints. From SQL Server's perspective, the database is a record written in metadata and containing the name of the database, properties of the database, and names and locations of all files or folders representing storage for the database. This is the same for user databases as well as for system databases. System databases are created automatically during SQL Server installation and are crucial for correct running of SQL Server. We know five system databases. Database master Database master is crucial for the correct running of SQL Server service. In this database is stored data about logins, all databases and their files, instance configurations, linked servers, and so on. SQL Server finds this database at startup via two startup parameters, -d and -l, followed by paths to mdf and ldf files. These parameters are very important in situations when the administrator wants to move the master's files to a different location. Changing their values is possible in the SQL Server Configuration Manager in the SQL Server service Properties dialog on the tab called startup parameters. Database msdb The database msdb serves as the SQL Server Agent service, Database Mail, and Service Broker. In this database are stored job definitions, operators, and other objects needed for administration automation. This database also stores some logs such as backup and restore events of each database. If this database is corrupted or missing, SQL Server Agent cannot start. Database model Database model can be understood as a template for every new database while it is created. During a database creation (see the CREATE DATABASE statement on MSDN), files are created on defined paths and all objects, data and properties of database model are created, copied, and set into the new database during its creation. This database must always exist on the instance, because when it's corrupted, database tempdb can be created at instance start up! Database tempdb Even if database tempdb seems to be a regular database like many others, it plays a very special role in every SQL Server instance. This database is used by SQL Server itself as well as by developers to save temporary data such as table variables or static cursors. As this database is intended for a short lifespan (temporary data only, which can be stored during execution of stored procedure or until session is disconnected), SQL Server clears this database by truncating all data from it or by dropping and recreating this database every time when it's started. As the tempdb database will never contain durable data, it has some special internal behavior and it's the reason why accessing data in this database is several times faster than accessing durable data in other databases. If this database is corrupted, restart SQL Server. Database resourcedb The resourcedb is fifth in our enumeration and consists of definitions for all system objects of SQL Server, for example, sys.objects. This database is hidden and we don't need to care about it that much. It is not configurable and we don't use regular backup strategies for it. It is always placed in the installation path of SQL Server (to the binn directory) and it's backed up within the filesystem backup. In case of an accident, it is recovered as a part of the filesystem as well. Filegroup Filegroup is an organizational metadata object containing one or more data files. Filegroup does not have its own representation in the filesystem--it's just a group of files. When any database is created, a filegroup called primary is always created. This primary filegroup always contains the primary data file. Filegroups can be divided into the following: Row storage filegroups: These filegroup can contain data files (mdf or ndf). Filestream filegroups: This kind of filegroups can contain not files but folders to store binary data. In-memory filegroup: Only one instance of this kind of filegroup can be created in a database. Internally, it is a special case of filestream filegroup and it's used by SQL Server to persist data from in-memory tables. Every filegroup has three simple properties: Name: This is a descriptive name of the filegroup. The name must fulfill the naming convention criteria. Default: In a set of filegroups of the same type, one of these filegroups has this option set to on. This means that when a new table or index is created without explicitly specified to which filegroup it has to store data in, the default filegroup is used. By default, the primary filegroup is the default one. Read-only: Every filegroup, except the primary filegroup, could be set to read- only. Let's say that a filegroup is created for last year's history. When data is moved from the current period to tables created in this historical filegroup, the filegroup could be set as read-only, and later the filegroup cannot be backed up again and again. It is a very good approach to divide the database into smaller parts-- filegroups with more files. It helps in distributing data across more physical storage and also makes the database more manageable; backups can be done part by part in shorter times, which better fit into a service window. Data files Every database must have at least one data file called primary data file. This file is always bound to the primary filegroup. In this file is all the metadata of the database, such as structure descriptions (could be seen through views such as sys.objects, sys.columns, and others), users, and so on. If the database does not have other data files (in the same or other filegroups), all user data is also stored in this file, but this approach is good enough just for smaller databases. Considering how the volume of data in the database grows over time, it is a good practice to add more data files. These files are called secondary data files. Secondary data files are optional and contain user data only. Both types of data files have the same internal structure. Every file is divided into 8 KB small parts called data pages. SQL Server maintains several types of data pages such as data, data pages, index pages, index allocation maps (IAM) pages to locate data pages of tables or indexes, global allocation map (GAM) and shared global allocation maps (SGAM) pages to address objects in the database, and so on. Regardless of the type of a certain data page, SQL Server uses a data page as the smallest unit of I/O operations between hard disk and memory. Let's describe some common properties: A data page never contains data of several objects Data pages don't know each other (and that's why SQL Server uses IAMs to allocate all pages of an object) Data pages don't have any special physical ordering A data row must always fit in size to a data page These properties could seem to be useless but we have to keep in mind that when we know these properties, we can better optimize and manage our databases. Did you know that a data page is the smallest storage unit that can be restored from backup? As a data page is quite a small storage unit, SQL Server groups data pages into bigger logical units called extents. An extent is a logical allocation unit containing eight coherent data pages. When SQL Server requests data from disk, extents are read into memory. This is the reason why 64 KB NTFS clusters are recommended to format disk volumes for data files. Extents could be uniform or mixed. Uniform extent is a kind of extent containing data pages belonging to one object only; on the other hand, a mixed extent contains data pages of several objects. Transaction log When SQL Server processes any transaction, it works in a way called two-phase commit. When a client starts a transaction by sending a single DML request or by calling the BEGIN TRAN command, SQL Server requests data pages from disk to memory called buffer cache and makes the requested changes in these data pages in memory. When the DML request is fulfilled or the COMMIT command comes from the client, the first phase of the commit is finished, but data pages in memory differ from their original versions in a data file on disk. The data page in memory is in a state called dirty. When a transaction runs, a transaction log file is used by SQL Server for a very detailed chronological description of every single action done during the transaction. This description is called write-ahead-logging, shortly WAL, and is one of the oldest processes known on SQL Server. The second phase of the commit usually does not depend on the client's request and is an internal process called checkpoint. Checkpoint is a periodical action that: searches for dirty pages in buffer cache, saves dirty pages to their original data file location, marks these data pages as clean (or drops them out of memory to free memory space), marks the transaction as checkpoint or inactive in the transaction log. Write-ahead-logging is needed for SQL Server during recovery process. Recovery process is started on every database every time SQL Server service starts. When SQL Server service stops, some pages could remain in a dirty state and they are lost from memory. This can lead to two possible situations: The transaction is completely described in the transaction log, the new content of the data page is lost from memory, and data pages are not changed in the data file The transaction was not completed at the moment SQL Server stopped, so the transaction cannot be completely described in the transaction log as well, data pages in memory were not in a stable state (because the transaction was not finished and SQL Server cannot know if COMMIT or ROLLBACK will occur), and the original version of data pages in data files is intact SQL Server decides these two situations when it's starting. If a transaction is complete in the transaction log but was not marked as checkpoint, SQL Server executes this transaction again with both phases of COMMIT. If the transaction was not complete in the transaction log when SQL Server stopped, SQL Server will never know what was the user's intention with the transaction and the incomplete transaction is erased from the transaction log as if it had never started. The aforementioned described recovery process ensures that every database is in the last known consistent state after SQL Server's startup. It's crucial for DBAs to understand write-ahead-logging when planning a backup strategy because when restoring the database, the administrator has to recognize if it's time to run the recovery process or not. To summarize, we introduced internal data handling as it is important not only during performance backups and restores but also for optimizing a database. If you are interested to know more about how to backup, recover and secure SQL Server, do checkout this book SQL Server 2017 Administrator's Guide.  
Read more
  • 0
  • 0
  • 33020

article-image-writing-perform-test-functions-in-golang-tutorial
Natasha Mathur
10 Jul 2018
9 min read
Save for later

Writing test functions in Golang [Tutorial]

Natasha Mathur
10 Jul 2018
9 min read
Go is a modern programming language built for the 21st-century application development. Hardware and technology have advanced significantly over the past decade, and most of the other languages do not take advantage of these technological advancements.  Go allows us to build network applications that take advantage of concurrency and parallelism made available with multicore systems. Testing is an important part of programming, whether it is in Go or in any other language. Go has a straightforward approach to writing tests, and in this tutorial, we will look at some important tools to help with testing. This tutorial is an excerpt from the book ‘Distributed computing with Go’, written by V.N. Nikhil Anurag. There are certain rules and conventions we need to follow to test our code. They are as follows: Source files and associated test files are placed in the same package/folder The name of the test file for any given source file is <source-file-name>_test.go Test functions need to have the "Test" prefix, and the next character in the function name should be capitalized In the remainder of this tutorial, we will look at three files and their associated tests: variadic.go and variadic_test.go addInt.go and addInt_test.go nil_test.go (there isn't any source file for these tests) Along the way, we will introduce any concepts we might use. variadic.go function In order to understand the first set of tests, we need to understand what a variadic function is and how Go handles it. Let's start with the definition: Variadic function is a function that can accept any number of arguments during function call. Given that Go is a statically typed language, the only limitation imposed by the type system on a variadic function is that the indefinite number of arguments passed to it should be of the same data type. However, this does not limit us from passing other variable types. The arguments are received by the function as a slice of elements if arguments are passed, else nil, when none are passed. Let's look at the code to get a better idea: // variadic.go package main func simpleVariadicToSlice(numbers ...int) []int { return numbers } func mixedVariadicToSlice(name string, numbers ...int) (string, []int) { return name, numbers } // Does not work. // func badVariadic(name ...string, numbers ...int) {} We use the ... prefix before the data type to define a function as a variadic function. Note that we can have only one variadic parameter per function and it has to be the last parameter. We can see this error if we uncomment the line for badVariadic and try to test the code. variadic_test.go We would like to test the two valid functions, simpleVariadicToSlice, and mixedVariadicToSlice, for various rules defined above. However, for the sake of brevity, we will test these: simpleVariadicToSlice: This is for no arguments, three arguments, and also to look at how to pass a slice to a variadic function mixedVariadicToSlice: This is to accept a simple argument and a variadic argument Let's now look at the code to test these two functions: // variadic_test.go package main import "testing" func TestSimpleVariadicToSlice(t *testing.T) { // Test for no arguments if val := simpleVariadicToSlice(); val != nil { t.Error("value should be nil", nil) } else { t.Log("simpleVariadicToSlice() -> nil") } // Test for random set of values vals := simpleVariadicToSlice(1, 2, 3) expected := []int{1, 2, 3} isErr := false for i := 0; i < 3; i++ { if vals[i] != expected[i] { isErr = true break } } if isErr { t.Error("value should be []int{1, 2, 3}", vals) } else { t.Log("simpleVariadicToSlice(1, 2, 3) -> []int{1, 2, 3}") } // Test for a slice vals = simpleVariadicToSlice(expected...) isErr = false for i := 0; i < 3; i++ { if vals[i] != expected[i] { isErr = true break } } if isErr { t.Error("value should be []int{1, 2, 3}", vals) } else { t.Log("simpleVariadicToSlice([]int{1, 2, 3}...) -> []int{1, 2, 3}") } } func TestMixedVariadicToSlice(t *testing.T) { // Test for simple argument & no variadic arguments name, numbers := mixedVariadicToSlice("Bob") if name == "Bob" && numbers == nil { t.Log("Recieved as expected: Bob, <nil slice>") } else { t.Errorf("Received unexpected values: %s, %s", name, numbers) } } Running tests in variadic_test.go Let's run these tests and see the output. We'll use the -v flag while running the tests to see the output of each individual test: $ go test -v ./{variadic_test.go,variadic.go} === RUN TestSimpleVariadicToSlice --- PASS: TestSimpleVariadicToSlice (0.00s) variadic_test.go:10: simpleVariadicToSlice() -> nil variadic_test.go:26: simpleVariadicToSlice(1, 2, 3) -> []int{1, 2, 3} variadic_test.go:41: simpleVariadicToSlice([]int{1, 2, 3}...) -> []int{1, 2, 3} === RUN TestMixedVariadicToSlice --- PASS: TestMixedVariadicToSlice (0.00s) variadic_test.go:49: Received as expected: Bob, <nil slice> PASS ok command-line-arguments 0.001s addInt.go The tests in variadic_test.go elaborated on the rules for the variadic function. However, you might have noticed that TestSimpleVariadicToSlice ran three tests in its function body, but go test treats it as a single test. Go provides a good way to run multiple tests within a single function, and we shall look them in addInt_test.go. For this example, we will use a very simple function as shown in this code: // addInt.go package main func addInt(numbers ...int) int { sum := 0 for _, num := range numbers { sum += num } return sum } addInt_test.go You might have also noticed in TestSimpleVariadicToSlice that we duplicated a lot of logic, while the only varying factor was the input and expected values. One style of testing, known as Table-driven development, defines a table of all the required data to run a test, iterates over the "rows" of the table and runs tests against them. Let's look at the tests we will be testing against no arguments and variadic arguments: // addInt_test.go package main import ( "testing" ) func TestAddInt(t *testing.T) { testCases := []struct { Name string Values []int Expected int }{ {"addInt() -> 0", []int{}, 0}, {"addInt([]int{10, 20, 100}) -> 130", []int{10, 20, 100}, 130}, } for _, tc := range testCases { t.Run(tc.Name, func(t *testing.T) { sum := addInt(tc.Values...) if sum != tc.Expected { t.Errorf("%d != %d", sum, tc.Expected) } else { t.Logf("%d == %d", sum, tc.Expected) } }) } } Running tests in addInt_test.go Let's now run the tests in this file, and we are expecting each of the row in the testCases table, which we ran, to be treated as a separate test: $ go test -v ./{addInt.go,addInt_test.go} === RUN TestAddInt === RUN TestAddInt/addInt()_->_0 === RUN TestAddInt/addInt([]int{10,_20,_100})_->_130 --- PASS: TestAddInt (0.00s) --- PASS: TestAddInt/addInt()_->_0 (0.00s) addInt_test.go:23: 0 == 0 --- PASS: TestAddInt/addInt([]int{10,_20,_100})_->_130 (0.00s) addInt_test.go:23: 130 == 130 PASS ok command-line-arguments 0.001s nil_test.go We can also create tests that are not specific to any particular source file; the only criteria is that the filename needs to have the <text>_test.go form. The tests in nil_test.go elucidate on some useful features of the language which the developer might find useful while writing tests. They are as follows: httptest.NewServer: Imagine the case where we have to test our code against a server that sends back some data. Starting and coordinating a full blown server to access some data is hard. The http.NewServer solves this issue for us. t.Helper: If we use the same logic to pass or fail a lot of testCases, it would make sense to segregate this logic into a separate function. However, this would skew the test run call stack. We can see this by commenting t.Helper() in the tests and rerunning go test. We can also format our command-line output to print pretty results. We will show a simple example of adding a tick mark for passed cases and cross mark for failed cases. In the test, we will run a test server, make GET requests on it, and then test the expected output versus actual output: // nil_test.go package main import ( "fmt" "io/ioutil" "net/http" "net/http/httptest" "testing" ) const passMark = "u2713" const failMark = "u2717" func assertResponseEqual(t *testing.T, expected string, actual string) { t.Helper() // comment this line to see tests fail due to 'if expected != actual' if expected != actual { t.Errorf("%s != %s %s", expected, actual, failMark) } else { t.Logf("%s == %s %s", expected, actual, passMark) } } func TestServer(t *testing.T) { testServer := httptest.NewServer( http.HandlerFunc( func(w http.ResponseWriter, r *http.Request) { path := r.RequestURI if path == "/1" { w.Write([]byte("Got 1.")) } else { w.Write([]byte("Got None.")) } })) defer testServer.Close() for _, testCase := range []struct { Name string Path string Expected string }{ {"Request correct URL", "/1", "Got 1."}, {"Request incorrect URL", "/12345", "Got None."}, } { t.Run(testCase.Name, func(t *testing.T) { res, err := http.Get(testServer.URL + testCase.Path) if err != nil { t.Fatal(err) } actual, err := ioutil.ReadAll(res.Body) res.Body.Close() if err != nil { t.Fatal(err) } assertResponseEqual(t, testCase.Expected, fmt.Sprintf("%s", actual)) }) } t.Run("Fail for no reason", func(t *testing.T) { assertResponseEqual(t, "+", "-") }) } Running tests in nil_test.go We run three tests, where two test cases will pass and one will fail. This way we can see the tick mark and cross mark in action: $ go test -v ./nil_test.go === RUN TestServer === RUN TestServer/Request_correct_URL === RUN TestServer/Request_incorrect_URL === RUN TestServer/Fail_for_no_reason --- FAIL: TestServer (0.00s) --- PASS: TestServer/Request_correct_URL (0.00s) nil_test.go:55: Got 1. == Got 1. --- PASS: TestServer/Request_incorrect_URL (0.00s) nil_test.go:55: Got None. == Got None. --- FAIL: TestServer/Fail_for_no_reason (0.00s) nil_test.go:59: + != - FAIL exit status 1 FAIL command-line-arguments 0.003s We looked at how to write test functions in Go, and learned a few interesting concepts when dealing with a variadic function and other useful test functions. If you found this post useful, do check out the book  'Distributed Computing with Go' to learn more about testing, Goroutines, RESTful web services, and other concepts in Go. Why is Go the go-to language for cloud-native development? – An interview with Mina Andrawos Systems programming with Go in UNIX and Linux How to build a basic server-side chatbot using Go
Read more
  • 0
  • 0
  • 32940
article-image-big-data-analysis-using-googles-pagerank
Sugandha Lahoti
14 Dec 2017
8 min read
Save for later

Getting started with big data analysis using Google's PageRank algorithm

Sugandha Lahoti
14 Dec 2017
8 min read
[box type="note" align="" class="" width=""]The article given below is a book excerpt from Java Data Analysis written by John R. Hubbard. Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the aim of discovering useful information. Java is one of the most popular languages to perform your data analysis tasks. This book will help you learn the tools and techniques in Java to conduct data analysis without any hassle. [/box] This post aims to help you learn how to analyse big data using Google’s PageRank algorithm. The term big data generally refers to algorithms for the storage, retrieval, and analysis of massive datasets that are too large to be managed by a single file server. Commercially, these algorithms were pioneered by Google, Google’s PageRank being one of them is considered in this article. Google PageRank algorithm Within a few years of the birth of the web in 1990, there were over a dozen search engines that users could use to search for information. Shortly after it was introduced in 1995, AltaVista became the most popular among them. These search engines would categorize web pages according to the topics that the pages themselves specified. But the problem with these early search engines was that unscrupulous web page writers used deceptive techniques to attract traffic to their pages. For example, a local rug-cleaning service might list "pizza" as a topic in their web page header, just to attract people looking to order a pizza for dinner. These and other tricks rendered early search engines nearly useless. To overcome the problem, various page ranking systems were attempted. The objective was to rank a page based upon its popularity among users who really did want to view its contents. One way to estimate that is to count how many other pages have a link to that page. For example, there might be 100,000 links to https://en.wikipedia.org/wiki/Renaissance, but only 100 to https://en.wikipedia.org/wiki/Ernest_Renan, so the former would be given a much higher rank than the latter. But simply counting the links to a page will not work either. For example, the rug-cleaning service could simply create 100 bogus web pages, each containing a link to the page they want users to view. In 1996, Larry Page and Sergey Brin, while students at Stanford University, invented their PageRank algorithm. It simulates the web itself, represented by a very large directed graph, in which each web page is represented by a node in the graph, and each page link is represented by a directed edge in the graph. The directed graph shown in the figure below could represent a very small network with the same properties: This has four nodes, representing four web pages, A, B, C, and D. The arrows connecting them represent page links. So, for example, page A has a link to each of the other three pages, but page B has a link only to A. To analyze this tiny network, we first identify its transition matrix, M : This square has 16 entries, mij, for 1 ≤ i ≤ 4 and 1 ≤ j ≤ 4. If we assume that a web crawler always picks a link at random to move from one page to another, then mij, equals the probability that it will move to node i from node j, (numbering the nodes A, B, C, and D as 1, 2, 3, and 4). So m12 = 1 means that if it's at node B, there's a 100% chance that it will move next to A. Similarly, m13 = m43 = ½ means that if it's at node C, there's a 50% chance of it moving to A and a 50% chance of it moving to D. Suppose a web crawler picks one of those four pages at random, and then moves to another page, once a minute, picking each link at random. After several hours, what percentage of the time will it have spent at each of the four pages? Here is a similar question. Suppose there are 1,000 web crawlers who obey that transition matrix as we've just described, and that 250 of them start at each of the four pages. After several hours, how many will be on each of the four pages? This process is called a Markov chain. It is a mathematical model that has many applications in physics, chemistry, computer science, queueing theory, economics, and even finance. The diagram in the above figure is called the state diagram for the process, and the nodes of the graph are called the states of the process. Once the state diagram is given, the meaning of the nodes (web pages, in this case) becomes irrelevant. Only the structure of the diagram defines the transition matrix M, and from that we can answer the question. A more general Markov chain would also specify transition probabilities between the nodes, instead of assuming that all transition choices are made at random. In that case, those transition probabilities become the non-zero entries of the M. A Markov chain is called irreducible if it is possible to get to any state from any other state. According to the mathematical theory of Markov chains, if the chain is irreducible, then we can compute the answer to the preceding question using the transition matrix. What we want is the steady state solution; that is, a distribution of crawlers that doesn't change. The crawlers themselves will change, but the number at each node will remain the same. To calculate the steady state solution mathematically, we first have to realize how to apply the transition matrix M. The fact is that if x = (x1 , x2 , x3 , x4 ) is the distribution of crawlers at one minute, and the next minute the distribution is y = (y1 , y2 , y3 , y4 ), then y = Mx , using matrix multiplication. So now, if x is the steady state solution for the Markov chain, then Mx = x. This vector equation gives us four scalar equations in four unknowns: One of these equations is redundant (linearly dependent). But we also know that x1 + x2 + x3 + x4 = 1, since x is a probability vector. So, we're back to four equations in four unknowns. The solution is: The point of that example is to show that we can compute the steady state solution to a static Markov chain by solving an n × n matrix equation, where n is the number of states. By static here, we mean that the transition probabilities mij do not change. Of course, that does not mean that we can mathematically compute the web. In the first place, n > 30,000,000,000,000 nodes! And in the second place, the web is certainly not static. Nevertheless, this analysis does give some insight about the web; and it clearly influenced the thinking of Larry Page and Sergey Brin when they invented the PageRank algorithm. The purpose of the PageRank algorithm is to rank the web pages according to some criteria that would resemble their importance, or at least their frequency of access. The original simple (pre-PageRank) idea was to count the number of links to each page and use something proportional to that count for the rank. Following that line of thought, we can imagine that, if x = (x1 , x2 ,..., xn )T is the page rank for the web (that is, if xj is the relative rank of page j and ∑xj = 1), then Mx = x, at least approximately. Another way to put that is that repeated applications of M to x should nudge x closer and closer to that (unattainable) steady state. That brings us (finally) to the PageRank formula: where ε is a very small positive constant, z is the vector of all 1s, and n is the number of nodes. The vector expression on the right defines the transformation function f which replaces a page rank estimate x with an improved page rank estimate. Repeated applications of this function gradually converge to the unknown steady state. Note that in the formula, f is a function of more than just x. There are really four inputs: x, M, ε , and n. Of course, x is being updated, so it changes with each iteration. But M, ε , and n change too. M is the transition matrix, n is the number of nodes, and ε is a coefficient that determines how much influence the z/n vector has. For example, if we set ε to 0.00005, then the formula becomes: This is how Google's PageRank algorithm can be utilized for the analysis of very large datasets. To learn how to implement this algorithm and various other machine learning algorithms for big data, data visualization, and more using Java, check out this book Java Data Analysis.  
Read more
  • 0
  • 0
  • 32934

article-image-getting-started-kinect
Packt
30 Aug 2013
8 min read
Save for later

Getting Started with Kinect

Packt
30 Aug 2013
8 min read
(For more resources related to this topic, see here.) Before the birth of Microsoft Kinect, few people were familiar with the technology of motion sensing. Similar devices have been invented and developed originally for monitoring aerial and undersea aggressors in wars. Then in the non-military cases, motion sensors are widely used in alarm systems, lighting systems and so on, which could detect if someone or something disrupts the waves throughout a room and trigger predefined events. Although radar sensors and modern infrared motion sensors are used more popularly in our life, we seldom notice their existence, and can hardly make use of these devices in our own applications. But Kinect changed everything from the time it was launched in North America at the end of 2010. Different from most other user input controllers, Kinect enables users to interact with programs without really touching a mouse or a pad, but only through gestures. In a top-level view, a Kinect sensor is made up of an RGB camera, a depth sensor, an IR emitter, and a microphone array, which consists of several microphones for sound and voice recognition. A standard Kinect (for Windows) equipment is shown as follows: The Kinect device The Kinect drivers and software, which are either from Microsoft or from third-party companies, can even track and analyze advanced gestures and skeletons of multiple players. All these features make it possible to design brilliant and exciting applications with handsfree user inputs. And until now, Kinect had already brought a lot of games and software to an entirely new level. It is believed to be the bridge between the physical world we exist in and the virtual reality we create, and a completely new way of interacting with arts and a pro fitable business opportunity for individuals and companies. In this article, we will try to make an interesting game with the popular Kinect technology for user inputs, As Kinect captures the camera and depth images as video streams, we can also merge this view of our real-world environment with virtual elements, which is called Augmented Reality (AR) . This enables users to feel as if they appear and live in a nonexistent world, or something unbelievable exists in the physical earth. In this article, we will first introduce the installation of Kinect hardware and software on personal computers, and then consider a good enough idea compounded of Kinect and augmented reality elements. Before installing the Kinect device on your PCs, obviously you should buy Kinect equipment first. In this article, we will depend on Kinect for Windows or Kinect for Xbox 360, which can be learned about and bought at: http://www.microsoft.com/en-us/kinectforwindows/ http://www.xbox.com/en-US/kinect Please note that you don't need to buy an Xbox 360 at all. Kinect will be connected to PCs so that we can make custom programs for it. An alternative choice is Kinect for Windows, which is located at: http://www.microsoft.com/en-us/kinectforwindows/purchase/ The uses and developments of both will be of no difference for our cases. Installation of Kinect It is strongly suggested that you have a Windows 7 operating system or higher. It can be either 32-bit or 64-bit and with dual-core or faster processors. Linux developers can also benefit from third-party drivers and SDKs to manipulate Kinect components. Before we start to discuss the software installation, you can download both the Microsoft Kinect SDK and the Developer Toolkit from: http://www.microsoft.com/en-us/kinectforwindows/develop/developerdownloads.aspx In this article, we prefer to develop Kinect-based applications using Kinect SDK Version 1.5 (or higher versions) and the C++ language. Later versions should be backward compatible so that the source code provided in this article doesn't need to be changed. Setting up your Kinect software on PCs After we have downloaded the SDK and the Developer Toolkit, it's time for us to install them on the PC and ensure that they can work with the Kinect hardware. Let's perform the following steps: Run the setup executable with administrator permissions. Select I agree to the license terms and conditions after reading the License Agreement. The Kinect SDK setup dialog Follow the steps until the SDK installation has finished. And then, install the toolkit following similar instructions. The hardware installation is easy: plug the ends of the cable into the USB port and a power point, and plug the USB into your PC. Wait for the drivers to be found automatically. Now, start the Developer Toolkit Browser, choose Samples: C++ from the tabs, and find and run the sample with the name Skeletal Viewer. You should be able to see a new window demonstrating the depth/ skeleton/color images of the current physical scene, which is similar to the following image: The depth (left), skeleton (middle), and color (right) images read from Kinect Why did I do that? We chose to set up the SDK software at first so that it will install the motor and camera drivers, the APIs, and the documentations, as well as the toolkit including resources and samples onto the PC. If the operation steps are inversed, that is, the hardware is connected before installing the SDK, your Windows OS may not be able to recognize the device. Just start the SDK setup at this time and the device should be identified again during the installation process. But before actually using Kinect, you still have to ensure there is nothing between the device and you (the player). And it's best to keep the play space at least 1.8 m wide and about 1.8 m to 3.6 m long from the sensor. If you have more than one Kinect device, don't keep them face-to-face as there may be infrared interference between them. If you have multiple Kinects to install on the same PC, please note that one USB root hub can have one and only one Kinect connected. The problem happens because Kinect takes over 50 percent of the USB bandwidth, and it needs an individual USB controller to run. So plugging more than one device on the same USB hub means only one of them will work. The depth image at the left in the preceding image shows a human (in fact, the author) standing in front of the camera. Some parts may be totally black if they are too near (often less than 80 cm), or too far (often more than 4 m). If you are using Kinect for Windows, you can turn on Near Mode to show objects that are near the camera; however, Kinect for Xbox 360 doesn't have such features. You can read more about the software and hardware setup at: http://www.microsoft.com/en-us/kinectforwindows/purchase/sensor_setup.aspx The idea of the AR-based Fruit Ninja game Now it's time for us to define the goal we are going to achieve in this article. As a quick but practical guide for Kinect and augmented reality, we should be able to make use of the depth detection, video streaming, and motion tracking functionalities in our project. 3D graphics APIs are also important here because virtual elements should also be included and interacted with irregular user inputs not common mouse or keyboard inputs). A fine example is the Fruit Ninja game, which is already a very popular game all over the world. Especially on mobile devices like smartphones and pads, you can see people destroy different kinds of fruits by touching and swiping their fingers on the screen. With the help of Kinect, our arms can act as blades to cut off flying fruits, and our images can also be shown along with the virtual environment so that we can determine the posture of our bodies and position of our arms through the screen display. Unfortunately, this idea is not fresh enough for now. Already, there are commercial products with similar purposes available in the market; for example: http://marketplace.xbox.com/en-US/Product/Fruit-Ninja-Kinect/66acd000-77fe-1000-9115-d80258410b79 But please note that we are not going to design a completely different product here, or even bring it to the market after finishing this article. We will only learn how to develop Kinect-based applications, work in our own way from the very beginning, and benefit from the experience in our professional work or as an amateur. So it is okay to reinvent the wheel this time, and have fun in the process and the results. Summary Kinect, which is a portmanteau of the words "kinetic" and "connect", is a motion sensor developed and released by Microsoft. It provides a natural user interface (NUI) for tracking and manipulating handsfree user inputs such as gestures and skeleton motions. It can be considered as one of the most successful consumer electronics device in recent years, and we will be using this novel device to build the Fruit Ninja game in this article. We will focus on developing Kinect and AR-based applications on Windows 7 or higher using the Microsoft Kinect SDK 1.5 (or higher) and the C++ programming language. Mainly, we have introduced how to install Kinect for Windows SDK in this article. Resources for Article : Further resources on this subject: So, what is KineticJS? [Article] Mission Running in EVE Online [Article] Making Money with Your Game [Article]
Read more
  • 0
  • 0
  • 32932
Modal Close icon
Modal Close icon