How-To Tutorials

article-image-understanding-text-search-and-hierarchies-sap-hana

20 Oct 2015

9 min read

Understanding Text Search and Hierarchies in SAP HANA

20 Oct 2015

0
0
16391

article-image-determining-resource-utilization-requirements

Packt

17 Feb 2014

11 min read

Determining resource utilization requirements

Packt

17 Feb 2014

11 min read

(For more resources related to this topic, see here.) For those hoping to find a magical catch-all formula that will work in every scenario, you'll have to keep looking. Remember every environment is unique, and even where similarities may arise, the use case your organization has, will most likely be different from another organization. Beyond your specific VM resource requirements, the hosts you are installing ESXi on will also vary; the hardware available to you will affect your consolidation ratio (the number of virtual machines you can fit on a single host). For example, if you have 10 servers that you want to virtualize, and you have determined each requires 4 GB of RAM, you might easily virtualize those 10 servers on a host with 48 GB of memory. However, if your host only has 16 GB of memory, you may need two or three hosts in order to achieve the required performance. Another important aspect to consider is when to collect resource utilization statistics about your servers. Think about the requirements you have for a specific server; let's use your finance department as an example. You can certainly collect resource statistics over a period of time in the middle of the month, and that might work just fine; however, the people in your finance department are more likely to utilize the system heavily during the first few days of the month as they are working on their month end processes. If you collect resource statistics on the 15th, you might miss a huge increase in resource utilization requirements, which could lead to the system not working as expected, making unhappy users. One last thing before we jump into some example statistics; you should consider collecting these statistics over at least two periods for each server: First, during the normal business hours of your organization or the specific department, during a time when systems are likely to be heavily utilized The second round should include an entire day or week so you are aware of the impact of after hours tasks such as backups and anti-virus scans on your environment It's important to have a strong understanding of the use cases for all the systems you will be virtualizing. If you are running your test during the middle of the month, you might miss the increase of traffic for systems utilized heavily only at the end of the month, for example, accounting systems. The more information you collect, the better prepared you will be to determine your resource utilization requirements. There are quite a few commercial tools available to help determine the specific resource requirements for your environment. In fact, if you have an active project and/or budget, check with your server and storage vendor as they can most likely provide tools to assess your environment over a period of time to help you collect this information. If you work with a VMware Partner or the VMware Professional Services Organization (PSO), you could also work with them to run a tool called VMware Capacity Planner. This tool is only available to partners who have passed the corresponding partner exams. For purposes of this article, however, we will look at the statistics we can capture natively within an operating system, for example, using Performance Monitor on Windows and the sar command in Linux. If you are an OS X user, you might be wondering why we are not touching OS X. This is because while Apple allows virtualizing OS X 10.5 and later, it is only supported on the Apple hardware and is not likely an everyday use case. If your organization requires virtualizing OSX, ESXi 5.1 is supported on specific Mac Pro desktops with Intel Xeon 5600 series processors and 5.0 is supported on Xserve using Xeon 5500 series processors. The current Apple license agreement allows virtualizing OSX 10.5 and up; of course, you should check for the latest agreement to ensure you are adhering to the license agreement. Monitoring common resource statistics From a statistics perspective, there are four main types of resources you generally monitor: CPU, memory, disk, and network. Unless you have a very chatty application, network utilization is generally very low, but this doesn't mean we won't check on it; however, we probably won't dedicate as much time to it as we do for CPU, memory, and disk. As we think about the CPU and memory, we generally look at utilization in terms of percentages. When we look at example servers, you will see that having an accurate inventory of the physical server is important so we can properly gauge the virtual CPU and memory requirements when we virtualize. If a physical server has dual quad core CPUs and 16 GB of memory, it does not necessarily mean we want to provide the same amount of virtual resources. Disk performance is where many people spend the least amount of time, and those people generally have the most headaches after they have virtualized. Disk performance is probably the most critical aspect to think about when you are planning your virtualization project. Most people only think of storage in terms of storage capacity, generally gigabytes (GB) or terabytes (TB). However, from a server perspective, we are mostly concerned with the amount of input and output per second, otherwise known as IOPS and throughput. We break down IOPS in into reads and writes per second and then their ratio by comparing one with the other. Understanding your I/O patterns will help you design your storage architecture to properly support all your applications. Storage design and understanding is an art and science by itself. Sample workload Let's break this down into a practical example so we can see how we are applying these concepts. In this example, we will look at two different types of servers that are likely to have various resource requirements: Windows Active Directory Domain Controller and a CentOS Apache web server. In this scenario, let's assume that each of these server operating systems and applications are running on dedicated hardware, that is, they are not yet virtual machines. The first step you should take, if you do not have this already, is to document the physical systems, their components, and other relevant information such as computer or DNS name, IP address (es), location, and so on. For larger environments, you may also want to document installed software, user groups or departments, and so on. Collecting statistics on Windows On Windows servers, your first step would be to start performance monitoring. Perform the following steps to do so: Navigate to Start | Run and enter perfmon. Once the Performance Monitor window opens, expand Monitoring Tools and click on Performance Monitor. Here, you could start adding various counters; however, as of Windows 2008/Windows 7, Performance Monitor includes Data Collector Sets. Expand the Data Collector Sets folder and then the System folder; right-click on System Performance and select Start. Performance Monitor will start to collect key statistics about your system and its resource utilization. When you are satisfied that you have collected an appropriate amount of data, click on System Performance and select Stop. Your reports will be saved into the Reports folder; navigate to Reports| System, click on the System Performance folder, and finally double-click on the report to see the report. In the following screenshot for our domain controller, you can see we were using 10 percent of the total CPU resources available, 54 percent of the memory, a low 18 IOPS, and 0 percent of the available network resources (this is not really uncommon; I have had busy application servers that barely break 2 percent). Now let's compare what we are utilizing with the actual physical resources available to the server. This server has two dual core processors (four total cores) running at 2 GHz per core (8 GHz total available), 4 GB of memory, two 200 GB SAS drives configured in a RAID 1, and a 1 Gbps network card. Here, performance monitor shows averages, but you should also investigate peak usage. If you scroll down in the report, you will find a menu labeled CPU. Navigate to CPU | Process. Here you will see quite a bit of data, more than the space we have to review in this book; however, if you scroll down, you will see a section called Processor User Time by CPU. Here, your mean (that is, average) column should match fairly closely to the report overview provided for the total, but we also want to look at any spikes we may have encountered. As you can see, this CPU had one core that received a maximum of 35 percent utilization, slightly more than the average suggested. If we take the average CPU utilization at 10 percent of the total CPU, it means we will theoretically require only 800 MHz of CPU power, something a single physical core could easily support. The, memory is also using only half of what is available, so we can most likely reduce the amount of memory to 3 GB and still have room for various changes in operating conditions we might have not encountered during our collection window. Finally, having only 18 IOPS used means that we have plenty of performance left in the drives; even a SATA 7200 RPM drive can provide around 80 IOPS. Collecting statistics on Linux Now let's look at the Linux web server to see how we can collect this same set of information using sar in an additional package with sysstat that can monitor resource utilization over time. This is similar to what you might get from top or iotop. The sysstat package can easily be added to your system by running yum install sysstat, as it is a part of the base repository (yum install sysstat is command format). Once the sysstat package is installed, it will start collecting information about resource utilization every 10 minutes and keep this information for a period of seven days. To see the information, you just need to run the sar command; there are different options to display different sets of information , which we will look at next. Here, we can see that our system is idle right now by viewing the %idle column. A simple way to generate some load on your system is to run dd if=/dev/zero of=/dev/null, which will spike your CPU load to 100 percent, so, don't do this on production systems! Let's look at the output with some CPU load. In this example, you can see that the CPU was under load for about half of the 10-minute collection window. One problem here is that unless the CPU spike, in this case to 100 percent, was not consistent for at least 10 minutes, we would potentially miss these spikes using sar with a 10-minute window. This is easily changed by editing /etc/cron.d/sysstat, which tells the system to run this every 10 minutes. During a collection window, one or two minutes may provide more valuable detail. In this example, you can see I am now logging at a five-minute interval instead of 10, so I will have a better chance to find maximum CPU usage during my monitoring period. Now, we are not only concerned with the CPU, but we also want to see memory and disk utilization. To access those statistics, run sar with the following options: The sar –r command will show RAM (memory) statistics. At a basic level, the items we are concerned with here would be the percentage of memory used, which we could use to determine how much memory is actually being utilized. The sar –b command will show disk I/O. From a disk perspective, sar –b will tell us the total number of transactions per second (tps), read transactions per second (rtps), and write transactions per second (wtps). As you can see, you are able to natively collect quite a bit of relevant data about resource utilization on our systems. However, without the help of a vendor or VMware PSO who has access to VMware Capacity Planner, another commercial tool, or a good automation system, this can become difficult to do on a large scale (hundreds or thousands of servers), but certainly not impossible. Resources for Article: Further resources on this subject: Windows 8 with VMware View [Article] Troubleshooting Storage Contention [Article] Networking Performance Design [Article]

0
0
16389

article-image-getting-started-with-chatgpt-advanced-data-analysis-part-2

Joshua Arvin Lat

08 Nov 2023

10 min read

Getting Started with ChatGPT Advanced Data Analysis- Part 2

Joshua Arvin Lat

08 Nov 2023

10 min read

0
0
16362

article-image-android-native-application-api

Packt

13 May 2013

21 min read

Android Native Application API

Packt

13 May 2013

21 min read

0
0
16359

How-To Tutorials

Packt

03 Mar 2017

17 min read

Data Pipelines

Packt

03 Mar 2017

17 min read

In this article by Andrew Morgan, Antoine Amend, Matthew Hallett, David George, the author of the book Mastering Spark for Data Science, readers will learn how to construct a content registerand use it to track all input loaded to the system, and to deliver metrics on ingestion pipelines, so that these flows can be reliably run as an automated, lights-out process. Readers will learn how to construct a content registerand use it to track all input loaded to the system, and to deliver metrics on ingestion pipelines, so that these flows can be reliably run as an automated, lights-out process. In this article we will cover the following topics: Welcome the GDELT Dataset Data Pipelines Universal Ingestion Framework Real-time monitoring for new data Receiving Streaming Data via Kafka Registering new content and vaulting for tracking purposes Visualization of content metrics in Kibana - to monitor ingestion processes & data health (For more resources related to this topic, see here.) Data Pipelines Even with the most basic of analytics, we always require some data. In fact, finding the right data is probably among the hardest problems to solve in data science (but that’s a whole topic for another book!). We have already seen that the way in which we obtain our data can be as simple or complicated as is needed. In practice, we can break this decision into two distinct areas: Ad-hoc and scheduled. Ad-hoc data acquisition is the most common method during prototyping and small scale analytics as it usually doesn’t require any additional software to implement - the user requires some data and simply downloads it from source as and when required. This method is often a matter of clicking on a web link and storing the data somewhere convenient, although the data may still need to be versioned and secure. Scheduled data acquisition is used in more controlled environments for large scale and production analytics, there is also an excellent case for ingesting a dataset into a data lake for possible future use. With Internet of Things (IoT) on the increase, huge volumes of data are being produced, in many cases if the data is not ingested now it is lost forever. Much of this data may not have an immediate or apparent use today, but could do in the future; so the mind-set is to gather all of the data in case it is needed and delete it later when sure it is not. It’s clear we need a flexible approach to data acquisition that supports a variety of procurement options. Universal Ingestion Framework There are many ways to approach data acquisition ranging from home grown bash scripts through to high-end commercial tools. The aim of this section is to introduce a highly flexible framework that we can use for small scale data ingest, and then grow as our requirements change - all the way through to a full corporately managed workflow if needed - that framework will be build using Apache NiFi. NiFi enables us to build large-scale integrated data pipelines that move data around the planet. In addition, it’s also incredibly flexible and easy to build simple pipelines - usually quicker even than using Bash or any other traditional scripting method. If an ad-hoc approach is taken to source the same dataset on a number of occasions, then some serious thought should be given as to whether it falls into the scheduled category, or at least whether a more robust storage and versioning setup should be introduced. We have chosen to use Apache NiFi as it offers a solution that provides the ability to create many, varied complexity pipelines that can be scaled to truly Big Data and IoT levels, and it also provides a great drag & drop interface (using what’s known as flow-based programming[1]). With patterns, templates and modules for workflow production, it automatically takes care of many of the complex features that traditionally plague developers such as multi-threading, connection management and scalable processing. For our purposes it will enable us to quickly build simple pipelines for prototyping, and scale these to full production where required. It’s pretty well documented and easy to get running https://nifi.apache.org/download.html, it runs in a browser and looks like this: https://en.wikipedia.org/wiki/Flow-based_programming We leave the installation of NiFi as an exercise for the reader - which we would encourage you to do - as we will be using it in the following section. Introducing the GDELT News Stream Hopefully, we have NiFi up and running now and can start to ingest some data. So let’s start with some global news media data from GDELT. Here’s our brief, taken from the GDELT website http://blog.gdeltproject.org/gdelt-2-0-our-global-world-in-realtime/: “Within 15 minutes of GDELT monitoring a news report breaking anywhere the world, it has translated it, processed it to identify all events, counts, quotes, people, organizations, locations, themes, emotions, relevant imagery, video, and embedded social media posts, placed it into global context, and made all of this available via a live open metadata firehose enabling open research on the planet itself. [As] the single largest deployment in the world of sentiment analysis, we hope that by bringing together so many emotional and thematic dimensions crossing so many languages and disciplines, and applying all of it in realtime to breaking news from across the planet, that this will spur an entirely new era in how we think about emotion and the ways in which it can help us better understand how we contextualize, interpret, respond to, and understand global events.” In order to start consuming this open data, we’ll need to hook into that metadata firehose and ingest the news streams onto our platform. How do we do this? Let’s start by finding out what data is available. Discover GDELT Real-time GDELT publish a list of the latest files on their website - this list is updated every 15 minutes. In NiFi, we can setup a dataflow that will poll the GDELT website, source a file from this list and save it to HDFS so we can use it later. Inside the NiFi dataflow designer, create a HTTP connector by dragging a processor onto the canvas and selecting GetHTTP. To configure this processor, you’ll need to enter the URL of the file list as: http://data.gdeltproject.org/gdeltv2/lastupdate.txt And also provide a temporary filename for the file list you will download. In the example below, we’ve used the NiFi’s expression language to generate a universally unique key so that files are not overwritten (UUID()). It’s worth noting that with this type of processor (GetHTTP), NiFi supports a number of scheduling and timing options for the polling and retrieval. For now, we’re just going to use the default options and let NiFi manage the polling intervals for us. An example of latest file list from GDELT is shown below. Next, we will parse the URL of the GKG news stream so that we can fetch it in a moment. Create a Regular Expression parser by dragging a processor onto the canvas and selecting ExtractText. Now position the new processor underneath the existing one and drag a line from the top processor to the bottom one. Finish by selecting the success relationship in the connection dialog that pops up. This is shown in the example below. Next, let’s configure the ExtractText processor to use a regular expression that matches only the relevant text of the file list, for example: ([^ ]*gkg.csv.*) From this regular expression, NiFi will create a new property (in this case, called url) associated with the flow design, which will take on a new value as each particular instance goes through the flow. It can even be configured to support multiple threads. Again, this is example is shown below. It’s worth noting here that while this is a fairly specific example, the technique is deliberately general purpose and can be used in many situations. Our First GDELT Feed Now that we have the URL of the GKG feed, we fetch it by configuring an InvokeHTTP processor to use the url property we previously created as it’s remote endpoint, and dragging the line as before. All that remains is to decompress the zipped content with a UnpackContent processor (using the basic zip format) and save to HDFS using a PutHDFS processor, like so: Improving with Publish and Subscribe So far, this flow looks very “point-to-point”, meaning that if we were to introduce a new consumer of data, for example, a Spark-streaming job, the flow must be changed. For example, the flow design might have to change to look like this: If we add yet another, the flow must change again. In fact, each time we add a new consumer, the flow gets a little more complicated, particularly when all the error handling is added. This is clearly not always desirable, as introducing or removing consumers (or producers) of data, might be something we want to do often, even frequently. Plus, it’s also a good idea to try to keep your flows as simple and reusable as possible. Therefore, for a more flexible pattern, instead of writing directly to HDFS, we can publish to Apache Kafka. This gives us the ability to add and remove consumers at any time without changing the data ingestion pipeline. We can also still write to HDFS from Kafka if needed, possibly even by designing a separate NiFi flow, or connect directly to Kafka using Spark-streaming. To do this, we create a Kafka writer by dragging a processor onto the canvas and selecting PutKafka. We now have a simple flow that continuously polls for an available file list, routinely retrieving the latest copy of a new stream over the web as it becomes available, decompressing the content and streaming it record-by-record into Kafka, a durable, fault-tolerant, distributed message queue, for processing by spark-streaming or storage in HDFS. And what’s more, without writing a single line of bash! Content Registry We have seen in this article that data ingestion is an area that is often overlooked, and that its importance cannot be underestimated. At this point we have a pipeline that enables us to ingest data from a source, schedule that ingest and direct the data to our repository of choice. But the story does not end there. Now we have the data, we need to fulfil our data management responsibilities. Enter the content registry. We’re going to build an index of metadata related to that data we have ingested. The data itself will still be directed to storage (HDFS, in our example) but, in addition, we will store metadata about the data, so that we can track what we’ve received and understand basic information about it, such as, when we received it, where it came from, how big it is, what type it is, etc. Choices and More Choices The choice of which technology we use to store this metadata is, as we have seen, one based upon knowledge and experience. For metadata indexing, we will require at least the following attributes: Easily searchable Scalable Parallel write ability Redundancy There are many ways to meet these requirements, for example we could write the metadata to Parquet, store in HDFS and search using Spark SQL. However, here we will use Elasticsearch as it meets the requirements a little better, most notably because it facilitates low latency queries of our metadata over a REST API - very useful for creating dashboards. In fact, Elasticsearch has the advantage of integrating directly with Kibana, meaning it can quickly produce rich visualizations of our content registry. For this reason, we will proceed with Elasticsearch in mind. Going with the Flow Using our current NiFi pipeline flow, let’s fork the output from “Fetch GKG files from URL” to add an additional set of steps to allow us to capture and store this metadata in Elasticsearch. These are: Replace the flow content with our metadata model Capture the metadata Store directly in Elasticsearch Here’s what this looks like in NiFi: Metadata Model So, the first step here is to define our metadata model. And there are many areas we could consider, but let’s select a set that helps tackle a few key points from earlier discussions. This will provide a good basis upon which further data can be added in the future, if required. So, let’s keep it simple and use the following three attributes: File size Date ingested File name These will provide basic registration of received files. Next, inside the NiFi flow, we’ll need to replace the actual data content with this new metadata model. An easy way to do this, is to create a JSON template file from our model. We’ll save it to local disk and use it inside a FetchFile processor to replace the flow’s content with this skeleton object. This template will look something like: { "FileSize": SIZE, "FileName": "FILENAME", "IngestedDate": "DATE" } Note the use of placeholder names (SIZE, FILENAME, DATE) in place of the attribute values. These will be substituted, one-by-one, by a sequence of ReplaceText processors, that swap the placeholder names for an appropriate flow attribute using regular expressions provided by the NiFi Expression Language, for example DATE becomes ${now()}. The last step is to output the new metadata payload to Elasticsearch. Once again, NiFi comes ready with a processor for this; the PutElasticsearch processor. An example metadata entry in Elasticsearch: { "_index": "gkg", "_type": "files", "_id": "AVZHCvGIV6x-JwdgvCzW", "_score": 1, "source": { "FileSize": 11279827, "FileName": "20150218233000.gkg.csv.zip", "IngestedDate": "2016-08-01T17:43:00+01:00" } } Now that we have added the ability to collect and interrogate metadata, we now have access to more statistics that can be used for analysis. This includes: Time based analysis e.g. file sizes over time Loss of data, for example are there data “holes” in the timeline? If there is a particular analytic that is required, the NIFI metadata component can be adjusted to provide the relevant data points. Indeed, an analytic could be built to look at historical data and update the index accordingly if the metadata does not exist in current data. Kibana Dashboard We have mentioned Kibana a number of times in this article, now that we have an index of metadata in Elasticsearch, we can use the tool to visualize some analytics. The purpose of this brief section is to demonstrate that we can immediately start to model and visualize our data. In this simple example we have completed the following steps: Added the Elasticsearch index for our GDELT metadata to the “Settings” tab Selected “file size” under the “Discover” tab Selected Visualize for “file size” Changed the Aggregation field to “Range” Entered values for the ranges The resultant graph displays the file size distribution: From here we are free to create new visualizations or even a fully featured dashboard that can be used to monitor the status of our file ingest. By increasing the variety of metadata written to Elasticsearch from NiFi, we can make more fields available in Kibana and even start our data science journey right here with some ingest based actionable insights. Now that we have a fully-functioning data pipeline delivering us real-time feeds of data, how do we ensure data quality of the payload we are receiving? Let’s take a look at the options. Quality Assurance With an initial data ingestion capability implemented, and data streaming onto your platform, you will need to decide how much quality assurance is required at the front door. It’s perfectly viable to start with no initial quality controls and build them up over time (retrospectively scanning historical data as time and resources allow). However, it may be prudent to install a basic level of verification to begin with. For example, basic checks such as file integrity, parity checking, completeness, checksums, type checking, field counting, overdue files, security field pre-population, denormalization, etc. You should take care that your up-front checks do not take too long. Depending on the intensity of your examinations and the size of your data, it’s not uncommon to encounter a situation where there is not enough time to perform all processing before the next dataset arrives. You will always need to monitor your cluster resources and calculate the most efficient use of time. Here are some examples of the type of rough capacity planning calculation you can perform: Example 1: Basic Quality Checking, No Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population) takes 4 minutes There are no other users on the compute cluster There are 10 minutes of resources available for other tasks. As there are no other users on the cluster, this is satisfactory - no action needs to be taken. Example 2: Advanced Quality Checking, No Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population, denormalization, sub dataset building) takes 13 minutes There are no other users on the compute cluster There is only 1 minute of resource available for other tasks. We probably need to consider, either: Configuring a resource scheduling policy Reducing the amount of data ingested Reducing the amount of processing we undertake Adding additional compute resources to the cluster Example 3: Basic Quality Checking, 50% Utility Due to Contending Users Data is ingested every 15 minutes and takes 1 minute to pull from the source Quality checking (integrity, field count, field pre-population) takes 4 minutes (100% utility) There are other users on the compute cluster There are 6 minutes of resources available for other tasks (15 - 1 - (4 * (100 / 50))). Since there are other users there is a danger that, at least some of the time, we will not be able to complete our processing and a backlog of jobs will occur. When you run into timing issues, you have a number of options available to you in order to circumvent any backlog: Negotiating sole use of the resources at certain times Configuring a resource scheduling policy, including: YARN Fair Scheduler: allows you to define queues with differing priorities and target your Spark jobs by setting the spark.yarn.queue property on start-up so your job always takes precedence Dynamicandr Resource Allocation: allows concurrently running jobs to automatically scale to match their utilization Spark Scheduler Pool: allows you to define queues when sharing a SparkContext using multithreading model, and target your Spark job by setting the spark.scheduler.pool property per execution thread so your thread takes precedence Running processing jobs overnight when the cluster is quiet In any case, you will eventually get a good idea of how the various parts to your jobs perform and will then be in a position to calculate what changes could be made to improve efficiency. There’s always the option of throwing more resources at the problem, especially when using a cloud provider, but we would certainly encourage the intelligent use of existing resources - this is far more scalable, cheaper and builds data expertise. Summary In this article we walked through the full setup of an Apache NiFi GDELT ingest pipeline, complete with metadata forks and a brief introduction to visualizing the resultant data. This section is particularly important as GDELT is used extensively throughout the book and the NiFi method is a highly effective way to source data in a scalable and modular way. Resources for Article: Further resources on this subject: Integration with Continuous Delivery [article] Amazon Web Services [article] AWS Fundamentals [article]

0
1
16359

article-image-sentiment-analysis-with-generative-ai

Sangita Mahala

09 Nov 2023

8 min read

Sentiment Analysis with Generative AI

Sangita Mahala

09 Nov 2023

8 min read

0
0
16355

M.T. White

22 Aug 2023

14 min read

ChatGPT for Everyday Use

M.T. White

22 Aug 2023

14 min read

IntroductionChatGPT is a revolutionary new technology that is making a large impact on society. The full impact of ChatGPT cannot be fully known at the time of writing this article because of how novel the technology is. However, what can be said is that since its introduction many industries have been trying to leverage it and increase productivity. Simultaneously, everyday people are trying to learn to leverage it as well. Overall, ChatGPT and similar systems are very new and the full impact of how to leverage them will take some more time to fully manifest. This article is going to explore how ChatGPT can be used for everyday life by exploring a few use cases.What is ChatGPT? Before we begin, it is important to understand what ChatGPT is and what it isn’t. To begin ChatGPT is in a lay sense a super advanced chatbot. More specifically, ChatGPT is known as a generative AI that uses Natural Language Processing (NLP) to create a dialog between a user and itself. ChatGPT and similar systems are what are known as Large Language Models (LLMs). In short, for AI models to work they have to be trained using data. To train LLMs engineers use vast amounts such as books, articles, journals, and so on. The result is a system like ChatGPT that has a vast knowledge base on many different subjects. Before we can explore how to use ChatGPT for everyday life we need to explore how NOT to use ChatGPT. How not to use ChatGPT?ChatGPT is very powerful and can be used for many different things; however, is important to understand that ChatGPT is not a sage nor infallible. Remember ChatGPT only knows about what it was trained on. This means if the information it was taught was wrong or outdated so too will be the response it provides. This means that as of writing this article ChatGPT cannot and should not be used as a replacement for humans. Even with the answer ChatGPT gives, a decent level of domain knowledge is still required properly format and use its response. As such, it is important to take suggestions with a certain amount of skepticism. The best way to think of ChatGPT is as an extremely smart friend with a wide range of knowledge. Though that friend may be very knowledgeable they can and will be wrong from time to time. With that how does one use ChatGPT?How to use ChatGPT for everyday life?Using ChatGPT starts with what is called a prompt. A prompt can be thought of as a query that the user feeds into ChatGPT. This prompt is kind of like a Google search. It will be a statement that asks ChatGPT a question or to perform a certain task. Since ChatGPT used NLP, the prompt does not have to be formatted a certain way nor does it require a specific syntax. In other words, a prompt can be entered the same way a user would pose the instruction to another person. However, it is good to provide ChatGPT will enough information to adequately describe the problem. An example, of a prompt is as follows:How can ChatGPT be used for everyday life?Once the prompt is run ChatGPT will produce text with suggestions, answers, programming software, or whatever was asked for. That output that is produced is called a response which will have the answer to the request/question. There are a lot of ways ChatGPT can be used for everyday life. In terms of professional life, ChatGPT can make an excellent assistant. For most “text-based” jobs, that is jobs that don’t require design such as drawing diagrams, ChatGPT can be an excellent assistant. For example, suppose a programmer needs a small piece of software for a project, assume they need a SQL query that can pull all the rows between two dates. The user could use a prompt like the following:Write a SQL query that will pull all rows between January 3 and March 31.This prompt will produce the following response:To pull all rows between January 3 and March 31 from a table, you'll need to use the WHERE clause to filter the results based on the date column. Assuming you have a date column named "date_column" in your table, here's the SQL query: sqlCopy code SELECT * FROM your_table WHERE date_column >= '2023-01-03' AND date_column <= '2023-03-31'; Replace "your_table" with the actual name of your table, and "date_column" with the name of the column that contains the dates you want to filter. The query will select all rows where the "date_column" is on or after January 3, 2023, and on or before March 31, 2023.For those that are not familiar with SQL, the prompt produced a simple query with generic placeholders that will pull the correct data out of a database. Not everyone is a programmer, so how could a non-programmer use ChatGPT as an assistant? One area in which ChatGPT can be used is by composing emails. Suppose a salesperson needs to inform Tom, their boss, that a $1 million deal was closed today via email. They could use a prompt like the following:Write Tom an email saying that the deal with Company X went through, and we made $1 million.The response would be something along the lines of:Subject: Successful Deal with Company X - $1 Million Revenue Achieved Hi Tom, I hope this email finds you well. I am thrilled to share some exciting news with you! The deal with Company X has successfully gone through, and I am delighted to inform you that we have made an astounding $1 million in revenue from this collaboration. This achievement wouldn't have been possible without the hard work and dedication of our entire team. It reflects the trust and confidence our partners have in our capabilities, and I believe it marks a significant milestone for our company's growth and success. I want to express my gratitude to you for your valuable contributions and expertise that played a crucial role in making this deal a reality. Your efforts have been commendable, and I am truly grateful for your commitment to the success of this venture. Let's take a moment to celebrate this triumph together. Your exceptional skills and commitment to excellence have made this accomplishment possible, and I am genuinely proud to have you as part of our team. Looking forward to even more prosperous ventures in the future. Once again, congratulations to you and the entire team! Best regards, [Your Name] [Your Title/Position] [Company Name] [Email] [Phone Number]In this case, all the users would have to do is proofread the email, copy it, and send it. Now, for applications such as this proofreading is very important. As stated before, ChatGPT is not a mind reader nor a sage, as such it may produce emails with wrong information, illogical wording, or anything else. So, it’ll pay to read the message before sending it. Learning ToolA very powerful use of ChatGPT is as a learning tool. For example, students can use this tool to help them master a new subject, generate practice problems, and even check answers to an extent. Assume the user needs a few extra practice problems for their English homework, they could enter a prompt like the following:Generate some grammar practice problems for a 1st year English classWhich would yield the following output:Of course! Here are some grammar practice problems suitable for a 1st-year English class: 1. Subject-Verb Agreement: Choose the correct verb form that agrees with the subject in each sentence: a) The cat (purr/purrs) when it is happy. b) My friend and I (is/are) going to the movies tonight. c) Neither the students nor the teacher (understands/understand) the assignment. 2. Pronoun Usage: Select the appropriate pronoun to complete each sentence: a) She invited Sarah and (I/me) to the party. b) My sister and (her/she) are going shopping. c) The book is for (he/him) and his brother. 3. Articles (a/an/the): Insert the correct article (a, an, the, or no article) in each sentence: a) Can you pass me (a/an) apple, please? b) I need to buy (the) new shoes for the dance. c) My friend has (no article) car. 4. Prepositions: Choose the correct preposition to complete each sentence: a) We arrived (in/on/at) the airport fifteen minutes early. b) The cat is sleeping (under/below/above) the table. c) She's going to the store (by/with/for) her mom. 5. Verb Tenses: Identify the correct verb tense used in each sentence: a) They (will be/was) studying for the exam all night. b) By the time we arrived, the party (had started/has started). c) I (am/was) watching a movie right now. 6. Adjective vs. Adverb: Decide whether the word in parentheses should be an adjective or adverb in the sentence: a) She sings (beautiful/beautifully). b) The (happy/happily) children played in the park. c) The (quick/quickly) fox jumped over the fence. 7. Negation: Add the necessary negative word to each sentence: a) I have (no/not) seen him today. b) She does (not/none) like coffee. c) We will (not/none) be attending the party. 8. Passive Voice: Rewrite the following sentences in the passive voice: a) The chef cooked the meal. b) The teacher is explaining the lesson. c) The storm destroyed the old building. 9. Comparative and Superlative Forms: Complete each sentence with the correct comparative or superlative form of the adjective: a) This book is (interesting/more interesting/most interesting) than the last one. b) She is (tall/taller/tallest) than her sister. c) It was the (big/biggest/bigger) cake at the party. 10. Reported Speech: Rewrite the following sentences using reported speech: a) Mary said, "I am going to the store." b) "I will call you later," said John. c) "They have arrived," she said. These grammar practice problems aim to reinforce essential grammar rules and concepts, helping 1st-year English students develop their language skills and proficiency.As can be seen, ChatGPT generated a number of questions that one would usually find in an English 101 class. Now, if the user wanted a more specific answer that could reword the prompt to accommodate. ChatGPT is also a great teaching tool as well. Teachers can also use this tool to generate lesson plans, write intros for learning modules, generate problems, and more. Overall, ChatGPT can be used as an excellent tool for academia in general.Now, it is important to remember that ChatGPT is not all-knowing. It can only generate responses based on the data that it was trained on. This means if the user is asking questions about a subject where there is little or no publicly available data the response probably isn’t going to be of the highest quality. For example, if someone askedHow do I build a working time machine?They’re going to get the following:Building a working time machine, as depicted in science fiction, is currently beyond our technological capabilities and goes against the known laws of physics. Time travel, as popularly imagined, involves traveling backward or forward in time, similar to how we move through space. As of my last update in September 2021, time travel remains a concept explored mainly in theoretical physics, and no practical time machine has been built or proven to be possible. Several theories, such as the theory of relativity, suggest that time travel could be theoretically feasible under certain conditions, but these ideas are still purely theoretical and have not been put into practice. It's important to remember that building a time machine would present enormous scientific, technological, and ethical challenges. It could raise paradoxes, such as the "grandfather paradox," where a time traveler could potentially alter past events and create inconsistencies in history. As of now, time travel remains a fascinating topic in science fiction and theoretical physics, but it is not something that can be practically accomplished with our current understanding and technology. The scientific community continues to explore the nature of time and the universe, but building a working time machine as portrayed in movies and books is not yet a reality.ChatGPT gave some general information on time travel but unfortunately, it didn’t generate any schematics. Again, this goes back to ChatGPT not being a sage as it only knows what it was “taught”. Since there is no data on how to build a working time machine ChatGPT could not fully answer the prompt. Now, If one looks at the English question prompt above, one will notice that practice questions were quite broad. It is common to have to “dig” with ChatGPT. In other words, a person may have to refine the queriers to get what they need. To get some practice try re-wording the prompt to generate practice questions for only passive writing. SummaryChatGPT is a tool, and like any other tool, what it’s used for is up to the user. As was seen in this article, ChatGPT is an excellent tool for helping a person through their day by generating software, emails, and so on. ChatGPT can also be used as a great learning or teaching device to help students and teachers generate practice problems, create lesson plans, and so much more. However, as was stated so many numerous times. Unless ChatGPT has been trained on something it does not know about it. This means that asking it things like how to build a time machine or domain specific concepts aren’t going to return quality responses. Also, even if ChatGPT has been trained on the prompt, it may not always generate a quality response. No matter the use case, the response should be vetted for accuracy. This may mean doing a little extra research with the response given, testing the output, or whatever needs to be done to verify the response. Overall, ChatGPT at the time of writing this article is less than a year old. This means that the full implication of using ChatGPT are not fully understood. Also, how to fully leverage ChatGPT is not understood yet either. What can be said is that ChatGPT and similar LLM systems will probably be the next Google. In terms of everyday use, the only true inhibitors are the user's imagination and the data that was used to train ChatGPT.Author BioM.T. White has been programming since the age of 12. His fascination with robotics flourished when he was a child programming microcontrollers such as Arduino. M.T. currently holds an undergraduate degree in mathematics, and a master's degree in software engineering, and is currently working on an MBA in IT project management. M.T. is currently working as a software developer for a major US defense contractor and is an adjunct CIS instructor at ECPI University. His background mostly stems from the automation industry where he programmed PLCs and HMIs for many different types of applications. M.T. has programmed many different brands of PLCs over the years and has developed HMIs using many different tools.Author of the book: Mastering PLC Programming

0
0
16351

article-image-creating-jee-application-ejb

Packt

24 Sep 2015

11 min read

Creating a JEE Application with EJB

Packt

24 Sep 2015

11 min read

In this article by Ram Kulkarni, author of Java EE Development with Eclipse (e2), we will be using EJBs (Enterprise Java Beans) to implement business logic. This is ideal in scenarios where you want components that process business logic to be distributed across different servers. But that is just one of the advantages of EJB. Even if you use EJBs on the same server as the web application, you may gain from a number of services that the EJB container provides to the applications through EJBs. You can specify security constraints for calling EJB methods declaratively (using annotations), and you can also easily specify transaction boundaries (specify which method calls from a part of one transaction) using annotations. In addition to this, the container handles the life cycle of EJBs, including pooling of certain types of EJB objects so that more objects can be created when the load on the application increases. (For more resources related to this topic, see here.) In this article, we will create the same application using EJBs and deploy it in a Glassfish 4 server. But before that, you need to understand some basic concepts of EJBs. Types of EJB EJBs can be of following types as per the EJB 3 specifications: Session bean: Stateful session bean Stateless session bean Singleton session bean Message-driven bean In this article, we will focus on session beans. Session beans In general, session beans are meant for containing methods used to execute the main business logic of enterprise applications. Any Plain Old Java Object (POJO) can be annotated with the appropriate EJB-3-specific annotations to make it a session bean. Session beans come in three types, as follows. Stateful session bean One stateful session bean serves requests for one client only. There is a one-to-one mapping between the stateful session bean and the client. Therefore, stateful beans can hold state data for the client between multiple method calls. In our CourseManagement application, we can use a stateful bean to hold the Student data (student profile and the courses taken by him/her) after a student logs-in. The state maintained by the Stateful bean is lost when the server restarts or when the session times out. Since there is one stateful bean per client, using a stateful bean might impact the scalability of the application. We use the @Stateful annotation to create a stateful session bean. Stateless session bean A stateless session bean does not hold any state information for any client. Therefore, one session bean can be shared across multiple clients. The EJB container maintains pools of stateless beans, and when a client request comes, it takes out a bean from the pool, executes methods, and returns the bean to the pool. Stateless session beans provide excellent scalability because they can be shared and need not be created for each client. We use the @Stateless annotation to create a stateless session bean. Singleton session bean As the name suggests, there is only one instance of a singleton bean class in the EJB container (this is true in the clustered environment too; each EJB container will have an instance of a singleton bean). This means that they are shared by multiple clients, and they are not pooled by EJB containers (because there can be only one instance). Since a singleton session bean is a shared resource, we need to manage concurrency in it. Java EE provides two concurrency management options for singleton session beans: container-managed concurrency and bean-managed concurrency. Container-managed concurrency can easily be specified by annotations. See https://docs.oracle.com/javaee/7/tutorial/ejb-basicexamples002.htm#GIPSZ for more information on managing concurrency in a singleton session bean. Using a singleton bean could have an impact on the scalability of the application if there are resource contentions in the code. We use the @Singleton annotation to create a singleton session bean Accessing a session bean from the client Session beans can be designed to be accessed locally (within the same application as a session bean) or remotely (from a client running in a different application or JVM) or both. In the case of remote access, session beans are required to implement a remote interface. For local access, session beans can implement a local interface or no interface (the no-interface view of a session bean). Remote and local interfaces that session beans implement are sometimes also called business interfaces, because they typically expose the primary business functionality. Creating a no-interface session bean To create a session bean with a no-interface view, create a POJO and annotate it with the appropriate EJB annotation type and @LocalBean. For example, we can create a local stateful Student bean as follows: import javax.ejb.LocalBean; import javax.ejb.Singleton; @Singleton @LocalBean public class Student { ... } Accessing a session bean using dependency injection You can access session beans by either using the @EJBannotation (for dependency injection) or performing a Java Naming and Directory Interface (JNDI) lookup. EJB containers are required to make the JNDI URLs of EJBs available to clients. Dependency injection of session beans using @EJB work only for managed components, that is, components of the application whose life cycle is managed by the EJB container. When a component is managed by the container, it is created (instantiated) by the container and also destroyed by the container. You do not create managed components using the new operator. JEE-managed components that support direct injection of EJBs are servlets, managed beans of JSF pages and EJBs themselves (one EJB can have other EJBs injected into it). Unfortunately, you cannot have a web container injecting EJBs into JSPs or JSP beans. Also, you cannot have EJBs injected into any custom classes that you create and are instantiated using the new operator. We can use the Student bean (created previously) from a managed bean of JSF, as follows: import javax.ejb.EJB; import javax.faces.bean.ManagedBean; @ManagedBean public class StudentJSFBean { @EJB private Student studentEJB; } Note that if you create an EJB with a no-interface view, then all the public methods in that EJB will be exposed to the clients. If you want to control which methods can be called by clients, then you should implement the business interface. Creating a session bean using a local business interface A business interface for EJB is a simple Java interface with either the @Remote or @Local annotation. So we can create a local interface for the Student bean as follows: import java.util.List; import javax.ejb.Local; @Local public interface StudentLocal { public List<Course> getCourses(); } We implement a session bean like this: import java.util.List; import javax.ejb.Local; import javax.ejb.Stateful; @Stateful @Local public class Student implements StudentLocal { @Override public List<CourseDTO> getCourses() { //get courses are return … } } Clients can access the Student EJB only through the local interface: import javax.ejb.EJB; import javax.faces.bean.ManagedBean; @ManagedBean public class StudentJSFBean { @EJB private StudentLocal student; } The session bean can implement multiple business interfaces. Accessing a session bean using a JNDI lookup Though accessing EJB using dependency injection is the easiest way, it works only if the container manages the class that accesses the EJB. If you want to access EJB from a POJO that is not a managed bean, then dependency injection will not work. Another scenario where dependency injection does not work is when EJB is deployed in a separate JVM (this could be on a remote server). In such cases, you will have to access EJB using a JNDI lookup (visit https://docs.oracle.com/javase/tutorial/jndi/ for more information on JNDI). JEE applications can be packaged in an Enterprise Application Archive (EAR), which contains a .jar file for EJBs and a WAR file for web applications (and the lib folder contains the libraries required for both). If, for example, the name of an EAR file is CourseManagement.ear and the name of an EJB JAR file in it is CourseManagementEJBs.jar, then the name of the application is CourseManagement (the name of the EAR file) and the module name is CourseManagementEJBs. The EJB container uses these names to create a JNDI URL for lookup EJBs. A global JNDI URL for EJB is created as follows: "java:global/<application_name>/<module_name>/<bean_name>![<bean_interface>]" java:global: Indicates that it is a global JNDI URL. <application_name>: The application name is typically the name of the EAR file. <module_name>: This is the name of the EJB JAR. <bean_name>: This is the name of the EJB bean class. <bean_interface>: This is optional if EJB has a no-interface view, or if it implements only one business interface. Otherwise, it is a fully qualified name of a business interface. EJB containers are also required to publish two more variations of JNDI URLs for each EJB. These are not global URLs, which means that they can't be used to access EJBs from clients that are not in the same JEE application (in the same EAR): "java:app/[<module_name>]/<bean_name>![<bean_interface>]" "java:module/<bean_name>![<bean_interface>]" The first URL can be used if the EJB client is in the same application, and the second URL can be used if the client is in the same module (the same JAR file as the EJB). Before you look up any URL in a JNDI server, you need to create an InitialContext that includes information, among other things such as the hostname of JNDI server and the port on which it is running. If you are creating InitialContext in the same server, then there is no need to specify these attributes: InitialContext initCtx = new InitialContext(); Object obj = initCtx.lookup("jndi_url"); We can use the following JNDI URLs to access a no-interface (LocalBean) Student EJB (assuming that the name of the EAR file is CourseManagement and the name of the JAR file for EJBs is CourseManagementEJBs): URL When to use java:global/CourseManagement/ CourseManagementEJBs/Student The client can be anywhere in the EAR file, because we are using a global URL. Note that we haven't specified the interface name because we are assuming that the Student bean provides a no-interface view in this example. java:app/CourseManagementEJBs/Student The client can be anywhere in the EAR. We skipped the application name because the client is expected to be in the same application. This is because the namespace of the URL is java:app. java:module/Student The client must be in the same JAR file as EJB. We can use the following JNDI URLs to access the Student EJB that implemented a local interface, StudentLocal: URL When to use java:global/CourseManagement/ CourseManagementEJBs/Student!packt.jee.book.ch6.StudentLocal The client can be anywhere in the EAR file, because we are using a global URL. java:global/CourseManagement/ CourseManagementEJBs/Student The client can be anywhere in the EAR. We skipped the interface name because the bean implements only one business interface. Note that the object returned from this call will be of the StudentLocal type, and not Student. java:app/CourseManagementEJBs/Student Or java:app/CourseManagementEJBs/Student!packt.jee.book.ch6.StudentLocal The client can be anywhere in the EAR. We skipped the application name because the JNDI namespace is java:app. java:module/Student Or java:module/Student!packt.jee.book.ch6.StudentLocal The client must be in the same EAR as the EJB. Here is an example of how we can call the Student bean with the local business interface from one of the objects (that is not managed by the web container) in our web application: InitialContext ctx = new InitialContext(); StudentLocal student = (StudentLocal) ctx.loopup ("java:app/CourseManagementEJBs/Student"); return student.getCourses(id) ; //get courses from Student EJB Creating EAR for Deployment outside Eclipse. Summary EJBs are ideal for writing business logic in web applications. They can act as the perfect bridge between web interface components, such as a JSF, servlet, or JSP, and data access objects, such as JDO. EJBs can be distributed across multiple JEE application servers (this could improve application scalability) and their life cycle is managed by the container. EJBs can easily be injected into managed objects or can be looked up using JNDI. The Eclipse JEE makes creating and consuming EJBs very easy. The JEE application server Glassfish can also be managed and applications can be deployed from within Eclipse. Resources for Article: Further resources on this subject: Contexts and Dependency Injection in NetBeans[article] WebSockets in Wildfly[article] Creating Java EE Applications [article]

0
0
16346

Packt

17 Feb 2016

27 min read

Understanding PHP basics

Packt

17 Feb 2016

27 min read

In this article by Antonio Lopez Zapata, the author of the book Learning PHP 7, you need to understand not only the syntax of the language, but also its grammatical rules, that is, when and why to use each element of the language. Luckily, for you, some languages come from the same root. For example, Spanish and French are romance languages as they both evolved from spoken Latin; this means that these two languages share a lot of rules, and learning Spanish if you already know French is much easier. (For more resources related to this topic, see here.) Programming languages are quite the same. If you already know another programming language, it will be very easy for you to go through this chapter. If it is your first time though, you will need to understand from scratch all the grammatical rules, so it might take some more time. But fear not! We are here to help you in this endeavor. In this chapter, you will learn about these topics: PHP in web applications Control structures Functions PHP in web applications Even though the main purpose of this chapter is to show you the basics of PHP, doing so in a reference-manual way is not interesting enough. If we were to copy paste what the official documentation says, you might as well go there and read it by yourself. Instead, let's not forget the main purpose of this book and your main goal—to write web applications with PHP. We will show you how can you apply everything you are learning as soon as possible, before you get too bored. In order to do that, we will go through the journey of building an online bookstore. At the very beginning, you might not see the usefulness of it, but that is just because we still haven't seen all that PHP can do. Getting information from the user Let's start by building a home page. In this page, we are going to figure out whether the user is looking for a book or just browsing. How do we find this out? The easiest way right now is to inspect the URL that the user used to access our application and extract some information from there. Save this content as your index.php file: <?php $looking = isset($_GET['title']) || isset($_GET['author']); ?> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Bookstore</title> </head> <body> You lookin'? <?php echo (int) $looking; ?> The book you are looking for is <ul> <li>Title: <?php echo $_GET['title']; ?></li> <li>Author: <?php echo $_GET['author']; ?></li> </ul> </body> </html> And now, access http://localhost:8000/?author=Harper Lee&title=To Kill a Mockingbird. You will see that the page is printing some of the information that you passed on to the URL. For each request, PHP stores in an array—called $_GET- all the parameters that are coming from the query string. Each key of the array is the name of the parameter, and its associated value is the value of the parameter. So, $_GET contains two entries: $_GET['author'] contains Harper Lee and $_GET['title'] contains To Kill a Mockingbird. On the first highlighted line, we are assigning a Boolean value to the $looking variable. If either $_GET['title'] or $_GET['author'] exists, this variable will be true; otherwise, false. Just after that, we close the PHP tag and then we start printing some HTML, but as you can see, we are actually mixing HTML with PHP code. Another interesting line here is the second highlighted line. We are printing the content of $looking, but before that, we cast the value. Casting means forcing PHP to transform a type of value to another one. Casting a Boolean to an integer means that the resultant value will be 1 if the Boolean is true or 0 if the Boolean is false. As $looking is true since $_GET contains valid keys, the page shows 1. If we try to access the same page without sending any information as in http://localhost:8000, the browser will say "Are you looking for a book? 0". Depending on the settings of your PHP configuration, you will see two notice messages complaining that you are trying to access the keys of the array that do not exist. Casting versus type juggling We already knew that when PHP needs a specific type of variable, it will try to transform it, which is called type juggling. But PHP is quite flexible, so sometimes, you have to be the one specifying the type that you need. When printing something with echo, PHP tries to transform everything it gets into strings. Since the string version of the false Boolean is an empty string, this would not be useful for our application. Casting the Boolean to an integer first assures that we will see a value, even if it is just "0". HTML forms HTML forms are one of the most popular ways to collect information from users. They consist a series of fields called inputs in the HTML world and a final submit button. In HTML, the form tag contains two attributes: the action points, where the form will be submitted and method that specifies which HTTP method the form will use—GET or POST. Let's see how it works. Save the following content as login.html and go to http://localhost:8000/login.html: <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Bookstore - Login</title> </head> <body> Enter your details to login: <form action="authenticate.php" method="post"> <label>Username</label> <input type="text" name="username" /> <label>Password</label> <input type="password" name="password" /> <input type="submit" value="Login"/> </form> </body> </html> This form contains two fields, one for the username and one for the password. You can see that they are identified by the name attribute. If you try to submit this form, the browser will show you a Page Not Found message, as it is trying to access http://localhost:8000/authenticate.phpand the web server cannot find it. Let's create it then: <?php $submitted = !empty($_POST); ?> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Bookstore</title> </head> <body> Form submitted? <?php echo (int) $submitted; ?> Your login info is <ul> <li>username: <?php echo $_POST['username']; ?></li> <li>password: <?php echo $_POST['password']; ?></li> </ul> </body> </html> As with $_GET, $_POST is an array that contains the parameters received by POST. In this piece of code, we are first asking whether that array is not empty—note the ! operator. Afterwards, we just display the information received, just as in index.php. Note that the keys of the $_POST array are the values for the name argument of each input field. Control structures So far, our files have been executed line by line. Due to that, we are getting some notices on some scenarios, such as when the array does not contain what we are looking for. Would it not be nice if we could choose which lines to execute? Control structures to the rescue! A control structure is like a traffic diversion sign. It directs the execution flow depending on some predefined conditions. There are different control structures, but we can categorize them in conditionals and loops. A conditional allows us to choose whether to execute a statement or not. A loop will execute a statement as many times as you need. Let's take a look at each one of them. Conditionals A conditional evaluates a Boolean expression, that is, something that returns a value. If the expression is true, it will execute everything inside its block of code. A block of code is a group of statements enclosed by {}. Let's see how it works: <?php echo "Before the conditional."; if (4 > 3) { echo "Inside the conditional."; } if (3 > 4) { echo "This will not be printed."; } echo "After the conditional."; In this piece of code, we are using two conditionals. A conditional is defined by the keyword if followed by a Boolean expression in parentheses and by a block of code. If the expression is true, it will execute the block; otherwise, it will skip it. You can increase the power of conditionals by adding the keyword else. This tells PHP to execute a block of code if the previous conditions were not satisfied. Let's see an example: if (2 > 3) { echo "Inside the conditional."; } else { echo "Inside the else."; } This will execute the code inside else as the condition of if was not satisfied. Finally, you can also add an elseif keyword followed by another condition and block of code to continue asking PHP for more conditions. You can add as many elseif as you need after if. If you add else, it has to be the last one of the chain of conditions. Also keep in mind that as soon as PHP finds a condition that resolves to true, it will stop evaluating the rest of the conditions: <?php if (4 > 5) { echo "Not printed"; } elseif (4 > 4) { echo "Not printed"; } elseif (4 == 4) { echo "Printed."; } elseif (4 > 2) { echo "Not evaluated."; } else { echo "Not evaluated."; } if (4 == 4) { echo "Printed"; } In this last example, the first condition that evaluates to true is the one that is highlighted. After that, PHP does not evaluate any more conditions until a new if starts. With this knowledge, let's try to clean up a bit of our application, executing statements only when needed. Copy this code to your index.php file: <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Bookstore</title> </head> <body> <?php if (isset($_COOKIE[username'])) { echo "You are " . $_COOKIE['username']; } else { echo "You are not authenticated."; } ?> <?php if (isset($_GET['title']) && isset($_GET['author'])) { ?> The book you are looking for is <ul> <li>Title: <?php echo $_GET['title']; ?></li> <li>Author: <?php echo $_GET['author']; ?></li> </ul> <?php } else { ?> You are not looking for a book? <?php } ?> </body> </html> In this new code, we are mixing conditionals and HTML code in two different ways. The first one opens a PHP tag and adds an if-else clause that will print whether we are authenticated or not with echo. No HTML is merged within the conditionals, which makes it clear. The second option—the second highlighted block—shows an uglier solution, but this is sometimes necessary. When you have to print a lot of HTML code, echo is not that handy, and it is better to close the PHP tag; print all the HTML you need and then open the tag again. You can do that even inside the code block of an if clause, as you can see in the code. Mixing PHP and HTML If you feel like the last file we edited looks rather ugly, you are right. Mixing PHP and HTML is confusing, and you have to avoid it by all means. Let's edit our authenticate.php file too, as it is trying to access $_POST entries that might not be there. The new content of the file would be as follows: <?php $submitted = isset($_POST['username']) && isset($_POST['password']); if ($submitted) { setcookie('username', $_POST['username']); } ?> <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <title>Bookstore</title> </head> <body> <?php if ($submitted): ?> Your login info is <ul> <li>username: <?php echo $_POST['username']; ?></li> <li>password: <?php echo $_POST['password']; ?></li> </ul> <?php else: ?> You did not submitted anything. <?php endif; ?> </body> </html> This code also contains conditionals, which we already know. We are setting a variable to know whether we've submitted a login or not and to set the cookies if we have. However, the highlighted lines show you a new way of including conditionals with HTML. This way, tries to be more readable when working with HTML code, avoiding the use of {} and instead using : and endif. Both syntaxes are correct, and you should use the one that you consider more readable in each case. Switch-case Another control structure similar to if-else is switch-case. This structure evaluates only one expression and executes the block depending on its value. Let's see an example: <?php switch ($title) { case 'Harry Potter': echo "Nice story, a bit too long."; break; case 'Lord of the Rings': echo "A classic!"; break; default: echo "Dunno that one."; break; } The switch case takes an expression; in this case, a variable. It then defines a series of cases. When the case matches the current value of the expression, PHP executes the code inside it. As soon as PHP finds break, it will exit switch-case. In case none of the cases are suitable for the expression, if there is a default case , PHP will execute it, but this is optional. You also need to know that breaks are mandatory if you want to exit switch-case. If you do not specify any, PHP will keep on executing statements, even if it encounters a new case. Let's see a similar example but without breaks: <?php $title = 'Twilight'; switch ($title) { case 'Harry Potter': echo "Nice story, a bit too long."; case 'Twilight': echo 'Uh...'; case 'Lord of the Rings': echo "A classic!"; default: echo "Dunno that one."; } If you test this code in your browser, you will see that it is printing "Uh...A classic!Dunno that one.". PHP found that the second case is valid so it executes its content. But as there are no breaks, it keeps on executing until the end. This might be the desired behavior sometimes, but not usually, so we need to be careful when using it! Loops Loops are control structures that allow you to execute certain statements several times—as many times as you need. You might use them on several different scenarios, but the most common one is when interacting with arrays. For example, imagine you have an array with elements but you do not know what is in it. You want to print all its elements so you loop through all of them. There are four types of loops. Each of them has their own use cases, but in general, you can transform one type of loop into another. Let's see them closely While While is the simplest of the loops. It executes a block of code until the expression to evaluate returns false. Let's see one example: <?php $i = 1; while ($i < 4) { echo $i . " "; $i++; } Here, we are defining a variable with the value 1. Then, we have a while clause in which the expression to evaluate is $i < 4. This loop will execute the content of the block of code until that expression is false. As you can see, inside the loop we are incrementing the value of $i by 1 each time, so after 4 iterations, the loop will end. Check out the output of that script, and you will see "0 1 2 3". The last value printed is 3, so by that time, $i was 3. After that, we increased its value to 4, so when the while was evaluating whether $i < 4, the result was false. Whiles and infinite loops One of the most common problems with while loops is creating an infinite loop. If you do not add any code inside while, which updates any of the variables considered in the while expression so it can be false at some point, PHP will never exit the loop! For This is the most complex of the four loops. For defines an initialization expression, an exit condition, and the end of the iteration expression. When PHP first encounters the loop, it executes what is defined as the initialization expression. Then, it evaluates the exit condition, and if it resolves to true, it enters the loop. After executing everything inside the loop, it executes the end of the iteration expression. Once this is done, it will evaluate the end condition again, going through the loop code and the end of iteration expression until it evaluates to false. As always, an example will help clarify this: <?php for ($i = 1; $i < 10; $i++) { echo $i . " "; } The initialization expression is $i = 1 and is executed only the first time. The exit condition is $i < 10, and it is evaluated at the beginning of each iteration. The end of the iteration expression is $i++, which is executed at the end of each iteration. This example prints numbers from 1 to 9. Another more common usage of the for loop is with arrays: <?php $names = ['Harry', 'Ron', 'Hermione']; for ($i = 0; $i < count($names); $i++) { echo $names[$i] . " "; } In this example, we have an array of names. As it is defined as a list, its keys will be 0, 1, and 2. The loop initializes the $i variable to 0, and it will iterate until the value of $i is not less than the amount of elements in the array 3 The first iteration $i is 0, the second will be 1, and the third one will be 2. When $i is 3, it will not enter the loop as the exit condition evaluates to false. On each iteration, we are printing the content of the $i position of the array; hence, the result of this code will be all three names in the array. We careful with exit conditions It is very common to set an exit condition. This is not exactly what we need, especially with arrays. Remember that arrays start with 0 if they are a list, so an array of 3 elements will have entries 0, 1, and 2. Defining the exit condition as $i <= count($array) will cause an error on your code, as when $i is 3, it also satisfies the exit condition and will try to access to the key 3, which does not exist. Foreach The last, but not least, type of loop is foreach. This loop is exclusive for arrays, and it allows you to iterate an array entirely, even if you do not know its keys. There are two options for the syntax, as you can see in these examples: <?php $names = ['Harry', 'Ron', 'Hermione']; foreach ($names as $name) { echo $name . " "; } foreach ($names as $key => $name) { echo $key . " -> " . $name . " "; } The foreach loop accepts an array; in this case, $names. It specifies a variable, which will contain the value of the entry of the array. You can see that we do not need to specify any end condition, as PHP will know when the array has been iterated. Optionally, you can specify a variable that will contain the key of each iteration, as in the second loop. Foreach loops are also useful with maps, where the keys are not necessarily numeric. The order in which PHP will iterate the array will be the same order in which you used to insert the content in the array. Let's use some loops in our application. We want to show the available books in our home page. We have the list of books in an array, so we will have to iterate all of them with a foreach loop, printing some information from each one. Append the following code to the body tag in index.php: <?php endif; $books = [ [ 'title' => 'To Kill A Mockingbird', 'author' => 'Harper Lee', 'available' => true, 'pages' => 336, 'isbn' => 9780061120084 ], [ 'title' => '1984', 'author' => 'George Orwell', 'available' => true, 'pages' => 267, 'isbn' => 9780547249643 ], [ 'title' => 'One Hundred Years Of Solitude', 'author' => 'Gabriel Garcia Marquez', 'available' => false, 'pages' => 457, 'isbn' => 9785267006323 ], ]; ?> <ul> <?php foreach ($books as $book): ?> <li> <?php echo $book['title']; ?> - <?php echo $book['author']; ?> <?php if (!$book['available']): ?> Not available <?php endif; ?> </li> <?php endforeach; ?> </ul> The highlighted code shows a foreach loop using the : notation, which is better when mixing it with HTML. It iterates all the $books arrays, and for each book, it will print some information as a HTML list. Also note that we have a conditional inside a loop, which is perfectly fine. Of course, this conditional will be executed for each entry in the array, so you should keep the block of code of your loops as simple as possible. Functions A function is a reusable block of code that, given an input, performs some actions and optionally returns a result. You already know several predefined functions, such as empty, in_array, or var_dump. These functions come with PHP so you do not have to reinvent the wheel, but you can create your own very easily. You can define functions when you identify portions of your application that have to be executed several times or just to encapsulate some functionality. Function declaration Declaring a function means to write it down so that it can be used later. A function has a name, takes arguments, and has a block of code. Optionally, it can define what kind of value is returning. The name of the function has to follow the same rules as variable names; that is, it has to start by a letter or underscore and can contain any letter, number, or underscore. It cannot be a reserved word. Let's see a simple example: function addNumbers($a, $b) { $sum = $a + $b; return $sum; } $result = addNumbers(2, 3); Here, the function's name is addNumbers, and it takes two arguments: $a and $b. The block of code defines a new variable $sum that is the sum of both the arguments and then returns its content with return. In order to use this function, you just need to call it by its name, sending all the required arguments, as shown in the highlighted line. PHP does not support overloaded functions. Overloading refers to the ability of declaring two or more functions with the same name but different arguments. As you can see, you can declare the arguments without knowing what their types are, so PHP would not be able to decide which function to use. Another important thing to note is the variable scope. We are declaring a $sum variable inside the block of code, so once the function ends, the variable will not be accessible any more. This means that the scope of variables declared inside the function is just the function itself. Furthermore, if you had a $sum variable declared outside the function, it would not be affected at all since the function cannot access that variable unless we send it as an argument. Function arguments A function gets information from outside via arguments. You can define any number of arguments—including 0. These arguments need at least a name so that they can be used inside the function, and there cannot be two arguments with the same name. When invoking the function, you need to send the arguments in the same order as we declared them. A function may contain optional arguments; that is, you are not forced to provide a value for those arguments. When declaring the function, you need to provide a default value for those arguments, so in case the user does not provide a value, the function will use the default one: function addNumbers($a, $b, $printResult = false) { $sum = $a + $b; if ($printResult) { echo 'The result is ' . $sum; } return $sum; } $sum1 = addNumbers(1, 2); $sum1 = addNumbers(3, 4, false); $sum1 = addNumbers(5, 6, true); // it will print the result This new function takes two mandatory arguments and an optional one. The default value is false, and is used as a normal value inside the function. The function will print the result of the sum if the user provides true as the third argument, which happens only the third time that the function is invoked. For the first two times, $printResult is set to false. The arguments that the function receives are just copies of the values that the user provided. This means that if you modify these arguments inside the function, it will not affect the original values. This feature is known as sending arguments by a value. Let's see an example: function modify($a) { $a = 3; } $a = 2; modify($a); var_dump($a); // prints 2 We are declaring the $a variable with the value 2, and then we are calling the modify method, sending $a. The modify method modifies the $a argument, setting its value to 3. However, this does not affect the original value of $a, which reminds to 2 as you can see in the var_dump function. If what you want is to actually change the value of the original variable used in the invocation, you need to pass the argument by reference. To do that, you add & in front of the argument when declaring the function: function modify(&$a) { $a = 3; } Now, after invoking the modify function, $a will be always 3. Arguments by value versus by reference PHP allows you to do it, and in fact, some native functions of PHP use arguments by reference—remember the array sorting functions; they did not return the sorted array; instead, they sorted the array provided. But using arguments by reference is a way of confusing developers. Usually, when someone uses a function, they expect a result, and they do not want their provided arguments to be modified. So, try to avoid it; people will be grateful! The return statement You can have as many return statements as you want inside your function, but PHP will exit the function as soon as it finds one. This means that if you have two consecutive return statements, the second one will never be executed. Still, having multiple return statements can be useful if they are inside conditionals. Add this function inside your functions.php file: function loginMessage() { if (isset($_COOKIE['username'])) { return "You are " . $_COOKIE['username']; } else { return "You are not authenticated."; } } Let's use it in your index.php file by replacing the highlighted content—note that to save some tees, I replaced most of the code that was not changed at all with //…: //... <body> <?php echo loginMessage(); ?> <?php if (isset($_GET['title']) && isset($_GET['author'])): ?> //... Additionally, you can omit the return statement if you do not want the function to return anything. In this case, the function will end once it reaches the end of the block of code. Type hinting and return types With the release of PHP7, the language allows developers to be more specific about what functions get and return. You can—always optionally—specify the type of argument that the function needs, for example, type hinting, and the type of result the function will return—return type. Let's first see an example: <?php declare(strict_types=1); function addNumbers(int $a, int $b, bool $printSum): int { $sum = $a + $b; if ($printSum) { echo 'The sum is ' . $sum; } return $sum; } addNumbers(1, 2, true); addNumbers(1, '2', true); // it fails when strict_types is 1 addNumbers(1, 'something', true); // it always fails This function states that the arguments need to be an integer, and Boolean, and that the result will be an integer. Now, you know that PHP has type juggling, so it can usually transform a value of one type to its equivalent value of another type, for example, the string 2 can be used as integer 2. To stop PHP from using type juggling with the arguments and results of functions, you can declare the strict_types directive as shown in the first highlighted line. This directive has to be declared on the top of each file, where you want to enforce this behavior. The three invocations work as follows: The first invocation sends two integers and a Boolean, which is what the function expects. So, regardless of the value of strict_types, it will always work. The second invocation sends an integer, a string, and a Boolean. The string has a valid integer value, so if PHP was allowed to use type juggling, the invocation would resolve to just normal. But in this example, it will fail because of the declaration on top of the file. The third invocation will always fail as the something string cannot be transformed into a valid integer. Let's try to use a function within our project. In our index.php file, we have a foreach loop that iterates the books and prints them. The code inside the loop is kind of hard to understand as it is mixing HTML with PHP, and there is a conditional too. Let's try to abstract the logic inside the loop into a function. First, create the new functions.php file with the following content: <?php function printableTitle(array $book): string { $result = '' . $book['title'] . ' - ' . $book['author']; if (!$book['available']) { $result .= ' Not available'; } return $result; } This file will contain our functions. The first one, printableTitle, takes an array representing a book and builds a string with a nice representation of the book in HTML. The code is the same as before, just encapsulated in a function. Now, index.php will have to include the functions.php file and then use the function inside the loop. Let's see how this is done: <?php require_once 'functions.php' ?> <!DOCTYPE html> <html lang="en"> //... ?> <ul> <?php foreach ($books as $book): ?> <li><?php echo printableTitle($book); ?> </li> <?php endforeach; ?> </ul> //... Well, now our loop looks way cleaner, right? Also, if we need to print the title of the book somewhere else, we can reuse the function instead of duplicating code! Summary In this article, we went through all the basics of procedural PHP while writing simple examples in order to practice them. You now know how to use variables and arrays with control structures and functions and how to get information from HTTP requests among others. Resources for Article: Further resources on this subject: Getting started with Modernizr using PHP IDE[article] PHP 5 Social Networking: Implementing Public Messages[article] Working with JSON in PHP jQuery[article]

0
0
16342

How-To Tutorials

article-image-customizing-kernel-and-boot-sequence

Packt

08 Nov 2016

35 min read

Customizing Kernel and Boot Sequence

Packt

08 Nov 2016

35 min read

0
0
16342

How-To Tutorials

article-image-performing-sentiment-analysis-with-r-on-obamas-state-of-the-union-speeches-tutorial

Sugandha Lahoti

23 Sep 2018

16 min read

Performing Sentiment Analysis with R on Obama's State of the Union speeches [Tutorial]

Sugandha Lahoti

23 Sep 2018

16 min read

For this article, we will take a look at former President Obama's State of the Union speeches. We will be performing Sentiment Analysis with R on Obama's State of the Union speeches. The two main analytical goals are to build topic models on the six State of the Union speeches and then compare the first speech in 2010 and the last in January 2016 for sentence-based textual measures, such as sentiment and dispersion. This tutorial is taken from the book Mastering Machine Learning with R - Second Edition by Cory Lesmeister. In this book, you will master machine learning techniques with R to deliver insights in complex projects. Preparing our Data and performing text transformations The primary package that we will use is tm, the text mining package. We will also need SnowballC for the stemming of the words, RColorBrewer for the color palettes in wordclouds, and the wordcloud package. Please ensure that you have these packages installed before attempting to load them: > library(tm) > library(wordcloud) > library(RColorBrewer) The data files are available for download in https://github.com/datameister66/data. Please ensure you put the text files into a separate directory because it will all go into our corpus for analysis. Download the seven .txt files, for example, sou2012.txt, into your working R directory. You can identify your current working directory and set it with these functions: > getwd() > setwd(".../data") We can now begin to create the corpus by first creating an object with the path to the speeches and then seeing how many files are in this directory and what they are named: > name <- file.path(".../text") > length(dir(name)) [1] 7 > dir(name) [1] "sou2010.txt" "sou2011.txt" "sou2012.txt" "sou2013.txt" [5] "sou2014.txt" "sou2015.txt" "sou2016.txt" We will name our corpus docs and create it with the Corpus() function, wrapped around the directory source function, DirSource(), which is also part of the tm package: > docs <- Corpus(DirSource(name)) > docs <<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 7 Note that there is no corpus or document level metadata. There are functions in the tm package to apply things such as author's names and timestamp information, among others, at both document level and corpus. We will not utilize this for our purposes. We can now begin the text transformations using the tm_map() function from the tm package. These will be the transformations that we discussed previously--lowercase letters, remove numbers, remove punctuation, remove stop words, strip out the whitespace, and stem the words: > docs <- tm_map(docs, tolower) > docs <- tm_map(docs, removeNumbers) > docs <- tm_map(docs, removePunctuation) > docs <- tm_map(docs, removeWords, stopwords("english")) > docs <- tm_map(docs, stripWhitespace) At this point, it is a good idea to eliminate unnecessary words. For example, during the speeches, when Congress applauds a statement, you will find (Applause) in the text. This must be removed: > docs <- tm_map(docs, removeWords, c("applause", "can", "cant", "will", "that", "weve", "dont", "wont", "youll", "youre")) After completing the transformations and removal of other words, make sure that your documents are plain text, put it in a document-term matrix, and check the dimensions: > docs = tm_map(docs, PlainTextDocument) > dtm = DocumentTermMatrix(docs) > dim(dtm) [1] 7 4738 The six speeches contain 4738 words. It is optional, but one can remove the sparse terms with the removeSparseTerms() function. You will need to specify a number between zero and one where the higher the number, the higher the percentage of sparsity in the matrix. Sparsity is the relative frequency of a term in the documents. So, if your sparsity threshold is 0.75, only terms with sparsity greater than 0.75 are removed. For us that would be (1 - 0.75) * 7, which is equal to 1.75. Therefore, any term in fewer than two documents would be removed: > dtm <- removeSparseTerms(dtm, 0.75) > dim(dtm) [1] 7 2254 As we don't have the metadata on the documents, it is important to name the rows of the matrix so that we know which document is which: > rownames(dtm) <- c("2010", "2011", "2012", "2013", "2014", "2015", "2016") Using the inspect() function, you can examine the matrix. Here, we will look at the seven rows and the first five columns: > inspect(dtm[1:7, 1:5]) Terms Docs abandon ability able abroad absolutely 2010 0 1 1 2 2 2011 1 0 4 3 0 2012 0 0 3 1 1 2013 0 3 3 2 1 2014 0 0 1 4 0 2015 1 0 1 1 0 2016 0 0 1 0 0 It appears that our data is ready for analysis, starting with looking at the word frequency counts. Let me point out that the output demonstrates why I've been trained to not favor wholesale stemming. You may be thinking that 'ability' and 'able' could be combined. If you stemmed the document you would end up with 'abl'. How does that help the analysis? I think you lose context, at least in the initial analysis. Again, I recommend applying stemming thoughtfully and judiciously. Data Modeling and evaluation Modeling will be broken into two distinct parts. The first will focus on word frequency and correlation and culminate in the building of a topic model. In the next portion, we will examine many different quantitative techniques by utilizing the power of the qdap package in order to compare two different speeches. Word frequency and topic models As we have everything set up in the document-term matrix, we can move on to exploring word frequencies by creating an object with the column sums, sorted in descending order. It is necessary to use as.matrix() in the code to sum the columns. The default order is ascending, so putting - in front of freq will change it to descending: > freq <- colSums(as.matrix(dtm)) > ord <- order(-freq) We will examine the head and tail of the object with the following code: > freq[head(ord)] new america people jobs now years 193 174 168 163 157 148 > freq[tail(ord)] wright written yearold youngest youngstown zero 2 2 2 2 2 2 The most frequent word is new and, as you might expect, the president mentions america frequently. Also, notice how important employment is with the frequency of jobs. I find it interesting that he mentions Youngstown, for Youngstown, OH, a couple of times. To look at the frequency of the word frequency, you can create tables, as follows: > head(table(freq)) freq 2 3 4 5 6 7 596 354 230 141 137 89 > tail(table(freq)) freq 148 157 163 168 174 193 1 1 1 1 1 1 What these tables show is the number of words with that specific frequency. So 354 words occurred three times; and one word, new in our case, occurred 193 times. Using findFreqTerms(), we can see which words occurred at least 125 times: > findFreqTerms(dtm, 125) [1] "america" "american" "americans" "jobs" "make" "new" [7] "now" "people" "work" "year" "years" You can find associations with words by correlation with the findAssocs() function. Let's look at jobs as two examples using 0.85 as the correlation cutoff: > findAssocs(dtm, "jobs", corlimit = 0.85) $jobs colleges serve market shouldnt defense put tax came 0.97 0.91 0.89 0.88 0.87 0.87 0.87 0.86 For visual portrayal, we can produce wordclouds and a bar chart. We will do two wordclouds to show the different ways to produce them: one with a minimum frequency and the other by specifying the maximum number of words to include. The first one with a minimum frequency also includes code to specify the color. The scale syntax determines the minimum and maximum word size by frequency; in this case, the minimum frequency is 70: > wordcloud(names(freq), freq, min.freq = 70, scale = c(3, .5), colors = brewer.pal(6, "Dark2")) The output of the preceding command is as follows: One can forgo all the fancy graphics, as we will in the following image, capturing the 25 most frequent words: > wordcloud(names(freq), freq, max.words = 25) The output of the preceding command is as follows: To produce a bar chart, the code can get a bit complicated, whether you use base R, ggplot2, or lattice. The following code will show you how to produce a bar chart for the 10 most frequent words in base R: > freq <- sort(colSums(as.matrix(dtm)), decreasing = TRUE) > wf <- data.frame(word = names(freq), freq = freq) > wf <- wf[1:10, ] > barplot(wf$freq, names = wf$word, main = "Word Frequency", xlab = "Words", ylab = "Counts", ylim = c(0, 250)) The output of the preceding command is as follows: We will now move on to building topic models using the topicmodels package, which offers the LDA() function. The question now is how many topics to create. It seems logical to solve for three topics (k=3). Certainly, I encourage you to try other numbers of topics: > library(topicmodels) > set.seed(123) > lda3 <- LDA(dtm, k = 3, method = "Gibbs") > topics(lda3) 2010 2011 2012 2013 2014 2015 2016 2 1 1 1 3 3 2 We can see an interesting transition over time. The first and last addresses have the same topic grouping, almost as if he opened and closed his tenure with the same themes. Using the terms() function produces a list of an ordered word frequency for each topic. The list of words is specified in the function, so let's look at the top 20 per topic: > terms(lda3, 25) Topic 1 Topic 2 Topic 3 [1,] "jobs" "people" "america" [2,] "now" "one" "new" [3,] "get" "work" "every" [4,] "tonight" "just" "years" [5,] "last" "year" "like" [6,] "energy" "know" "make" [7,] "tax" "economy" "time" [8,] "right" "americans" "need" [9,] "also" "businesses" "american" [10,] "government" "even" "world" [11,] "home" "give" "help" [12,] "well" "many" "lets" [13,] "american" "security" "want" [14,] "two" "better" "states" [15,] "congress" "come" "first" [16,] "country" "still" "country" [17,] "reform" "workers" "together" [18,] "must" "change" "keep" [19,] "deficit" "take" "back" [20,] "support" "health" "americans" [21,] "business" "care" "way" [22,] "education" "families" "hard" [23,] "companies" "made" "today" [24,] "million" "future" "working" [25,] "nation" "small" "good" Topic 2 covers the first and last speeches. Nothing really stands out as compelling in that topic as the others. It will be interesting to see how the next analysis can yield insights into those speeches. Topic 1 covers the next three speeches. Here, the message transitions to "jobs", "energy", "reform", and the "deficit", not to mention the comments about "education" and as we saw above, the correlation of "jobs" and "colleges". Topic 3 brings us to the next two speeches. The focus seems to really shift on to the economy and business with mentions to "security" and healthcare. In the next section, we can dig into the exact speech content further, along with comparing and contrasting the first and last State of the Union addresses. Additional quantitative analysis This portion of the analysis will focus on the power of the qdap package. It allows you to compare multiple documents over a wide array of measures. Our effort will be on comparing the 2010 and 2016 speeches. For starters, we will need to turn the text into data frames, perform sentence splitting, and then combine them into one data frame with a variable created that specifies the year of the speech. We will use this as our grouping variable in the analyses. Dealing with text data, even in R, can be tricky. The code that follows seemed to work the best, in this case, to get the data loaded and ready for analysis. We first load the qdap package. Then, to bring in the data from a text file, we will use the readLines() function from base R, collapsing the results to eliminate unnecessary whitespace. I also recommend putting your text encoding to ASCII, otherwise, you may run into some bizarre text that will mess up your analysis. That is done with the iconv() function: > library(qdap) > speech16 <- paste(readLines("sou2016.txt"), collapse=" ") Warning message: In readLines("sou2016.txt") : incomplete final line found on 'sou2016.txt' > speech16 <- iconv(speech16, "latin1", "ASCII", "") The warning message is not an issue as it is just telling us that the final line of text is not the same length as the other lines in the .txt file. We now apply the qprep() function from qdap. This function is a wrapper for a number of other replacement functions and using it will speed pre-processing, but it should be used with caution if a more detailed analysis is required. The functions it passes through are as follows: bracketX(): apply bracket removal replace_abbreviation(): replaces abbreviations replace_number(): numbers to words, for example '100' becomes 'one hundred' replace_symbol(): symbols become words, for example @ becomes 'at' > prep16 <- qprep(speech16) The other pre-processing we should do is to replace contractions (can't to cannot), remove stopwords, in our case the top 100, and remove unwanted characters, with the exception of periods and question marks. They will come in handy shortly: > prep16 <- replace_contraction(prep16) > prep16 <- rm_stopwords(prep16, Top100Words, separate = F) > prep16 <- strip(prep16, char.keep = c("?", ".")) Critical to this analysis is to now split it into sentences and add what will be the grouping variable, the year of the speech. This also creates the tot variable, which stands for Turn of Talk, serving as an indicator of sentence order. This is especially helpful in a situation where you are analyzing dialogue, say in a debate or question and answer session: > sent16 <- data.frame(speech = prep16) > sent16 <- sentSplit(sent16, "speech") > sent16$year <- "2016" Repeat the steps for the 2010 speech: > speech10 <- paste(readLines("sou2010.txt"), collapse=" ") > speech10 <- iconv(speech10, "latin1", "ASCII", "") > speech10 <- gsub("(Applause.)", "", speech10) > prep10 <- qprep(speech10) > prep10 <- replace_contraction(prep10) > prep10 <- rm_stopwords(prep10, Top100Words, separate = F) > prep10 <- strip(prep10, char.keep = c("?", ".")) > sent10 <- data.frame(speech = prep10) > sent10 <- sentSplit(sent10, "speech") > sent10$year <- "2010" Concatenate the separate years into one dataframe: > sentences <- data.frame(rbind(sent10, sent16)) One of the great things about the qdap package is that it facilitates basic text exploration, as we did before. Let's see a plot of frequent terms: > plot(freq_terms(sentences$speech)) The output of the preceding command is as follows: You can create a word frequency matrix that provides the counts for each word by speech: > wordMat <- wfm(sentences$speech, sentences$year) > head(wordMat[order(wordMat[, 1], wordMat[, 2],decreasing = TRUE),]) 2010 2016 our 120 85 us 33 33 year 29 17 americans 28 15 why 27 10 jobs 23 8 This can also be converted into a document-term matrix with the function as.dtm() should you so desire. Let's next build wordclouds, by year with qdap functionality: > trans_cloud(sentences$speech, sentences$year, min.freq = 10) The preceding command produces the following two images: Comprehensive word statistics are available. Here is a plot of the stats available in the package. The plot loses some of its visual appeal with just two speeches but is revealing nonetheless. A complete explanation of the stats is available under ?word_stats: > ws <- word_stats(sentences$speech, sentences$year, rm.incomplete = T) > plot(ws, label = T, lab.digits = 2) The output of the preceding command is as follows: Notice that the 2016 speech was much shorter, with over a hundred fewer sentences and almost a thousand fewer words. Also, there seems to be the use of asking questions as a rhetorical device in 2016 versus 2010 (n.quest 10 versus n.quest 4). To compare the polarity (sentiment scores), use the polarity() function, specifying the text and grouping variables: > pol = polarity(sentences$speech, sentences$year) > pol year total.sentences total.words ave.polarity sd.polarity stan.mean.polarity 1 2010 435 3900 0.052 0.432 0.121 2 2016 299 2982 0.105 0.395 0.267 The stan.mean.polarity value represents the standardized mean polarity, which is the average polarity divided by the standard deviation. We see that 2015 was slightly higher (0.267) than 2010 (0.121). This is in line with what we would expect, wanting to end on a more positive note. You can also plot the data. The plot produces two charts. The first shows the polarity by sentences over time and the second shows the distribution of the polarity: > plot(pol) The output of the preceding command is as follows: This plot may be a challenge to read in this text, but let me do my best to interpret it. The 2010 speech starts out with a strong negative sentiment and is slightly more negative than 2016. We can identify the most negative sentiment sentence by creating a dataframe of the pol object, find the sentence number, and produce it: > pol.df <- pol$all > which.min(pol.df$polarity) [1] 12 > pol.df$text.var[12] [1] "One year ago, I took office amid two wars, an economy rocked by a severe recession, a financial system on the verge of collapse, and a government deeply in debt. Now that is negative sentiment! Ironically, the government is even more in debt today. We will look at the readability index next: > ari <- automated_readability_index(sentences$speech, sentences$year) > ari$Readability year word.count sentence.count character.count 1 2010 3900 435 23859 2 2016 2982 299 17957 Automated_Readability_Index 1 11.86709 2 11.91929 I think it is no surprise that they are basically the same. Formality analysis is next. This takes a couple of minutes to run in R: > form <- formality(sentences$speech, sentences$year) > form year word.count formality 1 2016 2983 65.61 2 2010 3900 63.88 This looks to be very similar. We can examine the proportion of the parts of the speech. A plot is available, but adds nothing to the analysis, in this instance: > form$form.prop.by year word.count noun adj prep articles pronoun 1 2010 3900 44.18 15.95 3.67 0 4.51 2 2016 2982 43.46 17.37 4.49 0 4.96 verb adverb interj other 1 23.49 7.77 0.05 0.38 2 21.73 7.41 0.00 0.57 Now, the diversity measures are produced. Again, they are nearly identical. A plot is also available, (plot(div)), but being so similar, it once again adds no value. It is important to note that Obama's speechwriter for 2010 was Jon Favreau, and in 2016, it was Cody Keenan: > div <- diversity(sentences$speech, sentences$year) > div year wc simpson shannon collision berger_parker brillouin 1 2010 3900 0.998 6.825 5.970 0.031 6.326 2 2015 2982 0.998 6.824 6.008 0.029 6.248 One of my favorite plots is the dispersion plot. This shows the dispersion of a word throughout the text. Let's examine the dispersion of "jobs", "families", and "economy": > dispersion_plot(sentences$speech, rm.vars = sentences$year, c("security", "jobs", "economy"), color = "black", bg.color = "white") The output of the preceding command is as follows: This completes our analysis of the two speeches. The analysis showed that, although the speeches had a similar style, the core messages changed over time as the political landscape changed. This extract is taken from the book Mastering Machine Learning with R - Second Edition. Read the book to know more advanced prediction, algorithms, and learning methods with R. Understanding Sentiment Analysis and other key NLP concepts Twitter Sentiment Analysis Sentiment Analysis of the 2017 US elections on Twitter

0
0
16337

How-To Tutorials

article-image-overview-horizon-view-architecture-and-its-components

Packt

27 Mar 2015

31 min read

An Overview of Horizon View Architecture and its Components

Packt

27 Mar 2015

31 min read

0
0
16331

article-image-getting-inside-c-plus-plus-multithreaded-application

Maya Posch

13 Feb 2018

8 min read

Getting Inside a C++ Multithreaded Application

Maya Posch

13 Feb 2018

8 min read

This C++ programming tutorial is taken from Maya Posch's Mastering C++ Multithreading. In its most basic form, a multithreaded application consists of a singular process with two or more threads. These threads can be used in a variety of ways, for example, to allow the process to respond to events in an asynchronous manner by using one thread per incoming event or type of event, or to speed up the processing of data by splitting the work across multiple threads. Examples of asynchronous response to events include the processing of user interface (GUI) and network events on separate threads so that neither type of event has to wait on the other, or can block events from being responded to in time. Generally, a single thread performs a single task, such as the processing of GUI or network events, or the processing of data. For this basic example, the application will start with a singular thread, which will then launch a number of threads, and wait for them to finish. Each of these new threads will perform its own task before finishing. Let's start with the includes and global variables for our application: #include <iostream> #include <thread> #include <mutex> #include <vector> #include <random> using namespace std; // --- Globals mutex values_mtx; mutex cout_mtx; Both the I/O stream and vector headers should be familiar to anyone who has ever used C++: the former is here used for the standard output (cout), and vector for storing a sequence of values. The random header is new in c++11, and as the name suggests, it offers classes and methods for generating random sequences. We use it here to make our threads do something interesting. Finally, the thread and mutex includes are the core of our multithreaded application; they provide the basic means for creating threads, and allow for thread-safe interactions between them. Moving on, we create two mutexes: one for the global vector and one for cout, since the latter is not thread-safe. Next we create the main function as follows: int main() { values.push_back(42); We then push a fixed value onto the vector instance; this one will be used by the threads we create in a moment: thread tr1(threadFnc, 1); thread tr2(threadFnc, 2); thread tr3(threadFnc, 3); thread tr4(threadFnc, 4); We create new threads, and provide them with the name of the method to use, passing along any parameters--in this case, just a single integer: tr1.join(); tr2.join(); tr3.join(); tr4.join(); Next, we wait for each thread to finish before we continue by calling join() on each thread instance: cout << "Input: " << values[0] << ", Result 1: " << values[1] << ", Result 2: " << values[2] << ", Result 3: " << values[3] << ", Result 4: " << values[4] << "n"; return 1; } At this point, we expect that each thread has done whatever it's supposed to do, and added the result to the vector, which we then read out and show the user. Of course, this shows almost nothing of what really happens in the application, mostly just the essential simplicity of using threads. Next, let's see what happens inside this method that we pass to each thread instance: void threadFnc(int tid) { cout_mtx.lock(); cout << "Starting thread " << tid << ".n"; cout_mtx.unlock(); When we obtain the initial value set in the vector, we copy it to a local variable so that we can immediately release the mutex for the vector to enable other threads to use it: int rval = randGen(0, 10); val += rval; These last two lines contain the essence of what the threads created do: they take the initial value, and add a randomly generated value to it. The randGen() method takes two parameters, defining the range of the returned value: cout_mtx.lock(); cout << "Thread " << tid << " adding " << rval << ". New value: " << val << ".n"; cout_mtx.unlock(); values_mtx.lock(); values.push_back(val); values_mtx.unlock(); } Finally, we (safely) log a message informing the user of the result of this action before adding the new value to the vector. In both cases, we use the respective mutex to ensure that there can be no overlap with any of the other threads. Once the method reaches this point, the thread containing it will terminate, and the main thread will have one fewer thread to wait for to rejoin. Lastly, we'll take a look at the randGen() method. Here we can see some multithreaded specific additions as well: int randGen(const int& min, const int& max) { static thread_local mt19937 generator(hash<thread::id>()(this_thread::get_id())); uniform_int_distribution<int> distribution(min, max); return distribution(generator) } This preceding method takes a minimum and maximum value as explained earlier, which limit the range of the random numbers this method can return. At its core, it uses a mt19937-based generator, which employs a 32-bit Mersenne Twister algorithm with a state size of 19937 bits. This is a common and appropriate choice for most applications. Of note here is the use of the thread_local keyword. What this means is that even though it is defined as a static variable, its scope will be limited to the thread using it. Every thread will thus create its own generator instance, which is important when using the random number API in the STL. A hash of the internal thread identifier (not our own) is used as seed for the generator. This ensures that each thread gets a fairly unique seed for its generator instance, allowing for better random number sequences. Finally, we create a new uniform_int_distribution instance using the provided minimum and maximum limits, and use it together with the generator instance to generate the random number which we return. Makefile In order to compile the code described earlier, one could use an IDE, or type the command on the command line. As mentioned in the beginning of this chapter, we'll be using makefiles for the examples in this book. The big advantages of this are that one does not have to repeatedly type in the same extensive command, and it is portable to any system which supports make. The makefile for this example is rather basic: GCC := g++ OUTPUT := ch01_mt_example SOURCES := $(wildcard *.cpp) CCFLAGS := -std=c++11 all: $(OUTPUT) $(OUTPUT): clean: rm $(OUTPUT) .PHONY: all From the top down, we first define the compiler that we'll use (g++), set the name of the output binary (the .exe extension on Windows will be post-fixed automatically), followed by the gathering of the sources and any important compiler flags. The wildcard feature allows one to collect the names of all files matching the string following it in one go without having to define the name of each source file in the folder individually. For the compiler flags, we're only really interested in enabling the c++11 features, for which GCC still requires one to supply this compiler flag. For the all method, we just tell make to run g++ with the supplied information. Next we define a simple clean method which just removes the produced binary, and finally, we tell make to not interpret any folder or file named all in the folder, but to use the internal method with the .PHONY section. When we run this makefile, we see the following command-line output: $ make Afterwards, we find an executable file called ch01_mt_example (with the .exe extension attached on Windows) in the same folder. Executing this binary will result in a command-line output akin to the following: $ ./ch01_mt_example.exe Starting thread 1. Thread 1 adding 8. New value: 50. Starting thread 2. Thread 2 adding 2. New value: 44. Starting thread 3. Starting thread 4. Thread 3 adding 0. New value: 42. Thread 4 adding 8. New value: 50. Input: 42, Result 1: 50, Result 2: 44, Result 3: 42, Result 4: 50 What one can see here already is the somewhat asynchronous nature of threads and their output. While threads 1 and 2 appear to run synchronously, threads 3 and 4 clearly run asynchronously. For this reason, and especially in longer-running threads, it's virtually impossible to say in which order the log output and results will be returned. While we use a simple vector to collect the results of the threads, there is no saying whether Result 1 truly originates from the thread which we assigned ID 1 in the beginning. If we need this information, we need to extend the data we return by using an information structure with details on the processing thread or similar. One could, for example, use struct like this: struct result { int tid; int result; }; The vector would then be changed to contain result instances rather than integer instances. One could pass the initial integer value directly to the thread as part of its parameters, or pass it via some other way. Want to learn C++ multithreading in detail? You can find Mastering C++ Multithreading here, or explore all our latest C++ eBooks and videos here.

0
0
16318

article-image-cross-platform-solution-xamarinforms-and-mvvm-architecture

Packt

22 Feb 2016

9 min read

A cross-platform solution with Xamarin.Forms and MVVM architecture

Packt

22 Feb 2016

9 min read

In this article by George Taskos, the author of the book, Xamarin Cross Platform Development Cookbook, we will discuss a cross-platform solution with Xamarin.Forms and MVVM architecture. Creating a cross-platform solution correctly requires a lot of things to be taken under consideration. In this article, we will quickly provide you with a starter MVVM architecture showing data retrieved over the network in a ListView control. (For more resources related to this topic, see here.) How to do it... In Xamarin Studio, click on File | New | Xamarin.Forms App. Provide the name XamFormsMVVM. Add the NuGet dependencies by right-clicking on each project in the solution and choosing Add | Add NuGet Packages…. Search for the packages XLabs.Forms and modernhttpclient, and install them. Repeat step 2 for the XamFormsMVVM portable class library and add the packages Microsoft.Net.Http and Newtonsoft.Json. In the XamFormsMVVM portable class library, create the following folders: Models, ViewModels, and Views. To create a folder, right-click on the project and select Add | New Folder. Right-click on the Models folder and select Add | New File…, choose the General | Empty Interface template, name it IDataService, and click on New, and add the following code: public interface IDataService { Task<IEnumerable<OrderModel>> GetOrdersAsync (); } Right-click on the Models folder again and select Add | New File…, choose the General | Empty Class template, name it DataService, and click on New, and add the following code: [assembly: Xamarin.Forms.Dependency (typeof (DataService))] namespace XamFormsMVVM{ public class DataService : IDataService { protected const string BaseUrlAddress = @"https://api.parse.com/1/classes"; protected virtual HttpClient GetHttpClient() { HttpClient httpClient = new HttpClient(new NativeMessageHandler()); httpClient.BaseAddress = new Uri(BaseUrlAddress); httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue ("application/json")); return httpClient; } public async Task<IEnumerable<OrderModel>> GetOrdersAsync () { using (HttpClient client = GetHttpClient ()) { HttpRequestMessage requestMessage = new HttpRequestMessage(HttpMethod.Get, client.BaseAddress + "/Order"); requestMessage.Headers.Add("X-Parse- Application-Id", "fwpMhK1Ot1hM9ZA4iVRj49VFz DePwILBPjY7wVFy"); requestMessage.Headers.Add("X-Parse-REST- API-Key", "egeLQVTC7IsQJGd8GtRj3ttJV RECIZgFgR2uvmsr"); HttpResponseMessage response = await client.SendAsync(requestMessage); response.EnsureSuccessStatusCode (); string ordersJson = await response.Content.ReadAsStringAsync(); JObject jsonObj = JObject.Parse (ordersJson); JArray ordersResults = (JArray)jsonObj ["results"]; return JsonConvert.DeserializeObject <List<OrderModel>> (ordersResults.ToString ()); } } } } Right-click on the Models folder and select Add | New File…, choose the General | Empty Interface template, name it IDataRepository, and click on New, and add the following code: public interface IDataRepository { Task<IEnumerable<OrderViewModel>> GetOrdersAsync (); } Right-click on the Models folder and select Add | New File…, choose the General | Empty Class template, name it DataRepository, and click on New, and add the following code in that file: [assembly: Xamarin.Forms.Dependency (typeof (DataRepository))] namespace XamFormsMVVM { public class DataRepository : IDataRepository { private IDataService DataService { get; set; } public DataRepository () : this(DependencyService.Get<IDataService> ()) { } public DataRepository (IDataService dataService) { DataService = dataService; } public async Task<IEnumerable<OrderViewModel>> GetOrdersAsync () { IEnumerable<OrderModel> orders = await DataService.GetOrdersAsync ().ConfigureAwait (false); return orders.Select (o => new OrderViewModel (o)); } } } In the ViewModels folder, right-click on Add | New File… and name it OrderViewModel. Add the following code in that file: public class OrderViewModel : XLabs.Forms.Mvvm.ViewModel { string _orderNumber; public string OrderNumber { get { return _orderNumber; } set { SetProperty (ref _orderNumber, value); } } public OrderViewModel (OrderModel order) { OrderNumber = order.OrderNumber; } public override string ToString () { return string.Format ("[{0}]", OrderNumber); } } Repeat step 5 and create a class named OrderListViewModel.cs: public class OrderListViewModel : XLabs.Forms.Mvvm.ViewModel{ protected IDataRepository DataRepository { get; set; } ObservableCollection<OrderViewModel> _orders; public ObservableCollection<OrderViewModel> Orders { get { return _orders; } set { SetProperty (ref _orders, value); } } public OrderListViewModel () : this(DependencyService.Get<IDataRepository> ()) { } public OrderListViewModel (IDataRepository dataRepository) { DataRepository = dataRepository; DataRepository.GetOrdersAsync ().ContinueWith (antecedent => { if (antecedent.Status == TaskStatus.RanToCompletion) { Orders = new ObservableCollection<OrderViewModel> (antecedent.Result); } }, TaskScheduler. FromCurrentSynchronizationContext ()); } } Right-click on the Views folder and choose Add | New File…, select the Forms | Forms Content Page Xaml, name it OrderListView, and click on New: <?xml version="1.0" encoding="UTF-8"?> <ContentPage x_Class="XamFormsMVVM.OrderListView" Title="Orders"> <ContentPage.Content> <ListView ItemsSource="{Binding Orders}"/> </ContentPage.Content> </ContentPage> Go to XmaFormsMVVM.cs and replace the contents with the following code: public App() { if (!Resolver.IsSet) { SetIoc (); } RegisterViews(); MainPage = new NavigationPage((Page)ViewFactory. CreatePage<OrderListViewModel, OrderListView>()); } private void SetIoc() { var resolverContainer = new SimpleContainer(); Resolver.SetResolver (resolverContainer.GetResolver()); } private void RegisterViews() { ViewFactory.Register<OrderListView, OrderListViewModel>(); } Run the application, and you will get results like the following screenshots: For Android: For iOS: How it works… A cross-platform solution should share as much logic and common operations as possible, such as retrieving and/or updating data in a local database or over the network, having your logic centralized, and coordinating components. With Xamarin.Forms, you even have a cross-platform UI, but this shouldn't stop you from separating the concerns correctly; the more abstracted you are from the user interface and programming against interfaces, the easier it is to adapt to changes and remove or add components. Starting with models and creating a DataService implementation class with its equivalent interface, IDataService retrieves raw JSON data over the network from the Parse API and converts it to a list of OrderModel, which are POCO classes with just one property. Every time you invoke the GetOrdersAsync method, you get the same 100 orders from the server. Notice how we used the Dependency attribute declaration above the namespace to instruct DependencyService that we want to register this implementation class for the interface. We took a step to improve the performance of the REST client API; although we do use the HTTPClient package, we pass a delegate handler, NativeMessageHandler, when constructing in the GetClient() method. This handler is part of the modernhttpclient NuGet package and it manages undercover to use a native REST API for each platform: NSURLSession in iOS and OkHttp in Android. The IDataService interface is used by the DataRepository implementation, which acts as a simple intermediate repository layer converting the POCO OrderModel received from the server in OrderViewModel instances. Any model that is meant to be used on a view is a ViewModel, the view's model, and also, when retrieving and updating data, you don't carry business logic. Only data logic that is known should be included as data transfer objects. Dependencies, such as in our case, where we have a dependency of IDataService for the DataRepository to work, should be clear to classes that will use the component, which is why we create a default empty constructor required from the XLabs ViewFactory class, but in reality, we always invoke the constructor that accepts an IDataService instance; this way, when we unit test this unit, we can pass our mock IDataService class and test the functionality of the methods. We are using the DependencyService class to register the implementation to its equivalent IDataRepository interface here as well. OrderViewModel inherits XLabs.Forms.ViewModel; it is a simple ViewModel class with one property raising property change notifications and accepting an OrderModel instance as a dependency in the default constructor. We override the ToString() method too for a default string representation of the object, which simplifies the ListView control without requiring us, in our example, to use a custom cell with DataTemplate. The second ViewModel in our architecture is the OrderListViewModel, which inherits XLabs.Forms.ViewModel too and has a dependency of IDataRepository, following the same pattern with a default constructor and a constructor with the dependency argument. This ViewModel is responsible for retrieving a list of OrderViewModel and holding it to an ObservableCollection<OrderViewModel> instance that raises collection change notifications. In the constructor, we invoke the GetOrdersAsync() method and register an action delegate handler to be invoked on the main thread when the task has finished passing the orders received in a new ObservableCollection<OrderViewModel> instance set to the Orders property. The view of this recipe is super simple: in XAML, we set the title property which is used in the navigation bar for each platform and we leverage the built-in data-binding mechanism of Xamarin.Forms to bind the Orders property in the ListView ItemsSource property. This is how we abstract the ViewModel from the view. But we need to provide a BindingContext class to the view while still not coupling the ViewModel to the view, and Xamarin Forms Labs is a great framework for filling the gap. XLabs has a ViewFactory class; with this API, we can register the mapping between a view and a ViewModel, and the framework will take care of injecting our ViewModel into the BindingContext class of the view. When a page is required in our application, we use the ViewFactory.CreatePage class, which will construct and provide us with the desired instance. Xamarin Forms Labs uses a dependency resolver internally; this has to be set up early in the application startup entry point, so it is handled in the App.cs constructor. Run the iOS application in the simulator or device and in your preferred Android emulator or device; the result is the same with the equivalent native themes for each platform. Summary Xamarin.Forms is a great cross-platform UI framework that you can use to describe your user interface code declaratives in XAML, and it will be translated into the equivalent native views and pages with the ability of customizing each native application layer. Xamarin.Forms and MVVM are made for each other; the pattern fits naturally into the design of native cross-platform mobile applications and abstracts the view from the data easy using the built-in data-binding mechanism. Resources for Article: Further resources on this subject: Code Sharing Between iOS and Android [Article] Working with Xamarin.Android [Article] Sharing with MvvmCross [Article]

0
0
16314

article-image-rendering-stereoscopic-3d-models-using-opengl

Packt

24 Aug 2015

8 min read

Rendering Stereoscopic 3D Models using OpenGL

Packt

24 Aug 2015

8 min read

In this article, by Raymond C. H. Lo and William C. Y. Lo, authors of the book OpenGL Data Visualization Cookbook, we will demonstrate how to visualize data with stunning stereoscopic 3D technology using OpenGL. Stereoscopic 3D devices are becoming increasingly popular, and the latest generation's wearable computing devices (such as the 3D vision glasses from NVIDIA, Epson, and more recently, the augmented reality 3D glasses from Meta) can now support this feature natively. The ability to visualize data in a stereoscopic 3D environment provides a powerful and highly intuitive platform for the interactive display of data in many applications. For example, we may acquire data from the 3D scan of a model (such as in architecture, engineering, and dentistry or medicine) and would like to visualize or manipulate 3D objects in real time. Unfortunately, OpenGL does not provide any mechanism to load, save, or manipulate 3D models. Thus, to support this, we will integrate a new library named Open Asset Import Library (Assimp) into our code. The main dependencies include the GLFW library that requires OpenGL version 3.2 and higher. (For more resources related to this topic, see here.) Stereoscopic 3D rendering 3D television and 3D glasses are becoming much more prevalent with the latest trends in consumer electronics and technological advances in wearable computing. In the market, there are currently many hardware options that allow us to visualize information with stereoscopic 3D technology. One common format is side-by-side 3D, which is supported by many 3D glasses as each eye sees an image of the same scene from a different perspective. In OpenGL, creating side-by-side 3D rendering requires asymmetric adjustment as well as viewport adjustment (that is, the area to be rendered) – asymmetric frustum parallel projection or equivalently to lens-shift in photography. This technique introduces no vertical parallax and widely adopted in the stereoscopic rendering. To illustrate this concept, the following diagram shows the geometry of the scene that a user sees from the right eye: The intraocular distance (IOD) is the distance between two eyes. As we can see from the diagram, the Frustum Shift represents the amount of skew/shift for asymmetric frustrum adjustment. Similarly, for the left eye image, we perform the transformation with a mirrored setting. The implementation of this setup is described in the next section. How to do it... The following code illustrates the steps to construct the projection and view matrices for stereoscopic 3D visualization. The code uses the intraocular distance, the distance of the image plane, and the distance of the near clipping plane to compute the appropriate frustum shifts value. In the source file, common/controls.cpp, we add the implementation for the stereo 3D matrix setup: void computeStereoViewProjectionMatrices(GLFWwindow* window, float IOD, float depthZ, bool left_eye){ int width, height; glfwGetWindowSize(window, &width, &height); //up vector glm::vec3 up = glm::vec3(0,-1,0); glm::vec3 direction_z(0, 0, -1); //mirror the parameters with the right eye float left_right_direction = -1.0f; if(left_eye) left_right_direction = 1.0f; float aspect_ratio = (float)width/(float)height; float nearZ = 1.0f; float farZ = 100.0f; double frustumshift = (IOD/2)*nearZ/depthZ; float top = tan(g_initial_fov/2)*nearZ; float right = aspect_ratio*top+frustumshift*left_right_direction; //half screen float left = -aspect_ratio*top+frustumshift*left_right_direction; float bottom = -top; g_projection_matrix = glm::frustum(left, right, bottom, top, nearZ, farZ); // update the view matrix g_view_matrix = glm::lookAt( g_position-direction_z+ glm::vec3(left_right_direction*IOD/2, 0, 0), //eye position g_position+ glm::vec3(left_right_direction*IOD/2, 0, 0), //centre position up //up direction ); In the rendering loop in main.cpp, we define the viewports for each eye (left and right) and set up the projection and view matrices accordingly. For each eye, we translate our camera position by half of the intraocular distance, as illustrated in the previous figure: if(stereo){ //draw the LEFT eye, left half of the screen glViewport(0, 0, width/2, height); //computes the MVP matrix from the IOD and virtual image plane distance computeStereoViewProjectionMatrices(g_window, IOD, depthZ, true); //gets the View and Model Matrix and apply to the rendering glm::mat4 projection_matrix = getProjectionMatrix(); glm::mat4 view_matrix = getViewMatrix(); glm::mat4 model_matrix = glm::mat4(1.0); model_matrix = glm::translate(model_matrix, glm::vec3(0.0f, 0.0f, -depthZ)); model_matrix = glm::rotate(model_matrix, glm::pi<float>() * rotateY, glm::vec3(0.0f, 1.0f, 0.0f)); model_matrix = glm::rotate(model_matrix, glm::pi<float>() * rotateX, glm::vec3(1.0f, 0.0f, 0.0f)); glm::mat4 mvp = projection_matrix * view_matrix * model_matrix; //sends our transformation to the currently bound shader, //in the "MVP" uniform variable glUniformMatrix4fv(matrix_id, 1, GL_FALSE, &mvp[0][0]); //render scene, with different drawing modes if(drawTriangles) obj_loader->draw(GL_TRIANGLES); if(drawPoints) obj_loader->draw(GL_POINTS); if(drawLines) obj_loader->draw(GL_LINES); //Draw the RIGHT eye, right half of the screen glViewport(width/2, 0, width/2, height); computeStereoViewProjectionMatrices(g_window, IOD, depthZ, false); projection_matrix = getProjectionMatrix(); view_matrix = getViewMatrix(); model_matrix = glm::mat4(1.0); model_matrix = glm::translate(model_matrix, glm::vec3(0.0f, 0.0f, -depthZ)); model_matrix = glm::rotate(model_matrix, glm::pi<float>() * rotateY, glm::vec3(0.0f, 1.0f, 0.0f)); model_matrix = glm::rotate(model_matrix, glm::pi<float>() * rotateX, glm::vec3(1.0f, 0.0f, 0.0f)); mvp = projection_matrix * view_matrix * model_matrix; glUniformMatrix4fv(matrix_id, 1, GL_FALSE, &mvp[0][0]); if(drawTriangles) obj_loader->draw(GL_TRIANGLES); if(drawPoints) obj_loader->draw(GL_POINTS); if(drawLines) obj_loader->draw(GL_LINES); } The final rendering result consists of two separate images on each side of the display, and note that each image is compressed horizontally by a scaling factor of two. For some display systems, each side of the display is required to preserve the same aspect ratio depending on the specifications of the display. Here are the final screenshots of the same models in true 3D using stereoscopic 3D rendering: Here's the rendering of the architectural model in stereoscopic 3D: How it works... The stereoscopic 3D rendering technique is based on the parallel axis and asymmetric frustum perspective projection principle. In simpler terms, we rendered a separate image for each eye as if the object was seen at a different eye position but viewed on the same plane. Parameters such as the intraocular distance and frustum shift can be dynamically adjusted to provide the desired 3D stereo effects. For example, by increasing or decreasing the frustum asymmetry parameter, the object will appear to be moved in front or behind the plane of the screen. By default, the zero parallax plane is set to the middle of the view volume. That is, the object is set up so that the center position of the object is positioned at the screen level, and some parts of the object will appear in front of or behind the screen. By increasing the frustum asymmetry (that is, positive parallax), the scene will appear to be pushed behind the screen. Likewise, by decreasing the frustum asymmetry (that is, negative parallax), the scene will appear to be pulled in front of the screen. The glm::frustum function sets up the projection matrix, and we implemented the asymmetric frustum projection concept illustrated in the drawing. Then, we use the glm::lookAt function to adjust the eye position based on the IOP value we have selected. To project the images side by side, we use the glViewport function to constrain the area within which the graphics can be rendered. The function basically performs an affine transformation (that is, scale and translation) which maps the normalized device coordinate to the window coordinate. Note that the final result is a side-by-side image in which the graphic is scaled by a factor of two vertically (or compressed horizontally). Depending on the hardware configuration, we may need to adjust the aspect ratio. The current implementation supports side-by-side 3D, which is commonly used in most wearable Augmented Reality (AR) or Virtual Reality (VR) glasses. Fundamentally, the rendering technique, namely the asymmetric frustum perspective projection described in our article, is platform-independent. For example, we have successfully tested our implementation on the Meta 1 Developer Kit (https://www.getameta.com/products) and rendered the final results on the optical see-through stereoscopic 3D display: Here is the front view of the Meta 1 Developer Kit, showing the optical see-through stereoscopic 3D display and 3D range-sensing camera: The result is shown as follows, with the stereoscopic 3D graphics rendered onto the real world (which forms the basis of augmented reality): See also In addition, we can easily extend our code to support shutter glasses-based 3D monitors by utilizing the Quad Buffered OpenGL APIs (refer to the GL_BACK_RIGHT and GL_BACK_LEFT flags in the glDrawBuffer function). Unfortunately, such 3D formats require specific hardware synchronization and often require higher frame rate display (for example, 120Hz) as well as a professional graphics card. Further information on how to implement stereoscopic 3D in your application can be found at http://www.nvidia.com/content/GTC-2010/pdfs/2010_GTC2010.pdf. Summary In this article, we covered how to visualize data with stunning stereoscopic 3D technology using OpenGL. OpenGL does not provide any mechanism to load, save, or manipulate 3D models. Thus, to support this, we have integrated a new library named Assimp into the code. Resources for Article: Further resources on this subject: Organizing a Virtual Filesystem [article] Using OpenCL [article] Introduction to Modern OpenGL [article]

0
1
16295

Understanding Text Search and Hierarchies in SAP HANA

Determining resource utilization requirements

Getting Started with ChatGPT Advanced Data Analysis- Part 2

Android Native Application API

Data Pipelines

Sentiment Analysis with Generative AI

ChatGPT for Everyday Use

Creating a JEE Application with EJB

Understanding PHP basics

Customizing Kernel and Boot Sequence

Trending Topics

Performing Sentiment Analysis with R on Obama's State of the Union speeches [Tutorial]

An Overview of Horizon View Architecture and its Components

Getting Inside a C++ Multithreaded Application

A cross-platform solution with Xamarin.Forms and MVVM architecture

Rendering Stereoscopic 3D Models using OpenGL

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access