How-To Tutorials

article-image-prompt-engineering-with-azure-prompt-flow

06 May 2024

10 min read

Prompt Engineering with Azure Prompt Flow

06 May 2024

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionThe ability to generate relevant and creative prompts is one of the imperative aspects of the natural language processing system. Especially when the world is evolving in the landscape of artificial intelligence, it is one of the crucial prospects. During this situation, Microsoft's Azure prompt flow provides groundbreaking solutions while empowering the data, scientists, and developers to engineer prompts effectively. Here, let us explore the nuances of Azure prompt flow while delving deep into the realm of prompt engineering. Significance of Prompt Engineering With the help of prompt engineering, one can construct problems, helping the user with the guide of machine learning models effectively. However, it involves Formulating contextually relevant and specific questions or statements that elicit the desired responses from the artificial intelligence models. Azure prompt flow is one of the sophisticated tools by Microsoft Azure that simplifies intricate processes while enabling the developers to create brands that can have meaningful and accurate outcomes. Getting started with Azure prompt flow Even before exploring the practical applications of Azure Prompt flow, it is necessary to understand the few essential components of Azure prompt flow. The core of prompt flow utilizes the GPT 3.5 architecture to generate various relevant responses to prompts. With the integration of Azure, one can expect a secure and seamless environment for prompt engineering. Let us consider a practical example of a chatbot application. from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential # Set up Azure Text Analytics client key = "YOUR_AZURE_TEXT_ANALYTICS_KEY" endpoint = "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" credential = AzureKeyCredential(key) text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=credential) # User input user_input = "Tell me a joke." # Generate a prompt using Azure Promptflow prompt = f"User: {user_input}\nChatbot:" # Get chatbot's response response = text_analytics_client.analyze_sentiment(prompt) # Output the response print(f"Chatbot: {response[0]['sentiment']}") In this particular example, we can see that the user inputs a request. Azure prompt flow constructs the required form for the chatbot while generating a sentiment analysis response. Here is the output:Chatbot: Positive Tuning prompts using Azure Promptflow Crafting good prompts can be a challenging task. With the concept of variants, the user would be able to test the behavior of the model under various conditions. Example: If the user wants to create a chatbot using Azure Promptflow, then this example might help one to respond creatively to the queries about movies. Prompt Tuning: User: "Tell me about your favorite movie." Chatbot: "Certainly! One of my favorite movies is 'Inception.' Directed by Christopher Nolan, it's a mind-bending sci-fi thriller that explores the depths of the human mind." Python code: from azure.ai.textanalytics import TextAnalyticsClient from azure.core.credentials import AzureKeyCredential # Set up Azure Text Analytics client key = "YOUR_AZURE_TEXT_ANALYTICS_KEY" endpoint = "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" credential = AzureKeyCredential(key) text_analytics_client = TextAnalyticsClient(endpoint=endpoint, credential=credential) # User input user_input = "Tell me about your favorite movie." # Generate a creative prompt using Azure Promptflow prompt = f"User: {user_input}\nChatbot: Certainly! One of my favorite movies is 'Inception.' Directed by Christopher Nolan, it's a mind-bending sci-fi thriller that explores the depths of the human mind." # Get chatbot's response response = text_analytics_client.analyze_sentiment(prompt) # Output the response print(f"Chatbot: {response[0]['sentiment']}") In this example, Azure Promptflow is used to create prompts tailored to specific user queries, providing creative and contextually relevant responses. The analyze_sentiment function from the Azure Text Analytics client is used to assess the sentiment of the generated prompts. Replace "YOUR_AZURE_TEXT_ANALYTICS_KEY" and "YOUR_AZURE_TEXT_ANALYTICS_ENDPOINT" with your actual Azure Text Analytics API key and endpoint. Here are a few examples: URL: https://music.apple.com/us/app/apple-music/id1108187390 Text Content: Apple Music is a comprehensive music streaming app that boasts an extensive library of songs, albums, and playlists. Users can enjoy curated playlists, radio shows, and exclusive content from their favorite artists. Apple Music allows offline downloads and offers a family plan for multiple users. It also integrates with the user's existing music library, making it seamless to access purchased and uploaded music. OUTPUT: {"category": "App", "evidence": "Both"} URL: https://www.youtube.com/user/premierleague Text Content: Premier League Pass, in collaboration with the English Premier League, delivers live football matches, highlights, and exclusive behind-the-scenes content on YouTube. Football aficionados can stay updated with their favorite teams and players through this official channel. Subscribing to Premier League Pass on YouTube ensures fans never miss a moment from the most exciting football league in the world. OUTPUT: {"category": "Channel", "evidence": "URL"} URL: https://arxiv.org/abs/2305.06858 Text Content: This research paper explores the realm of image captioning, where advanced algorithms generate descriptive captions for images. The study delves into techniques that combine computer vision and natural language processing to achieve accurate and contextually relevant image captions. The paper discusses various models, evaluates their performance, and presents findings that contribute to the field of image captioning technology. OUTPUT: {"category": "Academic", "evidence": "Text content"} URL: https://exampleconstructionsite.com/ Text Content: This website is currently under construction. Please check back later for updates and exciting content. OUTPUT: {"category": "None", "evidence": "None"} For a given URL: {{url}}, and text content: {{text_content}}. Classified Category: Travel Evidence: The text contains information about popular tourist destinations, travel itineraries, and hotel recommendations. OUTPUT: After summarizing, here is the final Promptflow with 2 variants for the summarize_text_content node. Benefits of using Azure ML prompt flow Apart from offering a wider range of benefits, Azure ML promptflow helps users to make the transition from ideation to experimentation. This ultimately results in production ready LLM based applications. Prompt engineering agility Azure prompt flow offers a visual representation of the struct of the flow structure. It allows the users to understand and navigate the projects while offering a notebook-like coding experience for debugging and efficient flow development. At the same time, users can create as well as compare more than one prompt variant which helps in facilitating an iterative refinement process. Enterprise readiness The prompt flow streamlines the entire prompt engineering process and leverages robust enterprise readiness solutions. It thus offers a secure, reliable, and scalable foundation for experimentation and development. Besides, it supports team collaboration where multiple users can work together, share knowledge, and maintain version control. Application development The well-defined process of Azure prompt facilitates the seamless development of AI applications. Only by leveraging it the user can progress effectively through the consequent stages of developing, testing, tuning, and deploying flows. All these ultimately result in creating a fully-fledged AI applications. However, when the user follows this methodical and structured approach, it empowers them to develop fine-tune and test rigorously to deploy with confidence. Real-world applications of Azure Promptflow Content creation One of the applications of Azure promptflow lies in the content creation tunes. Various content creators can generate outlines and creative ideas by creating engineering tailored to specific topics. One can even generate entire paragraphs using the prompt flow engineering method. This helps streamline the content creation process while making it look more inspiring and efficient. Language Translation Developers are now leveraging Azure promptflow to build large language translation applications. With the help of constructing prompts in the source language, one can let the system translate the inputs by providing accurate outputs required in the desired language. Such a profound implication can only be possible with the help of Azure prompt flow. It has the propensity to break all the language barriers in the globalized world. Custom support chat box By integrating Azure prompt flow within the customer support chatbots, one can enhance the user experience. However, the prompt engineering techniques help ensure the queries are accurately understood. This process would result in relevant and precise responses. It significantly reduces the response time while improving customer satisfaction. Azure prompt flow simplifies prompt engineering Prompt engineering is an iterative and challenging process. With the help of Azure prompt flow, one can simplify the development, comparisons, and evaluation of problems. The process makes it easier for the user to find the best prompt for use cases. Besides, developing a chatbot that utilizes large language models, including GPT3.5, can help companies provide personalized product recommendations based on customer input. Here, Azure prompt flow allows users to evaluate, create, and even deploy from the machine learning models. It speeds up the whole process of developing and deploying artificial intelligence solutions. At the same time, it also allows the user to create connections to the large language model. Such models include GPT 3.5 and Azure open AI. Users can also use these models for different purposes, including chat computation or creating embeddings. Designing and modifying prompts Designing and modifying alarms for effective use is crucial, especially when using them for large language models. Azure prompt flow enables users to test, create, and deploy various prompt versions for recommendation purposes. To effectively utilize the large language model, especially while dealing with multiple prompts, it is imperative to modify them and design accordingly for better results. Once you can create the problems, it is time to evaluate and test them in multiple scenarios. For instance, if you are creating prompts for a product company, you must explain the process of prompts and their flow to handle the user queries. Also, one can mention the need for custom coding and deployment of end-to-end solutions with the help of Azure's prompt flow feature. Conclusion With powerful prompt engineering capabilities, Azure prompt flow enables the developers to construct contextually relevant prompts. It enhances the efficiency and accuracy of AI applications over various domains. The potential of prompt engineering makes the future of AI development promising. However, it can only be possible with the help of Azure AI leading the way. Author BioShankar Narayanan (aka Shanky) has worked on numerous different cloud and emerging technologies like Azure, AWS, Google Cloud, IoT, Industry 4.0, and DevOps to name a few. He has led the architecture design and implementation for many Enterprise customers and helped enable them to break the barrier and take the first step towards a long and successful cloud journey. He was one of the early adopters of Microsoft Azure and Snowflake Data Cloud. Shanky likes to contribute back to the community. He contributes to open source is a frequently sought-after speaker and has delivered numerous talks on Microsoft Technologies and Snowflake. He is recognized as a Data Superhero by Snowflake and SAP Community Topic leader by SAP.

0
0
51493

Packt

05 Jul 2017

16 min read

Lambda Functions

Packt

05 Jul 2017

16 min read

In this article, by Udita Gupta and Yohan Wadia, the authors of the book Mastering AWS Lambda, we are going to take things a step further by learning the anatomy of a typical Lambda Function and also how to actually write your own functions. We will cover the programming model for a Lambda function using simple functions as examples, the use of logs and exceptions and error handling. (For more resources related to this topic, see here.) The Lambda programming model Certain applications can be broken down into one or more simple nuggets of code called as functions and uploaded to AWS Lambda for execution. Lambda then takes care of provisioning the necessary resources to run your function along with other management activities such as auto-scaling of your functions, their availability, and so on. So what exactly are we supposed to do in all this? A developer basically has three tasks to perform when it comes to working with Lambda: Writing the code Packaging it for deployment Finally monitoring its execution and fine tuning In this section, we are going to explore the different components that actually make up a Lambda Function by understanding what AWS calls as a programming model or a programming pattern. As of date, AWS officially supports Node.js, Java, Python, and C# as the programming languages for writing Lambda functions, with each language following a generic programming pattern that comprises of certain concepts which we will see in the following sections. Handler The handler function is basically a function that Lambda calls first for execution. A handler function is capable of processing incoming event data that is passed to it as well as invoking other functions or methods from your code. We will be concentrating a lot of our code and development on Node.js; however, the programming model remains more or less the same for the other supported languages as well. A skeleton structure of a handler function is shown as follows: exports.myHandler = function(event, context, callback) { // Your code goes here. callback(); } Where, myHandler is the name of your handler function. By exporting it we make sure that Lambda knows which function it has to invoke first. The other parameters that are passed with the handler function are: event: Lambda uses this parameter to pass any event related data back to the handler. context: Lambda again uses this parameter to provide the handler with the function's runtime information such as the name of the function, the time it took to execute, and so on . callback: This parameter is used to return any data back to its caller. The callback parameter is the only optional parameter that gets passed when writing handlers. If not specified, AWS Lambda will call it implicitly and return the value as null. The callback parameter also supports two optional parameters in the form of error and result where error will return any of the function's error information back to the caller while result will return any result of your function's successful execution. Here are a few simple examples of invoking callbacks in your handler: callback() callback(null, 'Hello from Lambda') callback(error) The callback parameter is supported only in Node.js runtime v4.3. You will have to use the context methods in case your code supports earlier Node.js runtime (v0.10.42) Let us try out a simple handler example with a code: exports.myHandler = function(event, context, callback) { console.log("value = " + event.key); console.log("functionName = ", context.functionName); callback(null, "Yippee! Something worked!"); }; The following code snippet will print the value of an event (key) that we will pass to the function, print the function's name as part of the context object and finally print the success message Yippee! Something worked! if all goes well! Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the preceding code snippet in the inline code editor as shown: Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler as shown. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution before selecting the Next button: In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option: { "key": "My Printed Value!!" } With your code execution completed, you should get a similar execution result as shown in the following figure. The important things to note here are the values for the event, context and callback parameters. You can note the callback message being returned back to the caller as the function executed successfully. The other event and context object values are printed in the Log output section as highlighted in the following figure: In case you end up with any errors, make sure the handler function name matches the handler name that you passed during the function's configuration. Context object The context object is a really useful utility when it comes to obtaining runtime information about your function. The context object can provide information such as the executing function's name, the time remaining before Lambda terminates your function's execution, the log name and stream associated with your function and much more. The context object also comes with its own methods that you can call to correctly terminate your function's executions such as context.succed(), context.fail(), context.done(), and so on. However, post April 2016, Lambda has transitioned the Node.js runtime from v0.10.42 to v4.3 which does support these methods however encourages to use the callback() for performing the same actions. Here are some of the commonly used context object methods and properties described as follows: getRemainingTimeInMillis(): This property returns the number of milliseconds left for execution before Lambda terminates your function. This comes in really handy when you want to perform some corrective actions before your function exits or gets timed out. callbackWaitsForEmptyEventLoop: This property is used to override the default behaviour of a callback() function, such as to wait till the entire event loop is processed and only then return back to the caller. If set to false, this property causes the callback() function to stop any further processing in the event loop even if there are any other tasks to be performed. The default value is set to true. functionName: This property returns the name of the executing Lambda function. functionVersion: The current version of the executing Lambda function. memoryLimitInMB: The amount of resource in terms of memory set for your Lambda function. logGroupName: This property returns the name of the CloudWatch Log Group that stores function's execution logs. logStreamName: This property returns the name of the CloudWatch Log Stream that stores function's execution logs. awsRequestID: This property returns the request ID associated with that particular function's execution. If you are using Lambda functions as mobile backend processing services, you can then extract additional information about your mobile application using the context of identity and clientContext objects. These are invoked using the AWS Mobile SDK. To learn more, click here http://docs.aws.amazon.com/lambda/latest/dg/nodejs-prog-model-context.html. Let us look at a simple example to understand the context object a bit better. In this example, we are using the context object callbackWaitsForEmptyEventLoop and demonstrating its working by setting the object's value to either yes or no on invocation: Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the following code in the inline code editor: exports.myHandler = (event, context, callback) => { console.log('remaining time =', context.getRemainingTimeInMillis()); console.log('functionName =', context.functionName); console.log('AWSrequestID =', context.awsRequestId); console.log('logGroupName =', context.logGroupName); console.log('logStreamName =', context.logStreamName); switch (event.contextCallbackOption) { case "no": setTimeout(function(){ console.log("I am back from my timeout of 30 seconds!!"); },30000); // 30 seconds break break; case "yes": console.log("The callback won't wait for the setTimeout() n if the callbackWaitsForEmptyEventLoop is set to false"); setTimeout(function(){ console.log("I am back from my timeout of 30 seconds!!"); },30000); // 30 seconds break context.callbackWaitsForEmptyEventLoop = false; break; default: console.log("The Default code block"); } callback(null, 'Hello from Lambda'); }; Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler as shown. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution. The final change that we will do is change the Timeout value of our function from the default 3 seconds to 1 minute specifically for this example. Click Next to continue: In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option. You should see a similar output in the Log output window as shown: With the contextCallbackOption set to yes, the function does not wait for the 30 seconds setTimeout() function and will exit, however it prints the function's runtime information such as the remaining execution time, the function name, and so on. Now set the contextCallbackOption to no and re-run the test and verify the output. This time, you can see the setTimeout() function getting called and verify the same by comparing the remaining time left for execution with the earlier test run. Logging You can always log your code's execution and activities using simple log statements. The following statements are supported for logging with Node.js runtime: console.log() console.error() console.warn() console.info() The logs can be viewed using both the Management Console as well as the CLI. Let us quickly explore both the options. Using the Management Console We have already been using Lambda's dashboard to view the function's execution logs, however the logs are only for the current execution. To view your function's logs from the past, you need to view them using the CloudWatch Logs section: To do so, search and select CloudWatch option from the AWS Management Console. Next, select the Logs option to display the function's logs as shown in the following figure: You can use the Filter option to filter out your Lambda logs by typing in the log group name prefix as /aws/lambda. Select any of the present Log Groups and its corresponding Log Stream Name to view the complete and detailed execution logs of your function. If you do not see any Lambda logs listed out here it is mostly due to your Lambda execution role. Make sure your role has the necessary access rights to create the log group and log stream along with the capability to put log events. Using the CLI The CLI provides two ways using which you can view your function's execution logs: The first is using the Lambda function's invoke command itself. The invoke command when used with the --log-type parameter will print the latest 4 KB of log data that is written to CloudWatch Logs. To do so, first list out all available functions in your current region using the following command: # aws lambda list-functions Next, pick a Lambda function that you wish to invoke and substitute that function's name and payload with the following example snippet: # aws lambda invoke --invocation-type RequestResponse --function-name myFirstFunction --log-type Tail --payload '{"key1":"Lambda","key2":"is","key3":"awesome!"}' output.txt The second way is by using a combination of the context() object and the CloudWatch CLI. You can obtain your function's log group name and the log stream name using the context.logGroupName and the context.logStreamName. Next, substitute the data gathered from the output of these parameters in the following command: # aws logs get-log-events --log-group-name "/aws/lambda/myFirstFunction" --log-stream-name "2017/02/07/[$LATEST]1ae6ac9c77384794a3202802c683179a" If you run into the error The specified log stream does not exist in spite of providing correct values for the log group name and stream name; then make sure to add the "" escape character in the [$LATEST] as shown. Let us look at a few options that you can additionally pass with the get-log-events command: --start-time: The start of the log's time range. All times are in UTC. --end-time: The end of the log's time range. All times are in UTC. --next-token: The token for the next set of items to return. (You received this token from a previous call.) --limit: Used to set the maximum number of log events returned. By default the limit is set to either 10,000 log events. Alternatively, if you don't wish to use the context() objects in your code, you can still filter out the log group name and log stream name by using a combination of the following commands: # aws logs describe-log-groups --log-group-name-prefix "/aws/lambda/" The describe-log-groups command will list all the log groups that are prefixed with /aws/lambda. Make a note of your function's log group name from this output. Next, execute the following command to list your log group name's associated log stream names: # aws logs describe-log-streams --log-group-name "/aws/lambda/myFirstFunction" Make a note of the log stream name and substitute the same in the next and final command to view your log events for that particular log stream name: # aws logs get-log-events --log-group-name "/aws/lambda/myFirstFunction" --log-stream-name "2017/02/07/[$LATEST]1ae6ac9c77384794a3202802c683179a" Once again, make sure to add the backslash "" in the [$LATEST] to avoid the The specified log stream does not exist error. With the logging done, let's move on to the next piece of the programming model called exceptions. Exceptions and error handling Functions have the ability to notify AWS Lambda in case it failed to execute correctly. This is primarily done by the function passing the error object to Lambda which converts the same to a string and returns it to the user as an error message. The error messages that are returned also depend on the invocation type of the function; for example, if your function performs a synchronous execution (RequestResponse invocation type), then the error is returned back to the user and displayed on the Management Console as well as in the CloudWatch Logs. For any asynchronous executions (event invocation type), Lambda will not return anything. Instead it logs the error messages to CloudWatch Logs. Let us examine a function's error and exception handling capabilities with a simple example of a calculator function that accepts two numbers and an operand as the test events during invocation: Login to the AWS Management Console and select AWS Lambda from the dashboard. Select the Create a Lambda function option. From the Select blueprint page, select the Blank Function blueprint. Since we are not configuring any triggers for now, simple click on Next at the Configure triggers page. Provide a suitable Name and Description for your Lambda function and paste the following code in the inline code editor: exports.myHandler = (event, context, callback) => { console.log("Hello, Starting the "+ context.functionName +" Lambda Function"); console.log("The event we pass will have two numbers and an operand value"); // operand can be +, -, /, *, add, sub, mul, div console.log('Received event:', JSON.stringify(event, null, 2)); var error, result; if (isNaN(event.num1) || isNaN(event.num2)) { console.error("Invalid Numbers"); // different logging error = new Error("Invalid Numbers!"); // Exception Handling callback(error); } switch(event.operand) { case "+": case "add": result = event.num1 + event.num2; break; case "-": case "sub": result = event.num1 - event.num2; break; case "*": case "mul": result = event.num1 * event.num2; break; case "/": case "div": if(event.num2 === 0){ console.error("The divisor cannot be 0"); error = new Error("The divisor cannot be 0"); callback(error, null); } else{ result = event.num1/event.num2; } break; default: callback("Invalid Operand"); break; } console.log("The Result is: " + result); callback(null, result); }; Next, in the Lambda function handler and role section on the same page, type in the correct name of your Handler. The handler name should match with the handler name in your function to work. Remember also to select the basic-lambda-role for your function's execution. Leave the rest of the values to their defaults and click Next to continue. In the Review page, select the Create function option. With your function now created, select the Test option to pass the sample event to our function. In the Sample event, pass the following event and select the Save and test option. You should see a similar output in the Log output window as shown: { "num1": 3, "num2": 0, "operand": "div" } So what just happened there? Well first, we can print simple user friendly error messages with the help of the console.error() statement. Additionally, we can also print the stackTrace array of the error by passing the error in the callback() as shown: error = new Error("The divisor cannot be 0"); callback(error, null); You can also view the custom error message and the stackTrace JSON array both from the Lambda dashboard as well as from the CloudWatch Logs section. Next, give this code a couple of tries with some different permutations and combinations of events and check out the results. You can even write your own custom error messages and error handlers that can perform some additional task when an error is returned by the function. With this we come towards the end of a function's generic programming model and its components. Summary We deep dived into the Lambda programming model and understood each of its sub components (handlers, context objects, errors and exceptions) with easy to follow examples.

0
0
51421

article-image-how-to-manage-complex-applications-using-kubernetes-based-helm-tool-tutorial

Savia Lobo

16 Jul 2019

16 min read

How to manage complex applications using Kubernetes-based Helm tool [Tutorial]

Savia Lobo

16 Jul 2019

16 min read

0
0
51279

How-To Tutorials

Packt

04 Sep 2013

14 min read

Using Gerrit with GitHub

Packt

04 Sep 2013

14 min read

In this article by Luca Milanesio, author of the book Learning Gerrit Code review, we will learn about Gerrit Code revew. GitHub is the world's largest platform for the free hosting of Git Projects, with over 4.5 million registered developers. We will now provide a step-by-step example of how to connect Gerrit to an external GitHub server so as to share the same set of repositories. Additionally, we will provide guidance on how to use the Gerrit Code Review workflow and GitHub concurrently. By the end of this article we will have our Gerrit installation fully integrated and ready to be used for both open source public projects and private projects on GitHub. (For more resources related to this topic, see here.) GitHub workflow GitHub has become the most popular website for open source projects, thanks to the migration of some major projects to Git (for example, Eclipse) and new projects adopting it, along with the introduction of the social aspect of software projects that piggybacks on the Facebook hype. The following diagram shows the GitHub collaboration model: The key aspects of the GitHub workflow are as follows: Each developer pushes to their own repository and pulls from others Developers who want to make a change to another repository, create a fork on GitHub and work on their own clone When forked repositories are ready to be merged, pull requests are sent to the original repository maintainer The pull requests include all of the proposed changes and their associated discussion threads Whenever a pull request is accepted, the change is merged by the maintainer and pushed to their repository on GitHub GitHub controversy The preceding workflow works very effectively for most open source projects; however, when the projects gets bigger and more complex, the tools provided by GitHub are too unstructured, and a more defined review process with proper tools, additional security, and governance is needed. In May 2012 Linus Torvalds , the inventor of Git version control, openly criticized GitHub as a commit editing tool directly on the pull request discussion thread: " I consider GitHub useless for these kinds of things. It's fine for hosting, but the pull requests and the online commit editing, are just pure garbage " and additionally, " the way you can clone a (code repository), make changes on the web, and write total crap commit messages, without GitHub in any way making sure that the end result looks good. " See https://github.com/torvalds/linux/pull/17#issuecomment-5654674. Gerrit provides the additional value that Linus Torvalds claimed was missing in the GitHub workflow: Gerrit and GitHub together allows the open source development community to reuse the extended hosting reach and social integration of GitHub with the power of governance of the Gerrit review engine. GitHub authentication The list of authentication backends supported by Gerrit does not include GitHub and it cannot be used out of the box, as it does not support OpenID authentication. However, a GitHub plugin for Gerrit has been recently released in order to fill the gaps and allow a seamless integration. GitHub implements OAuth 2.0 for allowing external applications, such as Gerrit, to integrate using a three-step browser-based authentication. Using this scheme, a user can leverage their existing GitHub account without the need to provision and manage a separate one in Gerrit. Additionally, the Gerrit instance will be able to self-provision the SSH public keys needed for pushing changes for review. In order for us to use GitHub OAuth authentication with Gerrit, we need to do the following: Build the Gerrit GitHub plugin Install the GitHub OAuth filter into the Gerrit libraries (/lib under the Gerrit site directory) Reconfigure Gerrit to use the HTTP authentication type Building the GitHub plugin The Gerrit GitHub plugin can be found under the Gerrit plugins/github repository on https://gerrit-review.googlesource.com/#/admin/projects/plugins/github. It is open source under the Apache 2.0 license and can be cloned and built using the Java 6 JDK and Maven. Refer to the following example: $ git clone https://gerrit.googlesource.com/plugins/github $ cd github $ mvn install […] [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------- [INFO] Total time: 9.591s [INFO] Finished at: Wed Jun 19 18:38:44 BST 2013 [INFO] Final Memory: 12M/145M [INFO] ------------------------------------------------------- The Maven build should generate the following artifacts: github-oauth/target/github-oauth*.jar, the GitHub OAuth library for authenticating Gerrit users github-plugin/target/github-plugin*.jar, the Gerrit plugin for integrating with GitHub repositories and pull requests Installing GitHub OAuth library The GitHub OAuth JAR file needs to copied to the Gerrit /lib directory; this is required to allow Gerrit to use it for filtering all HTTP requests and enforcing the GitHub three-step authentication process: $ cp github-oauth/target/github-oauth-*.jar /opt/gerrit/lib/ Installing GitHub plugin The GitHub plugin includes the additional support for the overall configuration, the advanced GitHub repositories replication, and the integration of pull requests into the Code Review process. We now need to install the plugin before running the Gerrit init again so that we can benefit from the simplified automatic configuration steps: $ cp github-plugin/target/github-plugin-*.jar /opt/gerrit/plugins/github.jar Register Gerrit as a GitHub OAuth application Before going through the Gerrit init, we need to tell GitHub to trust Gerrit as a partner application. This is done through the generation of a ClientId/ClientSecret pair associated to the exact Gerrit URLs that will be used for initiating the 3-step OAuth authentication. We can register a new application in GitHub through the URL https://github.com/settings/applications/new, where the following three fields are requested: Application name : It is the logical name of the application authorized to access GitHub, for example, Gerrit. Main URL : The Gerrit canonical web URL used for redirecting to GitHub OAuth authentication, for example, https://myhost.mydomain:8443. Callback URL : The URL that GitHub should redirect to when the OAuth authentication is successfully completed, for example, https://myhost.mydomain:8443/oauth. GitHub will automatically generate a unique pair ClientId/ClientSecret that has to be provided to Gerrit identifying them as a trusted authentication partner. ClientId/ClientSecret are not GitHub credentials and cannot be used by an interactive user to access any GitHub data or information. They are only used for authorizing the integration between a Gerrit instance and GitHub. Running Gerrit init to configure GitHub OAuth We now need to stop Gerrit and go through the init steps again in order to reconfigure the Gerrit authentication. We need to enable HTTP authentication by choosing an HTTP header to be used to verify the user's credentials, and to go through the GitHub settings wizard to configure the OAuth authentication. $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** User Authentication *** Authentication method []: HTTP RETURN Get username from custom HTTP header [Y/n]? Y RETURN Username HTTP header []: GITHUB_USER RETURN SSO logout URL : /oauth/reset RETURN *** GitHub Integration *** GitHub URL [https://github.com]: RETURN Use GitHub for Gerrit login ? [Y/n]? Y RETURN ClientId []: 384cbe2e8d98192f9799 RETURN ClientSecret []: f82c3f9b3802666f2adcc4 RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK Using GitHub login for Gerrit Gerrit is now fully configured to register and authenticate users through GitHub OAuth. When opening the browser to access any Gerrit web pages, we are automatically redirected to the GitHub for login. If we have already visited and authenticated with GitHub previously, the browser cookie will be automatically recognized and used for the authentication, instead of presenting the GitHub login page. Alternatively, if we do not yet have a GitHub account, we create a new GitHub profile by clicking on the SignUp button. Once the authentication process is successfully completed, GitHub requests the user's authorization to grant access to their public profile information. The following screenshot shows GitHub OAuth authorization for Gerrit: The authorization status is then stored under the user's GitHub applications preferences on https://github.com/settings/applications. Finally, GitHub redirects back to Gerrit propagating the user's profile securely using a one-time code which is used to retrieve the full data profile including username, full name, e-mail, and associated SSH public keys. Replication to GitHub The next steps in the Gerrit to GitHub integration is to share the same Git repositories and then keep them up-to-date; this can easily be achieved by using the Gerrit replication plugin. The standard Gerrit replication is a master-slave, where Gerrit always plays the role of the master node and pushes to remote slaves. We will refer to this scheme as push replication because the actual control of the action is given to Gerrit through a git push operation of new commits and branches. Configure Gerrit replication plugin In order to configure push replication we need to enable the Gerrit replication plugin through Gerrit init: $ /opt/gerrit/bin/gerrit.sh stop Stopping Gerrit Code Review: OK $ cd /opt/gerrit $ java -jar gerrit.war init [...] *** Plugins *** Prompt to install core plugins [y/N]? y RETURN Install plugin reviewnotes version 2.7-rc4 [y/N]? RETURN Install plugin commit-message-length-validator version 2.7-rc4 [y/N]? RETURN Install plugin replication version 2.6-rc3 [y/N]? y RETURN Initialized /opt/gerrit $ /opt/gerrit/bin/gerrit.sh start Starting Gerrit Code Review: OK The Gerrit replication plugin relies on the replication.config file under the /opt/gerrit/etc directory to identify the list of target Git repositories to push to. The configuration syntax is a standard .ini format where each group section represents a target replica slave. See the following simplest replication.config script for replicating to GitHub: [remote "github"] url = git@github.com:myorganisation/${name}.git The preceding configuration enables all of the repositories in Gerrit to be replicated to GitHub under the myorganisa tion GitHub Team account. Authorizing Gerrit to push to GitHub Now, that Gerrit knows where to push, we need GitHub to authorize the write operations to its repositories. To do so, we need to upload the SSH public key of the underlying OS user where Gerrit is running to one of the accounts in the GitHub myorganisation team, with the permissions to push to any of the GitHub repositories. Assuming that Gerrit runs under the OS user gerrit, we can copy and paste the SSH public key values from the ~gerrit/.ssh/id_rsa.pub (or ~gerrit/.ssh/id_dsa.pub) to the Add an SSH Key section of the GitHub account under target URL to be set to: https://github.com/settings/ssh Start working with Gerrit replication Everything is now ready to start playing with Gerrit to GitHub replication. Whenever a change to a repository is made on Gerrit, it will be automatically replicated to the corresponding GitHub repository. In reality there is one additional operation that is needed on the GitHub side: the actual creation of the empty repositories using https://github.com/new associated to the ones created in Gerrit. We need to make sure that we select the organization name and repository name, consistent with the ones defined in Gerrit and in the replication.config file. Never initialize the repository from GitHub with an empty commit or readme file; otherwise the first replication attempt from Gerrit will result in a conflict and will then fail. Now GitHub and Gerrit are fully connected and whenever a repository in GitHub matches one of the repositories in Gerrit, it will be linked and synchronized with the latest set of commits pushed in Gerrit. Thanks to the Gerrit-GitHub authentication previously configured, Gerrit and GitHub share the same set of users and the commits authors will be automatically recognized and formatted by GitHub. The following screenshot shows Gerrit commits replicated to GitHub: Reviewing and merging to GitHub branches The final goal of the Code Review process is to agree and merge changes to their branches. The merging strategies need to be aligned with real-life scenarios that may arise when using Gerrit and GitHub concurrently. During the Code Review process the alignment between Gerrit and GitHub was at the change level, not influenced by the evolution of their target branches. Gerrit changes and GitHub pull requests are isolated branches managed by their review lifecycle. When a change is merged, it needs to align with the latest status of its target branch using a fast-forward, merge, rebase, or cherry-pick strategy. Using the standard Gerrit merge functionality, we can apply the configured project merge strategy to the current status of the target branch on Gerrit. The situation on GitHub may have changed as well, so even if the Gerrit merge has succeeded there is no guarantee that the actual subsequent synchronization to GitHub will do the same! The GitHub plugin mitigates this risk by implementing a two-phase submit + merge operation for merging opened changes as follows: Phase-1 : The change target branch is checked against its remote peer on GitHub and fast forwarded if needed. If two branches diverge, the submit + merge is aborted and manual merge intervention is requested. Phase-2 : The change is merged on its target branch in Gerrit and an additional ad hoc replication is triggered. If the merge succeeds then the GitHub pull request is marked as completed. At the end of Phase-2 the Gerrit and GitHub statuses will be completely aligned. The pull request author will then receive the notification that his/her commit has been merged. Using Gerrit and GitHub on http://gerrithub.io When using Gerrit and GitHub on the web with public or private repositories, all of the commits are replicated from Gerrit to GitHub, and each one of them has a complete copy of the data. If we are using a Git and collaboration server on GitHub over the Internet, why can't we do the same for its Gerrit counterpart? Can we avoid installing a standalone instance of Gerrit just for the purpose of going through a formal Code Review? One hassle-free solution is to use the GerritHub service (http://gerrithub.io), which offers a free Gerrit instance on the cloud already configured and connected with GitHub through the github-plugin and github-oauth authentication library. All of the flows that we have covered in this article are completely automated, including the replication and automatic pull request to change automation. As accounts are shared with GitHub, we do not need to register or create another account to use GerritHub; we can just visit http://gerrithub.io and start using Gerrit Code Review with our existing GitHub projects without having to teach our existing community about a new tool. GerritHub also includes an initial setup Wizard for the configuration and automation of the Gerrit projects and the option to configure the Gerrit groups using the existing GitHub. Once Gerrit is configured, the Code Review and GitHub can be used seamlessly for achieving maximum control and social reach within your developer community. Summary We have now integrated our Gerrit installation with GitHub authentication for a seamless Single-Sign-On experience. Using an existing GitHub account we started using Gerrit replication to automatically mirror all the commits to GitHub repositories, allowing our projects to have an extended reach to external users, free to fork our repositories, and to contribute changes as pull requests. Finally, we have completed our Code Review in Gerrit and managed the merge to GitHub with a two-phase change submit + merge process to ensure that the target branches on both Gerrit and GitHub have been merged and aligned accordingly. Similarly to GitHub, this Gerrit setup can be leveraged for free on the web without having to manage a separate private instance, thanks to the free set target URL to http://gerrithub.io service available on the cloud. Resources for Article : Further resources on this subject: Getting Dynamics NAV 2013 on Your Computer – For (Almost) Free [Article] Building Your First Zend Framework Application [Article] Quick start - your first Sinatra application [Article]

0
1
51232

article-image-building-your-own-basic-behavior-tree-tutorial

Natasha Mathur

11 Oct 2018

12 min read

Building your own Basic Behavior tree in Unity [Tutorial]

Natasha Mathur

11 Oct 2018

12 min read

0
5
51130

How-To Tutorials

article-image-exploring-the-strategy-behavioral-design-pattern-in-node-js

Expert Network

02 Jun 2021

10 min read

Exploring the Strategy Behavioral Design Pattern in Node.js

Expert Network

02 Jun 2021

10 min read

A design pattern is a reusable solution to a recurring problem. The term is really broad in its definition and can span multiple domains of an application. However, the term is often associated with a well-known set of object-oriented patterns that were popularized in the 90s by the book, Design Patterns: Elements of Reusable Object- Oriented Software, Pearson Education, by the almost legendary Gang of Four (GoF): Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. This article is an excerpt from the book Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino – a comprehensive guide for learning proven patterns, techniques, and tricks to take full advantage of the Node.js platform. In this article, we’ll look at the behavior of components in software design. We’ll learn how to combine objects and how to define the way they communicate so that the behavior of the resulting structure becomes extensible, modular, reusable, and adaptable. After introducing all the behavioral design patterns, we will dive deep into the details of the strategy pattern. Now, it's time to roll up your sleeves and get your hands dirty with some behavioral design patterns. Types of Behavioral Design Patterns The Strategy pattern allows us to extract the common parts of a family of closely related components into a component called the context and allows us to define strategy objects that the context can use to implement specific behaviors. The State pattern is a variation of the Strategy pattern where the strategies are used to model the behavior of a component when under different states. The Template pattern, instead, can be considered the "static" version of the Strategy pattern, where the different specific behaviors are implemented as subclasses of the template class, which models the common parts of the algorithm. The Iterator pattern provides us with a common interface to iterate over a collection. It has now become a core pattern in Node.js. JavaScript offers native support for the pattern (with the iterator and iterable protocols). Iterators can be used as an alternative to complex async iteration patterns and even to Node.js streams. The Middleware pattern allows us to define a modular chain of processing steps. This is a very distinctive pattern born from within the Node.js ecosystem. It can be used to preprocess and postprocess data and requests. The Command pattern materializes the information required to execute a routine, allowing such information to be easily transferred, stored, and processed. The Strategy Pattern The Strategy pattern enables an object, called the context, to support variations in its logic by extracting the variable parts into separate, interchangeable objects called strategies. The context implements the common logic of a family of algorithms, while a strategy implements the mutable parts, allowing the context to adapt its behavior depending on different factors, such as an input value, a system configuration, or user preferences. Strategies are usually part of a family of solutions and all of them implement the same interface expected by the context. The following figure shows the situation we just described: Figure 1: General structure of the Strategy pattern Figure 1 shows you how the context object can plug different strategies into its structure as if they were replaceable parts of a piece of machinery. Imagine a car; its tires can be considered its strategy for adapting to different road conditions. We can fit winter tires to go on snowy roads thanks to their studs, while we can decide to fit high-performance tires for traveling mainly on motorways for a long trip. On the one hand, we don't want to change the entire car for this to be possible, and on the other, we don't want a car with eight wheels so that it can go on every possible road. The Strategy pattern is particularly useful in all those situations where supporting variations in the behavior of a component requires complex conditional logic (lots of if...else or switch statements) or mixing different components of the same family. Imagine an object called Order that represents an online order on an e-commerce website. The object has a method called pay() that, as it says, finalizes the order and transfers the funds from the user to the online store. To support different payment systems, we have a couple of options: Use an ..elsestatement in the pay() method to complete the operation based on the chosen payment option Delegate the logic of the payment to a strategy object that implements the logic for the specific payment gateway selected by the user In the first solution, our Order object cannot support other payment methods unless its code is modified. Also, this can become quite complex when the number of payment options grows. Instead, using the Strategy pattern enables the Order object to support a virtually unlimited number of payment methods and keeps its scope limited to only managing the details of the user, the purchased items, and the relative price while delegating the job of completing the payment to another object. Let's now demonstrate this pattern with a simple, realistic example. Multi-format configuration objects Let's consider an object called Config that holds a set of configuration parameters used by an application, such as the database URL, the listening port of the server, and so on. The Config object should be able to provide a simple interface to access these parameters, but also a way to import and export the configuration using persistent storage, such as a file. We want to be able to support different formats to store the configuration, for example, JSON, INI, or YAML. By applying what we learned about the Strategy pattern, we can immediately identify the variable part of the Config object, which is the functionality that allows us to serialize and deserialize the configuration. This is going to be our strategy. Creating a new module Let's create a new module called config.js, and let's define the generic part of our configuration manager: import { promises as fs } from 'fs' import objectPath from 'object-path' export class Config { constructor (formatStrategy) { // (1) this.data = {} this.formatStrategy = formatStrategy } get (configPath) { // (2) return objectPath.get(this.data, configPath) } set (configPath, value) { // (2) return objectPath.set(this.data, configPath, value) } async load (filePath) { // (3) console.log(`Deserializing from ${filePath}`) this.data = this.formatStrategy.deserialize( await fs.readFile(filePath, 'utf-8') ) } async save (filePath) { // (3) console.log(`Serializing to ${filePath}`) await fs.writeFile(filePath, this.formatStrategy.serialize(this.data)) } } This is what's happening in the preceding code: In the constructor, we create an instance variable called data to hold the configuration data. Then we also store formatStrategy, which represents the component that we will use to parse and serialize the data. We provide two methods, set()and get(), to access the configuration properties using a dotted path notation (for example, property.subProperty) by leveraging a library called object-path (nodejsdp.link/object-path). The load() and save() methods are where we delegate, respectively, the deserialization and serialization of the data to our strategy. This is where the logic of the Config class is altered based on the formatStrategy passed as an input in the constructor. As we can see, this very simple and neat design allows the Config object to seamlessly support different file formats when loading and saving its data. The best part is that the logic to support those various formats is not hardcoded anywhere, so the Config class can adapt without any modification to virtually any file format, given the right strategy. Creating format Strategies To demonstrate this characteristic, let's now create a couple of format strategies in a file called strategies.js. Let's start with a strategy for parsing and serializing data using the INI file format, which is a widely used configuration format (more info about it here: nodejsdp.link/ini-format). For the task, we will use an npm package called ini (nodejsdp.link/ini): import ini from 'ini' export const iniStrategy = { deserialize: data => ini.parse(data), serialize: data => ini.stringify(data) } Nothing really complicated! Our strategy simply implements the agreed interface, so that it can be used by the Config object. Similarly, the next strategy that we are going to create allows us to support the JSON file format, widely used in JavaScript and in the web development ecosystem in general: export const jsonStrategy = { deserialize: data => JSON.parse(data), serialize: data => JSON.stringify(data, null, ' ') } Now, to show you how everything comes together, let's create a file named index.js, and let's try to load and save a sample configuration using different formats: import { Config } from './config.js' import { jsonStrategy, iniStrategy } from './strategies.js' async function main () { const iniConfig = new Config(iniStrategy) await iniConfig.load('samples/conf.ini') iniConfig.set('book.nodejs', 'design patterns') await iniConfig.save('samples/conf_mod.ini') const jsonConfig = new Config(jsonStrategy) await jsonConfig.load('samples/conf.json') jsonConfig.set('book.nodejs', 'design patterns') await jsonConfig.save('samples/conf_mod.json') } main() Our test module reveals the core properties of the Strategy pattern. We defined only one Config class, which implements the common parts of our configuration manager, then, by using different strategies for serializing and deserializing data, we created different Config class instances supporting different file formats. The example we've just seen shows us only one of the possible alternatives that we had for selecting a strategy. Other valid approaches might have been the following: Creating two different strategy families: One for the deserialization and the other for the serialization. This would have allowed reading from a format and saving to another. Dynamically selecting the strategy: Depending on the extension of the file provided; the Config object could have maintained a map extension → strategy and used it to select the right algorithm for the given extension. As we can see, we have several options for selecting the strategy to use, and the right one only depends on your requirements and the tradeoff in terms of features and the simplicity you want to obtain. Furthermore, the implementation of the pattern itself can vary a lot as well. For example, in its simplest form, the context and the strategy can both be simple functions: function context(strategy) {...} Even though this may seem insignificant, it should not be underestimated in a programming language such as JavaScript, where functions are first-class citizens and used as much as fully-fledged objects. Between all these variations, though, what does not change is the idea behind the pattern; as always, the implementation can slightly change but the core concepts that drive the pattern are always the same. Summary In this article, we dive deep into the details of the strategy pattern, one of the Behavioral Design Patterns in Node.js. Learn more in the book, Node.js Design Patterns, Third Edition by Mario Casciaro and Luciano Mammino. About the Authors Mario Casciaro is a software engineer and entrepreneur. Mario worked at IBM for a number of years, first in Rome, then in Dublin Software Lab. He currently splits his time between Var7 Technologies-his own software company-and his role as lead engineer at D4H Technologies where he creates software for emergency response teams. Luciano Mammino wrote his first line of code at the age of 12 on his father's old i386. Since then he has never stopped coding. He is currently working at FabFitFun as principal software engineer where he builds microservices to serve millions of users every day.

0
0
51059

Packt

04 Jun 2015

25 min read

Installing jQuery

Packt

04 Jun 2015

25 min read

In this article by Alex Libby, author of the book Mastering jQuery, we will examine some of the options available to help develop your skills even further. (For more resources related to this topic, see here.) Local or CDN, I wonder…? Which version…? Do I support old IE…? Installing jQuery is a thankless task that has to be done countless times by any developer—it is easy to imagine that person asking some of the questions. It is easy to imagine why most people go with the option of using a Content Delivery Network (CDN) link, but there is more to installing jQuery than taking the easy route! There are more options available, where we can be really specific about what we need to use—throughout this article, we will. We'll cover a number of topics, which include: Downloading and installing jQuery Customizing jQuery downloads Building from Git Using other sources to install jQuery Adding source map support Working with Modernizr as a fallback Intrigued? Let's get started. Downloading and installing jQuery As with all projects that require the use of jQuery, we must start somewhere—no doubt you've downloaded and installed jQuery a thousand times; let's just quickly recap to bring ourselves up to speed. If we browse to http://www.jquery.com/download, we can download jQuery using one of the two methods: downloading the compressed production version or the uncompressed development version. If we don't need to support old IE (IE6, 7, and 8), then we can choose the 2.x branch. If, however, you still have some diehards who can't (or don't want to) upgrade, then the 1.x branch must be used instead. To include jQuery, we just need to add this link to our page: <script src="http://code.jquery.com/jquery-X.X.X.js"></script> Here, X.X.X marks the version number of jQuery or the Migrate plugin that is being used in the page. Conventional wisdom states that the jQuery plugin (and this includes the Migrate plugin too) should be added to the <head> tag, although there are valid arguments to add it as the last statement before the closing <body> tag; placing it here may help speed up loading times to your site. This argument is not set in stone; there may be instances where placing it in the <head> tag is necessary and this choice should be left to the developer's requirements. My personal preference is to place it in the <head> tag as it provides a clean separation of the script (and the CSS) code from the main markup in the body of the page, particularly on lighter sites. I have even seen some developers argue that there is little perceived difference if jQuery is added at the top, rather than at the bottom; some systems, such as WordPress, include jQuery in the <head> section too, so either will work. The key here though is if you are perceiving slowness, then move your scripts to just before the <body> tag, which is considered a better practice. Using jQuery in a development capacity A useful point to note at this stage is that best practice recommends that CDN links should not be used within a development capacity; instead, the uncompressed files should be downloaded and referenced locally. Once the site is complete and is ready to be uploaded, then CDN links can be used. Adding the jQuery Migrate plugin If you've used any version of jQuery prior to 1.9, then it is worth adding the jQuery Migrate plugin to your pages. The jQuery Core team made some significant changes to jQuery from this version; the Migrate plugin will temporarily restore the functionality until such time that the old code can be updated or replaced. The plugin adds three properties and a method to the jQuery object, which we can use to control its behavior: Property or Method Comments jQuery.migrateWarnings This is an array of string warning messages that have been generated by the code on the page, in the order in which they were generated. Messages appear in the array only once even if the condition has occurred multiple times, unless jQuery.migrateReset() is called. jQuery.migrateMute Set this property to true in order to prevent console warnings from being generated in the debugging version. If this property is set, the jQuery.migrateWarnings array is still maintained, which allows programmatic inspection without console output. jQuery.migrateTrace Set this property to false if you want warnings but don't want traces to appear on the console. jQuery.migrateReset() This method clears the jQuery.migrateWarnings array and "forgets" the list of messages that have been seen already. Adding the plugin is equally simple—all you need to do is add a link similar to this, where X represents the version number of the plugin that is used: <script src="http://code.jquery.com/jquery-migrate- X.X.X.js"></script> If you want to learn more about the plugin and obtain the source code, then it is available for download from https://github.com/jquery/jquery-migrate. Using a CDN We can equally use a CDN link to provide our jQuery library—the principal link is provided by MaxCDN for the jQuery team, with the current version available at http://code.jquery.com. We can, of course, use CDN links from some alternative sources, if preferred—a reminder of these is as follows: Google (https://developers.google.com/speed/libraries/devguide#jquery) Microsoft (http://www.asp.net/ajaxlibrary/cdn.ashx#jQuery_Releases_on_the_CDN_0) CDNJS (http://cdnjs.com/libraries/jquery/) jsDelivr (http://www.jsdelivr.com/#%!jquery) Don't forget though that if you need, we can always save a copy of the file provided on CDN locally and reference this instead. The jQuery CDN will always have the latest version, although it may take a couple of days for updates to appear via the other links. Using other sources to install jQuery Right. Okay, let's move on and develop some code! "What's next?" I hear you ask. Aha! If you thought downloading and installing jQuery from the main site was the only way to do this, then you are wrong! After all, this is about mastering jQuery, so you didn't think I will only talk about something that I am sure you are already familiar with, right? Yes, there are more options available to us to install jQuery than simply using the CDN or main download page. Let's begin by taking a look at using Node. Each demo is based on Windows, as this is the author's preferred platform; alternatives are given, where possible, for other platforms. Using Node JS to install jQuery So far, we've seen how to download and reference jQuery, which is to use the download from the main jQuery site or via a CDN. The downside of this method is the manual work required to keep our versions of jQuery up to date! Instead, we can use a package manager to help manage our assets. Node.js is one such system. Let's take a look at the steps that need to be performed in order to get jQuery installed: We first need to install Node.js—head over to http://www.nodejs.org in order to download the package for your chosen platform; accept all the defaults when working through the wizard (for Mac and PC). Next, fire up a Node Command Prompt and then change to your project folder. In the prompt, enter this command: npm install jquery Node will fetch and install jQuery—it displays a confirmation message when the installation is complete: You can then reference jQuery by using this link: <name of drive>:websitenode_modulesjquerydistjquery.min.js. Node is now installed and ready for use—although we've installed it in a folder locally, in reality, we will most likely install it within a subfolder of our local web server. For example, if we're running WampServer, we can install it, then copy it into the /wamp/www/js folder, and reference it using http://localhost/js/jquery.min.js. If you want to take a look at the source of the jQuery Node Package Manager (NPM) package, then check out https://www.npmjs.org/package/jquery. Using Node to install jQuery makes our work simpler, but at a cost. Node.js (and its package manager, NPM) is primarily aimed at installing and managing JavaScript components and expects packages to follow the CommonJS standard. The downside of this is that there is no scope to manage any of the other assets that are often used within websites, such as fonts, images, CSS files, or even HTML pages. "Why will this be an issue?," I hear you ask. Simple, why make life hard for ourselves when we can manage all of these assets automatically and still use Node? Installing jQuery using Bower A relatively new addition to the library is the support for installation using Bower—based on Node, it's a package manager that takes care of the fetching and installing of packages from over the Internet. It is designed to be far more flexible about managing the handling of multiple types of assets (such as images, fonts, and CSS files) and does not interfere with how these components are used within a page (unlike Node). For the purpose of this demo, I will assume that you have already installed it; if not, you will need to revisit it before continuing with the following steps: Bring up the Node Command Prompt, change to the drive where you want to install jQuery, and enter this command: bower install jquery This will download and install the script, displaying the confirmation of the version installed when it has completed. The library is installed in the bower_components folder on your PC. It will look similar to this example, where I've navigated to the jquery subfolder underneath. By default, Bower will install jQuery in its bower_components folder. Within bower_components/jquery/dist/, we will find an uncompressed version, compressed release, and source map file. We can then reference jQuery in our script using this line: <script src="/bower_components/jquery/jquery.js"></script> We can take this further though. If we don't want to install the extra files that come with a Bower installation by default, we can simply enter this in a Command Prompt instead to just install the minified version 2.1 of jQuery: bower install http://code.jquery.com/jquery-2.1.0.min.js Now, we can be really clever at this point; as Bower uses Node's JSON files to control what should be installed, we can use this to be really selective and set Bower to install additional components at the same time. Let's take a look and see how this will work—in the following example, we'll use Bower to install jQuery 2.1 and 1.10 (the latter to provide support for IE6-8). In the Node Command Prompt, enter the following command: bower init This will prompt you for answers to a series of questions, at which point you can either fill out information or press Enter to accept the defaults. Look in the project folder; you should find a bower.json file within. Open it in your favorite text editor and then alter the code as shown here: {"ignore": [ "**/.*", "node_modules", "bower_components","test", "tests" ] ,"dependencies": {"jquery-legacy": "jquery#1.11.1","jquery-modern": "jquery#2.10"}} At this point, you have a bower.json file that is ready for use. Bower is built on top of Git, so in order to install jQuery using your file, you will normally need to publish it to the Bower repository. Instead, you can install an additional Bower package, which will allow you to install your custom package without the need to publish it to the Bower repository: In the Node Command Prompt window, enter the following at the prompt: npm install -g bower-installer When the installation is complete, change to your project folder and then enter this command line: bower-installer The bower-installer command will now download and install both the versions of jQuery. At this stage, you now have jQuery installed using Bower. You're free to upgrade or remove jQuery using the normal Bower process at some point in the future. If you want to learn more about how to use Bower, there are plenty of references online; https://www.openshift.com/blogs/day-1-bower-manage-your-client-side-dependencies is a good example of a tutorial that will help you get accustomed to using Bower. In addition, there is a useful article that discusses both Bower and Node, available at http://tech.pro/tutorial/1190/package-managers-an-introductory-guide-for-the-uninitiated-front-end-developer. Bower isn't the only way to install jQuery though—while we can use it to install multiple versions of jQuery, for example, we're still limited to installing the entire jQuery library. We can improve on this by referencing only the elements we need within the library. Thanks to some extensive work undertaken by the jQuery Core team, we can use the Asynchronous Module Definition (AMD) approach to reference only those modules that are needed within our website or online application. Using the AMD approach to load jQuery In most instances, when using jQuery, developers are likely to simply include a reference to the main library in their code. There is nothing wrong with it per se, but it loads a lot of extra code that is surplus to our requirements. A more efficient method, although one that takes a little effort in getting used to, is to use the AMD approach. In a nutshell, the jQuery team has made the library more modular; this allows you to use a loader such as require.js to load individual modules when needed. It's not suitable for every approach, particularly if you are a heavy user of different parts of the library. However, for those instances where you only need a limited number of modules, then this is a perfect route to take. Let's work through a simple example to see what it looks like in practice. Before we start, we need one additional item—the code uses the Fira Sans regular custom font, which is available from Font Squirrel at http://www.fontsquirrel.com/fonts/fira-sans. Let's make a start using the following steps: The Fira Sans font doesn't come with a web format by default, so we need to convert the font to use the web font format. Go ahead and upload the FiraSans-Regular.otf file to Font Squirrel's web font generator at http://www.fontsquirrel.com/tools/webfont-generator. When prompted, save the converted file to your project folder in a subfolder called fonts. We need to install jQuery and RequireJS into our project folder, so fire up a Node.js Command Prompt and change to the project folder. Next, enter these commands one by one, pressing Enter after each: bower install jquerybower install requirejs We need to extract a copy of the amd.html and amd.css files—it contains some simple markup along with a link to require.js; the amd.css file contains some basic styling that we will use in our demo. We now need to add in this code block, immediately below the link for require.js—this handles the calls to jQuery and RequireJS, where we're calling in both jQuery and Sizzle, the selector engine for jQuery: <script>require.config({paths: {"jquery": "bower_components/jquery/src","sizzle": "bower_components/jquery/src/sizzle/dist/sizzle"}});require(["js/app"]);</script> Now that jQuery has been defined, we need to call in the relevant modules. In a new file, go ahead and add the following code, saving it as app.js in a subfolder marked js within our project folder: define(["jquery/core/init", "jquery/attributes/classes"],function($) {$("div").addClass("decoration");}); We used app.js as the filename to tie in with the require(["js/app"]); reference in the code. If all went well, when previewing the results of our work in a browser. Although we've only worked with a simple example here, it's enough to demonstrate how easy it is to only call those modules we need to use in our code rather than call the entire jQuery library. True, we still have to provide a link to the library, but this is only to tell our code where to find it; our module code weighs in at 29 KB (10 KB when gzipped), against 242 KB for the uncompressed version of the full library! Now, there may be instances where simply referencing modules using this method isn't the right approach; this may apply if you need to reference lots of different modules regularly. A better alternative is to build a custom version of the jQuery library that only contains the modules that we need to use and the rest are removed during build. It's a little more involved but worth the effort—let's take a look at what is involved in the process. Customizing the downloads of jQuery from Git If we feel so inclined, we can really push the boat out and build a custom version of jQuery using the JavaScript task runner, Grunt. The process is relatively straightforward but involves a few steps; it will certainly help if you have some prior familiarity with Git! The demo assumes that you have already installed Node.js—if you haven't, then you will need to do this first before continuing with the exercise. Okay, let's make a start by performing the following steps: You first need to install Grunt if it isn't already present on your system—bring up the Node.js Command Prompt and enter this command: npm install -g grunt-cli Next, install Git—for this, browse to http://msysgit.github.io/ in order to download the package. Double-click on the setup file to launch the wizard, accepting all the defaults is sufficient for our needs. If you want more information on how to install Git, head over and take a look at https://github.com/msysgit/msysgit/wiki/InstallMSysGit for more details. Once Git is installed, change to the jquery folder from within the Command Prompt and enter this command to download and install the dependencies needed to build jQuery: npm install The final stage of the build process is to build the library into the file we all know and love; from the same Command Prompt, enter this command: grunt Browse to the jquery folder—within this will be a folder called dist, which contains our custom build of jQuery, ready for use. If there are modules within the library that we don't need, we can run a custom build. We can set the Grunt task to remove these when building the library, leaving in those that are needed for our project. For a complete list of all the modules that we can exclude, see https://github.com/jquery/jquery#modules. For example, to remove AJAX support from our build, we can run this command in place of step 5, as shown previously: grunt custom:-ajax This results in a file saving on the original raw version of 30 KB as shown in the following screenshot: The JavaScript and map files can now be incorporated into our projects in the usual way. For a detailed tutorial on the build process, this article by Dan Wellman is worth a read (https://www.packtpub.com/books/content/building-custom-version-jquery). Using a GUI as an alternative There is an online GUI available, which performs much the same tasks, without the need to install Git or Grunt. It's available at hhttp://projects.jga.me/jquery-builder/, although it is worth noting that it hasn't been updated for a while! Okay, so we have jQuery installed; let's take a look at one more useful function that will help in the event of debugging errors in our code. Support for source maps has been made available within jQuery since version 1.9. Let's take a look at how they work and see a simple example in action. Adding source map support Imagine a scenario, if you will, where you've created a killer site, which is running well, until you start getting complaints about problems with some of the jQuery-based functionality that is used on the site. Sounds familiar? Using an uncompressed version of jQuery on a production site is not an option; instead we can use source maps. Simply put, these map a compressed version of jQuery against the relevant line in the original source. Historically, source maps have given developers a lot of heartache when implementing, to the extent that the jQuery Team had to revert to disabling the automatic use of maps! For best effects, it is recommended that you use a local web server, such as WAMP (PC) or MAMP (Mac), to view this demo and that you use Chrome as your browser. Source maps are not difficult to implement; let's run through how you can implement them: Extract a copy of the sourcemap folder and save it to your project area locally. Press Ctrl + Shift + I to bring up the Developer Tools in Chrome. Click on Sources, then double-click on the sourcemap.html file—in the code window, and finally click on 17. Now, run the demo in Chrome—we will see it paused; revert back to the developer toolbar where line 17 is highlighted. The relevant calls to the jQuery library are shown on the right-hand side of the screen: If we double-click on the n.event.dispatch entry on the right, Chrome refreshes the toolbar and displays the original source line (highlighted) from the jQuery library, as shown here: It is well worth spending the time to get to know source maps—all the latest browsers support it, including IE11. Even though we've only used a simple example here, it doesn't matter as the principle is exactly the same, no matter how much code is used in the site. For a more in-depth tutorial that covers all the browsers, it is worth heading over to http://blogs.msdn.com/b/davrous/archive/2014/08/22/enhance-your-javascript-debugging-life-thanks-to-the-source-map-support-available-in-ie11-chrome-opera-amp-firefox.aspx—it is worth a read! Adding support for source maps We've just previewed the source map, source map support has already been added to the library. It is worth noting though that source maps are not included with the current versions of jQuery by default. If you need to download a more recent version or add support for the first time, then follow these steps: Source maps can be downloaded from the main site using http://code.jquery.com/jquery-X.X.X.min.map, where X represents the version number of jQuery being used. Open a copy of the minified version of the library and then add this line at the end of the file: //# sourceMappingURL=jquery.min.map Save it and then store it in the JavaScript folder of your project. Make sure you have copies of both the compressed and uncompressed versions of the library within the same folder. Let's move on and look at one more critical part of loading jQuery: if, for some unknown reason, jQuery becomes completely unavailable, then we can add a fallback position to our site that allows graceful degradation. It's a small but crucial part of any site and presents a better user experience than your site simply falling over! Working with Modernizr as a fallback A best practice when working with jQuery is to ensure that a fallback is provided for the library, should the primary version not be available. (Yes, it's irritating when it happens, but it can happen!) Typically, we might use a little JavaScript, such as the following example, in the best practice suggestions. This would work perfectly well but doesn't provide a graceful fallback. Instead, we can use Modernizr to perform the check for us and provide a graceful degradation if all fails. Modernizr is a feature detection library for HTML5/CSS3, which can be used to provide a standardized fallback mechanism in the event of a functionality not being available. You can learn more at http://www.modernizr.com. As an example, the code might look like this at the end of our website page. We first try to load jQuery using the CDN link, falling back to a local copy if that hasn't worked or an alternative if both fail: <body><script src="js/modernizr.js"></script><script type="text/javascript">Modernizr.load([{load: 'http://code.jquery.com/jquery-2.1.1.min.js',complete: function () {// Confirm if jQuery was loaded using CDN link// if not, fall back to local versionif ( !window.jQuery ) {Modernizr.load('js/jquery-latest.min.js');}}},// This script would wait until fallback is loaded, beforeloading{ load: 'jquery-example.js' }]);</script></body> In this way, we can ensure that jQuery either loads locally or from the CDN link—if all else fails, then we can at least make a graceful exit. Best practices for loading jQuery So far, we've examined several ways of loading jQuery into our pages, over and above the usual route of downloading the library locally or using a CDN link in our code. Now that we have it installed, it's a good opportunity to cover some of the best practices we should try to incorporate into our pages when loading jQuery: Always try to use a CDN to include jQuery on your production site. We can take advantage of the high availability and low latency offered by CDN services; the library may already be precached too, avoiding the need to download it again. Try to implement a fallback on your locally hosted library of the same version. If CDN links become unavailable (and they are not 100 percent infallible), then the local version will kick in automatically, until the CDN link becomes available again: <script type="text/javascript" src="//code.jquery.com/jquery-1.11.1.min.js"></script><script>window.jQuery || document.write('<scriptsrc="js/jquery-1.11.1.min.js"></script>')</script> Note that although this will work equally well as using Modernizr, it doesn't provide a graceful fallback if both the versions of jQuery should become unavailable. Although one hopes to never be in this position, at least we can use CSS to provide a graceful exit! Use protocol-relative/protocol-independent URLs; the browser will automatically determine which protocol to use. If HTTPS is not available, then it will fall back to HTTP. If you look carefully at the code in the previous point, it shows a perfect example of a protocol-independent URL, with the call to jQuery from the main jQuery Core site. If possible, keep all your JavaScript and jQuery inclusions at the bottom of your page—scripts block the rendering of the rest of the page until they have been fully rendered. Use the jQuery 2.x branch, unless you need to support IE6-8; in this case, use jQuery 1.x instead—do not load multiple jQuery versions. If you load jQuery using a CDN link, always specify the complete version number you want to load, such as jquery-1.11.1.min.js. If you are using other libraries, such as Prototype, MooTools, Zepto, and so on, that use the $ sign as well, try not to use $ to call jQuery functions and simply use jQuery instead. You can return the control of $ back to the other library with a call to the $.noConflict() function. For advanced browser feature detection, use Modernizr. It is worth noting that there may be instances where it isn't always possible to follow best practices; circumstances may dictate that we need to make allowances for requirements, where best practices can't be used. However, this should be kept to a minimum where possible; one might argue that there are flaws in our design if most of the code doesn't follow best practices! Summary If you thought that the only methods to include jQuery were via a manual download or using a CDN link, then hopefully this article has opened your eyes to some alternatives—let's take a moment to recap what we have learned. We kicked off with a customary look at how most developers are likely to include jQuery before quickly moving on to look at other sources. We started with a look at how to use Node, before turning our attention to using the Bower package manager. Next, we had a look at how we can reference individual modules within jQuery using the AMD approach. We then moved on and turned our attention to creating custom builds of the library using Git. We then covered how we can use source maps to debug our code, with a look at enabling support for them within Google's Chrome browser. To round out our journey of loading jQuery, we saw what might happen if we can't load jQuery at all and how we can get around this, by using Modernizr to allow our pages to degrade gracefully. We then finished the article with some of the best practices that we can follow when referencing jQuery. Resources for Article: Further resources on this subject: Using different jQuery event listeners for responsive interaction [Article] Building a Custom Version of jQuery [Article] Learning jQuery [Article]

0
0
51051

How-To Tutorials

article-image-create-conversational-assistant-chatbot-using-python

Savia Lobo

21 Feb 2018

5 min read

How to create a conversational assistant or chatbot using Python

Savia Lobo

21 Feb 2018

5 min read

[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Natural Language Processing with Python Cookbook written by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti. This book includes unique recipes to teach various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.[/box] Today we will learn to create a conversational assistant or chatbot using Python programming language. Conversational assistants or chatbots are not very new. One of the foremost of this kind is ELIZA, which was created in the early 1960s and is worth exploring. In order to successfully build a conversational engine, it should take care of the following things: 1. Understand the target audience 2. Understand the natural language in which communication happens. 3. Understand the intent of the user 4. Come up with responses that can answer the user and give further clues NLTK has a module, nltk.chat, which simplifies building these engines by providing a generic framework. Let's see the available engines in NLTK: Engines Modules Eliza nltk.chat.eliza Python module Iesha nltk.chat.iesha Python module Rude nltk.chat.rudep ython module Suntsu Suntsu nltk.chat.suntsu module Zen nltk.chat.zen module In order to interact with these engines we can just load these modules in our Python program and invoke the demo() function. This recipe will show us how to use built-in engines and also write our own simple conversational engine using the framework provided by the nltk.chat module. Getting ready You should have Python installed, along with the nltk library. Having an understanding of regular expressions also helps. How to do it... Open atom editor (or your favorite programming editor). Create a new file called Conversational.py. Type the following source code: Save the file. Run the program using the Python interpreter. You will see the following output: How it works... Let's try to understand what we are trying to achieve here. import nltk This instruction imports the nltk library into the current program. def builtinEngines(whichOne): This instruction defines a new function called builtinEngines that takes a string parameter, whichOne: if whichOne == 'eliza': nltk.chat.eliza.demo() elif whichOne == 'iesha': nltk.chat.iesha.demo() elif whichOne == 'rude': nltk.chat.rude.demo() elif whichOne == 'suntsu': nltk.chat.suntsu.demo() elif whichOne == 'zen': nltk.chat.zen.demo() else: print("unknown built-in chat engine {}".format(whichOne)) These if, elif, else instructions are typical branching instructions that decide which chat engine's demo() function is to be invoked depending on the argument that is present in the whichOne variable. When the user passes an unknown engine name, it displays a message to the user that it's not aware of this engine. It's a good practice to handle all known and unknown cases also; it makes our programs more robust in handling unknown situations def myEngine():. This instruction defines a new function called myEngine(); this function does not take any parameters. chatpairs = ( (r"(.*?)Stock price(.*)", ("Today stock price is 100", "I am unable to find out the stock price.")), (r"(.*?)not well(.*)", ("Oh, take care. May be you should visit a doctor", "Did you take some medicine ?")), (r"(.*?)raining(.*)", ("Its monsoon season, what more do you expect ?", "Yes, its good for farmers")), (r"How(.*?)health(.*)", ("I am always healthy.", "I am a program, super healthy!")), (r".*", ("I am good. How are you today ?", "What brings you here ?")) ) This is a single instruction where we are defining a nested tuple data structure and assigning it to chat pairs. Let's pay close attention to the data structure: We are defining a tuple of tuples Each subtuple consists of two elements: The first member is a regular expression (this is the user's question in regex format) The second member of the tuple is another set of tuples (these are the answers) def chat(): print("!"*80) print(" >> my Engine << ") print("Talk to the program using normal english") print("="*80) print("Enter 'quit' when done") chatbot = nltk.chat.util.Chat(chatpairs, nltk.chat.util.reflections) chatbot.converse() We are defining a subfunction called chat()inside the myEngine() function. This is permitted in Python. This chat() function displays some information to the user on the screen and calls the nltk built-in nltk.chat.util.Chat() class with the chatpairs variable. It passes nltk.chat.util.reflections as the second argument. Finally we call the chatbot.converse() function on the object that's created using the chat() class. chat() This instruction calls the chat() function, which shows a prompt on the screen and accepts the user's requests. It shows responses according to the regular expressions that we have built before: if name == ' main ': for engine in ['eliza', 'iesha', 'rude', 'suntsu', 'zen']: print("=== demo of {} ===".format(engine)) builtinEngines(engine) print() myEngine() These instructions will be called when the program is invoked as a standalone program (not using import). They do these two things: Invoke the built-in engines one after another (so that we can experience them) Once all the five built-in engines are excited, they call our myEngine(), where our customer engine comes into play We have learned to create a chatbot of our own using the easiest programming language ‘Python’. To know more about how to efficiently use NLTK and implement text classification, identify parts of speech, tag words, etc check out Natural Language Processing with Python Cookbook.

0
0
50958

article-image-react-native-vs-xamarin-which-is-the-better-cross-platform-mobile-development-framework

Guest Contributor

25 May 2019

10 min read

React Native VS Xamarin: Which is the better cross-platform mobile development framework?

Guest Contributor

25 May 2019

10 min read

0
0
50885

article-image-what-are-rest-verbs-and-status-codes-tutorial

Sugandha Lahoti

02 Oct 2018

12 min read

What are REST verbs and status codes [Tutorial]

Sugandha Lahoti

02 Oct 2018

12 min read

0
0
50831

How-To Tutorials

Packt

17 Feb 2016

26 min read

Understanding Docker

Packt

17 Feb 2016

26 min read

This article will cover the Docker basics that you should already have a pretty good handle on. But if you don't already have the required knowledge at this point, this article will help give you the basics. (For more resources related to this topic, see here.) In this article, we're going to review the following higher level topics with subtopics in each section: Understanding Docker Docker versus typical VMs The Dockerfile and its function Docker networking/linking Docker installers/installation Types of installers and how they operate Controlling your Docker daemon The Kitematic GUI Docker commands Useful commands for Docker, Docker images, and Docker containers Understanding Docker In this section, we will be covering the structure of Docker and the flow of what happens behind the scenes in this world. We will also take a look at Dockerfile and all the magic it can do. Lastly, in this section, we will look at the Docker networking/linking. Difference between Docker and typical VMs First, we must know what exactly Docker is and does. Docker is a container management system that helps easily manage Linux Containers (LXC) in an easier and universal fashion. This lets you create images in virtual environments on your laptop and run commands or operations against them. The actions you do to the containers that you run in these environments locally on your own machine will be the same commands or operations you run against them when they are running in your production environment. This helps in not having to do things differently when you go from a development environment like that on your local machine to a production environment on your server. Now, let's take a look at the differences between Docker containers and the typical virtual machine environments. In the following illustration, we can see the typical Docker setup on the right-hand side versus the typical VM setup on the left-hand side: This illustration gives us a lot of insight into the biggest key benefit of Docker; and that is its no need for a full operating system every time we need to bring up a new container, which cuts down on the overall size of containers. Docker relies on using the host OS's Linux kernel (since almost all the versions of Linux use the standard kernel models) for the OS it was built upon, such as Red Hat, CentOS, Ubuntu, and so on. For this reason, you can have almost any Linux OS as your host operating system (Ubuntu in the previous illustration) and be able to layer other OSes on top of the host. For example, in the earlier illustration, we could have Red Hat running for one app (the one on the left) and Debian running for the other app (the one on the right), but there would never be a need to actually install Red Hat or Debian on the host. Thus, another benefit of Docker is the size of images when they are born. They are not built with the largest piece: the kernel or the operating system. This makes them incredibly small, compact, and easy to ship. Dockerfile Next, let's take a look at the most important file pertaining to Docker: Dockerfile. Dockerfile is the core file that contains instructions to be performed when an image is built. For example, in an Ubuntu-based system, if you want to install the Apache package, you would first do an apt-get update followed by an apt-get install -y apache2. These would be the type of instructions you would find inside a typical Dockerfile. Items such as commands, calls to other scripts, setting environmental variables, adding files, and setting permissions can all be done via Dockerfile. Dockerfile is also where you specify what image is to be used as your base image for the build. Let's take a look at a very basic Dockerfile and then go over the individual pieces that make one up and what they all do: FROM ubuntu:latest MAINTAINER Scott P. Gallagher <email@somewhere.com> RUN apt-get update && apt-get install -y apache2 ADD 000-default.conf /etc/apache2/sites-available/ RUN chown root:root /etc/apache2/sites-available/000-default.conf EXPOSE 80 CMD ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"] These are the typical items you would find in a basic Dockerfile. The first line states the image we want to start off with when we build the container. In this example, we will be using Ubuntu; the item after the colon can be called if you want a specific version of it. In this case, I am just going to say use the latest version of Ubuntu; but you will also specify trusty, precise, raring, and so on. The second line is the line that is relevant to the maintainer of Dockerfile. In this case, I just have my information in there; well, at least, my name is there. This is for people to contact you if they have any questions or find any errors in your file. Typically, most people just include their name and e-mail address. The next line is a typical line you will see while pulling updates and packages in a Ubuntu environment. You might think they should be separate and wonder why they should be put on the same line separated by &&. Well, in the Dockerfile, it helps by only having to run one process to encompass the entire line. If you were to split it into separate lines, it would have to run one process, finish the process, then start the next process, and finish it. With this, it helps speed up the process by pairing the processes together. They still run one after another, but with more efficiency. The next two lines complement each other. The first adds your custom configurations to the path you specified and changes the owner and group owner to the root user. The EXPOSE line will expose the ports to anything external to the container and to the host it is running on. (This will, by default, expose the container externally beyond the host, unless the firewall is enabled and protecting it.) The last line is the command that is run when the container is launched. This particular command in a Dockerfile should only be used once. If it is used more than once, the last CMD in the Dockerfile will be launched upon the container that is running. This also helps emphasize the one process per container rule. The idea is to spread out the processes so that each process runs in its own container, thus the value of the containers will become more understandable. Essentially, something that runs in the foreground, such as the earlier command to keep the Apache running in the foreground. If we were to use CMD ["service apache2 start"], the container would start and then immediately stop. There is nothing to keep the container running. You can also have other instructions, such as ENV to specify the environmental variables that users can pass upon runtime. These are typically used and are useful while using shell scripts to perform actions such as specifying a database to be created in MySQL or setting permission databases. Docker networking/linking Another important aspect that needs to be understood is how Docker containers are networked or linked together. The way they are networked or linked together highlights another important and large benefit of Docker. When a container is created, it creates a bridge network adapter for which it is assigns an address; it is through these network adapters that the communication flows when you link containers together. Docker doesn't have the need to expose ports to link containers. Let's take a look at it with the help of the following illustration: In the preceding illustration, we can see that the typical VM has to expose ports for others to be able to communicate with each other. This can be dangerous if you don't set up your firewalls or, in this case with MySQL, your MySQL permissions correctly. This can also cause unwanted traffic to the open ports. In the case of Docker, you can link your containers together, so there is no need to expose the ports. This adds security to your setup, as there is now a secure connection between your containers. We've looked at the differences between Docker and typical VMs, as well as the Dockerfile structure and the components that make up the file. We also looked at how Docker containers are linked together for security purposes as opposed to typical VMs. Now, let's review the installers for Docker and the structure behind the installation once they are installed, manipulating them to ensure they are operating correctly. Docker installers/installation Installers are one of the first pieces you need to get up and running with Docker on both your local machine as well as your server environments. Let's first take a look at what environments you can install Docker in: Apple OS X (Mac) Windows Linux (various Linux flavors) Cloud (AWS, DigitalOcean, Microsoft Azure, and so on) Types of installers With the various types of installers listed earlier, there are different ways Docker actually operates on the operating system. Docker natively runs on Linux; so if you are using Linux, then it's pretty straightforward how Docker runs right on your system. However, if you are using Windows or Mac OS X, then it operates a little differently, since it relies on using Linux. With these operating systems, they need Linux in some sort of way, thus enters the virtual machine needed to run the Linux part that Docker operates on, which is called boot2docker. The installers for both Windows and Mac OS X are bundled with the boot2docker package alongside the virtual machine software that, by default, is the Oracle VirtualBox. Now, it is worthwhile to note that Docker recently moved away from offering boot2docker. But, I feel, it is important to understand the boot2docker terms and commands in case you run across anyone running the previous version of the Docker installer. This will help you understand what is going on and move forward to the new installer(s). Currently, they are offering up Docker Toolbox that, like the name implies, includes a lot of items that the installer will install for you. The installers for each OS contain different applications with regards to Docker such as: Docker Toolbox piece Mac OS X Windows Docker Client X X Docker Machine X X Docker Compose X Docker Kitematic X X VirtualBox X X First, let's take a look at the older style commands of boot2docker. Then, we will take a look at the new commands or application that you can use to achieve these outcomes. Controlling the Docker VM (boot2docker) Now, there are ways to run boot2docker on different VM software. But to start off, VirtualBox is the best and easiest way to operate boot2docker: $ boot2docker Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset|restart|config|status|info|ip|shellinit|delete|download|upgrade|version} [<args>] Now, after we have installed Docker on Linux, OS X, or Windows, how do we go about controlling this virtual machine in the events when we need to start it up, restart it, or even shut it down? This is where the boot2docker command-line parameters come into play. As you can see in the earlier illustration, there are a lot of options you can use for your boot2docker instance. The options you will use mostly are up, down, poweroff, restart, status, ip, upgrade, and version. Some of these commands you will use mostly to troubleshoot items when you are trying to see why the Docker commands might hang, or when you run into any other issues with your boot2docker virtual machine. You can see what each command does by executing the following command: $ boot2docker help The most useful command that I have found while troubleshooting is the boot2docker status command: $ boot2docker status Another useful boot2docker command is: $ boot2docker version This command will help see what version of boot2docker you are currently running. This is helpful in knowing when to use the boot2docker upgrade command. The last command we will look at with respect to boot2docker is the boot2docker ip command. This command is very useful when you need to know what IP address is to be used to access the machines you have been running on a particular host: $ boot2docker ip 192.168.59.103 As you can see, the earlier command gives us the IP address of the boot2docker client running on my OS X machine inside VirtualBox. By using this IP, I can now access the containers I might have been running using the IP address alongside any of the open ports I have exposed. Docker Machine – the new boot2docker So, with boot2docker on its way out, there needs to be a new way to do what boot2docker does. This being said, enter Docker Machine. With Docker Machine, you can do the same things you did with boot2docker, but now in Machine. The following table shows the commands you used in boot2docker and what they are now in Machine: Command boot2docker Docker Machine command boot2docker docker-machine help boot2docker help docker-machine help status boot2docker status docker-machine status version boot2docker version docker-machine sionus i ip boot2docker ip docker-machine ip Kitematic Now that we have covered all the basics of controlling your boot2docker VM, let's take a look at another way you can run Docker containers on your local machine. Let's take a look at Kitematic. Kitematic is a recent addition to the Docker portfolio. Up until now, everything we have done has been command line-based. With Kitematic, you can manage your Docker containers through a GUI. Kitematic can be used either on Windows or OS X, just not on Linux; besides who needs a GUI on Linux anyways! Kitematic, just like boot2docker, operates on a VM defaulting to VirtualBox. Pictures are worth a thousand words, so let's take a look at some screenshots of Kitematic: The previous screenshot depicts what you will see when you launch Kitematic for the first time. After you start running the containers, they will show up on the left-hand side column. You can manipulate and get information about them through the GUI. You can search for prebuilt images on the Docker Hub and click on the CREATE button once you have found the one you want to use or test. In the preceding screenshot, we have created and are running the hello-world-nginx image inside Kitematic. We can now use the STOP, RESTART, and EXEC commands against the container as well as view the settings of the running container. In the following screenshot, we can go to settings and view what ports are exposed from the container to the outside: In the following screenshot, you can see that you can use your login credentials to log in to the Docker Hub and view the repositories you have created and pushed there: The Docker commands We have covered the types of installers and what they can be run on. We have also seen how to control the Docker VM that gets created for you and how to use Kitematic. Let's look at some Docker commands that you should be familiar with already. We will start with some common commands and then take a peek at the commands that are used for the Docker images. We will then take a dive into the commands that are used for the containers. The first command we will be taking a look at will be one of the most useful commands not only in Docker but in any command-line utility you use—the help command. It is run simply by executing the command as follows: $ docker help The earlier command will give you a full list of all the Docker commands at your disposal and a brief description of what each command does. For further help with a particular command, you can run the following: $ docker COMMAND --help You will then receive additional information on using the command, such as the switches, arguments, and descriptions of the arguments. Similar to the boot2docker version command we ran earlier, there is also a version command for the Docker daemon: $ docker version Now, this command will give us a little bit more information than the boot2docker command output, as follows: Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 This is helpful when you want to see the version of the Docker daemon you may be running to see if you need/want to upgrade. The Docker images Next, let's take a dive into the Docker images. You will learn how to view the images you currently have that you can run, search for images on the Docker Hub, and pull them down to your environment, so you can run them. Let's first take a look at the docker images command. Upon running the command, we will get an output similar to the following output: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE ubuntu 14.10 ab57dbafeeea 11 days ago 194.5 MB ubuntu trusty 6d4946999d4f 11 days ago 188.3 MB ubuntu latest 6d4946999d4f 11 days ago 188.3 MB Your output will differ based on whether you have any images at all in your Docker environment or upon what images you do have. There are a few important pieces you need to understand from the output you see. Let's go over the columns and what is contained in each. The first column you see is the REPOSITORY column; this column contains the name of the repository as it exists in the Docker Hub. If you were to have a repository that was from someone's user account, it may show up as follows: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE scottpgallagher/mysql latest 57df9c7989a1 9 weeks ago 321.7 MB The next column, the TAG column, will show you different versions of a repository. As you can see in the preceding example with the Ubuntu repository, there are tag names for the different versions. So, if you want to specify a particular version of a repository in your Dockerfile (as we saw earlier), you are able to. This is useful, so you're not always reliant on having to use the latest version of an operating system and can use the one your application supports the best. It can also help you do backward compatibility testing for your application. The next column is labeled IMAGE ID and it is based on a unique 64 hexadecimal digit string of characters. The image ID simplifies this down to the first 12 digits for easier viewing. Imagine if you had to view all 64 bits on one line! You will learn when to use this unique image ID for later tasks. The last two columns are pretty straightforward; the first being the creation date for the repository, followed by the virtual size of the image. The size is very important as you want to keep or use images that are very small in size if you plan to be moving them around a lot. The smaller the image, the faster is the load time; and who doesn't like it faster? Searching for the Docker images Okay, so let's look at how we can search for the images that are in the Docker Hub using the Docker commands. The command we will be looking at is docker search. With the docker search command, you can search based on the different criteria you are looking for. For example, we can search for all the images with the term ubuntu in them and see what all is available. Here is what we would get back in our results; it would go as follows: $ docker search ubuntu We would get back our results: NAME DESCRIPTION STARS OFFICIAL AUTOMATED ubuntu Ubuntu is a Debian-based Linux operating s... 1835 [OK] ubuntu-upstart Upstart is an event-based replacement for ... 26 [OK] tutum/ubuntu Ubuntu image with SSH access. For the root... 25 [OK] torusware/speedus-ubuntu Always updated official Ubuntu docker imag... 25 [OK] ubuntu-debootstrap debootstrap --variant=minbase --components... 10 [OK] rastasheep/ubuntu-sshd Dockerized SSH service, built on top of of... 4 [OK] maxexcloo/ubuntu Docker base image built on Ubuntu with Sup... 2 [OK] nuagebec/ubuntu Simple always updated Ubuntu docker images... 2 [OK] nimmis/ubuntu This is a docker images different LTS vers... 1 [OK] alsanium/ubuntu Ubuntu Core image for Docker 1 [OK] Based on these results, we can now decipher some information. We can see the name of the repository, a reduced description, how many people have starred and think it is a good repository, whether it's an official repository; which means it's been approved by the Docker team, as well as if it's an automated build. An automated build is typically a Docker image that is built automatically when a Git repository it is linked to is updated. The code gets updated, the web hook is called, and a new Docker image is built in the Docker Hub. If we find an image we want to use, we can simply pull it using its repository name with the docker pull command, as follows: $ docker pull tutum/ubuntu The image will be downloaded and show up in our list when we perform the docker images command we ran earlier. We now know how to search for Docker images and pull them down to our machine. What if we want to get rid of them? That's where the docker rmi command comes into play. With the docker rmi command, you can remove unwanted images from your machine(s). So, let's take look at the images we currently have on our machine with the docker images command. We will get the following: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE ubuntu 14.10 ab57dbafeeea 11 days ago 194.5 MB ubuntu trusty 6d4946999d4f 11 days ago 188.3 MB ubuntu latest 6d4946999d4f 11 days ago 188.3 MB We can see that we have duplicate images here taking up space. We can see this by looking at the image ID and seeing the exact image ID for both ubuntu:trusty and ubuntu:latest. We now know that ubuntu:trusty is the latest Ubuntu image, so there is no need to keep them both around. Let's free up some space by removing ubuntu:trusty and just keeping ubuntu:latest. We do this by using the docker rmi command, as follows: $ docker rmi ubuntu:trusty If you issue the docker images command now, you will see that ubuntu:trusty no longer shows up in your images list and has been removed. Now, you can remove machines based on their image ID as well. But be careful while you do so; in this scenario, not only will you remove ubuntu:trusty, but you will also remove ubuntu:latest as they have the same image ID. Manipulating the Docker images We have gone over the images and know how to obtain and manipulate them in some ways. Next, we are going to take a look at what it takes to fire them up and manipulate them. This is the part where the images become containers! Let's first go over the basics of the docker run command and how to run containers. We will cover some basic docker run items in this article. So, let's just look at how to get images up, running, and turned into containers. The most basic way to run a container is as follows: $ docker run -i -t <image_name>:<tag> /bin/bash Upon closer inspection of the earlier command, we start off with the docker run command, followed by two switches: -i and -t. The -i gives us an interactive shell into the running container, the -t will allocate a pseudo-tty that, while using interactive processes, must be used together with the -i switch. You can also use switches together; for example, -it is commonly used for these two switches. This will help you test the container to see how it operates before running it as a daemon. Once you are comfortable with your container, you can test how it operates in the daemon mode: $ docker run -d <image_name>:<tag> If the container is set up correctly and has an entry point setup, you should be able to see the running container by issuing the docker ps command. You will see something similar to the following: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cc1fefcfa098 ubuntu:14.10 "/bin/bash" 3 seconds ago Up 3 seconds boring_mccarthy Based on the earlier command, we get a lot of other important information indicating that the container is running. We can see the container ID, the image name that is running, the command that is running to keep the image alive, when the container started, its current status, if any ports were exposed they would be listed here, as well as the name given to the container. Now, these names are random, unless it is specified otherwise by the --name= switch. You can also the expose the ports on your containers by using the -p switch as follows: $ docker run -d -p <host_port>:<container_port> <image>:<tag> $ docker run -d -p 8080:80 ubuntu:14.10 This will run the ubuntu 14.10 container in the demonized mode, exposing port 8080 on the Docker host to port 80 on the running container: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 55cfdcb6beb6 ubuntu:14.10 "/bin/bash" 2 seconds ago Up 2 seconds 0.0.0.0:8080->80/tcp babbage Now, there will come a time when containers don't want to behave. For this, you can see the issues you have by using the docker logs command. The command is very straightforward. You specify the container you want to see the logs off. For this command, you need to use the container ID or the name of the container from the docker ps output: $ docker logs 55cfdcb6beb6 Or: $ docker logs babbage You can also get this ID when you first initiate the docker run command: $ docker run -d ubuntu:14.10 /bin/bash da92261485db98c7463fffadb43e3f684ea9f47949f287f92408fd0f3e4f2bad Stopping containers Now, let's take a look at how we can stop these containers. For various reasons, we would want to do this. There are a few commands we could use; they are docker kill, docker stop, docker pause, and docker unpause. Let's cover them briefly as they are fairly straightforward. First, let's look at the difference between docker kill and docker stop. The docker kill command will do just that—kill the container immediately. For a graceful shutdown of the container, you would want to use the docker stop command. Mostly, when you are testing, you will be using docker kill. When you're in your production environments, you will want to use docker stop to ensure you don't corrupt any data you might have in the Docker volumes. The commands are used exactly like the docker logs command, where you can use the container ID, the random name given to the container, or the one you might specify with the --name= switch. Now, let's take a dive into how we can execute some commands, view information on our running containers, and manipulate them in a small sense. The first thing we want to take a look at, which will make things a little easier with the upcoming commands, is the docker rename command. With the docker rename command, we can change the name that has been randomly generated for the container. When we performed the docker run command, a random name was assigned to our container; most times, these names are fine. But if you are looking for an easy way to manage the containers, a name can be sometimes easier to remember. For this, you can use the docker rename command as follows: $ docker rename <current_container_name> <new_container_name> Now that we have an easily recognizable and rememberable name, let's take a peek inside our containers with the docker stats and docker top commands, taking them in order: $ docker stats <container_name> CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O web1 0.00% 1.016 MB/2.099 GB 0.05% 0 B/0 B The other command docker top provides a list of all running processes inside the container. Again, we can use the name of the container to pull the information: $ docker top <container_name> We will receive an output similar to the following one based on what processes are running inside the container: UID PID PPID C STIME TTY TIME CMD root 8057 1380 0 13:02 pts/0 00:00:00 /bin/bash We can see who is running the process (in this case, the root user), the command being run (in this case, /bin/bash), as well as the other information that might be useful. Lastly, let's cover how we can remove the containers. The same way we looked at removing images earlier with the docker rmi command, we can use the docker rm command to remove unwanted containers. This is useful if you want to reuse a name you provided to a container: $ docker rm <container_name> Summary In this article, we have gone over the basics of what Docker is and how it is compared to typical virtual machines. We looked at the Dockerfile structure and the networking and linking of containers. We went over the installers, how they operate on different operating systems, and how to control them through the command line. We briefly looked at the latest Docker addition Kitematic for those interested in a GUI version for Windows or OS X. Then, we took a small but deep dive into the basic Docker commands to get you started. Resources for Article: Further resources on this subject: Introduction to Docker[article] Docker in Production[article] Speeding Vagrant Development With Docker[article]

0
0
50741

article-image-how-to-secure-elasticcache-in-aws

Savia Lobo

11 May 2018

5 min read

How to secure ElasticCache in AWS

Savia Lobo

11 May 2018

5 min read

AWS offers services to handle the cache management process. Earlier, we were using Memcached or Redis installed on VM, which was a very complex and tough task to manage in terms of ensuring availability, patching, scalability, and security. [box type="shadow" align="" class="" width=""]This article is an excerpt taken from the book,'Cloud Security Automation'. In this book, you'll learn the basics of why cloud security is important and how automation can be the most effective way of controlling cloud security.[/box] On AWS, we have this service available as ElastiCache. This gives you the option to use any engine (Redis or Memcached) to manage your cache. It's a scalable platform that will be managed by AWS in the backend. ElastiCache provides a scalable and high-performance caching solution. It removes the complexity associated with creating and managing distributed cache clusters using Memcached or Redis. Now, let's look at how to secure ElastiCache. Secure ElastiCache in AWS For enhanced security, we deploy ElastiCache clusters inside VPC. When they are deployed inside VPC, we can use a security group and NACL to add a level of security on the communication ports at network level. Apart from this, there are multiple ways to enable security for ElastiCache. VPC-level security Using a security group at VPC—when we deploy AWS ElastiCache in VPC, it gets associated with a subnet, a security group, and the routing policy of that VPC. Here, we define a rule to communicate with the ElastiCache cluster on a specific port. ElastiCache clusters can also be accessed from on-premise applications using VPN and Direct Connect. Authentication and access control We use IAM in order to implement the authentication and access control on ElastiCache. For authentication, you can have the following identity type: Root user: It's a superuser that is created while setting up an AWS account. It has super administrator privileges for all the AWS services. However, it's not recommended to use the root user to access any of the services. IAM user: It's a user identity in your AWS account that will have a specific set of permissions for accessing the ElastiCache service. IAM role: We also can define an IAM role with a specific set of permissions and associate it with the services that want to access ElastiCache. It basically generates temporary access keys to use ElastiCache. Apart from this, we can also specify federated access to services where we have an IAM role with temporary credentials for accessing the service. To access ElastiCache, service users or services must have a specific set of permissions such as create, modify, and reboot the cluster. For this, we define an IAM policy and associate it with users or roles. Let's see an example of an IAM policy where users will have permission to perform system administration activity for ElastiCache cluster: { "Version": "2012-10-17", "Statement":[{ "Sid": "ECAllowSpecific", "Effect":"Allow", "Action":[ "elasticache:ModifyCacheCluster", "elasticache:RebootCacheCluster", "elasticache:DescribeCacheClusters", "elasticache:DescribeEvents", "elasticache:ModifyCacheParameterGroup", "elasticache:DescribeCacheParameterGroups", "elasticache:DescribeCacheParameters", "elasticache:ResetCacheParameterGroup", "elasticache:DescribeEngineDefaultParameters"], "Resource":"*" } ] } Authenticating with Redis authentication AWS ElastiCache also adds an additional layer of security with the Redis authentication command, which asks users to enter a password before they are granted permission to execute Redis commands on a password-protected Redis server. When we use Redis authentication, there are the following few constraints for the authentication token while using ElastiCache: Passwords must have at least 16 and a maximum of 128 characters Characters such as @, ", and / cannot be used in passwords Authentication can only be enabled when you are creating clusters with the in-transit encryption option enabled The password defined during cluster creation cannot be changed To make the policy harder or more complex, there are the following rules related to defining the strength of a password: A password must include at least three characters of the following character types: Uppercase characters Lowercase characters Digits Non-alphanumeric characters (!, &, #, $, ^, <, >, -) A password must not contain any word that is commonly used A password must be unique; it should not be similar to previous passwords Data encryption AWS ElastiCache and EC2 instances have mechanisms to protect against unauthorized access of your data on the server. ElastiCache for Redis also has methods of encryption for data run-in on Redis clusters. Here, too, you have data-in-transit and data-at-rest encryption methods. Data-in-transit encryption ElastiCache ensures the encryption of data when in transit from one location to another. ElastiCache in-transit encryption implements the following features: Encrypted connections: In this mode, SSL-based encryption is enabled for server and client communication Encrypted replication: Any data moving between the primary node and the replication node are encrypted Server authentication: Using data-in-transit encryption, the client checks the authenticity of a connection—whether it is connected to the right server Client authentication: After using data-in-transit encryption, the server can check the authenticity of the client using the Redis authentication feature Data-at-rest encryption ElastiCache for Redis at-rest encryption is an optional feature that increases data security by encrypting data stored on disk during sync and backup or snapshot operations. However, there are the following few constraints for data-at-rest encryption: It is supported only on replication groups running Redis version 3.2.6. It is not supported on clusters running Memcached. It is supported only for replication groups running inside VPC. Data-at-rest encryption is supported for replication groups running on any node type. During the creation of the replication group, you can define data-at-rest encryption. Data-at-rest encryption once enabled, cannot be disabled. To summarize, we learned how to secure ElastiCache and ensured security for PaaS services, such as database and analytics services. If you've enjoyed reading this article, do check out 'Cloud Security Automation' for hands-on experience of automating your cloud security and governance. How to start using AWS AWS Sydney Summit 2018 is all about IoT AWS Fargate makes Container infrastructure management a piece of cake

0
0
50680

article-image-working-forms-using-rest-api

Packt

11 Jul 2016

21 min read

Working with Forms using REST API

Packt

11 Jul 2016

21 min read

WordPress, being an ever-improving content management system, is now moving toward becoming a full-fledged application framework, which brings up the necessity for new APIs. The WordPress REST API has been created to create necessary and reliable APIs. The plugin provides an easy-to-use REST API, available via HTTP that grabs your site's data in the JSON format and further retrieves it. WordPress REST API is now at its second version and has brought a few core differences, compared to its previous one, including route registration via functions, endpoints that take a single parameter, and all built-in endpoints that use a common controller. In this article by Sufyan bin Uzayr, author of the book Learning WordPress REST API, you'll learn how to write a functional plugin to create and edit posts using the latest version of the WordPress REST API. This article will also cover the process on how to work efficiently with data to update your page dynamically based on results. This tutorial comes to serve as a basis and introduction to processing form data using the REST API and AJAX and not as a redo of the WordPress post editor or a frontend editing plugin. REST API's first task is to make your WordPress powered websites more dynamic, and for this precise reason, I have created a thorough tutorial that will take you step by step in this process. After you understand how the framework works, you will be able to implement it on your sites, thus making them more dynamic. (For more resources related to this topic, see here.) Fundamentals In this article, you will be doing something similar, but instead of using the WordPress HTTP API and PHP, you'll use jQuery's AJAX methods. All of the code for that project should go in its plugin file.Another important tip before starting is to have the required JavaScript client installed that uses the WordPress REST API. You will be using the JavaScript client to make it possible to authorize via the current user's cookies. As a note for this tip would be the fact that you can actually substitute another authorization method such as OAuth if you would find it suitable. Setup the plugin During the course of this tutorial, you'll only need one PHP and one JavaScript file. Nothing else is necessary for the creation of our plugin. We will be starting off with writing a simple PHP file that will do the following three key things for us: Enqueue the JavaScript file Localize a dynamically created JavaScript object into the DOM when you use the said file Create the HTML markup for our future form All that is required of us is to have two functions and two hooks. To get this done, we will be creating a new folder in our plugin directory with one of the PHP files inside it. This will serve as the foundation for our future plugin. We will give the file a conventional name, such as my-rest-post-editor.php. In the following you can see our starting PHP file with the necessary empty functions that we will be expanding in the next steps: <?php /* Plugin Name: My REST API Post Editor */ add_shortcode( 'My-Post-EditorR', 'my_rest_post_editor_form'); function my_rest_post_editor_form( ) { } add_action( 'wp_enqueue_scripts', 'my_rest_api_scripts' ); function my_rest_api_scripts() { } For this demonstration, notice that you're working only with the post title and post content. This means that in the form editor function, you only need the HTML for a simple form for those two fields. Creating the form with HTML markup As you can notice, we are only working with the post title and post content. This makes it necessary only to have the HTML for a simple form for those two fields in the editor form function. The necessary code excerpt is as follows: function my_rest_post_editor_form( ) { $form = ' <form id="editor"> <input type="text" name="title" id="title" value="My title"> <textarea id="content"></textarea> <input type="submit" value="Submit" id="submit"> </form> <div id="results"> </div>'; return $form; } Our aim is to show this only to those users who are logged in on the site and have the ability to edit posts. We will be wrapping the variable containing the form in some conditional checks that will allow us to fulfill the said aim. These tests will check whether the user is logged-inin the system or not, and if he's not,he will be provided with a link to the default WordPress login page. The code excerpt with the required function is as follows: function my_rest_post_editor_form( ) { $form = ' <form id="editor"> <input type="text" name="title" id="title" value="My title"> <textarea id="content"></textarea> <input type="submit" value="Submit" id="submit"> </form> <div id="results"> </div> '; if ( is_user_logged_in() ) { if ( user_can( get_current_user_id(), 'edit_posts' ) ) { return $form; } else { return __( 'You do not have permissions to do this.', 'my-rest-post-editor' ); } } else { return sprintf( '<a href="%1s" title="Login">%2s</a>', wp_login_url( get_permalink( get_ queried_object_id() ) ), __( 'You must be logged in to do this, please click here to log in.', 'my-rest-post-editor') ); } } To avoid confusions, we do not want our page to be processed automatically or somehow cause a page reload upon submitting it, which is why our form will not have either a method or an action set. This is an important thing to notice because that's how we are avoiding the unnecessary automatic processes. Enqueueing your JavaScript file Another necessary thing to do is to enqueue your JavaScript file. This step is important because this function provides a systematic and organized way of loading Javascript files and styles. Using the wp_enqueue_script function, you will tell WordPress when to load a script, where to load it, and what are its dependencies. By doing this, everyone utilizes the built-in JavaScript libraries that come bundled with WordPress rather than loading the same third-party script several times. Another big advantage of doing this is that it helps reduce the page load time and avoids potential code conflicts with other plugins. We use this method instead the wrong method of loading in the head section of our site because that's how we avoid loading two different plugins twice, in case we add one more manually. Once the enqueuing is done, we will be localizing an array of data into it, which you'll need to include in the JavaScript that needs to be generated dynamically. This will include the base URL for the REST API, as that can change with a filter, mainly for security purposes. Our next step is to make this piece as useable and user-friendly as possible, and for this, we will be creating both a failure and success message in an array so that our strings would be translation friendly. When done with this, you'll need to know the current user's ID and include that one in the code as well. The result we have accomplished so far is owed to the wp_enqueue_script()and wp_localize_script()functions. It would also be possible to add custom styles to the editor, and that would be achieved by using the wp_enqueue_style()function. While we have assessed the importance and functionality of wp_enqueue_script(), let's take a close look at the other ones as well. The wp_localize_script()function allows you to localize a registered script with data for a JavaScript variable. By this, we will be offered a properly localized translation for any used string within our script. As WordPress currently offers localization API in PHP; this comes as a necessary measure. Though the localization is the main use of the function, it can be used to make any data available to your script that you can usually only get from the server side of WordPress. The wp_enqueue_stylefunctionis the best solution for adding stylesheets within your WordPress plugins, as this will handle all of the stylesheets that need to be added to the page and will do it in one place. If you have two plugins using the same stylesheet and both of them use the same handle, then WordPress will only add the stylesheet on the page once. When adding things to wp_enqueue_style, it adds your styles to a list of stylesheets it needs to add on the page when it is loaded. If a handle already exists, it will not add a new stylesheet to the list. The function is as follows: function my_rest_api_scripts() { wp_enqueue_script( 'my-api-post-editor', plugins_url( 'my-api-post-editor.js', __FILE__ ), array( 'jquery' ), false, true ); wp_localize_script( 'my-api-post-editor', 'my_post_editor', array( 'root' => esc_url_raw( rest_url() ), 'nonce' => wp_create_nonce( 'wp_json' ), 'successMessage' => __( 'Post Creation Successful.', 'my-rest-post-editor' ), 'failureMessage' => __( 'An error has occurred.', 'my-rest-post-editor' ), 'userID' => get_current_user_id(), ) ); } That will be all the PHP you need as everything else is handled via JavaScript. Creating a new page with the editor shortcode (MY-POST-EDITOR) is what you should be doing next and then proceed to that new page. If you've followed the instructions precisely, then you should see the post editor form on that page. It will obviously not be functional just yet, not before we write some JavaScript that will add functionality to it. Issuing requests for creating posts To create posts from our form, we will need to use a POST request, which we can make by using jQuery's AJAX method. This should be a familiar and very simple process for you, yet if you're not acquitted with it,you may want to take a look through the documentation and guiding offered by the guys at jQuery themselves (http://api.jquery.com/jquery.ajax/). You will also need to create two things that may be new to you, such as the JSON array and adding the authorization header. In the following, we will be walking through each of them in details. To create the JSON object for your AJAX request, you must firstly create a JavaScript array from the input and then use the JSON.stringify()to convert it into JSON. The JSON.strinfiy() method will convert a JavaScript value to a JSON string by replacing values if a replacer function is specified or optionally including only the specified properties if a replacer array is specified. The following code excerpt is the beginning of the JavaScript file that shows how to build the JSON array: (function($){ $( '#editor' ).on( 'submit', function(e) { e.preventDefault(); var title = $( '#title' ).val(); var content = $( '#content' ).val(); var JSONObj = { "title" :title, "content_raw" :content, "status" :'publish' }; var data = JSON.stringify(JSONObj); })(jQuery); Before passing the variable data to the AJAX request, you will have first to set the URL for the request. This step is as simple as appending wp.v2/posts to the root URL for the API, which is accessible via _POST_EDITOR.root: var url = _POST_EDITOR.root; url = url + 'wp/v2/posts'; The AJAX request will look a lot like any other AJAX request you would make, with the sole exception of the authorization headers. Because of the REST API's JavaScript client, the only thing that you will be required to do is to add a header to the request containing the nonce set in the _POST_EDITOR object. Another method that could work as an alternative would be the OAuth authorization method. Nonce is an authorization method that generates a number for specific use, such as a session authentication. In this context, nonce stands for number used once or number once. OAuth authorization method OAuth authorization method provides users with secure access to server resources on behalf of a resource owner. It specifies a process for resource owners to authorize third-party access to their server resources without sharing any user credentials. It is important to state that is has been designed to work with HTTP protocols, allowing an authorization server to issue access tokens to third-party clients. The third party would then use the access token to access the protected resources hosted on the server. Using the nonce method to verify cookie authentication involves setting a request header with the name X-WP-Nonce, which will contain the said nonce value. You can then use the beforeSend function of the request to send the nonce. Following is what that looks like in the AJAX request: $.ajax({ type:"POST", url: url, dataType : 'json', data: data, beforeSend : function( xhr ) { xhr.setRequestHeader( 'X-WP-Nonce', MY_POST_EDITOR.nonce ); }, }); As you might have noticed, the only missing things are the functions that would display success and failure. These alerts can be easily created by using the messages that we localized into the script earlier. We will now output the result of the provided request as a simple JSON array so that we would see how it looks like. Following is the complete code for the JavaScript to create a post editor that can now create new posts: (function($){ $( '#editor' ).on( 'submit', function(e) { e.preventDefault(); var title = $( '#title' ).val(); var content = $( '#content' ).val(); var JSONObj = { "title" :title, "content_raw" :content, "status" :'publish' }; var data = JSON.stringify(JSONObj); var url = MY_POST_EDITOR.root; url += 'wp/v2/posts'; $.ajax({ type:"POST", url: url, dataType : 'json', data: data, beforeSend : function( xhr ) { xhr.setRequestHeader( 'X-WP-Nonce', MY_POST_EDITOR.nonce ); }, success: function(response) { alert( MY_POST_EDITOR.successMessage ); $( "#results").append( JSON.stringify( response ) ); }, failure: function( response ) { alert( MY_POST_EDITOR.failureMessage ); } }); }); })(jQuery); This is how we can create a basic editor in WP REST API. If you are a logged in and the API is still active, you should create a new post and then create an alert telling you that the post has been created. The returned JSON object would then be placed into the #results container. Insert image_B05401_04_01.png If you followed each and every step precisely, you should now have a basic editor ready. You may want to give it a try and see how it works for you. So far, we have created and set up a basic editor that allows you to create posts. In our next steps, we will go through the process of adding functionality to our plugin, which will enable us to edit existing posts. Issuing requests for editing posts In this section, we will go together through the process of adding functionality to our editor so that we could edit existing posts. This part may be a little bit more detailed, mainly because the first part of our tutorial covered the basics and setup of the editor. To edit posts, we would need to have the following two things: A list of posts by author, with all of the posts titles and post content A new form field to hold the ID of the post you're editing As you can understand, the list of posts by author and the form field would lay the foundation for the functionality of editing posts. Before adding that hidden field to your form, add the following HTMLcode: <input type="hidden" name="post-id" id="post-id" value=""> In this step, we will need to get the value of the field for creating new posts. This will be achieved by writing a few lines of code in the JavaScript function. This code will then allow us to automatically change the URL, thus making it possible to edit the post of the said ID, rather than having to create a new one every time we would go through the process. This would be easily achieved by writing down a simple code piece, like the following one: var postID = $( '#post-id').val(); if ( undefined !== postID ) { url += '/'; url += postID; } As we move on, the preceding code will be placed before the AJAX section of the editor form processor. It is important to understand that the variable URL in the AJAX function will have the ID of the post that you are editing only if the field has value as well. The case in which no such value is present for the field, it will yield in the creation of a new post, which would be identical to the process you have been taken through previously. It is important to understand that to populate the said field, including the post title and post content field, you will be required to add a second form. This will result in all posts to be retrieved by the current user, by using a GET request. Based on the selection provided in the said form, you can set the editor form to update. In the PHP, you will then add the second form, which will look similar to the following: <form id="select-post"> <select id="posts" name="posts"> </select> <input type="submit" value="Select a Post to edit" id="choose-post"> </form> REST API will now be used to populate the options within the #posts select. For us to achieve that, we will have to create a request for posts by the current user. To accomplish our goal, we will be using the available results. We will now have to form the URL for requesting posts by the current user, which will happen if you will set the current user ID as a part of the _POST_EDITOR object during the processes of the script setup. A function needs to be created to get posts by the current author and populate the select field. This is very similar to what we did when we made our posts update, yet it is way simpler. This function will not require any authentication, and given the fact that you have already been taken through the process of creating a similar function, creating this one shouldn't be any more of a hassle for you. The success function loops through the results and adds them to the postselector form as options for its one field and will generate a similar code, something like the following: function getPostsByUser( defaultID ) { url += '?filter[author]='; url += my_POST_EDITOR.userID; url += '&filter[per_page]=20'; $.ajax({ type:"GET", url: url, dataType : 'json', success: function(response) { var posts = {}; $.each(response, function(i, val) { $( "#posts" ).append(new Option( val.title, val.ID ) ); }); if ( undefined != defaultID ) { $('[name=posts]').val( defaultID ) } } }); } You can notice that the function we have created has one of the parameters set for defaultID, but this shouldn't be a matter of concern for you just now. The parameter, if defined, would be used to set the default value of the select field, yet, for now, we will ignore it. We will use the very same function, but without the default value, and will then set it to run on document ready. This is simply achieved by a small piece of code like the following: $( document ).ready( function() { getPostsByUser(); }); Having a list of posts by the current user isn't enough, and you will have to get the title and the content of the selected post and push it into the form for further editing. This is will assure the proper editing possibility and make it possible to achieve the projected result. Moving on, we will need the other GET request to run on the submission of the postselector form. This should be something of the kind: $( '#select-post' ).on( 'submit', function(e) { e.preventDefault(); var ID = $( '#posts' ).val(); var postURL = MY_POST_EDITOR.root; postURL += 'wp/v2/posts/'; postURL += ID; $.ajax({ type:"GET", url: postURL, dataType : 'json', success: function(post) { var title = post.title; var content = post.content; var postID = postID; $( '#editor #title').val( title ); $( '#editor #content').val( content ); $( '#select-post #posts').val( postID ); } }); }); In the form of <json-url>wp/v2/posts/<post-id>, we will build a new URL that will be used to scrape post data for any selected post. This will result in us making an actual request that will be used to take the returned data and then set it as the value of any of the three fields there in the editor form. Upon refreshing the page, you will be able to see all posts by the current user in a specific selector. Submitting the data by a click will yield in the following: The content and title of the post that you have selected will be visible to the editor, given that you have followed the preceding steps correctly. And the second occurrence will be in the fact that the hidden field for the post ID you have added should now be set. Even though the content and title of the post will be visible, we would still be unable to edit the actual posts as the function for the editor form was not set for this specific purpose, just yet. To achieve that, we will need to make a small modification to the function that will make it possible for the content to be editable. Besides, at the moment, we would only get our content and title displayed in raw JSON data; however, applying the method described previously will improve the success function for that request so that the title and content of the post displays in the proper container, #results. In order to achieve this, you will need a function that is going to update the said container with the appropriate data. The code piece for this function will be something like the following: function results( val ) { $( "#results").empty(); $( "#results" ).append( '<div class="post-title">' + val.title + '</div>' ); $( "#results" ).append( '<div class="post-content">' + val.content + '</div>' ); } The preceding code makes use of some very simple jQuery techniques, but that doesn't make it any worse as a proper introduction to updating page content by making use of data from the REST API. There are countless ways of getting a lot more detailed or creative with this if you dive in the markup or start adding any additional fields. That will always be an option for you if you're more of a savvy developer, but as an introductory tutorial, we're trying not to keep this tutorial extremely technical, which is why we'll stick to the provided example for now. Insert image_B05401_04_02.png As we move forward, you can use it in your modified form procession function, which will be something like the following: $( '#editor' ).on( 'submit', function(e) { e.preventDefault(); var title = $( '#title' ).val(); var content = $( '#content' ).val(); console.log( content ); var JSONObj = { "title" "content_raw" "status" }; :title, :content, :'publish' var data = JSON.stringify(JSONObj); var postID = $( '#post-id').val(); if ( undefined !== postID ) { url += '/'; url += postID; } $.ajax({ type:"POST", url: url, dataType : 'json', data: data, beforeSend : function( xhr ) { xhr.setRequestHeader( 'X-WP-Nonce', MY_POST_EDITOR.nonce ); }, success: function(response) { alert( MY_POST_EDITOR.successMessage ); getPostsByUser( response.ID ); results( response ); }, failure: function( response ) { alert( MY_POST_EDITOR.failureMessage ); } }); }); As you have noticed, a few changes have been applied, and we will go through each of them in specific: The first thing that has changed is the Post ID that's being edited is now conditionally added. This implies that we will make use of the form and it will serve to create new posts by POSTing to the endpoint. Another change with the POST ID is that it will now update posts via posts/<post-id>. The second change regards the success function. A new result() function was used to output the post title and content during the process of editing. Another thing is that we also reran the getPostsbyUser() function, yet it has been set in a way that posts will automatically offer the functionality of editing, just after you will createthem. Summary With this, we havefinishedoff this article, and if you have followed each step with precision, you should now have a simple yet functional plugin that can create and edit posts by using the WordPress REST API. This article also covered techniques on how to work with data in order to update your page dynamically based on the available results. We will now progress toward further complicated actions with REST API. Resources for Article: Further resources on this subject: Implementing a Log-in screen using Ext JS [article] Cluster Computing Using Scala [article] Understanding PHP basics [article]

0
0
50652

article-image-using-machine-learning-for-phishing-domain-detection-tutorial

Prasad Ramesh

24 Nov 2018

11 min read

Using machine learning for phishing domain detection [Tutorial]

Prasad Ramesh

24 Nov 2018

11 min read

Social engineering is one of the most dangerous threats facing every individual and modern organization. Phishing is a well-known, computer-based, social engineering technique. Attackers use disguised email addresses as a weapon to target large companies. With the huge number of phishing emails received every day, companies are not able to detect all of them. That is why new techniques and safeguards are needed to defend against phishing. This article will present the steps required to build three different machine learning-based projects to detect phishing attempts, using cutting-edge Python machine learning libraries. We will use the following Python libraries: scikit-learn Python (≥ 2.7 or ≥ 3.3) NumPy (≥ 1.8.2) NLTK Make sure that they are installed before moving forward. You can find the code files here. This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing. In this book, you will you learn how to identify loopholes in a self-learning security system and will be able to efficiently breach a machine learning system. Social engineering overview Social engineering, by definition, is the psychological manipulation of a person to get useful and sensitive information from them, which can later be used to compromise a system. In other words, criminals use social engineering to gain confidential information from people, by taking advantage of human behavior. Social Engineering Engagement Framework The Social Engineering Engagement Framework (SEEF) is a framework developed by Dominique C. Brack and Alexander Bahmram. It summarizes years of experience in information security and defending against social engineering. The stakeholders of the framework are organizations, governments, and individuals (personals). Social engineering engagement management goes through three steps: Pre-engagement process: Preparing the social engineering operation During-engagement process: The engagement occurs Post-engagement process: Delivering a report There are many social engineering techniques used by criminals: Baiting: Convincing the victim to reveal information, promising him a reward or a gift. Impersonation: Pretending to be someone else. Dumpster diving: Collecting valuable information (papers with addresses, emails, and so on) from dumpsters. Shoulder surfing: Spying on other peoples' machines from behind them, while they are typing. Phishing: This is the most often used technique; it occurs when an attacker, masquerading as a trusted entity, dupes a victim into opening an email, instant message, or text message. Steps of social engineering penetration testing Penetration testing simulates a black hat hacker attack in order to evaluate the security posture of a company for deploying the required safeguard. Penetration testing is a methodological process, and it goes through well-defined steps. There are many types of penetration testing: White box pentesting Black box pentesting Grey box pentesting To perform a social engineering penetration test, you need to follow the following steps: Building real-time phishing attack detectors using different machine learning models In the next sections, we are going to learn how to build machine learning phishing detectors. We will cover the following two methods: Phishing detection with logistic regression Phishing detection with decision trees Phishing detection with logistic regression In this section, we are going to build a phishing detector from scratch with a logistic regression algorithm. Logistic regression is a well-known statistical technique used to make binomial predictions (two classes). Like in every machine learning project, we will need data to feed our machine learning model. For our model, we are going to use the UCI Machine Learning Repository (Phishing Websites Data Set). You can check it out at https://archive.ics.uci.edu/ml/datasets/Phishing+Websites: The dataset is provided as an arff file: The following is a snapshot from the dataset: For better manipulation, we have organized the dataset into a csv file: As you probably noticed from the attributes, each line of the dataset is represented in the following format – {30 Attributes (having_IP_Address URL_Length, abnormal_URL and so on)} + {1 Attribute (Result)}: For our model, we are going to import two machine learning libraries, NumPy and scikit-learn. Let's open the Python environment and load the required libraries: >>> import numpy as np >>> from sklearn import * >>> from sklearn.linear_model import LogisticRegression >>> from sklearn.metrics import accuracy_score Next, load the data: training_data = np.genfromtxt('dataset.csv', delimiter=',', dtype=np.int32) Identify the inputs (all of the attributes, except for the last one) and the outputs (the last attribute): >>> inputs = training_data[:,:-1] >>> outputs = training_data[:, -1] We need to divide the dataset into training data and testing data: training_inputs = inputs[:2000] training_outputs = outputs[:2000] testing_inputs = inputs[2000:] testing_outputs = outputs[2000:] Create the scikit-learn logistic regression classifier: classifier = LogisticRegression() Train the classifier: classifier.fit(training_inputs, training_outputs) Make predictions: predictions = classifier.predict(testing_inputs) Let's print out the accuracy of our phishing detector model: accuracy = 100.0 * accuracy_score(testing_outputs, predictions) print ("The accuracy of your Logistic Regression on testing data is: " + str(accuracy)) The accuracy of our model is approximately 85%. This is a good accuracy, since our model detected 85 phishing URLs out of 100. But let's try to make an even better model with decision trees, using the same data. Phishing detection with decision trees To build the second model, we are going to use the same machine learning libraries, so there is no need to import them again. However, we are going to import the decision tree classifier from sklearn: >>> from sklearn import tree Create the tree.DecisionTreeClassifier() scikit-learn classifier: classifier = tree.DecisionTreeClassifier() Train the model: classifier.fit(training_inputs, training_outputs) Compute the predictions: predictions = classifier.predict(testing_inputs) Calculate the accuracy: accuracy = 100.0 * accuracy_score(testing_outputs, predictions) Then, print out the results: print ("The accuracy of your decision tree on testing data is: " + str(accuracy)) The accuracy of the second model is approximately 90.4%, which is a great result, compared to the first model. We have now learned how to build two phishing detectors, using two machine learning techniques. NLP in-depth overview NLP is the art of analyzing and understanding human languages by machines. According to many studies, more than 75% of the used data is unstructured. Unstructured data does not have a predefined data model or not organized in a predefined manner. Emails, tweets, daily messages and even our recorded speeches are forms of unstructured data. NLP is a way for machines to analyze, understand, and derive meaning from natural language. NLP is widely used in many fields and applications, such as: Real-time translation Automatic summarization Sentiment analysis Speech recognition Build chatbots Generally, there are two different components of NLP: Natural Language Understanding (NLU): This refers to mapping input into a useful representation. Natural Language Generation (NLG): This refers to transforming internal representations into useful representations. In other words, it is transforming data into written or spoken narrative. Written analysis for business intelligence dashboards is one of NLG applications. Every NLP project goes through five steps. To build an NLP project the first step is identifying and analyzing the structure of words. This step involves dividing the data into paragraphs, sentences, and words. Later we analyze the words in the sentences and relationships among them. The third step involves checking the text for meaningfulness. Then, analyzing the meaning of consecutive sentences. Finally, we finish the project by the pragmatic analysis. Open source NLP libraries There are many open source Python libraries that provide the structures required to build real-world NLP applications, such as: Apache OpenNLP GATE NLP library Stanford NLP And, of course, Natural Language Toolkit (NLTK) Let's fire up our Linux machine and try some hands-on techniques. Open the Python terminal and import nltk: >>> import nltk Download a book type, as follows: >>> nltk.download() You can also type: >> from nltk.book import * To get text from a link, it is recommended to use the urllib module to crawl a website: >>> from urllib import urlopen >>> url = "http://www.URL_HERE/file.txt" As a demonstration, we are going to load a text called Security.in.Wireless.Ad.Hoc.and.Sensor.Networks: We crawled the text file, and used len to check its length and raw[:50] to display some content. As you can see from the screenshot, the text contains a lot of symbols that are useless for our projects. To get only what we need, we use tokenization: >>> tokens = nltk.word_tokenize(raw) >>> len(tokens) > tokens[:10] To summarize what we learned in the previous section, we saw how to download a web page, tokenize the text, and normalize the words. Spam detection with NLTK Now it is time to build our spam detector using the NLTK. The principle of this type of classifier is simple; we need to detect the words used by spammers. We are going to build a spam/non-spam binary classifier using Python and the nltk library, to detect whether or not an email is spam. First, we need to import the library as usual: >>> import nltk We need to load data and feed our model with an emails dataset. To achieve that, we can use the dataset delivered by the Internet CONtent FIltering Group. You can visit the website at https://labs-repos.iit.demokritos.gr/skel/i-config/: Basically, the website provides four datasets: Ling-spam PU1 PU123A Enron-spam For our project, we are going to use the Enron-spam dataset: Let's download the dataset using the wget command: Extract the tar.gz file by using the tar -xzf enron1.tar.gz command: Shuffle the cp spam/* emails && cp ham/* emails object: To shuffle the emails, let's write a small Python script, Shuffle.py, to do the job: import os import random #initiate a list called emails_list emails_list = [] Directory = '/home/azureuser/spam_filter/enron1/emails/' Dir_list = os.listdir(Directory) for file in Dir_list: f = open(Directory + file, 'r') emails_list.append(f.read()) f.close() Just change the directory variable, and it will shuffle the files: After preparing the dataset, you should be aware that, as we learned previously, we need to tokenize the emails: >> from nltk import word_tokenize Also, we need to perform another step, called lemmatizing. Lemmatizing connects words that have different forms, like hacker/hackers and is/are. We need to import WordNetLemmatizer: >>> from nltk import WordNetLemmatizer Create a sentence for the demonstration, and print out the result of the lemmatizer: >>> [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(unicode(sentence, errors='ignore'))] Then, we need to remove stopwords, such as of, is, the, and so on: from nltk.corpus import stopwords stop = stopwords.words('english') To process the email, a function called Process must be created, to lemmatize and tokenize our dataset: def Process(data): lemmatizer = WordNetLemmatizer() return [lemmatizer.lemmatize(word.lower()) for word in word_tokenize(unicode(sentence, errors='ignore'))] The second step is feature extraction, by reading the emails' words: from collections import Counter def Features_Extraction(text, setting): if setting=='bow': # Bow means bag-of-words return {word: count for word, count in Counter(Process(text)).items() if not word in stop} else: return {word: True for word in Process(text) if not word in stop} Extract the features: features = [(Features_Extraction(email, 'bow'), label) for (email, label) in emails] Now, let's define training the model Python function: def training_Model (Features, samples): Size = int(len(Features) * samples) training , testing = Features[:Size], Features[Size:] print ('Training = ' + str(len(training)) + ' emails') print ('Testing = ' + str(len(testing)) + ' emails') As a classification algorithm, we are going to use NaiveBayesClassifier: from nltk import NaiveBayesClassifier, classify classifier = NaiveBayesClassifier.train(training) Finally, we define the evaluation Python function: def evaluate(training, tesing, classifier): print ('Training Accuracy is ' + str(classify.accuracy(classifier, train_set))) print ('Testing Accuracy i ' + str(classify.accuracy(classifier, test_set))) In this article, we learned to detect phishing attempts by building three different projects from scratch. First, we discovered how to develop a phishing detector using two different machine learning techniques—logistic regression and decision trees. The third project was a spam filter, based on NLP and Naive Bayes classification. To become a master at penetration testing using machine learning with Python, check out this book Mastering Machine Learning for Penetration Testing. Google’s Protect your Election program: Security policies to defend against state-sponsored phishing attacks, and influence campaigns How the Titan M chip will improve Android security New cybersecurity threats posed by artificial intelligence

0
0
50588

How-To Tutorials

article-image-implementing-autocompletion-in-a-react-material-ui-application-tutorial

Bhagyashree R

16 May 2019

14 min read

Implementing autocompletion in a React Material UI application [Tutorial]

Bhagyashree R

16 May 2019

14 min read

0
0
50489

How-To Tutorials

Prompt Engineering with Azure Prompt Flow

Lambda Functions

How to manage complex applications using Kubernetes-based Helm tool [Tutorial]

Using Gerrit with GitHub

Building your own Basic Behavior tree in Unity [Tutorial]

Exploring the Strategy Behavioral Design Pattern in Node.js

Installing jQuery

How to create a conversational assistant or chatbot using Python

React Native VS Xamarin: Which is the better cross-platform mobile development framework?

What are REST verbs and status codes [Tutorial]

Trending Topics

Understanding Docker

How to secure ElasticCache in AWS

Working with Forms using REST API

Using machine learning for phishing domain detection [Tutorial]

Implementing autocompletion in a React Material UI application [Tutorial]

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access