Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-make-your-own-siri-with-openai-whisper-and-bark
Louis Owen
18 Oct 2023
7 min read
Save for later

Make your own Siri with OpenAI Whisper and Bark

Louis Owen
18 Oct 2023
7 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionChatGPT has earned its reputation as a versatile and capable assistant. From helping you craft the perfect piece of writing, planning your next adventure, aiding your coding endeavors, or simply engaging in light-hearted conversations, ChatGPT can do it all. It's like having a digital Swiss Army knife at your fingertips. But have you ever wondered what it would be like if ChatGPT could communicate with you not just through text, but also through speech? Imagine the convenience of issuing voice commands and receiving spoken responses, just like your own personal Siri. Well, the good news is, that this is now possible thanks to the remarkable combination of OpenAI Whisper and Bark.Bringing the power of voice interaction to ChatGPT is a game-changer. Instead of typing out your queries and waiting for text-based responses, you can seamlessly converse with ChatGPT, making your interactions more natural and efficient. Whether you're a multitasking enthusiast, a visually impaired individual, or someone who prefers spoken communication, this development holds incredible potential.So, how is this magic achieved? The answer lies in the fusion of two crucial components: Speech-to-Text (STT) and Text-to-Speech (TTS) modules.STT, as the name suggests, is the technology responsible for converting spoken words into text. OpenAI's Whisper is a groundbreaking pre-trained model for Automatic Speech Recognition (ASR) and speech translation. The model has been trained on an astonishing 680,000 hours of labeled data, giving it an impressive ability to adapt to a variety of datasets and domains without the need for fine-tuning.Whisper comes in two flavors: English-only and multilingual models. The English-only models are trained for the specific task of speech recognition, where they accurately predict transcriptions in the same language as the spoken audio. The multilingual models, on the other hand, are trained to handle both speech recognition and speech translation. In this case, the model predicts transcriptions in a language different from the source audio, adding an extra layer of versatility. Imagine speaking in one language and having ChatGPT instantly respond in another - Whisper makes it possible.On the other side of the conversation, we have Text-to-Speech (TTS) technology. This essential component converts ChatGPT's textual responses into lifelike speech. Bark, an open-source model developed by Suno AI, is a transformer-based text-to-speech marvel. It's what makes ChatGPT's spoken responses sound as engaging and dynamic as Siri's.Just like with Whisper, Bark is a reliable choice for its remarkable ability to turn text into speech, creating a human-like conversational experience. ChatGPT now not only thinks like a human but speaks like one too, thanks to Bark.The beauty of this integration is that it doesn't require you to be a tech genius. HuggingFace, a leading platform for natural language processing, supports both the TTS and STT pipeline. In simpler terms, it streamlines the entire process, making it accessible to anyone.You don't need to be a master coder or AI specialist to make it work. All you have to do is select the model you prefer for STT (Whisper) and another for TTS (Bark). Input your commands and queries, and let HuggingFace take care of the rest. The result? An intelligent, voice-activated ChatGPT can assist you with whatever you need.Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to utilize both Whisper and Bark along with OpenAI GPT-3.5-Turbo to create your own Siri!Building the STTOpenAI Whisper is a powerful ASR/STT model that can be seamlessly integrated into your projects. It has been pre-trained on an extensive dataset, making it highly capable of recognizing and transcribing spoken language.Here's how you can use OpenAI Whisper for STT with HuggingFace pipeline. Note that the `sample_audio` here will be the user’s command to the ChatGPT.from transformers import pipeline stt = pipeline( "automatic-speech-recognition", model="openai/whisper-medium", chunk_length_s=30, device=device, ) text = stt(sample_audio, return_timestamps=True)["text"]The foundation of any AI model's prowess lies in the data it's exposed to during its training. Whisper is no exception. This ASR model has been trained on a staggering 680,000 hours of audio data and the corresponding transcripts, all carefully gathered from the vast landscape of the internet.Here's how that massive amount of data is divided:● English Dominance (65%): A substantial 65% of the training data, which equates to a whopping 438,000 hours, is dedicated to English-language audio and matched English transcripts. This abundance of English data ensures that Whisper excels in transcribing English speech accurately.● Multilingual Versatility (18%): Whisper doesn't stop at English. About 18% of its training data, roughly 126,000 hours, focuses on non-English audio paired with English transcripts. This diversity makes Whisper a versatile ASR model capable of handling different languages while still providing English transcriptions.● Global Reach (17%): The remaining 17%, which translates to 117,000 hours, is dedicated to non-English audio and the corresponding transcripts. This extensive collection represents a stunning 98 different languages. Whisper's proficiency in transcribing non-English languages is a testament to its global reach.Getting the LLM ResponseWith the user's speech command now transcribed into text, the next step is to harness the power of ChatGPT or GPT-3.5-Turbo. This is where the real magic happens. These advanced language models have achieved fame for their diverse capabilities, whether you need help with writing, travel planning, coding, or simply engaging in a friendly conversation.There are several ways to integrate ChatGPT into your system:LangChain: LangChain offers a seamless and efficient way to connect with ChatGPT. It enables you to interact with the model programmatically, making it a preferred choice for developers.OpenAI Python Client: The OpenAI Python client provides a user-friendly interface for accessing ChatGPT. It simplifies the integration process and is a go-to choice for Python developers.cURL Request: For those who prefer more direct control, cURL requests to the OpenAI endpoint allow you to interact with ChatGPT through a RESTful API. This method is versatile and can be integrated into various programming languages.No matter which method you choose, ChatGPT will take your transcribed speech command and generate a thoughtful, context-aware text-based response, ready to assist you in any way you desire. We’ll not deep dive into this in this article since there are numerous articles explaining this already.Building the TTSThe final piece of the puzzle is Bark, an open-source TTS model. Bark works its magic by converting ChatGPT's textual responses into lifelike speech, much like Siri talks to you. It adds that crucial human touch to the conversation, making your interactions with ChatGPT feel more natural and engaging.Again, we can build the TTS pipeline very easily with the help of HuggingFace pipeline. Here's how you can use Bark for TTS with HuggingFace pipeline. Note that the `text` here will be the ChatGPT response to the user’s command.from transformers import pipeline tts = pipeline("text-to-speech", model="suno/bark-small") response = tts(text) from IPython.display import Audio Audio(response["audio"], rate=response["sampling_rate"])You can see the example quality of the Bark model in this Google Colab notebook.ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned how to build your own Siri with the help of OpenAI Whisper, ChatGPT, and Bark. Hope the best for your experiment in creating your own Siri and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects.Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.
Read more
  • 0
  • 0
  • 7844

article-image-building-chat-application
Packt
27 Jun 2013
4 min read
Save for later

Building a Chat Application

Packt
27 Jun 2013
4 min read
(For more resources related to this topic, see here.) The following is a screenshot of our chat application: Creating a project To begin developing our chat application, we need to create an Opa project using the following Opa command: opa create chat This command will create an empty Opa project. Also, it will generate the required directories and files automatically with the structure as shown in the following screenshot: Let's have a brief look at what these source code files do: controller.opa: This file serves as the entry point of the chat application; we start the web server in controller.opa view.opa: This file serves as an user interface model.opa: This is the model of the chat application; it defines the message, network, and the chat room style.css: This is an external stylesheet file Makefile: This file is used to build an application As we do not need database support in the chat application, we can remove --import-package stdlib.database.mongo from the FLAG option in Makefile. Type make and make run to run the empty application. Launching the web server Let's begin with controller.opa, the entry point of our chat application where we launch the web server. We have already discussed the function Server.start in the Server module section. In our chat application, we will use a handlers group to handle users requests. Server.start(Server.http, [ {resources: @static_resource_directory("resources")}, {register: [{css:["/resources/css/style.css"]}]}, {title:"Opa Chat", page: View.page } ]) So, what exactly are the arguments that we are passing to the Server.start function? The line {resources: @static_resource_direcotry("resources")} registers a resource handler and will serve resource files in the resources directory. Next, the line {register: [{css:["/resources/css/style.css"]}]} registers an external CSS file—style.css. This permits us to use styles in the style.css application scope. Finally, the line {title:"Opa Chat", page: View.page} registers a single page handler that will dispatch all other requests to the function View.page. The server uses the default configuration Server.http and will run on port 8080. Designing user interface When the application starts, all the requests (except requests for resources) will be distributed to the function View.page, which displays the chat page on the browser. Let's take a look at the view part; we define a module named View in view.opa. import stdlib.themes.bootstrap.css module View { function page(){ user = Random.string(8) <div id=#title class="navbar navbar-inverse navbar-fixed-top"> <div class=navbar-inner> <div id=#logo /> </div> </div> <div id=#conversation class=container-fluid onready={function(_){Model.join(updatemsg)}} /> <div id=#footer class="navbar navbar-fixed-bottom"> <div class=input-append> <input type=text id=#entry class=input-xxlarge onnewline={broadcast(user)}/> <button class="btn btn-primary" onclick={broadcast(user)}>Post</button> </div> </div> } ... } The module View contains functions to display the page on the browser. In the first line, import stdlib.themes.bootstrap.css, we import Bootstrap styles. This permits us to use Bootstrap markup in our code, such as navbar, navbar-fixtop, and btn-primary. We also registered an external style.css file so we can use styles in style.css such as conversation and footer. As we can see, this code in the function page follows almost the same syntax as HTML. As discussed in earlier, we can use HTML freely in the Opa code, the HTML values having a predefined type xhtml in Opa. Summary In this article, we started by creating and a project and launching the web server. Resources for Article : Further resources on this subject: MySQL 5.1 Plugin: HTML Storage Engine—Reads and Writes [Article] Using jQuery and jQueryUI Widget Factory plugins with RequireJS [Article] Oracle Web RowSet - Part1 [Article]
Read more
  • 0
  • 0
  • 7844

article-image-setting-development-environment-android-wear-applications
Packt
16 Nov 2016
8 min read
Save for later

Setting up Development Environment for Android Wear Applications

Packt
16 Nov 2016
8 min read
"Give me six hours to chop down a tree and I will spend the first four sharpening the axe." -Abraham Lincoln In this article by Siddique Hameed, author of the book, Mastering Android Wear Application Development, they have discussed the steps, topics and process involved in setting up the development environment using Android Studio. If you have been doing Android application development using Android Studio, some of the items discussed here might already be familiar to you. However, there are some Android Wear platform specific items that may be of interest to you. (For more resources related to this topic, see here.) Android Studio Android Studio Integrated Development Environment (IDE) is based on IntelliJ IDEA platform. If you had done Java development using IntelliJ IDEA platform, you'll be feeling at home working with Android Studio IDE. Android Studio platform comes bundled with all the necessary tools and libraries needed for Android application development. If this is the first time you are setting up Android Studio on your development system, please make sure that you have satisfied all the requirements before installation. Please refer developer site (http://developer.android.com/sdk/index.html#Requirements) for checking on the items needed for the operating system of your choice. Please note that you need at least JDK version 7 installed on your machine for Android Studio to work. You can verify your JDK's version that by typing following commands shown in the following terminal window: If your system does not meet that requirement, please upgrade it using the method that is specific to your operating system. Installation Android Studio platform includes Android Studio IDE, SDK Tools, Google API Libraries and systems images needed for Android application development. Visit the http://developer.android.com/sdk/index.html URL for downloading Android Studio for your corresponding operating system and following the installation instruction. Git and GitHub Git is a distributed version control system that is used widely for open-source projects. We'll be using Git for sample code and sample projects as we go along the way. Please make sure that you have Git installed on your system by typing the command as shown in the following a terminal window. If you don't have it installed, please download and install it using this link for your corresponding operating system by visting https://git-scm.com/downloads. If you are working on Apple's Macintosh OS like El Capitan or Yosemite or Linux distributions like Ubuntu, Kubuntu or Mint, chances are you already have Git installed on it. GitHub (http://github.com) is a free and popular hosting service for Git based open-source projects. They make checking out and contributing to open-source projects easier than ever. Sign up with GitHub for a free account if you don't have an account already. We don't need to be an expert on Git for doing Android application development. But, we do need to be familiar with basic usages of Git commands for working with the project. Android Studio comes by default with Git and GitHub integration. It helps importing sample code from Google's GitHub repository and helps you learn by checking out various application code samples. Gradle Android application development uses Gradle (http://gradle.org/)as the build system. It is used to build, test, run and package the apps for running and testing Android applications. Gradle is declarative and uses convention over configuration for build settings and configurations. It manages all the library dependencies for compiling and building the code artifacts. Fortunately, Android Studio abstracts most of the common Gradle tasks and operations needed for development. However, there may be some cases where having some extra knowledge on Gradle would be very helpful. We won't be digging into Gradle now, we'll be discussing about it as and when needed during the course of our journey. Android SDK packages When you install Android Studio, it doesn't include all the Android SDK packages that are needed for development. The Android SDK separates tools, platforms and other components & libraries into packages that can be downloaded, as needed using the Android SDK Manager. Before we start creating an application, we need to add some required packages into the Android SDK. Launch SDK Manager from Android Studio Tools | Android | SDK Manager. Let's quickly go over a few items from what's displayed in the preceding screenshot. As you can see, the Android SDK's location is /opt/android-sdk on my machine, it may very well be different on your machine depending on what you selected during Android Studio installation setup. The important point to note is that the Android SDK is installed on a different location than the Android Studio's path (/Applications/Android Studio.app/). This is considered a good practice because the Android SDK installation can be unaffected depending on a new installation or upgrade of Android Studio or vice versa. On the SDK Platforms tab, select some recent Android SDK versions like Android 6.0, 5.1.1, and 5.0.1. Depending on the Android versions you are planning on supporting in your wearable apps, you can select other older Android versions. Checking on Show Package Details option on the bottom right, the SDK Manager will show all the packages that will be installed for a given Android SDK version. To be on the safer side, select all the packages. As you may have noticed already Android Wear ARM and Intel system images are included in the package selection. Now when you click on SDK Tools tab, please make sure the following items are selected: Android SDK Build Tools Android SDK Tools 24.4.1 (Latest version) Android SDK Platform-Tools Android Support Repository, rev 25 (Latest version) Android Support Library, rev 23.1.1 (Latest version) Google Play services, rev 29 (Latest version) Google Repository, rev 24 (Latest version) Intel X86 Emulator Accelerator (HAXM installer), rev 6.0.1 (Latest version) Documentation for Android SDK (Optional) Please do not change anything on SDK Update Sites. Keep the update sites as it was configured by default. Clicking on OK will take some time downloading and installing all the components and packages selected. Android Virtual Device Android Virtual Devices will enable us to test the code using Android Emulators. It lets us pick and choose various Android system target versions and form factors needed for testing. Launch Android Virtual Device Manager from Tools | Android | AVD Manager From Android Virtual Device Manager window, click on Create New Virtual Device button on the bottom left and proceed to the next screen and select Wear category Select Marshmallow API Level 23 on x86 and everything else as default, as shown in the following screenshot: Note that the current latest Android version is Marshmallow of API level 23 at the time of this writing. It may or may not be the latest version while you are reading this article. Feel free to select the latest version that is available during that time. Also, if you'd like to support or test in earlier Android versions, please feel free to do so in that screen. After the virtual device is selected successfully, you should see that listed on the Android Virtual Devices list as show in the following screenshot: Although it's not required to use real Android Wear device during development, sometimes it may be convenient and faster developing it in a real physical device. Let's build a skeleton App Since we have all the components and configurations needed for building wearable app, let's build a skeleton app and test out what we have so far. From Android Studio's Quick Start menu, click on Import an Android code sample tab: Select the Skeleton Wearable App from Wearable category as shown in following screenshot: Click Next and select your preferred project location. As you can see the skeleton project is cloned from Google's sample code repository from GitHub. Clicking on Finish button will pull the source code and Android Studio will compile and build the code and get it ready for execution. The following screenshot indicates that the Gradle build finished successfully without any errors. Click on Run configuration to run the app: When the app starts running, Android Studio will prompt us to select the deployment targets, we can select the emulator we created earlier and click OK. After the code compiles and uploaded to the emulator, the main activity of the Skeleton App will be launched as shown below: Clicking on SHOW NOTIFICATION will show the notification as below: Clicking on START TIMER will start the timer and run for five seconds and clicking on Finish Activity will close the activity take the emulator to the home screen. Summary We discussed the process involved in setting up the Android Studio development environment by covering the installation instruction, requirements and SDK tools, packages and other components needed for Android Wear development. We also checked out source code for Skeleton Wearable App from Google's sample code repository and successfully ran and tested it on Android device emulator. Resources for Article: Further resources on this subject: Building your first Android Wear Application [Article] Getting started with Android Development [Article] The Art of Android Development Using Android Studio [Article]
Read more
  • 0
  • 0
  • 7841

article-image-tips-and-tricks-backtrack-4
Packt
26 May 2011
7 min read
Save for later

Tips and Tricks on BackTrack 4

Packt
26 May 2011
7 min read
  BackTrack 4: Assuring Security by Penetration Testing Master the art of penetration testing with BackTrack         Read more about this book       (For more resources on this subject, see here.) Updating the kernel The update process is enough for updating the software applications. However, sometimes you may want to update your kernel, because your existing kernel doesn't support your new device. Please remember that because the kernel is the heart of the operating system, failure to upgrade may cause your BackTrack to be unbootable. You need to make a backup of your kernel and configuration. You should ONLY update your kernel with the one made available by the BackTrack developers. This Linux kernel is modified to make certain "features" available to the BackTrack users and updating with other kernel versions could disable those features.   Multiple Customized installations One of the drawbacks we found while using BackTrack 4 is that you need to perform a big upgrade (300MB to download) after you've installed it from the ISO or from the VMWare image provided. If you have one machine and a high speed Internet connection, there's nothing much to worry about. However, imagine installing BackTrack 4 in several machines, in several locations, with a slow internet connection. The solution to this problem is by creating an ISO image with all the upgrades already installed. If you want to install BackTrack 4, you can just install it from the newly created ISO image. You won't have to download the big upgrade again. While for the VMWare image, you can solve the problem by doing the upgrade in the virtual image, then copying that updated virtual image to be used in the new VMWare installation.   Efficient methodology Combining the power of both methodologies, Open Source Security Testing Methodology Manual (OSSTMM) and Information Systems Security Assessment Framework (ISSAF) does provide sufficient knowledge base to assess the security of an enterprise environment efficiently.   Can't find the dnsmap program In our testing, the dnsmap-bulk script is not working because it can't find the dnsmap program. To fix it, you need to define the location of the dnsmap executable. Make sure you are in the dnsmap directory (/pentest/enumeration/dns/dnsmap). Edit the dnsmap-bulk.sh file using nano text editor and change the following: dnsmap $i elif [[ $# -eq 2 ]] then dnsmap $i -r $2 to ./dnsmap $i elif [[ $# -eq 2 ]] then ./dnsmap $i -r $2 and save your changes.   fierce Version Currently, the fierce Version 1 included with BackTrack 4 is no longer maintained by the author (Rsnake). He has suggested using fierce Version 2 that is still actively maintained by Jabra. fierce Version 2 is a rewrite of fierce Version 1. It also has several new features such as virtual host detection, subdomain and extension bruteforcing, template based output system, and XML support to integrate with Nmap. Since fierce Version 2 is not released yet and there is no BackTrack package for it, you need to get it from the development server by issuing the Subversion check out command: #svn co https://svn.assembla.com/svn/fierce/fierce2/trunk/fierce2/ Make sure you are in the /pentest/enumeration directory first before issuing the above command. You may need to install several Perl modules before you can use fierce v2 correctly.   Relationship between "Vulnerability" and "Exploit" A vulnerability is a security weakness found in the system which can be used by the attacker to perform unauthorized operations, while the exploit is a piece of code (proof-of-concept or PoC) written to take advantage of that vulnerability or bug in an automated fashion.   CISCO Privilege modes There are 16 different privilege modes available for the Cisco devices, ranging from 0 (most restricted level) to 15 (least restricted level). All the accounts created should have been configured to work under the specific privilege level. More information on this is available at http://www.cisco.com/en/US/docs/ios/12_2t/12_2t13/feature/guide/ftprienh.html.   Cisco Passwd Scanner The Cisco Passwd Scanner has been developed to scan the whole bunch of IP addresses in a specific network class. This class can be represented as A, B, or C in terms of network computing. Each class has it own definition for a number of hosts to be scanned. The tool is much faster and efficient in handling multiple threads in a single instance. It discovers those Cisco devices carrying default telnet password "cisco". We have found a number of Cisco devices vulnerable to default telnet password "cisco".   Common User Passwords Profiler (CUPP) As a professional penetration tester you may find a situation where you hold the target's personal information but are unable to retrieve or socially engineer his e-mail account credentials due to certain variable conditions, such as, the target does not use the Internet often, doesn't like to talk to strangers on the phone, and may be too paranoid to open unknown e-mails. This all comes to guessing and breaking the password based on various password cracking techniques (dictionary or brute force method). CUPP is purely designed to generate a list of common passwords by profiling the target name, birthday, nickname, family member's information, pet name, company, lifestyle patterns, likes, dislikes, interests, passions, and hobbies. This activity serves as crucial input to the dictionary-based attack method while attempting to crack the target's e-mail account password.   Extract particular information from the exploits list Using the power of bash commands we can manipulate the output of any text file in order to retrieve meaningful data. This can be accomplished by typing in cat files.csv |grep '"' |cut -d";" -f3 on your console. It will extract the list of exploit titles from a files.csv. To learn the basic shell commands please refer to an online source at: http://tldp.org/LDP/abs/html/index.html.   "inline" and "stager" type payload An inline payload is a single self-contained shell code that is to be executed with one instance of an exploit. While the stager payload creates a communication channel between the attacker and victim machine to read-off the rest of the staging shell code to perform the specific task, it is often common practice to choose stager payloads because they are much smaller in size than inline payloads.   Extending attack landscape by gaining deeper access to the target's network that is inaccessible from outside Metasploit provides a capability to view and add new routes to the destination network using the "route add targetSubnet targetSubnetMask SessionId" command (for example, route add 10.2.4.0 255.255.255.0 1). The "SessionId" is pointing to the existing meterpreter session (also called gateway) created after successful exploitation. The "targetSubnet" is another network address (also called dual homed Ethernet IP-address) attached to our compromised host. Once you set a metasploit to route all the traffic through a compromised host session, we are then ready to penetrate further into a network which is normally non-routable from our side. This terminology is commonly known as Pivoting or Foot-holding.   Evading Antivirus Protection Using Metasploit Using a tool called msfencode located at /pentest/exploits/framework3, we can generate a self-protected executable file with encoded payload. This should be parallel to the msfpayload file generation process. A "raw" output from Msfpayload will be piped into Msfencode to use specific encoding technique before outputting the final binary. For instance, execute ./msfpayload windows/shell/reverse_tcp LHOST=192.168.0.3 LPORT=32323 R | ./msfencode -e x86/shikata_ga_nai -t exe > /tmp/tictoe.exe to generate an encoded version of a reverse shell executable file. We strongly suggest you to use the "stager" type payloads instead of "inline" payloads, as they have a greater probability of success in bypassing major malware defenses due to their indefinite code signatures.   Stunnel version 3 BackTrack also comes with Stunnel version 3. The difference with Stunnel version 4 is that the version 4 uses a configuration file. If you want to run the version 3 style command-line arguments, you can call the command stunnel or stunnel3 with all of the needed arguments. Summary In this article we will take a look at some tips and tricks to make the best use of BackTrack OS. Further resources on this subject: FAQs on BackTrack 4 [Article] BackTrack 4: Target Scoping [Article] BackTrack 4: Security with Penetration Testing Methodology [Article] Blocking Common Attacks using ModSecurity 2.5 [Article] Telecommunications and Network Security Concepts for CISSP Exam [Article] Preventing SQL Injection Attacks on your Joomla Websites [Article]
Read more
  • 0
  • 0
  • 7840

article-image-selecting-dom-elements-using-mootools-12-part-1
Packt
07 Jan 2010
7 min read
Save for later

Selecting DOM Elements using MooTools 1.2: Part 1

Packt
07 Jan 2010
7 min read
In order to successfully and effortlessly write unobtrusive JavaScript, we must have a way to point to the Document Object Model (DOM) elements that we want to manipulate. The DOM is a representation of objects in our HTML and a way to access and manipulate them. In traditional JavaScript, this involves a lot (like, seriously a lot) of code authoring, and in many instances, a lot of head-banging-against-wall-and-pulling-out-hair as you discover a browser quirk that you must solve. Let me save you some bumps, bruises, and hair by showing you how to select DOM elements the MooTools way. This article will cover how you can utilize MooTools to select/match simple (like, "All div elements") up to the most complex and specific elements (like, "All links that are children of a span that has a class of awesomeLink and points to http://superAwesomeSite.com MooTools and CSS selectors MooTools selects an element (or a set of elements) in the DOM using CSS selectors syntax. Just a quick refresher on CSS terminology, a CSS style rule consists of: selector { property: property value; } selector: indicates what elements will be affected by the style rule. property: the CSS property (also referred to as attribute). For example, color is a CSS property, so is font-style. You can have multiple property declarations in one style rule property value: the value you want assigned to the property. For example, bold is a possible CSS property value of the font-weight CSS property. For example, if you wanted to select a paragraph element with an ID of awesomeParagraph to give it a red color (in hexadecimal, this is #ff0), in CSS syntax you'd write: #awesomeParagraph { color: #ff0; } Also, if I wanted to increase its specificity and make sure that only paragraph elements that have an ID of awesomeParagraph is selected: p#awesomeParagraph { color: #ff0;} You'll be happy to know that this same syntax ports over to MooTools. What's more is that you'll be able to take advantage of all of CSS3's more complex selection syntax because even though CSS3 isn't supported by all browsers yet, MooTools supports them already. So you don't have to learn another syntax for writing selectors, you can use your existing knowledge of CSS - awesome, isn't it? Working with the $() and $$() functions The $() and $$() functions are the bread and butter of MooTools. When you're working with unobtrusive JavaScript, you need to specify which elements you want to operate on. The dollar and dollars functions help you do just that - they will allow you to specify what elements you want to work on. Notice, the dollar sign $ is shaped like an S, which we can interpret to mean 'select'. The $() dollar function The dollar function is used for getting an element by it's ID, which returns a single element that is extended with MooTools Element methods or null if nothing matches. Let's go back to awesomeParagraph in the earlier example. If I wanted to select awesomeParagraph, this is what I would write: $('awesomeParagraph’) By doing so, we can now operate on it by passing methods to the selection. For example, if you wanted to change its style to have a color of red. You can use the .setStyle() method which allows you to specify a CSS property and its matching property value, like so $('awesomeParagraph').setStyle('color', '#ff0'); The $$() dollars function The $$() function is the $() big brother (that's why it gets an extra dollar sign). The $$() function can do more robust and complex selections, can select a group, or groups of element’s and always returns an array object, with or without selected elements in it. Likewise, it can be interchanged with the dollar function. If we wanted to select awesomeParagraph using the dollars function, we would write: $$('#awesomeParagraph') Note that you have to use the pound sign (#) in this case as if you were using CSS selectors. When to use which If you need to select just one element that has an ID, you should use the $() function because it performs better in terms of speed than the $$() function. Use the $$() function to select multiple elements. In fact, when in doubt, use the$$() function because it can do what the $() function can do (and more). A note on single quotes (') versus double quotes ("")The example above would work even if we used double quotes such as $("awesomeParagraph") or $$("#awesomeParagraph") ") - but many MooTools developers prefer single quotes so they don’t have to escape characters as much (since the double quote is often used in HTML, you'll have to do "in order not to prematurely end your strings). It's highly recommended that you use single quotes - but hey, it's your life! Now, let's see these functions in action. Let's start with the HTML markup first. Put the following block of code in an HTML document: <body> <ul id="superList"> <li>List item</li> <li><a href="#">List item link</a></li> <li id="listItem">List item with an ID.</li> <li class="lastItem">Last list item.</li> </ul></body> What we have here is an unordered list. We'll use it to explore the dollar and dollars function. If you view this in your web browser, you should see something like this: Time for action – selecting an element with the dollar function Let's select the list item with an ID of listItem and then give it a red color using the .setStyle() MooTools method. Inside your window.addEvent('domready') method, use the following code: $('listItem').setStyle('color', 'red'); Save the HTML document and view it on your web browser. You should see the 3rd list item will be red. Now let's select the entire unordered list (it has an ID of superList), then give it a black border (in hexadecimal, this is #000000). Place the following code, right below the code the line we wrote in step 1: $('superList').setStyle('border', '1px solid #000000'); If you didn't close your HTML document, hit your browser's refresh. You should now see something like this: Time for action – selecting elements with the dollars function We'll be using the same HTML document, but this time, let's explore the dollars function. We're going to select the last list item (it has a class of lastItem). Using the .getFirst(), we select the first element from the array $$() returned. Then, we're going to use the .get() method to get its text. To show us what it is, we'll pass it to the alert() function. The code to write to accomplish this is: alert( $$('.lastItem').getFirst().get('text') ); Save the HTML document and view it in your web browser (or just hit your browser's refresh button if you still have the HTML document from the preview Time for action open). You should now see the following: Time for action – selecting multiple sets of elements with the dollars function What if we wanted to select multiple sets of elements and run the same method (or methods) on them? Let's do that now. Let's select the list item that has an <a> element inside it and the last list item (class="lastItem") and then animate them to the right by transitioning their margin-left CSS property using the .tween() method. Right below the line we wrote previously, place the following line: $$('li a, .lastItem').tween('margin-left', '50px'); View your work in your web browser. You should see the second and last list item move to the right by 50 pixels. What just happened? We just explored the dollar and dollars function to see how to select different elements and apply methods on them. You just learned to select: An element with an ID (#listItem and #superList) using the dollar $() function An element with a class (.lastItem) using the dollars $$() function Multiple elements by separating them with commas (li and .lastItem) You also saw how you can execute methods on your selected elements. In the example, we used the .setStyle(), .getFirst(), get(), and .tween() MooTools methods. Because DOM selection is imperative to writing MooTools code, before we go any further, let's talk about some important points to note about what we just did.
Read more
  • 0
  • 0
  • 7823

article-image-designing-sizing-building-and-configuring-citrix-vdi-box
Packt
13 Sep 2013
7 min read
Save for later

Designing, Sizing, Building, and Configuring Citrix VDI-in-a-Box

Packt
13 Sep 2013
7 min read
(For more resources related to this topic, see here.) Sizing the servers There are a number of tools and guidelines to help you to size Citrix VIAB appliances. Essentially, the guides cover the following topics: CPU Memory Disk IO Storage In their sizing guides, Citrix classifies users into the following two groups: 4kers Knowledge workers Therefore, the first thing to determine is how many of your proposed VIAB users are task workers, and how many are knowledge workers? Task workers Citrix would define task workers as users who run a small set of simple applications, not very graphical in nature or CPU- or memory-intensive, for example, Microsoft Office and a simple line of business applications. Knowledge workers Citrix would define knowledge workers as users who run multimedia and CPU- and memory-intensive applications. They may include large spreadsheet files, graphics packages, video playback, and so on. CPU Citrix offers recommendations based on CPU cores, such as the following: 3 x desktops per core per knowledge worker 6 x desktops per core per task user 1 x core for the hypervisor These figures can be increased slightly if the CPUs have hyper-threading. You should also add another 15 percent if delivering personal desktops. The sizing information has been gathered from the Citrix VIAB sizing guide PDF. Example 1 If you wanted to size a server appliance to support 50 x task-based users running pooled desktops, you would require 50 / 6 = 8.3 + 1 (for the hypervisor) = 9.3 cores, rounded up to 10 cores. Therefore, a dual CPU with six cores would provide 12 x CPU cores for this requirement. Example 2 If you wanted to size a server appliance to support 15 x task and 10 x knowledge workers you would require (15 / 6 = 2.5) + (10 / 3 = 3.3) + 1 (for the hypervisor) = 7 cores. Therefore, a dual CPU with 4 cores would provide 8 x CPU cores for this requirement. Memory The memory required depends on the desktop OS that you are running and also on the amount of optimization that you have done to the image. Citrix recommends the following guidelines: Task worker for Windows 7 should be 1.5 GB Task worker for Windows XP should be 0.5 GB Knowledge worker Windows 7 should be 2 GB Knowledge worker Windows XP should be 1 GB It is also important to allocate memory for the hypervisor and the VIAB virtual appliance. This can vary depending on the number of users, so we would recommend using the sizing spreadsheet calculator available in the Resources section of the VIAB website. However, as a guide, we would allocate 3 GB memory (based on 50 users) for the hypervisor and 1 GB for VIAB. The amount of memory required by the hypervisor will grow as the number of users on the server grows. Citrix also recommends adding 10 percent more memory for server operations. Example 1 If you wanted to size a server appliance to support 50 x task-based users, with Windows 7, you would require 50 x 1.5 + 4 GB (for VIAB and the hypervisor) = 75 GB + 10% = 87 GB. Therefore, you would typically round this up to a 96 GB memory, providing an ideal configuration for this requirement. Example 2 Therefore, if you wanted to size a server appliance to support 15 x task and 10 x knowledge workers, with Windows 7, you would require (15 x 1.5) + (10 x 2) + 4 GB (for VIAB and the hypervisor) = 75 GB + 10% = 51.5 GB. Therefore, a 64 GB memory would be an ideal configuration for this requirement. Disk IO As multiple Windows images run on the appliances, disk IO becomes very important and can often become the first bottleneck for VIAB.Citrix calculates IOPS with a 40-60 split between read and write OPS, during end user desktop access.Citrix doesn't recommend using slow disks for VIAB and has statistic information for SAS 10 and 15K and SSD disks.The following table shows the IOPS delivered from the following disks: Hard drive RPM IOPS RAID 0 IOPS RAID 1 SSD 6000   15000 175 122.5 10000 125 87.7 The following table shows the IOPS required for task and knowledge workers for Windows XP and Windows 7: Desktop IOPS Windows XP Windows 7 Task user 5 IOS 10 IOPS Knowledge user 10 IOPS 20 IOPS Some organizations decide to implement RAID 1 or 10 on the appliances to reduce the chance of an appliance failure. This does require many more disks however, and significantly increases the cost of the solution. SSD SSD is becoming an attractive proposition for organizations that want to run a larger number of users on each appliance. SSD is roughly 30 times faster than 15K SAS drives, so it will eliminate desktop IO bottlenecks completely. SSD continues to come down in price, so can be well worth considering at the start of a VIAB project. SSDs have no moving mechanical components. Compared with electromechanical disks, SSDs are typically less susceptible to physical shock, run more quietly, have lower access time, and less latency. However, while the price of SSDs has continued to decline, SSDs are still about 7 to 8 times more expensive per unit of storage than HDDs. A further option to consider would be Fusion-IO, which is based on NAND flash memory technology and can deliver an exceptional number of IOPS. Example 1 If you wanted to size a server appliance to support 50 x task workers, with Windows 7, using 15K SAS drives, you would require 175 / 10 = 17.5 users on each disk, therefore, 50 / 17. 5 = 3 x 15K SAS disks. Example 2 If you wanted to size a server appliance to support 15 x task workers and 10 knowledge workers, with Windows 7, you would require the following: 175 / 10 = 17.5 task users on each disk, therefore 15 / 17.5 = 0.8 x 15K SAS disks 175 / 20 = 8.75 knowledge users on each disk, therefore 10 / 8.75 = 1.1 x 15K SAS disks Therefore, 2 x 15K SAS drives would be required. Storage Storage capacity is determined by the number of images, number of desktops, and types of desktop. It is best practice to store user profile information and data elsewhere. Citrix uses the following formula to determine the storage capacity requirement: 2 x golden image x number of images (assume 20 GB for an image) 70 GB for VDI-in-a-Box 15 percent of the size of the image / desktop (achieved with linked clone technology) Example 1 Therefore, if you wanted to size a server appliance to support 50 x task-based users, with two golden Windows 7 images, you would require the following: Space for the golden image: 2 x 20 GB x 2 = 80 GB VIAB appliance space: 70 GB Image space/desktop: 15% x 20 GB x 50 = 150 GB Extra room for swap and transient activity: 100 GB Total: 400 GB Recommended: 500 GB to 1 TB per server We have already specified 3 x 15K SAS drives for our IO requirements. If those were 300-GB disks, they should provide enough storage. This section of the article provides you with a step-by-step guide to help you to build and configure a VIAB solution; starting with the hypervisor install. It then goes onto to cover adding an SSL certificate, the benefits of using the GRID IP Address feature, and how you can use the Kiosk mode to deliver a standard desktop to public access areas. It then covers adding a license file and provides details on the useful features contained within Citrix profile management. It then highlights how VIAB can integrate with other Citrix products such as NetScaler VPX, to enable secure connections across the Internet and GoToAssist, a support and monitoring package which is very useful if you are supporting a number of VIAB appliances across multiple sites. ShareFile can again be a very useful tool to enable data files to follow the user, whether they are connecting to a local device or a virtual desktop. This can avoid the problems of files being copied across the network, delaying users. We then move on to a discussion on the options available for connecting to VIAB, including existing PCs, thin clients, and other devices, including mobile devices. The chapter finishes with some useful information on support for VIAB, including the support services included with subscription and the knowledge forums. Installing the hypervisor All the hypervisors have two elements; the bare metal hypervisor that installs on the server and its management tools that you would typically install on the IT administrator workstations. Bare Metal Hypervisor Management tool Citrix XenServer XenCenter Microsoft Hyper-V Hyper V Manager VMware ESXi vSphere Client It is relatively straightforward to install the hypervisor. Make sure you enable linked clones in XenServer, because this is required for the linked clone technology. Give the hypervisor a static IP address and make a note of the administrator's username and password. You will need to download ISO images for the installation media; if you don't already have them, they can be found on the Internet.
Read more
  • 0
  • 0
  • 7821
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-overview-architecture-and-modeling-cassandra
Packt
21 Jan 2014
5 min read
Save for later

An overview of architecture and modeling in Cassandra

Packt
21 Jan 2014
5 min read
(For more resources related to this topic, see here.) Cassandra uses a peer-to-peer architecture, unlike a master-slave architecture, which is prone to single point of failure (SPOF) problems. Cassandra is deployed on multiple machines with each machine acting as a node in a cluster. Data is autosharded, that is, automatically distributed across nodes using key-based sharding, which means that the keys are used to distribute the data across the cluster. Each key-value data element in Cassandra is replicated across the cluster on other nodes (the default replication is 3) for high availability and fault tolerance. If a node goes down, the data can be served from another node having a copy of the original data. Sharding is an old concept used for distributing data across different systems. Sharding can be horizontal or vertical. In horizontal sharding, in case of RDBMS, data is distributed on the basis of rows, with some rows residing on a single machine and the other rows residing on other machines. Vertical sharding is similar to columnar storage, where columns can be stored separately in different locations. Hadoop Distributed File Systems (HDFS) use data-volumes-based sharding, where a single big file is sharded and distributed across multiple machines using the block size. So, as an example, if the block size is 64 MB, a 640 MB file will be split into 10 chunks and placed in multiple machines. The same autosharding capability is used when new nodes are added to Cassandra, where the new node becomes responsible for a specific key range of data. The details of what node holds what key ranges is coordinated and shared across the cluster using the gossip protocol. So, whenever a client wants to access a specific key, each node locates the key and its associated data quickly within a few milliseconds. When the client writes data to the cluster, the data will be written to the nodes responsible for that key range. However, if the node responsible for that key range is down or not reachable, Cassandra uses a clever solution called Hinted Handoff that allows the data to be managed by another node in the cluster and to be written back on the responsible node once that node is back in the cluster. The replication of data raises the concern of data inconsistency when the replicas might have different states for the same data. Cassandra uses mechanisms such as anti-entropy and read repair for solving this problem and synchronizing data across the replicas. Anti-entropy is used at the time of compaction, where compaction is a concept borrowed from Google BigTable. Compaction in Cassandra refers to the merging of SSTable and helps in optimizing data storage and increasing read performance by reducing the number of seeks across SSTables. Another problem that compaction solves is handling deletion in Cassandra. Unlike traditional RDBMS, all deletes in Cassandra are soft deletes, which means that the records still exist in the underlying data store but are marked with a special flag so that these deleted records do not appear in query results. The records marked as deleted records are called tombstone records. Major compactions handle these soft deletes or tombstones by removing them from the SSTable in the underlying file stores. Cassandra, like Dynamo, uses a Merkle tree data structure to represent the data state at a column family level in a node. This Merkle tree representation is used during major compactions to find the difference in the data states across nodes and reconciled. The Merkle tree or Hash tree is a data structure in the form of a tree where every non-leaf node is labeled with the hash of children nodes, allowing the efficient and secure verification of the contents of the large data structure. Cassandra, like Dynamo, falls under the AP part of the CAP theorem and offers a tunable consistency level. Cassandra provides multiple consistency levels, as illustrated in the following table: Operation ZERO ANY ONE QUORUM ALL Read Not supported Not supported Reads from one node   Read from a majority of nodes with replicas Read from all the nodes with replicas Write Asynchronous write Writes on one node including hints Writes on one node with commit log and Memtable Writes on a majority of nodes with replicas Writes on all the nodes with replicas A summary of the features in Cassandra The following table summarizes the key features of Cassandra with respect to its origins in Google BigTable and Amazon Dynamo: Feature Cassandra implementation Google BigTable Amazon Dynamo Architecture Peer-to-peer architecture, ring-based deployment architecture No Yes   Data model Multidimensional map (row,column, timestamp) -> bytes Yes   No CAP theorem AP with tunable consistency No Yes   Storage architecture SSTable, Memtables Yes   No Storage layer Local filesystem storage No No Fast reads and efficient storage Bloom filters, compactions Yes   No Programming language Java No Yes   Client programming language Multiple languages supported: Java, PHP, Python, REST, C++, .NET, and so on. Not known Not known Scalability model Horizontal scalability; multiple nodes deployment than a single machine deployment Yes   Yes   Version conflicts Timestamp field (not a vector clock as usually assumed) No No Hard deletes/updates Data is always appended using the timestamp field—deletes/updates are soft appends and are cleaned asynchronously as part of major compactions Yes   No Summary Cassandra packs the best features of two technologies proven at scale—Google BigTable and Amazon Dynamo. However, today Cassandra has evolved beyond these origins with new unique and enterprise-ready features such as Cassandra Query Language (CQL), support for collection columns, lightweight transactions, and triggers. Resources for Article: Further resources on this subject: Basic Concepts and Architecture of Cassandra [Article] About Cassandra [Article] Getting Started with Apache Cassandra [Article]
Read more
  • 0
  • 0
  • 7820

article-image-security-and-interoperability
Packt
03 Feb 2015
28 min read
Save for later

Security and Interoperability

Packt
03 Feb 2015
28 min read
 This article by Peter Waher, author of the book, Learning Internet of Things, we will focus on the security, interoperability, and what issues we need to address during the design of the overall architecture of Internet of Things (IoT) to avoid many of the unnecessary problems that might otherwise arise and minimize the risk of painting yourself into a corner. You will learn the following: Risks with IoT Modes of attacking a system and some counter measures The importance of interoperability in IoT (For more resources related to this topic, see here.) Understanding the risks There are many solutions and products marketed today under the label IoT that lack basic security architectures. It is very easy for a knowledgeable person to take control of devices for malicious purposes. Not only devices at home are at risk, but cars, trains, airports, stores, ships, logistics applications, building automation, utility metering applications, industrial automation applications, health services, and so on, are also at risk because of the lack of security measures in their underlying architecture. It has gone so far that many western countries have identified the lack of security measures in automation applications as a risk to national security, and rightly so. It is just a matter of time before somebody is literally killed as a result of an attack by a hacker on some vulnerable equipment connected to the Internet. And what are the economic consequences for a company that rolls out a product for use on the Internet that results into something that is vulnerable to well-known attacks? How has it come to this? After all the trouble Internet companies and applications have experienced during the rollout of the first two generations of the Web, do we repeat the same mistakes with IoT? Reinventing the wheel, but an inverted one One reason for what we discussed in the previous section might be the dissonance between management and engineers. While management knows how to manage known risks, they don't know how to measure them in the field of IoT and computer communication. This makes them incapable of understanding the consequences of architectural decisions made by its engineers. The engineers in turn might not be interested in focusing on risks, but on functionality, which is the fun part. Another reason might be that the generation of engineers who tackle IoT are not the same type of engineers who tackled application development on the Internet. Electronics engineers now resolve many problems already solved by computer science engineers decades earlier. Engineers working on machine-to-machine (M2M) communication paradigms, such as industrial automation, might have considered the problem solved when they discovered that machines could talk to each other over the Internet, that is, when the message-exchanging problem was solved. This is simply relabeling their previous M2M solutions as IoT solutions because the transport now occurs over the IP protocol. But, in the realm of the Internet, this is when the problems start. Transport is just one of the many problems that need to be solved. The third reason is that when engineers actually re-use solutions and previous experience, they don't really fit well in many cases. The old communication patterns designed for web applications on the Internet are not applicable for IoT. So, even if the wheel in many cases is reinvented, it's not the same wheel. In previous paradigms, publishers are a relatively few number of centralized high-value entities that reside on the Internet. On the other hand, consumers are many but distributed low-value entities, safely situated behind firewalls and well protected by antivirus software and operating systems that automatically update themselves. But in IoT, it might be the other way around: publishers (sensors) are distributed, very low-value entities that reside behind firewalls, and consumers (server applications) might be high-value centralized entities, residing on the Internet. It can also be the case that both the consumer and publisher are distributed, low-value entities who reside behind the same or different firewalls. They are not protected by antivirus software, and they do not autoupdate themselves regularly as new threats are discovered and countermeasures added. These firewalls might be installed and then expected to work for 10 years with no modification or update being made. The architectural solutions and security patterns developed for web applications do not solve these cases well. Knowing your neighbor When you decide to move into a new neighborhood, it might be a good idea to know your neighbors first. It's the same when you move a M2M application to IoT. As soon as you connect the cable, you have billions of neighbors around the world, all with access to your device. What kind of neighbors are they? Even though there are a lot of nice and ignorant neighbors on the Internet, you also have a lot of criminals, con artists, perverts, hackers, trolls, drug dealers, drug addicts, rapists, pedophiles, burglars, politicians, corrupt police, curious government agencies, murderers, demented people, agents from hostile countries, disgruntled ex-employees, adolescents with a strange sense of humor, and so on. Would you like such people to have access to your things or access to the things that belong to your children? If the answer is no (as it should be), then you must take security into account from the start of any development project you do, aimed at IoT. Remember that the Internet is the foulest cesspit there is on this planet. When you move from the M2M way of thinking to IoT, you move from a nice and security gated community to the roughest neighborhood in the world. Would you go unprotected or unprepared into such an area? IoT is not the same as M2M communication in a secure and controlled network. For an application to work, it needs to work for some time, not just in the laboratory or just after installation, hoping that nobody finds out about the system. It is not sufficient to just get machines to talk with each other over the Internet. Modes of attack To write an exhaustive list of different modes of attack that you can expect would require a book by itself. Instead, just a brief introduction to some of the most common forms of attack is provided here. It is important to have these methods in mind when designing the communication architecture to use for IoT applications. Denial of Service A Denial of Service (DoS) or Distributed Denial of Service (DDoS) attack is normally used to make a service on the Internet crash or become unresponsive, and in some cases, behave in a way that it can be exploited. The attack consists in making repetitive requests to a server until its resources gets exhausted. In a distributed version, the requests are made by many clients at the same time, which obviously increases the load on the target. It is often used for blackmailing or political purposes. However, as the attack gets more effective and difficult to defend against when the attack is distributed and the target centralized, the attack gets less effective if the solution itself is distributed. To guard against this form of attack, you need to build decentralized solutions where possible. In decentralized solutions, each target's worth is less, making it less interesting to attack. Guessing the credentials One way to get access to a system is to impersonate a client in the system by trying to guess the client's credentials. To make this type of attack less effective, make sure each client and each device has a long and unique, perhaps randomly generated, set of credentials. Never use preset user credentials that are the same for many clients or devices or factory default credentials that are easy to reset. Furthermore, set a limit to the number of authentication attempts per time unit permitted by the system; also, log an event whenever this limit is reached, from where to which credentials were used. This makes it possible for operators to detect systematic attempts to enter the system. Getting access to stored credentials One common way to illicitly enter a system is when user credentials are found somewhere else and reused. Often, people reuse credentials in different systems. There are various ways to avoid this risk from happening. One is to make sure that credentials are not reused in different devices or across different services and applications. Another is to randomize credentials, lessening the desire to reuse memorized credentials. A third way is to never store actual credentials centrally, even encrypted if possible, and instead store hashed values of these credentials. This is often possible since authentication methods use hash values of credentials in their computations. Furthermore, these hashes should be unique to the current installation. Even though some hashing functions are vulnerable in such a way that a new string can be found that generates the same hash value, the probability that this string is equal to the original credentials is miniscule. And if the hash is computed uniquely for each installation, the probability that this string can be reused somewhere else is even more remote. Man in the middle Another way to gain access to a system is to try and impersonate a server component in a system instead of a client. This is often referred to as a Man in the middle (MITM) attack. The reason for the middle part is that the attacker often does not know how to act in the server and simply forwards the messages between the real client and the server. In this process, the attacker gains access to confidential information within the messages, such as client credentials, even if the communication is encrypted. The attacker might even try to modify messages for their own purposes. To avoid this type of attack, it's important for all clients (not just a few) to always validate the identity of the server it connects to. If it is a high-value entity, it is often identified using a certificate. This certificate can both be used to verify the domain of the server and encrypt the communication. Make sure this validation is performed correctly, and do not accept a connection that is invalid or where the certificate has been revoked, is self-signed, or has expired. Another thing to remember is to never use an unsecure authentication method when the client authenticates itself with the server. If a server has been compromised, it might try to fool clients into using a less secure authentication method when they connect. By doing so, they can extract the client credentials and reuse them somewhere else. By using a secure authentication method, the server, even if compromised, will not be able to replay the authentication again or use it somewhere else. The communication is valid only once. Sniffing network communication If communication is not encrypted, everybody with access to the communication stream can read the messages using simple sniffing applications, such as Wireshark. If the communication is point-to-point, this means the communication can be heard by any application on the sending machine, the receiving machine, or any of the bridges or routers in between. If a simple hub is used instead of a switch somewhere, everybody on that network will also be able to eavesdrop. If the communication is performed using multicast messaging service, as can be done in UPnP and CoAP, anybody within the range of the Time to live (TTL) parameter (maximum number of router hops) can eavesdrop. Remember to always use encryption if sensitive data is communicated. If data is private, encryption should still be used, even if the data might not be sensitive at first glance. A burglar can know if you're at home by simply monitoring temperature sensors, water flow meters, electricity meters, or light switches at your home. Small variations in temperature alert to the presence of human beings. Change in the consumption of electrical energy shows whether somebody is cooking food or watching television. The flow of water shows whether somebody is drinking water, flushing a toilet, or taking a shower. No flow of water or a relatively regular consumption of electrical energy tells the burglar that nobody is at home. Light switches can also be used to detect presence, even though there are applications today that simulate somebody being home by switching the lights on and off. If you haven't done so already, make sure to download a sniffer to get a feel of what you can and cannot see by sniffing the network traffic. Wireshark can be downloaded from https://www.wireshark.org/download.html. Port scanning and web crawling Port scanning is a method where you systematically test a range of ports across a range of IP addresses to see which ports are open and serviced by applications. This method can be combined with different tests to see the applications that might be behind these ports. If HTTP servers are found, standard page names and web-crawling techniques can be used to try to figure out which web resources lie behind each HTTP server. CoAP is even simpler since devices often publish well-known resources. Using such simple brute-force methods, it is relatively easy to find (and later exploit) anything available on the Internet that is not secured. To avoid any private resources being published unknowingly, make sure to close all the incoming ports in any firewalls you use. Don't use protocols that require incoming connections. Instead, use protocols that create the connections from inside the firewall. Any resources published on the Internet should be authenticated so that any automatic attempt to get access to them fails. Always remember that information that might seem trivial to an individual might be very interesting if collected en masse. This information might be coveted not only by teenage pranksters but by public relations and marketing agencies, burglars, and government agencies (some would say this is a repetition). Search features and wildcards Don't make the mistake of thinking it's difficult to find the identities of devices published on the Internet. Often, it's the reverse. For devices that use multicast communication, such as those using UPnP and CoAP, anybody can listen in and see who sends the messages. For devices that use single-cast communication, such as those using HTTP or CoAP, port-scanning techniques can be used. For devices that are protected by firewalls and use message brokers to protect against incoming attacks, such as those that use XMPP and MQTT, search features or wildcards can be used to find the identities of devices managed by the broker, and in the case of MQTT, even what they communicate. You should always assume that the identity of all devices can be found, and that there's an interest in exploiting the device. For this reason, it's very important that each device authenticates any requests made to it if possible. Some protocols help you more with this than others, while others make such authentication impossible. XMPP only permits messages from accepted friends. The only thing the device needs to worry about is which friend requests to accept. This can be either configured by somebody else with access to the account or by using a provisioning server if the device cannot make such decisions by itself. The device does not need to worry about client authentication, as this is done by the brokers themselves, and the XMPP brokers always propagate the authenticated identities of everybody who send them messages. MQTT, on the other hand, resides in the other side of the spectrum. Here, devices cannot make any decision about who sees the published data or who makes a request since identities are stripped away by the protocol. The only way to control who gets access to the data is by building a proprietary end-to-end encryption layer on top of the MQTT protocol, thereby limiting interoperability. In between the two resides protocols such as HTTP and CoAP that support some level of local client authentication but lacks a good distributed identity and authentication mechanism. This is vital for IoT even though this problem can be partially solved in local intranets. Breaking ciphers Many believe that by using encryption, data is secure. This is not the case, as discussed previously, since the encryption is often only done between connected parties and not between end users of data (the so-called end-to-end encryption). At most, such encryption safeguards from eavesdropping to some extent. But even such encryption can be broken, partially or wholly, with some effort. Ciphers can be broken using known vulnerabilities in code where attackers exploit program implementations rather than the underlying algorithm of the cipher. This has been the method used in the latest spectacular breaches in code based on the OpenSSL library. To protect yourselves from such attacks, you need to be able to update code in devices remotely, which is not always possible. Other methods use irregularities in how the cipher works to figure out, partly or wholly, what is being communicated over the encrypted channel. This sometimes requires a considerable amount of effort. To safeguard against such attacks, it's important to realize that an attacker does not spend more effort into an attack than what is expected to be gained by the attack. By storing massive amounts of sensitive data centrally or controlling massive amounts of devices from one point, you increase the value of the target, increasing the interest of attacking it. On the other hand, by decentralizing storage and control logic, the interest in attacking a single target decreases since the value of each entity is comparatively lower. Decentralized architecture is an important tool to both mitigate the effects of attacks and decrease the interest in attacking a target. However, by increasing the number of participants, the number of actual attacks can increase, but the effort that can be invested behind each attack when there are many targets also decreases, making it easier to defend each one of the attacks using standard techniques. Tools for achieving security There are a number of tools that architects and developers can use to protect against malicious use of the system. An exhaustive discussion would fill a smaller library. Here, we will mention just a few techniques and how they not only affect security but also interoperability. Virtual Private Networks A method that is often used to protect unsecured solutions on the Internet is to protect them using Virtual Private Networks (VPNs). Often, traditional M2M solutions working well in local intranets need to expand across the Internet. One way to achieve this is to create such VPNs that allow the devices to believe they are in a local intranet, even though communication is transported across the Internet. Even though transport is done over the Internet, it's difficult to see this as a true IoT application. It's rather a M2M solution using the Internet as the mode of transport. Because telephone operators use the Internet to transport long distance calls, it doesn't make it Voice over IP (VoIP). Using VPNs might protect the solution, but it completely eliminates the possibility to interoperate with others on the Internet, something that is seen as the biggest advantage of using the IoT technology. X.509 certificates and encryption We've mentioned the use of certificates to validate the identity of high-value entities on the Internet. Certificates allow you to validate not only the identity, but also to check whether the certificate has been revoked or any of the issuers of the certificate have had their certificates revoked, which might be the case if a certificate has been compromised. Certificates also provide a Public Key Infrastructure (PKI) architecture that handles encryption. Each certificate has a public and private part. The public part of the certificate can be freely distributed and is used to encrypt data, whereas only the holder of the private part of the certificate can decrypt the data. Using certificates incurs a cost in the production or installation of a device or item. They also have a limited life span, so they need to be given either a long lifespan or updated remotely during the life span of the device. Certificates also require a scalable infrastructure for validating them. For these reasons, it's difficult to see that certificates will be used by other than high-value entities that are easy to administer in a network. It's difficult to see a cost-effective, yet secure and meaningful, implementation of validating certificates in low-value devices such as lamps, temperature sensors, and so on, even though it's theoretically possible to do so. Authentication of identities Authentication is the process of validating whether the identity provided is actually correct or not. Authenticating a server might be as simple as validating a domain certificate provided by the server, making sure it has not been revoked and that it corresponds to the domain name used to connect to the server. Authenticating a client might be more involved, as it has to authenticate the credentials provided by the client. Normally, this can be done in many different ways. It is vital for developers and architects to understand the available authentication methods and how they work to be able to assess the level of security used by the systems they develop. Some protocols, such as HTTP and XMPP, use the standardized Simple Authentication and Security Layer (SASL) to publish an extensible set of authentication methods that the client can choose from. This is good since it allows for new authentication methods to be added. But it also provides a weakness: clients can be tricked into choosing an unsecure authentication mechanism, thus unwittingly revealing their user credentials to an impostor. Make sure clients do not use unsecured or obsolete methods, such as PLAIN, BASIC, MD5-CRAM, MD5-DIGEST, and so on, even if they are the only options available. Instead, use secure methods such as SCRAM-SHA-1 or SCRAM-SHA-1-PLUS, or if client certificates are used, EXTERNAL or no method at all. If you're using an unsecured method anyway, make sure to log it to the event log as a warning, making it possible to detect impostors or at least warn operators that unsecure methods are being used. Other protocols do not use secure authentication at all. MQTT, for instance, sends user credentials in clear text (corresponding to PLAIN), making it a requirement to use encryption to hide user credentials from eavesdroppers or client-side certificates or pre-shared keys for authentication. Other protocols do not have a standardized way of performing authentication. In CoAP, for instance, such authentication is built on top of the protocol as security options. The lack of such options in the standard affects interoperability negatively. Usernames and passwords A common method to provide user credentials during authentication is by providing a simple username and password to the server. This is a very human concept. Some solutions use the concept of a pre-shared key (PSK) instead, as it is more applicable to machines, conceptually at least. If you're using usernames and passwords, do not reuse them between devices, just because it is simple. One way to generate secure, difficult-to-guess usernames and passwords is to randomly create them. In this way, they correspond more to pre-shared keys. One problem in using randomly created user credentials is how to administer them. Both the server and the client need to be aware of this information. The identity must also be distributed among the entities that are to communicate with the device. Here, the device creates its own random identity and creates the corresponding account in the XMPP server in a secure manner. There is no need for a common factory default setting. It then reports its identity to a thing registry or provisioning server where the owner can claim it and learn the newly created identity. This method never compromises the credentials and does not affect the cost of production negatively. Furthermore, passwords should never be stored in clear text if it can be avoided. This is especially important on servers where many passwords are stored. Instead, hashes of the passwords should be stored. Most modern authentication algorithms support the use of password hashes. Storing hashes minimizes the risk of unwanted generation of original passwords for attempted reuse in other systems. Using message brokers and provisioning servers Using message brokers can greatly enhance security in an IoT application and lower the complexity of implementation when it comes to authentication, as long as message brokers provide authenticated identity information in messages it forwards. In XMPP, all the federated XMPP servers authenticate clients connected to them as well as the federated servers themselves when they intercommunicate to transport messages between domains. This relieves clients from the burden of having to authenticate each entity in trying to communicate with it since they all have been securely authenticated. It's sufficient to manage security on an identity level. Even this step can be relieved further by the use of provisioning. Unfortunately, not all protocols using message brokers provide this added security since they do not provide information about the sender of packets. MQTT is an example of such a protocol. Centralization versus decentralization Comparing centralized and decentralized architectures is like comparing the process of putting all the eggs in the same basket and distributing them in many much smaller baskets. The effect of a breach of security is much smaller in the decentralized case; fewer eggs get smashed when you trip over. Even though there are more baskets, which might increase the risk of an attack, the expected gain of an attack is much smaller. This limits the motivation of performing a costly attack, which in turn makes it simpler to protect it against. When designing IoT architecture, try to consider the following points: Avoid storing data in a central position if possible. Only store the data centrally that is actually needed to bind things together. Distribute logic, data, and workload. Perform work as far out in the network as possible. This makes the solution more scalable, and it utilizes existing resources better. Use linked data to spread data across the Internet, and use standardized grid computation technologies to assemble distributed data (for example, SPARQL) to avoid the need to store and replicate data centrally. Use a federated set of small local brokers instead of trying to get all the devices on the same broker. Not all brokered protocols support federation, for example, XMPP supports it but MQTT does not. Let devices talk directly to each other instead of having a centralized proprietary API to store data or interpret communication between the two. Contemplate the use of cheap small and energy-efficient microcomputers such as the Raspberry Pi in local installations as an alternative to centralized operation and management from a datacenter. The need for interoperability What has made the Internet great is not a series of isolated services, but the ability to coexist, interchange data, and interact with the users. This is important to keep in mind when developing for IoT. Avoid the mistakes made by many operators who failed during the first Internet bubble. You cannot take responsibility for everything in a service. The new Internet economy is based on the interaction and cooperation between services and its users. Solves complexity The same must be true with the new IoT. Those companies that believe they can control the entire value chain, from things to services, middleware, administration, operation, apps, and so on, will fail, as the companies in the first Internet bubble failed. Companies that built devices with proprietary protocols, middleware, and mobile phone applications, where you can control your things, will fail. Why? Imagine a future where you have a thousand different things in your apartment from a hundred manufacturers. Would you want to download a hundred smart phone apps to control them? Would you like five different applications just to control your lights at home, just because you have light bulbs from five different manufacturers? An alternative would be to have one app to rule them all. There might be a hundred different such apps available (or more), but you can choose which one to use based on your taste and user feedback. And you can change if you want to. But for this to be possible, things need to be interoperable, meaning they should communicate using a commonly understood language. Reduces cost Interoperability does not only affect simplicity of installation and management, but also the price of solutions. Consider a factory that uses thousands (or hundreds of thousands) of devices to control and automate all processes within. Would you like to be able to buy things cheaply or expensively? Companies that promote proprietary solutions, where you're forced to use their system to control your devices, can force their clients to pay a high price for future devices and maintenance, or the large investment made originally might be lost. Will such a solution be able to survive against competitors who sell interoperable solutions where you can buy devices from multiple manufacturers? Interoperability provides competition, and competition drives down cost and increases functionality and quality. This might be a reason for a company to work against interoperability, as it threatens its current business model. But the alternative might be worse. A competitor, possibly a new one, might provide such a solution, and when that happens, the business model with proprietary solutions is dead anyway. The companies that are quickest in adapting a new paradigm are the ones who would most probably survive a paradigm shift, as the shift from M2M to IoT undoubtedly is. Allows new kinds of services and reuse of devices There are many things you cannot do unless you have an interoperable communication model from the start. Consider a future smart city. Here, new applications and services will be built that will reuse existing devices, which were installed perhaps as part of other systems and services. These applications will deliver new value to the inhabitants of the city without the need of installing new duplicate devices for each service being built. But such multiple use of devices is only possible if the devices communicate in an open and interoperable way. However, care has to be taken at the same time since installing devices in an open environment requires the communication infrastructure to be secure as well. To achieve the goal of building smart cities, it is vitally important to use technologies that allow you to have both a secure communication infrastructure and an interoperable one. Combining security and interoperability As we have seen, there are times where security is contradictory to interoperability. If security is meant to be taken as exclusivity, it opposes the idea of interoperability, which is by its very nature inclusive. Depending on the choice of communication infrastructure, you might have to use security measures that directly oppose the idea of an interoperable infrastructure, prohibiting third parties from accessing existing devices in a secure fashion. It is important during the architecture design phase, before implementation, to thoroughly investigate what communication technologies are available, and what they provide and what they do not provide. You might think that this is a minor issue, thinking that you can easily build what is missing on top of the chosen infrastructure. This is not true. All such implementation is by its very nature proprietary, and therefore not interoperable. This might drastically limit your options in the future, which in turn might drastically reduce anyone else's willingness to use your solution. The more a technology includes, in the form of global identity, authentication, authorization, different communication patterns, common language for interchange of sensor data, control operations and access privileges, provisioning, and so on, the more interoperable the solution becomes. If the technology at the same time provides a secure infrastructure, you have the possibility to create a solution that is both secure and interoperable without the need to build proprietary or exclusive solutions on top of it. Summary In this article, we presented the basic reasons why security and interoperability must be contemplated early on in the project and not added as late patchwork because it was shown to be necessary. Not only does such late addition limit interoperability and future use of the solution, it also creates solutions that can jeopardize not only yourself your company and your customers, but in the end, even national security. This article also presented some basic modes of attack and some basic defense systems to counter them. Resources for Article: Further resources on this subject: Rich Internet Application (RIA) – Canvas [article] ExtGWT Rich Internet Application: Crafting UI Real Estate [article] Sending Data to Google Docs [article]
Read more
  • 0
  • 0
  • 7820

article-image-how-perform-iteration-sets-mdx
Packt
05 Aug 2011
5 min read
Save for later

How to Perform Iteration on Sets in MDX

Packt
05 Aug 2011
5 min read
  MDX with Microsoft SQL Server 2008 R2 Analysis Services Cookbook More than 80 recipes for enriching your Business Intelligence solutions with high-performance MDX calculations and flexible MDX queries in this book and eBook Iteration is a very natural way of thinking for us humans. We set a starting point, we step into a loop, and we end when a condition is met. While we're looping, we can do whatever we want: check, take, leave, and modify items in that set. Being able to break down the problems in steps makes us feel that we have things under control. However, by breaking down the problem, the query performance often breaks down as well. Therefore, we have to be extra careful with iterations when data is concerned. If there's a way to manipulate the collection of members as one item, one set, without cutting that set into small pieces and iterating on individual members, we should use it. It's not always easy to find that way, but we should at least try. Iterating on a set in order to reduce it Getting ready Start a new query in SSMS and check that you're working on the right database. Then write the following query: SELECT { [Measures].[Customer Count], [Measures].[Growth in Customer Base] } ON 0, NON EMPTY { [Date].[Fiscal].[Month].MEMBERS } ON 1 FROM [Adventure Works] WHERE ( [Product].[Product Categories].[Subcategory].&[1] ) The query returns fiscal months on rows and two measures: a count of customers and their growth compared to the previous month. Mountain bikes are in slicer. Now let's see how we can get the number of days the growth was positive for each period. How to do it... Follow these steps to reduce the initial set: Create a new calculated measure in the query and name it Positive growth days. Specify that you need descendants of current member on leaves. Wrap around the FILTER() function and specify the condition which says that the growth measure should be greater than zero. Apply the COUNT() function on a complete expression to get count of days. The new calculated member's definition should look as follows, verify that it does. WITH MEMBER [Measures].[Positive growth days] AS FILTER( DESCENDANTS([Date].[Fiscal].CurrentMember, , leaves), [Measures].[Growth in Customer Base] > 0 ).COUNT Add the measure on columns. Run the query and observe if the results match the following image: How it works... The task says we need to count days for each time period and use only positive ones. Therefore, it might seem appropriate to perform iteration, which, in this case, can be performed using the FILTER() function. But, there's a potential problem. We cannot expect to have days on rows, so we must use the DESCENDANTS() function to get all dates in the current context. Finally, in order to get the number of items that came up upon filtering, we use the COUNT function. There's more... Filter function is an iterative function which doesn't run in block mode, hence it will slow down the query. In the introduction, we said that it's always wise to search for an alternative if available. Let's see if something can be done here. A keen eye will notice a "count of filtered items" pattern in this expression. That pattern suggests the use of a set-based approach in the form of SUM-IF combination. The trick is to provide 1 for the True part of the condition taken from the FILTER() statement and null for the False part. The sum of one will be equivalent to the count of filtered items. In other words, once rewritten, that same calculated member would look like this: MEMBER [Measures].[Positive growth days] AS SUM( Descendants([Date].[Fiscal].CurrentMember, , leaves), IIF( [Measures].[Growth in Customer Base] > 0, 1, null) ) Execute the query using the new definition. Both the SUM() and the IIF() functions are optimized to run in the block mode, especially when one of the branches in IIF() is null. In this particular example, the impact on performance was not noticeable because the set of rows was relatively small. Applying this technique on large sets will result in drastic performance improvement as compared to the FILTER-COUNT approach. Be sure to remember that in future. More information about this type of optimization can be found in Mosha Pasumansky's blog: http://tinyurl.com/SumIIF Hints for query improvements There are several ways you can avoid the FILTER() function in order to improve performance. When you need to filter by non-numeric values (i.e. properties or other metadata), you should consider creating an attribute hierarchy for often-searched items and then do one of the following: Use a tuple when you need to get a value sliced by that new member Use the EXCEPT() function when you need to negate that member on its own hierarchy (NOT or <>) Use the EXISTS() function when you need to limit other hierarchies of the same dimension by that member Use the NONEMPTY() function when you need to operate on other dimensions, that is, subcubes created with that new member Use the 3-argument EXISTS() function instead of the NONEMPTY() function if you also want to get combinations with nulls in the corresponding measure group (nulls are available only when the NullProcessing property for a measure is set to Preserve) When you need to filter by values and then count a member in that set, you should consider aggregate functions like SUM() with IIF() part in its expression, as described earlier.  
Read more
  • 0
  • 0
  • 7814

article-image-spam-filtering-natural-language-processing-approach
Packt
08 Mar 2018
16 min read
Save for later

Spam Filtering - Natural Language Processing Approach

Packt
08 Mar 2018
16 min read
In this article, by Jalaj Thanaki, the author of the book Python Natural Language Processing discusses how to develop natural language processing (NLP) application. In this article, we will be developing a spam filtering. In order to develop spam filtering we will be using supervised machine learning (ML) algorithm named logisticregression. You can also use decision tree, NaiveBayes,or support vector machine (SVM).Tomake this happen the following steps will be covered: Understandlogistic regression with MLalgorithm Data collection and exploration Split dataset into training-dataset and testing-dataset (For more resources related to this topic, see here.) Understanding logistic regression ML algorithm Let's understand logistic regression algorithm first.For this classification algorithm, I will give you intuition how logistic regression algorithm works and we will see some basic mathematics related to it. Then we will see the spam filtering application. First we are considering the binary classes like spam or not-spam, good or bad, win or lose, 0 or 1, and so on for understanding the algorithm and its application. Suppose I want to classify emails into spam and non-spam (ham)category so the spam and non-spam are discrete output label or target concept here. Our goal here is that we want to predict that whether the new email is spam or not-spam. Not-spam also known asham. In order to build this NLP application we are going to use logistic regression. Let's step back a while and understand the technicality of algorithm first. Here I'm stating the facts related to mathematics and this algorithm in very simple manner so everyone can understand the logic. General approach for understanding this algorithm is as follows. If you know some part of ML then you can connect your dot and if you are new to ML then don't worry because we are going to understand every part which I will describe as follows: We are defining our hypothesis function which helps us to generate our target output or target concept We are defining the cost function or error function and we choose error function in such a way that we can derive the partial derivate of error function easily so we can calculate gradient descent easily Over the time we are trying to minimize the error so we can generate the more accurate label and classify data accurately In statistics, logistic regression is also called as logitregression or logitmodel. This algorithm is mostly used as binary class classifier that means there should be two different class in which you want to classify the data. The binary logistic model is used to estimate the probability of a binary response and it generates the response based on one or more predictor or independent variables or features. By the way the ML algorithm that basic mathematics concepts used in deep learning (DL) as well. First I want to explain that why this algorithm called logistic regression? The reason is that the algorithm uses logistic function or sigmoid function and that is the reason it called logistic regression. Logistic function or sigmoid function are the synonyms of each other. We use sigmoid function as hypothesis function and this function belongs to the hypothesis class. Now if you want to say thatwhat do you mean by the hypothesis function? well as we have seen earlier that machine has to learn mapping between data attributes and given label in such a way so it can predict the label for new data. This can be achieved by machine if it learns this mapping using mathematical function. So the mathematical function is called hypothesis function,which machine will use to classify the data and predict the labels or target concept. Here, as I said, we want to build binary classifier so our label is either spam or ham. So mathematically I can assign 0 for ham or not-spam and 1 for spam or viceversa as per your choice. These mathematically assigned labels are our dependent variables. Now we need that our output labels should be either zero or one. Mathematically,we can say that label is y and y ∈ {0, 1}. So we need to choose that kind of hypothesis function which convert our output value either in zero or one and logistic function or sigmoid function is exactly doing that and this is the main reason why logistic regression uses sigmoid function as hypothesis function. Logistic or Sigmoid Function Let me provide you the mathematical equation for logistic or sigmoid function. Refer to Figure 1: Figure 1: Logistic or sigmoid function You can see the plot which is showing g(z). Here, g(z)= Φ(z). Refer to Figure 2: Figure 2: Graph of sigmoid or logistic function From the preceding graph you can see following facts:  If you have z value greater than or equal to zero then logistic function gives the output value one.  If you have value of z less than zero then logistic function or sigmoid function generate the output zero. You can see the following mathematical condition for logistic function. Refer to Figure 3:   Figure 3: Logistic function mathematical property Because of the preceding mathematical property, we can use this function to perform binary classification. Now it's time to show the hypothesis function how this sigmoid function will be represented as hypothesis function. Refer to Figure 4: Figure 4: Hypothesis function for logistic regression If we take the preceding equation and substitute the value of z with θTx then equation given in Figure 1gets convertedas following. Refer to Figure 5: Figure 5: Actual hypothesis function after mathematical manipulation Here hθx is the hypothesis function,θT is the matrix of the feature or matrix of the independent variables and transpose representation of it, x is the stand for all independent variables or for all possible feature set. In order to generate the hypothesis equation we replace the z value of logistic function with θTx. By using hypothesis equation machine actually tries to learn mapping between input variables or input features, and output labels. Let's talk a bit about the interpretation of this hypothesis function. Here for logistic regression, can you think what is the best way to predict the class label? Accordingly, we can predict the target class label by using probability concept. We need to generate the probability for both classes and whatever class has high probability we will assign that class label for that particular instance of feature. So in binary classification the value of y or target class is either zero or one. So if you are familiar with probability then you can represent the probability equation as given in Figure 6: Figure 6: Interpretation of hypothesis function using probabilistic representation So those who are not familiar with probability the P(y=1|x;θ) can be read like this. Probability of y =1, given x, and parameterized by θ. In simple language you can say like this hypothesis function will generate the probability value for target output 1 where we give features matrix x and some parameter θ. This seems intuitive concept, so for a while, you can keep all these in your mind. I will later on given you the reason why we need to generate probability as well as let you know how we can generate probability values for each of the class. Here we complete first step of general approach to understand the logistic regression. Cost or Error function for logistic regression First, let's understand what is cost function or the error function? Cost function or lose function, or error function are all the same things. In ML it is very important concept so here we understand definition of cost function and what is the purpose of defining the cost function. Cost function is the function which we use to check how accurate our ML classifier performs. So let me simplify this for you, in our training dataset we have data and we have labels. Now, when we use hypothesis function and generate the output we need to check how much near we are from the actual prediction and if we predict the actual output label then the difference between our hypothesis function output and actual label is zero or minimum and if our hypothesis function output and actual label are not same then we have big difference between them. So suppose if actual label of email is spam which is 1 and our hypothesis function also generate the result 1 then difference between actual target value and predicated output value is zero and therefore error in prediction is also zero and if our predicted output is 1 and actual output is zero then we have maximum error between our actual target concept and prediction. So it is important for us to have minimum error in our predication. This is the very basic concept of error function. We will get in to the mathematics in some minutes. There are several types of error function available like r2 error, sum of squared error, and so on. As per the ML algorithm and as per the hypothesis function our error function also changes. Now I know you wanted to know what will be the error function for logistic regression? and I have put θ in our hypothesis function so you also want to know what is θ and if I need to choose some value of the θ then how can I approach it? So here I will give all answers. Let me give you some background what we used to do in linear regression so it will help you to understand the logistic regression. We generally used sum of squared error or residuals error, or cost function. In linear regression we used to use it. So, just to give you background about sum of squared error. In linear regression we are trying to generate the line of best fit for our dataset so as I stated the example earlier given height I want to predict the weight and in this case we fist draw a line and measure the distance from each of the data point to line. We will square these distance and sum them and try to minimize this error function. Refer to Figure 7: Figure 7: Sum of squared error representation for reference You can see the distance of each data point from the line which is denoted using red line we will take this distance, square them, and sum them. This error function we will use in linear regression. We use this error function and we have generate partial derivative with respect to slop of line m and with respect to intercept b. Every time we calculate error and update the value of m and b so we can generate the line of best fit. The process of updating m and b is called gradient descent. By using gradient descent we update m and b in such a way so our error function has minimum error value and we can generate line of best fit. Gradient descent gives us a direction in which we need to plot a line so we can generate the line of best fit. You can find the detail example in Chapter 9,Deep Learning for NLU and NLG Problems. So by defining error function and generating partial derivatives we can apply gradient descent algorithm which help us to minimize our error or cost function. Now back to the main question which error function can we use for logistic regression? What you think can we use this as sum of squared error function for logistic regression as well? If you know function and calculus very well, then probably your answer is no. That is the correct answer. Let me explain this for those who aren't familiar with function and calculus. This is important so be careful. In linear regression our hypothesis function is linear so it is very easy for us to calculate sum of squared errors but here we are using sigmoid function which is non-linear function if you apply same function which we used in linear regression will not turn out well because if you take sigmoid function and put into the sum of squared error function then and if you try to visualized the all possible values then you will get non-convex curve. Refer to Figure 8: Figure 8: Non-convex with (Image credit: http://www.yuthon.com/images/non-convex_and_convex_function.png) In machine learning we majorly use function which are able to provide convex curve because then we can use gradient descent algorithm to minimize the error function and able to reach at global minimum certainly. As you saw in Figure 8, non-convex curve has many local minimum so in order to reach to global minimum is very challenging and very time consuming because then you need to apply second order or nth order optimization in order to reach to global minimum where in convex curve you can reach to global minimum certainly and fast as well. So if we plug our sigmoid function in sum of squared error then you get the non-convex function so we are not going to define same error function which we use in linear regression. So, we need to define a different cost function which is convex so we can apply gradient descent algorithm and generate global minimum. So here we are using the statistical concept called likelihood. To derive likelihood function we will use the equation of the probability which is given in Figure 6 and we are considering all data points in training set. So we can generate the following equation which is the likelihood function. Refer to Figure 9: Figure 9: likelihood function for logistic regression (Image credit: http://cs229.stanford.edu/notes/cs229-notes1.pdf) Now in order to simplify the derivative process we need to convert the likelihood function into monotonically increasing function which can be achieved by taking natural logarithm of the likelihood function and this is called loglikelihood. This log likelihood is our cost function for logistic regression. See the following equation given in Figure 10: Figure 10: Cost function for logistic regression Here to gain some intuition about the given cost function we will plot it and understand what benefit it provides to us. Here in xaxis we have our hypothesis function. Our hypothesis function range is 0 to 1 so we have these two points on xaxis. Start with the first case where y =1. You can see the generated curve which is on top right hand side in Figure 11: Figure 11: Logistic function cost function graphs If you see any log function plot and then flip that curve because here we have negative sign then you get the same curve as we plot in Figure 11. you can see the log graph as well as flipped graph in Figure 12: Figure 12:comparing log(x) and –log(x) graph for better understanding of cost function (Image credit : http://www.sosmath.com/algebra/logs/log4/log42/log422/gl30.gif) So here we are interested for value 0 and 1 so we are considering that part of the graph which we have depicted in Figure 11. This cost function has some interesting and useful properties. If predict or candidate label is same as the actual target label then cost will be zero so you can put like this if y=1 and hypothesis function predict hθ(x) = 1 then cost is 0 but if hθ(x) tends to 0 means more towards the zero then cost function blows up to ∞. Now you can see for the y = 0 you can see the graph which is on top left hand side inside the Figure 11. This case condition also have same advantages and properties which we have seen earlier. It will go to ∞ when actual value is 0 and hypothesis function predicts 1. If hypothesis function predict 0 and actual target is also 0 then cost =0. As I told you earlier that I will give you reason why we are choosing this cost function then the reason is that this function makes our optimization easy as we are using maximum log likelihood function as we as this function has convex curve which help us to run gradient decent. In order to apply gradient decent we need to generate the partial derivative with respect to θ and we can generate the following equation which is given in Figure 13: Figure 13: Partial derivative for performing gradient descent (Image credit : http://2.bp.blogspot.com) This equation is used for updating the parameter value of θ and α is here define the learning rate. This is the parameter which you can use how fast or how slow your algorithm should learn or train. If you set learning rate too high then algorithm can not learn and if you set it too low then it take lot of time to train. So you need to choose learning rate wisely. Now let's start building the spam filtering application. Data loading and exploration To build the spam filtering application we need dataset. Here we are using small size dataset. This dataset is simply straight forward. This dataset has two attribute. The first attribute is the label and second attribute is the text content of the email. Let's discuss more about the first attribute. Here the presence of label make this dataset a tagged data. This label indicated that the email content is belong to thespam category or ham category. Let's jump into the practical part. Here we are using numpy, pandas, andscikit-learnas dependency libraries. So let's explore or dataset first.We read dataset using pandas library.I have also checked how many total data records we have and basic details of the dataset. Once we load data,we will check its first ten records and then we will replace the spam and ham categories with number. As we have seen that machine can understand numerical format only so here all labels ham is converted into 0 and all labels spam is converted into 1.Refer to Figure 14: Figure 14: Code snippet for converting labels into numerical format Split dataset intotrainingdataset and testingdataset In this part we divide our dataset into two parts one part is called training set and other part is called testing set. Refer to Figure 15: Figure 15: Code snippet for dividing dataset into trainingdataset and testingdataset We are dividing dataset into two partsbecause we will perform training by using our trainingdataset and one our ML algorithm trained on that dataset and generate ML-model after that we will use generated ML-model and feed testing into that model as result our ML-model will generate the prediction. Based on that result we evaluate out ML-model Summary Resources for Article:   Further resources on this subject: [article] [article] [article]
Read more
  • 0
  • 0
  • 7810
article-image-bug-tracking
Packt
04 Jan 2017
11 min read
Save for later

Bug Tracking

Packt
04 Jan 2017
11 min read
In this article by Eduardo Freitas, the author of the book Building Bots with Node.js, we will learn about Internet Relay Chat (IRC). It enables us to communicate in real time in the form of text. This chat runs on a TCP protocol in a client server model. IRC supports group messaging which is called as channels and also supports private message. (For more resources related to this topic, see here.) IRC is organized into many networks with different audiences. IRC being a client server, users need IRC clients to connect to IRC servers. IRC Client software comes as a packaged software as well as web based clients. Some browsers are also providing IRC clients as add-ons. Users can either install on their systems and then can be used to connect to IRC servers or networks. While connecting these IRC Servers, users will have to provide unique nick or nickname and choose existing channel for communication or users can start a new channel while connecting to these servers. In this article, we are going to develop one of such IRC bots for bug tracking purpose. This bug tracking bot will provide information about bugs as well as details about a particular bug. All this will be done seamlessly within IRC channels itself. It's going to be one window operations for a team when it comes to knowing about their bugs or defects. Great!! IRC Client and server As mentioned in introduction, to initiate an IRC communication, we need an IRC Client and Server or a Network to which our client will be connected. We will be using freenode network for our client to connect to. Freenode is the largest free and open source software focused IRC network. IRC Web-based Client I will be using IRC web based client using URL(https://webchat.freenode.net/). After opening the URL, you will see the following screen, As mentioned earlier, while connecting, we need to provide Nickname: and Channels:. I have provided Nickname: as Madan and at Channels: as #BugsChannel. In IRC, channels are always identified with #, so I provided # for my bugs channel. This is the new channel that we will be starting for communication. All the developers or team members can similarly provide their nicknames and this channel name to join for communication. Now let's ensure Humanity: by selecting I'm not a robot and click button Connect. Once connected, you will see the following screen. With this, our IRC client is connected to freenode network. You can also see username on right hand side as @Madan within this #BugsChannel. Whoever is joining this channel using this channel name and a network, will be shown on right hand side. In the next article, we will ask our bot to join this channel and the same network and will see how it appears within the channel. IRC bots IRC bot is a program which connects to IRC as one of the clients and appears as one of the users in IRC channels. These IRC bots are used for providing IRC Services or to host chat based custom implementations which will help teams for efficient collaboration. Creating our first IRC bot using IRC and NodeJS Let's start by creating a folder in our local drive in order to store our bot program from the command prompt. mkdir ircbot cd ircbot Assuming we have Node.js and NPM installed and let's create and initialize our package.json, which will store our bot's dependencies and definitions. npm init Once you go through the npm init options (which are very easy to follow), you'll see something similar to this. On your project folder you'll see the result which is your package.json file. Let's install irc package from NPM. This can be located at https://www.npmjs.com/package/irc. In order to install it, run this npm command. npm install –-save irc You should then see something similar to this. Having done this, the next thing to do is to update your package.json in order to include the "engines" attribute. Open with a text editor the package.json file and update it as follows. "engines": { "node": ">=5.6.0" } Your package.json should then look like this. Let's create our app.js file which will be the entry point to our bot as mentioned while setting up our node package. Our app.js should like this. var irc = require('irc'); var client = new irc.Client('irc.freenode.net', 'BugTrackerIRCBot', { autoConnect: false }); client.connect(5, function(serverReply) { console.log("Connected!n", serverReply); client.join('#BugsChannel', function(input) { console.log("Joined #BugsChannel"); client.say('#BugsChannel', "Hi, there. I am an IRC Bot which track bugs or defects for your team.n I can help you using following commands.n BUGREPORT n BUG # <BUG. NO>"); }); }); Now let's run our Node.js program and at first see how our console looks. If everything works well, our console should show our bot as connected to the required network and also joined a channel. Console can be seen as the following, Now if you look at our channel #BugsChannel in our web client, you should see our bot has joined and also sent a welcome message as well. Refer the following screen: If you look at the the preceding screen, our bot program got has executed successfully. Our bot BugTrackerIRCBot has joined the channel #BugsChannel and also bot sent an introduction message to all whoever is on channel. If you look at the right side of the screen under usernames, we are seeing BugTrackerIRCBot below @Madan Code understanding of our basic bot After seeing how our bot looks in IRC client, let's look at basic code implementation from app.js. We used irc library with the following lines, var irc = require('irc'); Using irc library, we instantiated client to connect one of the IRC networks using the following code snippet, var client = new irc.Client('irc.freenode.net', 'BugTrackerIRCBot', { autoConnect: false }); Here we connected to network irc.freenode.net and provided a nickname as BugTrackerIRCBot. This name has been given as I would like my bot to track and report the bugs in future. Now we ask client to connect and join a specific channel using the following code snippet, client.connect(5, function(serverReply) { console.log("Connected!n", serverReply); client.join('#BugsChannel', function(input) { console.log("Joined #BugsChannel"); client.say('#BugsChannel', "Hi, there. I am an IRC Bot which track bugs or defects for your team.n I can help you using following commands.n BUGREPORT n BUG # <BUG. NO>"); }); }); In preceeding code snippet, once client is connected, we get reply from server. This reply we are showing on a console. Once successfully connected, we ask bot to join a channel using the following code lines: client.join('#BugsChannel', function(input) { Remember, #BugsChannel is where we have joined from web client at the start. Now using client.join(), I am asking my bot to join the same channel. Once bot is joined, bot is saying a welcome message in the same channel using function client.say(). Hope this has given some basic understanding of our bot and it's code implementations. In the next article, we will enhance our bot so that our teams can have effective communication experience while chatting itself. Enhancing our BugTrackerIRCBot Having built a very basic IRC bot, let's enhance our BugTrackerIRCBot. As developers, we always would like to know how our programs or a system is functioning. To do this typically our testing teams carry out testing of a system or a program and log their bugs or defects into a bug tracking software or a system. We developers later can take a look at those bugs and address them as a part of our development life cycle. During this journey, developers will collaborate and communicate over messaging platforms like IRC. We would like to provide unique experience during their development by leveraging IRC bots. So here is what exactly we are doing. We are creating a channel for communication all the team members will be joined and our bot will also be there. In this channel, bugs will be reported and communicated based on developers' request. Also if developers need some additional information about a bug, chat bot can also help them by providing a URL from the bug tracking system. Awesome!! But before going in to details, let me summarize using the following steps about how we are going to do this, Enhance our basic bot program for more conversational experience Bug tracking system or bug storage where bugs will be stored and tracked for developers Here we mentioned about bug storage system. In this article, I would like to explain DocumentDB which is a NoSQL JSON based cloud storage system. What is DocumentDB? I have already explained NoSQLs. DocumentDB is also one of such NoSQLs where data is stored in JSON documents and offered by Microsoft Azure platform. Details of DocumentDB can be referred from (https://azure.microsoft.com/en-in/services/documentdb/) Setting up a DocumentDB for our BugTrackerIRCBot Assuming you already have a Microsoft Azure subscription follow these steps to configure DocumentDB for your bot. Create account ID for DocumentDB Let's create a new account called botdb using the following screenshot from Azure portal. Select NoSQL API as of DocumentDB. Select appropriate subscription and resources. I am using existing resources for this account. You can also create a new dedicated resource for this account. Once you enter all the required information, hit Create button at the bottom to create new account for DocumentDB. Newly created account botdb can be seen as the following, Create collection and database Select a botdb account from account lists shown precedingly. This will show various menu options like Properties, Settings, Collections etc. Under this account we need to create a collection to store bugs data. To create a new collection, click on Add Collection option as shown in the following screenshot, On click of Add Collection option, following screen will be shown on right side of the screen. Please enter the details as shown in the following screenshot: In the preceding screen, we are creating a new database along with our new collection Bugs. This new database will be named as BugDB. Once this database is created, we can add other bugs related collections in future in the same database. This can be done in future using option Use existing from the preceding screen. Once you enter all the relevant data, click OK to create database as well as collection. Refer the following screenshot: From the preceding screen, COLLECTION ID and DATABASE shown will be used during enhancing our bot. Create data for our BugTrackerIRCBot Now we have BugsDB with Bugs collection which will hold all the data for bugs. Let's add some data into our collection. To add a data let's use menu option Document Explorer shown in the following screenshot: This will open up a screen showing list of Databases and Collections created so far. Select our database as BugDB and collection as Bugs from the available list. Refer the following screenshot: To create a JSON document for our Bugs collection, click on Create option. This will open up a New Document screen to enter JSON based data. Please enter a data as per the following screenshot: We will be storing id, status, title, description, priority,assignedto, url attributes for our single bug document which will get stored in Bugs collection. To save JOSN document in our collection click Save button. Refer the following screenshot: This way we can create sample records in bugs collection which will be later wired up in NodeJS program. Sample list of bugs can be seen in the following screenshot: Summary Every development team needs bug tracking and reporting tools. There are typical needs of bug reporting and bug assignment. In case of critical projects these needs become also very critical for project timelines. This article showed us how we can provide a seamless experience to developers while they are communicating with peers within a channel. To summarize so far, we understood how to use DocumentDB from Microsoft Azure. Using DocumentDB, we created a new collection along with new database to store bugs data. We also added some sample JSON documents in Bugs collection. In today's world of collaboration, development teams who would be using such integrations and automations would be efficient and effective while delivering their quality products. Resources for Article: Further resources on this subject: Talking to Bot using Browser [article] Asynchronous Control Flow Patterns with ES2015 and beyond [article] Basic Website using Node.js and MySQL database [article]
Read more
  • 0
  • 0
  • 7809

article-image-introducing-android-platform
Packt
12 Sep 2013
9 min read
Save for later

Introducing an Android platform

Packt
12 Sep 2013
9 min read
(For more resources related to this topic, see here.) Introducing an Android app Mobile software application that runs on Android is an Android app. The apps use the extension of .apk as the installer file extension. There are several popular examples of mobile apps, such as Foursquare, Angry Birds, and Fruit Ninja. Primarily in an Eclipse environment, we use Java, which is then compiled into Dalvik bytecode (not the ordinary Java bytecode). Android provides Dalvik virtual machine (DVM) inside Android (not Java virtual machine JVM). Dalvik VM does not ally with Java SE and Java ME libraries, and is built on an Apache Harmony java implementation. What is Dalvik virtual machine? Dalvik VM is a register-based architecture, authored by Dan Bornstein. It is being optimized for low memory requirements, and the virtual machine was slimmed down to use less space and less power consumption. Preparing for Android development Eclipse ADT In this part of the article, we will see how to install the development environment for Android on Eclipse Juno (4.2). Eclipse is a major IDE for Android development. We need to install an Eclipse extension ADT Android Development Toolkit (ADT) for development of the Android application. Debugging an Android project It is advisable to use the Log class for this purpose, the reason being we can filter, print different colors, and define log types. This could be one of the ways of debugging your program, by displaying variables value or parameters. To use Log, import android.util.Log, and use one the following methods to print messages to LogCat: v(String, String) (verbose) d(String, String) (debug) i(String, String) (information) w(String, String) (warning) e(String, String) (error) LogCat is used to view the internal log of the Android system. It is useful to trace any activity happening inside the device or emulator through the Android Debug Bridge (ADB). The Android project structure The following table illustrates the brief description of the important folders and files available in an Android project: Folder Functions /src The Java codes are placed in this folder. /gen It is generated automatically. /assets You can put your fonts, videos, and sounds here. It is more like a filesystem, and can also place CSS, JavaScript files, and so on. /libs It is an external library (normally in JAR). /res It contains images, layout, and global variables. /drawable-xhdpi It is used for extra high specification devices (for example, Tablet, Galaxy SIII, HTC One X). /drawable-hdpi  It is used for high specification phones (for example, SGSI, SGSII) /drawable-mdpi It is used for medium specification phones (for example, Galaxy W and HTC Desire). /drawable-ldpi  It is used for low specification phones (for example: Galaxy Y and HTC WildFire). /layout  It includes all the XML file for the screen(s) layout. /menu XML files for screen menu. /values It includes global constants. /values-v11 These are template styles definitions for devices with Honeycomb (Android API level 11). /values-v14 These are template styles definitions for devices with ICS (Android API level 14). AndroidManifest.xml This is one of important files to define the apps. This is the first file located by the Android OS in order to run the app. It contains the app's properties, activity declarations, and list of permissions.   Dalvik Debug Monitor Server (DDMS) DDMS is a must have tool to view the emulator/device activities. To access DDMS in the Eclipse, navigate to Windows | Open Perspective | Other, and then choose DDMS. By default it is available in the Android SDK (it's inside the folder android-sdk/tools by the file ddms). From this perspective the following aspects are available: Devices: The list of the devices and AVD that are connected to ADB Emulator control: It helps to carry out device functions LogCat: It views real-time system log messages Threads: It gives an idea of currently running threads within a VM Heap: It shows heap usage by application Allocation tracker: It provides information on memory allocation of objects File explorer: It explores the device filesystem Creating a new Android project using Eclipse ADT To create a new Android project in Eclipse, navigate to File | New | Project. A new project window will appear, from there choose Android | Android Application Project from the list. Then click on the Next button. Application Name: This is the name of your application, it will appear side-by-side to the launcher icon. Choose a project name that relevant to your application. Project Name: This is typically similar to your application name. Avoid having the same name with existing projects in Eclipse, it is not permitted. Package Name: This is the package name of the application. It will act as an ID in the Google Play app store if we wish to publish. Typically it will be the reverse of your domain name if we have one (since this is unique) followed by the application name, and a valid Java package name else we can have anything now and refactor it before publishing. Running the application on an Android device To run and deploy on real device, first install the driver of the device. This varies as per device model and manufacturer. These are a few links you could refer to: For Google Android devices refer to http://developer.android.com/sdk/win-usb.html. For others refer to http://www.teamandroid.com/download-android-usb-drivers/. Make sure the Android phone is connected to the computer through the USB cable. To check whether the phone is properly connected to your PC and in debug mode, please switch to the DDMS perspective. Adding multiple activity in Android application This exercise is to add an information screen on the SimpleNumb3r5 app. The information regarding the developer, e-mail, Facebook fan page, and other information is displayed. Since the screen contains a lot of text information including several pictures, so we make use of an HTML page as our approach here: Create an activity class to handle the new screen. Open the src folder, right-click on the package name (net.kerul.SimpleNumb3r5), and choose New | Other... from the selections, choose to add a new Android activity, and click on the Next button. Then, choose a blank activity and click on Next. Set the activity name as Info, and the wizard will suggest the screen layout as info_activity. Click on the Finish button. Adding the RadioGroup or RadioButton controls Android SDK provides two types of radio controls to be used in conjunction, where only one control can be chosen at a given time. RadioGroup (Android widget RadioGroup) is used to encapsulate a set of RadioButton controls for this purpose. Defining the preference screen Preferences are an important aspect of the Android applications. It allows users to have the choice to modify and personalize it. Preferences can be set in two ways: first method is to create the preferences.xml file in the res/xml directory, and second method is to set the preferences from the code. We will use the former also the easier one, by creating the preferences.xml file. Usually, there are five different preference views as listed in the following table: Views Description CheckBoxPreference It is a simple checkbox which returns true/false ListPreference It shows RadioGroup, where only 1 item can be selected EditTextPreference It shows dialog box to edit TextView, and returns String RingTonePreference It is a radioGroup that shows ringtone PreferenceCategory It is a category with preferences Fragment A fragment is an independent component that can be connected to an Activity or simply is subactivity. Typically it defines a part of UI but can also exist with no user interface, that is, headless. An instance of fragment must exist within an activity. Fragments ease the reuse of components for different layouts. Fragments are the way to support UI variances across different types of screens. The most popular use is of building single pane layouts for phone and multipane layouts for tablets (large screens). Adding an external library Android project – AdMob An Android application cannot achieve everything on its own, it will always need the company of external jars/libraries to achieve different goals and serve various purposes. Almost every free Android application published on store has advertisement embedded in it, which makes use of external component to achieve it. Incorporating advertisement in the Android application is a vital aspect of today's application development. In this article, we will continue on our DistanceConverter application, and make use of the external library, AdMob to incorporate advertisement in our application. Adding the AdMob SDK to the project Let's extract the previously downloaded AdMob SDK zip file, and we should get the folder GoogleAdMobAdsSdkAndroid-6.*.*, under that folder there is GoogleAdMobAdsSdk-6.x.x.jar. You should copy this JAR file in the libs folder of the project. Signing and distributing APK The Android package (APK), in simple terms is similar to the runnable JAR or executable file (on Windows OS) which consists of everything that is needed to run the application. The Android ecosystem uses a virtual machine, that is, Dalvik virtual machine (DVM) to run the Java applications. Dalvik uses its own bytecode which is quite different from the Java bytecode. Generating a private key An android application must be signed with our own private key. It identifies a person, corporate, or entity associated with the application. This can be generated using the program keytool from the Java SDK. The following command is used for generating the key: keytool -genkey -v -keystore <filename>.keystore -alias <key-name> -keyalg RSA -keysize 2048 -validity 10000 We can use different key for each published application, and specify different name to identify it. Also, Google expects validity of at least 25 years or more. A very important thing to consider is to keep back up and securely store key, because once it is compromised it impossible to update an already published application. Publishing to Google Play Publishing at Google Play is very simple and involves register for Google play. You just have to visit and register it at https://play.google.com/. It requires $25 USD to register, and is fairly straight and can take a few days until you get the final access. Summary In this article, we learned how to install the Eclipse Juno (the IDE), the Android SDK and the testing platform. Also, we learned about the fragment and its usage, and used it to have different layouts for landscape mode for our application DistanceConverter. We also learned about handling different screen types and persisting state during screen mode changes. Resources for Article: Further resources on this subject: Installing Alfresco Software Development Kit (SDK) [Article] JBoss AS plug-in and the Eclipse Web Tools Platform [Article] Creating a pop-up menu [Article]
Read more
  • 0
  • 0
  • 7809

article-image-creating-and-using-composer-packages
Packt
29 Oct 2013
7 min read
Save for later

Creating and Using Composer Packages

Packt
29 Oct 2013
7 min read
(For more resources related to this topic, see here.) Using Bundles One of the great features in Laravel is the ease in which we can include the class libraries that others have made using bundles. On the Laravel site, there are already many useful bundles, some of which automate certain tasks while others easily integrate with third-party APIs. A recent addition to the PHP world is Composer, which allows us to use libraries (or packages) that aren't specific to Laravel. In this article, we'll get up-and-running with using bundles, and we'll even create our own bundle that others can download. We'll also see how to incorporate Composer into our Laravel installation to open up a wide range of PHP libraries that we can use in our application. Downloading and installing packages One of the best features of Laravel is how modular it is. Most of the framework is built using libraries, or packages, that are well tested and widely used in other projects. By using Composer for dependency management, we can easily include other packages and seamlessly integrate them into our Laravel app. For this recipe, we'll be installing two popular packages into our app: Jeffrey Way's Laravel 4 Generators and the Imagine image processing packages. Getting ready For this recipe, we need a standard installation of Laravel using Composer. How to do it... For this recipe, we will follow these steps: Go to https://packagist.org/. In the search box, search for way generator as shown in the following screenshot: Click on the link for way/generators : View the details at https://packagist.org/packages/way/generators and take notice of the require line to get the package's version. For our purposes, we'll use "way/generators": "1.0.*" . In our application's root directory, open up the composer.json file and add in the package to the require section so it looks like this: "require": { "laravel/framework": "4.0.*", "way/generators": "1.0.*" }, Go back to http://packagist.org and perform a search for imagine as shown in the following screenshot: Click on the link to imagine/imagine and copy the require code for dev-master : Go back to our composer.json file and update the require section to include the imagine package . It should now look similar to the following code: "require": { "laravel/framework": "4.0.*", "way/generators": "1.0.*", "imagine/imagine": "dev-master" }, Open the command line, and in the root of our application, run the Composer update as follows: php composer.phar update Finally, we'll add the Generator Service Provider, so open the app/config/app.php file and in the providers array, add the following line: 'WayGeneratorsGeneratorsServiceProvider' How it works... To get our package, we first go to packagist.org and search for the package we want. We could also click on the Browse packages link. It will display a list of the most recent packages as well as the most popular. After clicking on the package we want, we'll be taken to the detail page, which lists various links including the package's repository and home page. We could also click on the package's maintainer link to see other packages they have released. Underneath, we'll see the various versions of the package. If we open that version's detail page, we'll find the code we need to use for our composer.json file. We could either choose to use a strict version number, add a wildcard to the version, or use dev-master, which will install whatever is updated on the package's master branch. For the Generators package, we'll only use Version 1.0, but allow any minor fixes to that version. For the imagine package, we'll use dev-master, so whatever is in their repository's master branch will be downloaded, regardless of version number. We then run update on Composer and it will automatically download and install all of the packages we chose. Finally, to use Generators in our app, we need to register the service provider in our app's config file. Using the Generators package to set up an app Generators is a popular Laravel package that automates quite a bit of file creation. In addition to controllers and models, it can also generate views, migrations, seeds, and more, all through a command-line interface. Getting ready For this recipe, we'll be using the Laravel 4 Generators package maintained by Jeffrey Way that was installed in the Downloading and installing packages recipe. We'll also need a properly configured MySQL database. How to do it… Follow these steps for this recipe: Open the command line in the root of our app and, using the generator, create a scaffold for our cities as follows: php artisan generate:scaffold cities --fields="city:string" In the command line, create a scaffold for our superheroes as follows: php artisan generate:scaffold superheroes --fields="name:string, city_id:integer:unsigned" In our project, look in the app/database/seeds directory and find a file named CitiesTableSeeder.php. Open it and add some data to the $cities array as follows: <?php class CitiesTableSeeder extends Seeder { public function run() { DB::table('cities')->delete(); $cities = array( array( 'id' => 1, 'city' => 'New York', 'created_at' => date('Y-m-d g:i:s',time()) ), array( 'id' => 2, 'city' => 'Metropolis', 'created_at' => date('Y-m-d g:i:s',time()) ), array( 'id' => 3, 'city' => 'Gotham', 'created_at' => date('Y-m-d g:i:s',time()) ) ); DB::table('cities')->insert($cities); } } In the app/database/seeds directory, open SuperheroesTableSeeder.php and add some data to it: <?php class SuperheroesTableSeeder extends Seeder { public function run() { DB::table('superheroes')->delete(); $superheroes = array( array( 'name' => 'Spiderman', 'city_id' => 1, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'Superman', 'city_id' => 2, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'Batman', 'city_id' => 3, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'The Thing', 'city_id' => 1, 'created_at' => date('Y-m-d g:i:s', time()) ) ); DB::table('superheroes')->insert($superheroes); } } In the command line, run the migration then seed the database as follows: php artisan migrate php artisan db:seed Open up a web browser and go to http://{your-server}/cities. We will see our data as shown in the following screenshot: Now, navigate to http://{your-server}/superheroes and we will see our data as shown in the following screenshot: How it works... We begin by running the scaffold generator for our cities and superheroes tables. Using the --fields tag, we can determine which columns we want in our table and also set options such as data type. For our cities table, we'll only need the name of the city. For our superheroes table, we'll want the name of the hero as well as the ID of the city where they live. When we run the generator, many files will automatically be created for us. For example, with cities, we'll get City.php in our models, CitiesController.php in controllers, and a cities directory in our views with the index, show, create, and edit views. We then get a migration named Create_cities_table.php, a CitiesTableSeeder.php seed file, and CitiesTest.php in our tests directory. We'll also have our DatabaseSeeder.php file and our routes.php file updated to include everything we need. To add some data to our tables, we opened the CitiesTableSeeder.php file and updated our $cities array with arrays that represent each row we want to add. We did the same thing for our SuperheroesTableSeeder.php file. Finally, we run the migrations and seeder and our database will be created and all the data will be inserted. The Generators package has already created the views and controllers we need to manipulate the data, so we can easily go to our browser and see all of our data. We can also create new rows, update existing rows, and delete rows.
Read more
  • 0
  • 0
  • 7807
article-image-using-resources
Packt
06 Oct 2015
6 min read
Save for later

Using Resources

Packt
06 Oct 2015
6 min read
In this article written by Mathieu Nayrolles, author of the book Xamarin Studio for Android Programming: A C# Cookbook, wants us to learn of how to play sound and how to play a movie by using an user-interactive—press on button in a programmative way. (For more resources related to this topic, see here.) Playing sound There are an infinite number of occasions in which you want your applications to play sound. In this section, we will learn how to play a song from our application in a user-interactive—press on a button—or programmative way. Getting ready For using this you need to create a project on your own: How to do it… Add the following using parameter to your MainActivity.cs file. using Android.Media; Create a class variable named _myPlayer of the MediaPlayer class: MediaPlayer _myPlayer; Create a subfolder named raw under Resources. Place the sound you wish to play inside the newly created folder. We will use the Mario theme, free of rights for non-commercial use, downloaded from http://mp3skull.com. Add the following lines at the end of the OnCreate() method of your MainActivity. _myPlayer = MediaPlayer.Create (this, Resource.Raw.mario); Button button = FindViewById<Button> (Resource.Id.myButton); button.Click += delegate { _myPlayer.Start(); }; In the preceding code sample, the first line creates an instance of the MediaPlayer using the this statement as Context and Resource.Raw.mario as the file to play with this MediaPlayer. The rest of the code is trivial, we just acquired a reference to the button and create a behavior for the OnClick() event of the button. In this event, we call the Start() method of the _myPlayer() variable. Run your application and press on the button as shown on the following screenshot: You should hear the Mario theme playing right after you pressed the button, even if you are running the application on the emulator. How it works... Playing sound (and video) is an activity handled by the Media Player class of the Android platform. This class involves some serious implementations and a multitude of states in the same way as activities. However, as an Android Applications Developer and not as an Android Platform Developer we only require a little background on this. The Android multimedia framework includes—thought the Media Player class—a support for playing a very large variety of media such MP3 from the filesystem or from the Internet. Also, you can only play a song on the current sound device which could be the phone speakers, headset or even a Bluetooth enabled speaker. In other words, even if there are many sound outputs available on the phone, the current default settled by the user is the one where your sound will be played. Finally, you cannot play a sound during a call. There's more... Playing sound which is not stored locally Obviously, you may want to play sounds that are not stored locally—in the raw folder—but anywhere else on the phone, like SDcard or so. To do it, you have to use the following code sample: Uri myUri = new Uri ("uriString"); _myPlayer = new MediaPlayer(); _myPlayer.SetAudioStreamType (AudioManager.UseDefaultStreamType); _myPlayer.SetDataSource(myUri); _myPlayer.Prepare(); The first line defines a Uri for the targeting file to play. The three following lines set the StreamType, the Uri prepares the MediaPlayer. The Prepare() is a method which prepares the player for playback in a synchrone manner. Meaning that this instruction blocks the program until the player is ready to play, until the player has loaded the file. You could also call the PrepareAsync() which returns immediately and performs the loading in an asynchronous way. Playing online sound Using a code very similar to the one required to play sounds stored somewhere on the phone, we could play sound from the Internet. Basically, we just have to replace the Uri parameter by some HTTP address just like the following: String url = "http://myWebsite/mario.mp3"; _myPlayer = new MediaPlayer(); _myPlayer.SetAudioStreamType (AudioManager.UseDefaultStreamType); _myPlayer.SetDataSource(url); _myPlayer.Prepare(); Also, you must request the permission to access the Internet with your application. This is done in the manifest by adding an <uses-permission> tag for your application as shown by the following code sample: <application android_icon="@drawable/Icon" android_label="Splash"> <uses-permission android_name="android.permission.INTERNET" /> </application> See Also See also the next recipe for playing video. Playing a movie As a final recipe in this article, we will see how to play a movie with your Android application. Playing video, unlike playing audio, involves some special views for displaying it to the users. Getting ready For the last time, we will reuse the same project, and more specifically, we will play a Mario video under the button for playing the Mario theme seen in the previous recipe. How to do it... Add the following code to your Main.axml under Layout file: <VideoView android_id="@+id/myVideoView" android_layout_width="fill_parent" android_layout_height="fill_parent"> </VideoView> As a result, the content of your Main.axml file should look like the following screenshot: Add the following code to the MainActivity.cs file in the OnCreate() method: var videoView = FindViewById<VideoView> (Resource.Id.SampleVideoView); var uri = Android.Net.Uri.Parse ("url of your video"); videoView.SetVideoURI (uri); Finally, invoke the videoView.Start() method. Note that playing Internet based video, even short one, will take a very long time as the video need to be fully loaded while using this technique. How it works... Playing video should involve some special view to display it to users. This special view is named the VideoView tag and should be used as same as simple TextView tag. <VideoView android_id="@+id/myVideoView" android_layout_width="fill_parent" android_layout_height="fill_parent"> </VideoView> As you can see in the preceding code sample, you can apply the same parameters to VideoView tag as TextView tag such as layout based options. The VideoView tag, like the MediaPlayer for audio, have a method to set the video URI named SetVideoURI and another one to start the video named Start();. Summary In this article, we've learned how to play a sound clip as well as video on the Android application which we've developed. Resources for Article: Further resources on this subject: Heads up to MvvmCross [article] XamChat – a Cross-platform App [article] Gesture [article]
Read more
  • 0
  • 0
  • 7799

article-image-microsoft-sql-server-2008-r2-master-data-services-overview
Packt
19 Jul 2011
5 min read
Save for later

Microsoft SQL Server 2008 R2 Master Data Services Overview

Packt
19 Jul 2011
5 min read
Microsoft SQL Server 2008 R2 Master Data Services Master Data Services overview Master Data Services is Microsoft's Master Data Management product that ships with SQL Server 2008 R2. Much like other parts of SQL Server, such as Analysis Services (SSAS) or Reporting Services (SSRS), MDS doesn't get installed along with the database engine, but is a separate product in its own right. Unlike SSAS or SSRS, it's worth noting that MDS is only available in the Enterprise and Data Centre Editions of SQL Server, and that the server must be 64-bit. MDS is a product that has grown from acquisition, as it is based on the +EDM product that Microsoft obtained when they bought the Atlanta-based company called Stratature in 2007. A great deal of work has been carried out since the acquisition, including changing the user interface and putting in a web service layer. At a high level, the new product has the following features: Entity maintenance—MDS supports data stewardship by allowing users to add, edit, and delete members. The tool is not specific to a particular industry or area, but instead is generic enough to work across a variety of subject domains. Modeling capability—MDS contains interfaces that allow administrative users to create data models to hold entity members. Hierarchy management—Relationships between members can be utilized to produce hierarchies that users can alter. Version management—Copies of entity data and related metadata can be archived to create an entirely separate version of the data. Business rules and workflow—A comprehensive business rules engine is included in order to enforce data quality and assist with data stewardship via workflow. Alerts can be sent to users using e-mail when the business rules encounter a particular condition. Security—A granular security model is included, where it is possible, for example, to prevent a given user from accessing certain entities, attributes, and members. Master Data Services architecture Technically, Master Data Services consists of the following components: SQL Server Database—The database holds the entities such as Customer or Product, whether they are imported from other systems or created in MDS. Master Data Manager—A web-based data stewardship and administration portal that amongst many other features allows data stewards to add, edit, and delete entity members. Web Service Layer—All calls to the database from the front-end go through a WCF (Windows Communication Foundation) web service. Internet Information Services (IIS) is used to host the web services and the Master Data Manager application. Workflow Integration Service—A Windows service that acts as a broker between MDS and SharePoint in order to allow MDS business rules to use SharePoint workflows. Configuration Manager—A windows application that allows key settings to be altered by an administrator. The following diagram shows how the components interact with one another: MDS SQL Server database The MDS database uses a mix of components in order to be the master data store and to support the functionality found in the Master Data Manager, including stored procedures, views, and functions. Separate tables are created both for entities and for their supporting objects, all of which happens on the fly when a new object gets created in Master Data Manager. The data for all entities across all subject areas are stored in the same database, meaning that the database could get quite big if several subject domains are being managed. The tables themselves are created with a code name. For example, on my local installation, the Product entity is not stored in a table called "Product" as you might expect, but in a table called "tbl_2_10_EN". Locating entity data The exact table that contains the data for a particular entity can be found by writing a select statement against the view called viw_SYSTEM_SCHEMA_ENTITY in the mdm schema. As well as containing a number of standard SQL Server table-valued and scalar functions, the MDS database also contains a handful of .Net CLR (Common Language Runtime)-based functions, which can be found in the mdq schema. The functions utilize the Microsoft.MasterDataServices.DataQuality assembly and are used to assist with data quality and the merging, de-duplication, and survivorship exercises that are often required in a master data management solution. Some of the actions in MDS, such as the e-mail alerts or loading of large amounts of data, need to happen in an asynchronous manner. Service Broker is utilized to allow users to continue to use the front-end without having to wait for long running processes to complete. Although strictly outside the MDS database, SQL Server Database Mail, which resides in the system msdb database, is used as the mechanism to send e-mail alerts to subscribing users. In addition to the tables that hold the master data entities, the MDS database also contains a set of staging tables that should be used when importing data into MDS. Once the staging tables have been populated correctly, the staged data can be loaded into the master data store in a single batch. Internet Information Services (IIS) During the initial configuration of MDS, an IIS Web Application will get created within an IIS website. The name of the application that gets created is called "MDS", although this can be over-ridden if needed. The Web Application contains a Virtual Directory that points to the physical path of <Drive>:<install location>WebApplication, where the various components of the Master Data Manager application are stored. The MDS WCF Service called Service.svc is also located in the same directory. The service can be exposed in order to provide a unified access point to any MDS functionality (for example, creating an entity or member, retrieving all entity members) that is needed by other applications. Master Data Manager connects to the WCF service, which then connects to the database, so this is the route that should be taken by other applications, instead of connecting to the database directly.
Read more
  • 0
  • 0
  • 7784
Modal Close icon
Modal Close icon