How-To Tutorials

04 Apr 2016

12 min read

Proxmox VE Fundamentals

04 Apr 2016

In this article written by Rik Goldman author of the book Learning Proxmox VE, we introduce to you Proxmox Virtual Environment (PVE) which is a mature, complete, well-supported, enterprise-class virtualization environment for servers. It is an open source tool—based in the Debian GNU/Linux distribution—that manages containers, virtual machines, storage, virtualized networks, and high-availability clustering through a well-designed, web-based interface or via the command-line interface. (For more resources related to this topic, see here.) Developers provided the first stable release of Proxmox VE in 2008; 4 years and eight point releases later, ZDNet's Ken Hess boldly, but quite sensibly, declared Proxmox VE as Proxmox: The Ultimate Hypervisor (http://www.zdnet.com/article/proxmox-the-ultimate-hypervisor/). Four years later, PVE is on version 4.1, in use by at least 90,000 hosts, and more than 500 commercial customers in 140 countries; the web-based administrative interface itself is translated into nineteen languages. This article will explore the fundamental technologies underlying PVE's hypervisor features: LXC, KVM, and QEMU. To do so, we will develop a working understanding of virtual machines, containers, and their appropriate use. We will cover the following topics: Proxmox VE in brief Virtualization and containerization with PVE Proxmox VE virtual machines, KVM, and QEMU Containerization with PVE and LXC Proxmox VE in brief With Proxmox VE, Proxmox Server Solutions GmbH (https://www.proxmox.com/en/about) provides us with an enterprise-ready, open source type II hypervisor. Later, you'll find some of the features that make Proxmox VE such a strong enterprise candidate. The license for Proxmox VE is very deliberately the GNU Affero General Public License (V3) (https://www.gnu.org/licenses/agpl-3.0.html). From among the many free and open source compatible licenses available, this is a significant choice because it is "specifically designed to ensure cooperation with the community in the case of network server software." PVE is primarily administered from an integrated web interface or from the command line locally or via SSH. Consequently, there is no need for a separate management server and the associated expenditure. In this way, Proxmox VE significantly contrasts with alternative enterprise virtualization solutions by vendors such as VMware. Proxmox VE instances/nodes can be incorporated into PVE clusters, and centrally administered from a unified web interface. Proxmox VE provides for live migration—the movement of a virtual machine or container from one cluster node to another without any disruption of services. This is a rather unique feature to PVE and not common to competing products. Features Proxmox VE VMware vSphere Hardware requirements Flexible Strict compliance with HCL Integrated management interface Web- and shell-based (browser and SSH) No. Requires dedicated management server at additional cost Simple subscription structure Yes; based on number of premium support tickets per year and CPU socket count No High availability Yes Yes VM live migration Yes Yes Supports containers Yes No Virtual machine OS support Windows and Linux Windows, Linux, and Unix Community support Yes No Live VM snapshots Yes Yes Contrasting Proxmox VE and VMware vSphere features For a complete catalog of features, see the Proxmox VE datasheet at https://www.proxmox.com/images/download/pve/docs/Proxmox-VE-Datasheet.pdf. Like its competitors, PVE is a hypervisor: a typical hypervisor is software that creates, runs, configures, and manages virtual machines based on an administrator or engineer's choices. PVE is known as a type II hypervisor because the virtualization layer is built upon an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. A type II hypervisor, such as PVE, runs directly over the operating system. In Proxmox VE's case, the operating system is Debian; since the release of PVE 4.0, the underlying operating system has been Debian "Jessie." By contrast, a type I hypervisor (such as VMware's ESXi) runs directly on bare metal without the mediation of an operating system. It has no additional function beyond managing virtualization and the physical hardware. A type I hypervisor runs directly on hardware, without the mediation of an operating system. As a type II hypervisor, Proxmox VE is built on the Debian project. Debian is a GNU/Linux distribution renowned for its reliability, commitment to security, and its thriving and dedicated community of contributing developers. Debian-based GNU/Linux distributions are arguably the most popular GNU/Linux distributions for the desktop. One characteristic that distinguishes Debian from competing distribution is its release policy: Debian releases only when its development community can stand behind it for its stability, security, and usability. Debian does not distinguish between long-term support releases and regular releases as do some other distributions. Instead, all Debian releases receive strong support and critical updates through the first year following the next release. (Since 2007, a major release of Debian has been made about every two years. Debian 8, Jessie, was released just about on schedule in 2015. Proxmox VE's reliance on Debian is thus a testament to its commitment to these values: stability, security, and usability over scheduled releases that favor cutting-edge features. PVE provides its virtualization functionality through three open technologies, and the efficiency with which they're integrated by its administrative web interface: LXC KVM QEMU To understand how this foundation serves Proxmox VE, we must first be able to clearly understand the relationship between virtualization (or, specifically, hardware virtualization) and containerization (OS virtualization). As we proceed, their respective use cases should become clear. Virtualization and containerization with Proxmox VE It is correct to ultimately understand containerization as a type of virtualization. However, here, we'll look first to conceptually distinguish a virtual machine from a container by focusing on contrasting characteristics. Simply put, virtualization is a technique through which we provide fully-functional, computing resources without a demand for resources' physical organization, locations, or relative proximity. Briefly put, virtualization technology allows you to share and allocate the resources of a physical computer into multiple execution environments. Without context, virtualization is a vague term, encapsulating the abstraction of such resources as storage, networks, servers, desktop environments, and even applications from their concrete hardware requirements through software implementation solutions called hypervisors. Virtualization thus affords us more flexibility, more functionality, and a significant positive impact on our budgets—often realized with merely the resources we have at hand. In terms of PVE, virtualization most commonly refers to the abstraction of all aspects of a discrete computing system from its hardware. In this context, virtualization is the creation, in other words, of a virtual machine or VM, with its own operating system and applications. A VM may be initially understood as a computer that has the same functionality as a physical machine. Likewise, it may be incorporated and communicated with via a network exactly as a machine with physical hardware would. Put yet another way, from inside a VM, we will experience no difference from which we can distinguish it from a physical computer. The virtual machine, moreover, hasn't the physical footprint of its physical counterparts. The hardware it relies on is, in fact, provided by software that borrows from the hardware resources from a host installed on a physical machine (or bare metal). Nevertheless, the software components of the virtual machine, from the applications to the operating system, are distinctly separated from those of the host machine. This advantage is realized when it comes to allocating physical space for resources. For example, we may have a PVE server running a web server, database server, firewall, and log management system—all as discrete virtual machines. Rather than consuming the physical space, resources, and labor of maintaining four physical machines, we simply make physical room for the single Proxmox VE server and configure an appropriate virtual LAN as necessary. In a white paper entitled Putting Server Virtualization to Work, AMD articulates well the benefits of virtualization to businesses and developers (https://www.amd.com/Documents/32951B_Virtual_WP.pdf): Top 5 business benefits of virtualization: Increases server utilization Improves service levels Streamlines manageability and security Decreases hardware costs Reduces facility costs The benefits of virtualization with a development and test environment: Lowers capital and space requirements. Lowers power and cooling costs Increases efficiencies through shorter test cycles Faster time-to-market To these benefits, let's add portability and encapsulation: the unique ability to migrate a live VM from one PVE host to another—without suffering a service outage. Proxmox VE makes the creation and control of virtual machines possible through the combined use of two free and open source technologies: Kernel-based Virtual Machine (or KVM) and Quick Emulator (QEMU). Used together, we refer to this integration of tools as KVM-QEMU. KVM KVM has been an integral part of the Linux kernel since February, 2007. This kernel module allows GNU/Linux users and administrators to take advantage of an architecture's hardware virtualization extensions; for our purposes, these extensions are AMD's AMD-V and Intel's VT-X for the x86_64 architecture. To really make the most of Proxmox VE's feature set, you'll therefore very much want to install on an x86_64 machine with a CPU with integrated virtualization extensions. For a full list of AMD and Intel processors supported by KVM, visit Intel at http://ark.intel.com/Products/VirtualizationTechnology or AMD at http://support.amd.com/en-us/kb-articles/Pages/GPU120AMDRVICPUsHyperVWin8.aspx. QEMU QEMU provides an emulation and virtualization interface that can be scripted or otherwise controlled by a user. Visualizing the relationship between KVM and QEMU Without Proxmox VE, we could essentially define the hardware, create a virtual disk, and start and stop a virtualized server from the command line using QEMU. Alternatively, we could rely on any one of an array of GUI frontends for QEMU (a list of GUIs available for various platforms can be found at http://wiki.qemu.org/Links#GUI_Front_Ends). Of course, working with these solutions is productive only if you're interested in what goes on behind the scenes in PVE when virtual machines are defined. Proxmox VE's management of virtual machines is itself managing QEMU through its API. Managing QEMU from the command line can be tedious. The following is a line from a script that launched Raspbian, a Debian remix intended for the architecture of the Raspberry Pi, on an x86 Intel machine running Ubuntu. When we see how easy it is to manage VMs from Proxmox VE's administrative interfaces, we'll sincerely appreciate that relative simplicity: qemu-system-arm -kernel kernel-qemu -cpu arm1176 -m 256 -M versatilepb -no-reboot -serial stdio -append "root=/dev/sda2 panic=1" -hda ./$raspbian_img -hdb swap If you're familiar with QEMU's emulation features, it's perhaps important to note that we can't manage emulation through the tools and features Proxmox VE provides—despite its reliance on QEMU. From a bash shell provided by Debian, it's possible. However, the emulation can't be controlled through PVE's administration and management interfaces. Containerization with Proxmox VE Containers are a class of virtual machines (as containerization has enjoyed a renaissance since 2005, the term OS virtualization has become synonymous with containerization and is often used for clarity). However, by way of contrast with VMs, containers share operating system components, such as libraries and binaries, with the host operating system; a virtual machine does not. Visually contrasting virtual machines with containers The container advantage This arrangement potentially allows a container to run leaner and with fewer hardware resources borrowed from the host. For many authors, pundits, and users, containers also offer a demonstrable advantage in terms of speed and efficiency. (However, it should be noted here that as resources such as RAM and more powerful CPUs become cheaper, this advantage will diminish.) The Proxmox VE container is made possible through LXC from version 4.0 (it's made possible through OpenVZ in previous PVE versions). LXC is the third fundamental technology serving Proxmox VE's ultimate interest. Like KVM and QEMU, LXC (or Linux Containers) is an open source technology. It allows a host to run, and an administrator to manage, multiple operating system instances as isolated containers on a single physical host. Conceptually then, a container very clearly represents a class of virtualization, rather than an opposing concept. Nevertheless, it's helpful to maintain a clear distinction between a virtual machine and a container as we come to terms with PVE. The ideal implementation of a Proxmox VE guest is contingent on our distinguishing and choosing between a virtual-machine solution and a container solution. Since Proxmox VE containers share components of the host operating system and can offer advantages in terms of efficiency, this text will guide you through the creation of containers whenever the intended guest can be fully realized with Debian Jessie as our hypervisor's operating system without sacrificing features. When our intent is a guest running a Microsoft Windows operating system, for example, a Proxmox VE container ceases to be a solution. In such a case, we turn, instead, to creating a virtual machine. We must rely on a VM precisely because the operating system components that Debian can share with a Linux container are not components a Microsoft Windows operating system can make use of. Summary In this article, we have come to terms with the three open source technologies that provide Proxmox VE's foundational features: containerization and virtualization with LXC, KVM, and QEMU. Along the way, we've come to understand that containers, while being a type of virtualization, have characteristics that distinguish them from virtual machines. These differences will be crucial as we determine which technology to rely on for a virtual server solution with Proxmox VE. Resources for Article: Further resources on this subject: Deploying App-V 5 in a Virtual Environment[article] Setting Up a Spark Virtual Environment[article] Basic Concepts of Proxmox Virtual Environment[article]

0
0
32251

Packt

04 Apr 2016

20 min read

Morphology – Getting Our Feet Wet

Packt

04 Apr 2016

20 min read

In this article by Deepti Chopra, Nisheeth Joshi, and Iti Mathur authors of the book Mastering Natural Language Processing with Python, morphology may be defined as the study of the composition of words using morphemes. A morpheme is the smallest unit of the language that has a meaning. In this article, we will discuss stemming and lemmatizing, creating a stemmer and lemmatizer for non-English languages, developing a morphological analyzer and morphological generator using machine learning tools, creating a search engine, and many other concepts. In brief, this article will include the following topics: Introducing morphology Creating a stemmer and lemmatizer Developing a stemmer for non-English languages Creating a morphological analyzer Creating a morphological generator Creating a search engine (For more resources related to this topic, see here.) Introducing morphology Morphology may be defined as the study of the production of tokens with the help of morphemes. A morpheme is the basic unit of language, which carries a meaning. There are two types of morphemes: stems and affixes (suffixes, prefixes, infixes, and circumfixes). Stems are also referred to as free morphemes since they can even exist without adding affixes. Affixes are referred to as bound morphemes since they cannot exist in a free form, and they always exist along with free morphemes. Consider the word "unbelievable". Here, "believe" is a stem or free morpheme. It can even exist on its own. The morphemes "un" and "able" are affixes or bound morphemes. They cannot exist in s free form but exist together with a stem. There are three kinds of languages, namely isolating languages, agglutinative languages, and inflecting languages. Morphology has different meanings in all these languages. Isolating languages are those languages in which words are merely free morphemes, and they do not carry any tense (past, present, and future) or number (singular or plural) information. Mandarin Chinese is an example of an isolating language. Agglutinative languages are those languages in which small words combine together to convey compound information. Turkish is an example of an agglutinative language. Inflecting languages are languages in which words are broken down into simpler units, but all these simpler units exhibit different meanings. Latin is an example of an inflecting language. There are morphological processes such as inflections, derivations, semi-affixes, combining forms, and cliticization. An inflection refers to transforming a word into a form so that it represents a person, number, tense, gender, case, aspect, and mood. Here, the syntactic category of the token remains the same. In derivation, the syntactic category of word is also changed. Semi-affixes are bound morphemes that exhibit a word-like quality, for example, noteworthy, antisocial, anticlockwise, and so on. Understanding stemmers Stemming may be defined as the process of obtaining a stem from a word by eliminating the affixes from it. For example, in the word "raining", a stemmer would return the root word or the stem word "rain" by removing the affix "ing" from "raining". In order to increase the accuracy of information retrieval, search engines mostly use stemming to get a stem and store it as an index word. Search engines call words with the same meaning synonyms, which may be a kind of query expansion known as conflation. Martin Porter has designed a well-known stemming algorithm known as the Porter Stemming Algorithm. This algorithm is basically designed to replace and eliminate some well-known suffices present in English words. To perform stemming in NLTK, we can simply perform the instantiation of the PorterStemmer class, and then perform stemming by calling the stem method. Let's take a look at the code for stemming using the PorterStemmer class in NLTK: >>> import nltk>>> from nltk.stem import PorterStemmer>>> stemmerporter = PorterStemmer()>>> stemmerporter.stem('working')'work'>>> stemmerporter.stem('happiness')'happi' The PorterStemmer class is trained and has the knowledge of many stems and word forms in the English language. The process of stemming takes place in a series of steps and transforms a word into a shorter word or this word may similar meaning to the root word. The stemmer I interface defines the stem() method, and all stemmers are inherited from this interface. The inheritance diagram is depicted here: Another Stemming algorithm, known as the Lancaster Stemming algorithm, was introduced in Lancaster University. Similar to the PorterStemmer class, the LancasterStemmer class is used in NLTK to implement Lancaster Stemming. Let's consider the following code, which depicts Lancaster stemming in NLTK: >>> import nltk >>> from nltk.stem import LancasterStemmer >>> stemmerlan=LancasterStemmer() >>> stemmerlan.stem('working') 'work' >>> stemmerlan.stem('happiness') 'happy' We can also build our own stemmer in NLTK using RegexpStemmer. This works by accepting a string and eliminates it from the prefix or suffix of a word when a match is found. Let's consider an example of stemming using RegexpStemmer in NLTK: >>> import nltk >>> from nltk.stem import RegexpStemmer >>> stemmerregexp=RegexpStemmer('ing') >>> stemmerregexp.stem('working') 'work' >>> stemmerregexp.stem('happiness') 'happiness' >>> stemmerregexp.stem('pairing') 'pair' We can use RegexpStemmer in cases where stemming cannot be performed using PorterStemmer and LancasterStemmer. The SnowballStemmer class is used to perform stemming in 13 languages other than English. In order to perform stemming using SnowballStemmer, firstly, an instance is created in the language where stemming needs to be performed, and then using the stem() method, stepping is performed. Consider the following example to perform stemming in Spanish and French in NLTK using SnowballStemmer: >>> import nltk >>> from nltk.stem import SnowballStemmer >>> SnowballStemmer.languages ('danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian', 'italian', 'norwegian', 'porter', 'portuguese', 'romanian', 'russian', 'spanish', 'swedish') >>> spanishstemmer=SnowballStemmer('spanish') >>> spanishstemmer.stem('comiendo') 'com' >>> frenchstemmer=SnowballStemmer('french') >>> frenchstemmer.stem('manger') 'mang' Nltk.stem.api consists of the stemmer I class in which the stem function is performed. Consider the following code present in NLTK, which enables stemming to be performed: Class StemmerI(object): """ It is an interface that helps to eliminate morphological affixes from the tokens and the process is known as stemming. """ def stem(self, token): """ Eliminate affixes from token and stem is returned. """ raise NotImplementedError() Here's the code used to perform stemming using multiple stemmers: >>> import nltk >>> from nltk.stem.porter import PorterStemmer >>> from nltk.stem.lancaster import LancasterStemmer >>> from nltk.stem import SnowballStemmer >>> def obtain_tokens(): With open('/home/p/NLTK/sample1.txt') as stem: tok = nltk.word_tokenize(stem.read()) return tokens >>> def stemming(filtered): stem=[] for x in filtered: stem.append(PorterStemmer().stem(x)) return stem >>> if_name_=="_main_": tok= obtain_tokens() >>> print("tokens is %s")%(tok) >>> stem_tokens= stemming(tok) >>> print("After stemming is %s")%stem_tokens >>> res=dict(zip(tok,stem_tokens)) >>> print("{tok:stemmed}=%s")%(result) Understanding lemmatization Lemmatization is the process in which we transform a word into a form that has a different word category. The word formed after lemmatization is entirely different from what it was initially. Consider an example of lemmatization in NLTK: >>> import nltk >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output=WordNetLemmatizer() >>> lemmatizer_output.lemmatize('working') 'working' >>> lemmatizer_output.lemmatize('working',pos='v') 'work' >>> lemmatizer_output.lemmatize('works') 'work' WordNetLemmatizer may be defined as a wrapper around the so-called WordNet corpus, and it makes use of the morphy() function present in WordNetCorpusReader to extract a lemma. If no lemma is extracted, then the word is only returned in its original form. For example, for 'works', the lemma that is returned is in the singular form 'work'. This code snippet illustrates the difference between stemming and lemmatization: >>> import nltk >>> from nltk.stem import PorterStemmer >>> stemmer_output=PorterStemmer() >>> stemmer_output.stem('happiness') 'happi' >>> from nltk.stem import WordNetLemmatizer >>> lemmatizer_output.lemmatize('happiness') 'happiness' In the preceding code, 'happiness' is converted to 'happi' by stemming it. Lemmatization can't find the root word for 'happiness', so it returns the word "happiness". Developing a stemmer for non-English languages Polyglot is a software that is used to provide models called morfessor models, which are used to obtain morphemes from tokens. The Morpho project's goal is to create unsupervised data-driven processes. Its focuses on the creation of morphemes, which are the smallest units of syntax. Morphemes play an important role in natural language processing. They are useful in automatic recognition and the creation of language. With the help of the vocabulary dictionaries of polyglot, morfessor models on 50,000 tokens of different languages was used. Here's the code to obtain a language table using a polyglot: from polyglot.downloader import downloader print(downloader.supported_languages_table("morph2")) The output obtained from the preceding code is in the form of these languages listed as follows: 1. Piedmontese language 2. Lombard language 3. Gan Chinese 4. Sicilian 5. Scots 6. Kirghiz, Kyrgyz 7. Pashto, Pushto 8. Kurdish 9. Portuguese 10. Kannada 11. Korean 12. Khmer 13. Kazakh 14. Ilokano 15. Polish 16. Panjabi, Punjabi 17. Georgian 18. Chuvash 19. Alemannic 20. Czech 21. Welsh 22. Chechen 23. Catalan; Valencian 24. Northern Sami 25. Sanskrit (Saṁskṛta) 26. Slovene 27. Javanese 28. Slovak 29. Bosnian-Croatian-Serbian 30. Bavarian 31. Swedish 32. Swahili 33. Sundanese 34. Serbian 35. Albanian 36. Japanese 37. Western Frisian 38. French 39. Finnish 40. Upper Sorbian 41. Faroese 42. Persian 43. Sinhala, Sinhalese 44. Italian 45. Amharic 46. Aragonese 47. Volapük 48. Icelandic 49. Sakha 50. Afrikaans 51. Indonesian 52. Interlingua 53. Azerbaijani 54. Ido 55. Arabic 56. Assamese 57. Yoruba 58. Yiddish 59. Waray-Waray 60. Croatian 61. Hungarian 62. Haitian; Haitian Creole 63. Quechua 64. Armenian 65. Hebrew (modern) 66. Silesian 67. Hindi 68. Divehi; Dhivehi; Mald... 69. German 70. Danish 71. Occitan 72. Tagalog 73. Turkmen 74. Thai 75. Tajik 76. Greek, Modern 77. Telugu 78. Tamil 79. Oriya 80. Ossetian, Ossetic 81. Tatar 82. Turkish 83. Kapampangan 84. Venetian 85. Manx 86. Gujarati 87. Galician 88. Irish 89. Scottish Gaelic; Gaelic 90. Nepali 91. Cebuano 92. Zazaki 93. Walloon 94. Dutch 95. Norwegian 96. Norwegian Nynorsk 97. West Flemish 98. Chinese 99. Bosnian 100. Breton 101. Belarusian 102. Bulgarian 103. Bashkir 104. Egyptian Arabic 105. Tibetan Standard, Tib... 106. Bengali 107. Burmese 108. Romansh 109. Marathi (Marāthī) 110. Malay 111. Maltese 112. Russian 113. Macedonian 114. Malayalam 115. Mongolian 116. Malagasy 117. Vietnamese 118. Spanish; Castilian 119. Estonian 120. Basque 121. Bishnupriya Manipuri 122. Asturian 123. English 124. Esperanto 125. Luxembourgish, Letzeb... 126. Latin 127. Uighur, Uyghur 128. Ukrainian 129. Limburgish, Limburgan... 130. Latvian 131. Urdu 132. Lithuanian 133. Fiji Hindi 134. Uzbek 135. Romanian, Moldavian, ... The necessary models can be downloaded using the following code: %%bash polyglot download morph2.en morph2.ar [polyglot_data] Downloading package morph2.en to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.en is already up-to-date! [polyglot_data] Downloading package morph2.ar to [polyglot_data] /home/rmyeid/polyglot_data... [polyglot_data] Package morph2.ar is already up-to-date! Consider this example that obtains output from a polyglot: from polyglot.text import Text, Word tokens =["unconditional" ,"precooked", "impossible", "painful", "entered"] for s in tokens: s=Word(s, language="en") print("{:<20}{}".format(s,s.morphemes)) unconditional ['un','conditional'] precooked ['pre','cook','ed'] impossible ['im','possible'] painful ['pain','ful'] entered ['enter','ed'] If tokenization is not performed properly, then we can perform morphological analysis for the process of splitting text into its original constituents: sent="Ihopeyoufindthebookinteresting" para=Text(sent) para.language="en" para.morphemes WordList(['I','hope','you','find','the','book','interesting']) A morphological analyzers Morphological analysis may be defined as the process of obtaining grammatical information about a token given its suffix information. Morphological analysis can be performed in three ways: Morpheme-based morphology (or the item and arrangement approach), Lexeme-based morphology (or the item and process approach), and Word-based morphology (or the word and paradigm approach). A morphological analyzer may be defined as a program that is responsible for the analysis of the morphology of a given input token. It analyzes a given token and generates morphological information, such as gender, number, class, and so on, as an output. In order to perform morphological analysis on a given non-whitespace token, pyEnchant dictionary is used. Consider the following code that performs morphological analysis: >>> import enchant >>> s = enchant.Dict("en_US") >>> tok=[] >>> def tokenize(st1): if not st1:return for j in xrange(len(st1),-1,-1): if s.check(st1[0:j]): tok.append(st1[0:i]) st1=st[j:] tokenize(st1) break >>> tokenize("itismyfavouritebook") >>> tok ['it', 'is', 'my','favourite','book'] >>> tok=[ ] >>> tokenize("ihopeyoufindthebookinteresting") >>> tok ['i','hope','you','find','the','book','interesting'] We can determine the category of a word as follows: Morphological hints: Suffix information helps us to detect the category of a word. For example, -ness and –ment suffixes exist with nouns. Syntactic hints: Contextual information is conducive in determining the category of a word. For example, if we have found a word that has a noun category, then syntactic hints will be useful in determining whether an adjective will appear before the noun or after the noun category. Semantic hints: A semantic hint is also useful in determining the category of a word. For example, if we already know that a word represents the name of a location, then it will fall under the noun category. Open class: This refers to the class of words that are not fixed and each day, their number keeps on increasing whenever a new word is added to the list. Words in an open class are usually in the form of nouns. Prepositions are mostly a closed class. Morphology captured by the part of speech tagset: Part of Speech tagset capture information that helps us to perform morphology. For example, the word 'plays' would appear with the third person and singular noun. Omorfi (the open morphology of Finnish) is a package that has been licensed by version 3 of GNU GPL. It is used for the purpose of performing numerous tasks such as language modeling, morphological analysis, rule-based machine translations, information retrieval, statistical machine translations, morphological segmentation, ontologies, and spell checking and correction. A morphological generator A morphological generator is a program that performs the task of morphological generations. Morphological generation may be considered the opposite of morphological analysis. Here, given the description of a word in terms of its number, category, stem, and so on, the original word is retrieved. For example, if root = go, Part of Speech = verb, tense= present, and if it occurs along with a third person and singular subject, then the morphological generator would generate its surface form, that is, goes. There are many Python-based software that perform morphological analysis and generation. Some of them are as follows: ParaMorfo: This is used to perform the morphological generation and analysis of Spanish and Guarani nouns, adjectives, and verbs HornMorpho: This is used for the morphological generation and analysis of Oromo and Amharic nouns and verbs as well as Tigrinya verbs AntiMorfo: This is used for the morphological generation and analysis of Quechua adjectives, verbs, and nouns as well as Spanish verbs MorfoMelayu: This is used for the morphological analysis of Malay words Other examples of software that is used to perform morphological analysis and generation are as follows: Morph is a morphological generator and analyzer for the English language and the RASP system Morphy is a morphological generator, analyzer, and POS tagger for German Morphisto is a morphological generator and analyzer for German Morfette performs supervised learning (inflectional morphology) for Spanish and French Search engines PyStemmer 1.0.1 consists of Snowball stemming algorithms that are conducive for performing information retrieval tasks and the construction of a search engine. It consists of the Porter stemming algorithm and many other stemming algorithms that are useful for the purpose of performing stemming and information retrieval tasks in many languages, including many European languages. We can construct a vector space search engine by converting the texts into vectors. Here are the steps needed to construct a vector space search engine: Stemming and elimination of stop words. A stemmer is a program that accepts words and converts them into stems. Tokens that have same stem almost have the same meanings. Stop words are also eliminated from text. Consider the following code for the removal of stop words and tokenization: def eliminatestopwords(self,list): " " " Eliminate words which occur often and have not much significance from context point of view. " " " return[ word for word in list if word not in self.stopwords ] def tokenize(self,string): " " " Perform the task of splitting text into stop words and tokens " " " Str=self.clean(str) Words=str.split(" ") return [self.stemmer.stem(word,0,len(word)-1) for word in words] Mapping keywords into vector dimensions.Here's the code required to perform the mapping of keywords into vector dimensions: def obtainvectorkeywordindex(self, documentList): " " " In the document vectors, generate the keyword for the given position of element " " " #Perform mapping of text into strings vocabstring = " ".join(documentList) vocablist = self.parser.tokenise(vocabstring) #Eliminate common words that have no search significance vocablist = self.parser.eliminatestopwords(vocablist) uniqueVocablist = util.removeDuplicates(vocablist) vectorIndex={} offset=0 #Attach a position to keywords that performs mapping with dimension that is used to depict this token for word in uniqueVocablist: vectorIndex[word]=offset offset+=1 return vectorIndex #(keyword:position) Mapping of text strings to vectorsHere, a simple term count model is used. The code to convert text strings into vectors is as follows: def constructVector(self, wordString): # Initialise the vector with 0's Vector_val = [0] * len(self.vectorKeywordIndex) tokList = self.parser.tokenize(tokString) tokList = self.parser.eliminatestopwords(tokList) for word in toklist: vector[self.vectorKeywordIndex[word]] += 1; # simple Term Count Model is used return vector Searching similar documents By finding the cosine of an angle between the vectors of a document, we can prove whether two given documents are similar or not. If the cosine value is 1, then the angle value is 0 degrees and vectors are said to be parallel (this means that documents are related). If the cosine value is 0 and the value of the angle is 90 degrees, then vectors are said to be perpendicular (this means that documents are not related). This is the code to compute the cosine between the text vector using scipy: def cosine(vec1, vec2): """ cosine = ( X * Y ) / ||X|| x ||Y|| """ return float(dot(vec1,vec2) / (norm(vec1) * norm(vec2))) Search keywords We perform the mapping of keywords to a vector space. We construct a temporary text that represents items to be searched and then compare it with document vectors with the help of a cosine measurement. Here is the following code needed to search for the vector space: def searching(self,searchinglist): """ search for text that are matched on the basis of list of items """ askVector = self.buildQueryVector(searchinglist) ratings = [util.cosine(askVector, textVector) for textVector in self.documentVectors] ratings.sort(reverse=True) return ratings The following code can be used to detect languages from a source text: >>> import nltk >>> import sys >>> try: from nltk import wordpunct_tokenize from nltk.corpus import stopwords except ImportError: print( 'Error has occured') #---------------------------------------------------------------------- >>> def _calculate_languages_ratios(text): """ Compute probability of given document that can be written in different languages and give a dictionary that appears like {'german': 2, 'french': 4, 'english': 1} """ languages_ratios = {} ''' nltk.wordpunct_tokenize() splits all punctuations into separate tokens wordpunct_tokenize("I hope you like the book interesting .") [' I',' hope ','you ','like ','the ','book' ,'interesting ','.'] ''' tok = wordpunct_tokenize(text) wor = [word.lower() for word in tok] # Compute occurence of unique stopwords in a text for language in stopwords.fileids(): stopwords_set = set(stopwords.words(language)) words_set = set(words) common_elements = words_set.intersection(stopwords_set) languages_ratios[language] = len(common_elements) # language "score" return languages_ratios #---------------------------------------------------------------- >>> def detect_language(text): """ Compute the probability of given text that is written in different languages and obtain the one that is highest scored. It makes use of stopwords calculation approach, finds out unique stopwords present in a analyzed text. """ ratios = _calculate_languages_ratios(text) most_rated_language = max(ratios, key=ratios.get) return most_rated_language if __name__=='__main__': text = ''' All over this cosmos, most of the people believe that there is an invisible supreme power that is the creator and the runner of this world. Human being is supposed to be the most intelligent and loved creation by that power and that is being searched by human beings in different ways into different things. As a result people reveal His assumed form as per their own perceptions and beliefs. It has given birth to different religions and people are divided on the name of religion viz. Hindu, Muslim, Sikhs, Christian etc. People do not stop at this. They debate the superiority of one over the other and fight to establish their views. Shrewd people like politicians oppose and support them at their own convenience to divide them and control them. It has intensified to the extent that even parents of a new born baby teach it about religious differences and recommend their own religion superior to that of others and let the child learn to hate other people just because of religion. Jonathan Swift, an eighteenth century novelist, observes that we have just enough religion to make us hate, but not enough to make us love one another. The word 'religion' does not have a derogatory meaning - A literal meaning of religion is 'A personal or institutionalized system grounded in belief in a God or Gods and the activities connected with this'. At its basic level, 'religion is just a set of teachings that tells people how to lead a good life'. It has never been the purpose of religion to divide people into groups of isolated followers that cannot live in harmony together. No religion claims to teach intolerance or even instructs its believers to segregate a certain religious group or even take the fundamental rights of an individual solely based on their religious choices. It is also said that 'Majhab nhi sikhata aaps mai bair krna'. But this very majhab or religion takes a very heinous form when it is misused by the shrewd politicians and the fanatics e.g. in Ayodhya on 6th December, 1992 some right wing political parties and communal organizations incited the Hindus to demolish the 16th century Babri Masjid in the name of religion to polarize Hindus votes. Muslim fanatics in Bangladesh retaliated and destroyed a number of temples, assassinated innocent Hindus and raped Hindu girls who had nothing to do with the demolition of Babri Masjid. This very inhuman act has been presented by Taslima Nasrin, a Banglsdeshi Doctor-cum-Writer in her controversial novel 'Lajja' (1993) in which, she seems to utilizes fiction's mass emotional appeal, rather than its potential for nuance and universality. ''' >>> language = detect_language(text) >>> print(language) The preceding code will search for stop words and detect the language of the text, which is English. Summary In this article, we discussed stemming, lemmatization, and morphological analysis and generation. Resources for Article: Further resources on this subject: How is Python code organized[article] Machine learning and Python – the Dream Team[article] Putting the Fun in Functional Python[article]

0
0
4569

article-image-how-get-started-redux-react-native

Emilio Rodriguez

04 Apr 2016

5 min read

How To Get Started with Redux in React Native

Emilio Rodriguez

04 Apr 2016

5 min read

In mobile development there is a need for architectural frameworks, but complex frameworks designed to be used in web environments may end up damaging the development process or even the performance of our app. Because of this, some time ago I decided to introduce in all of my React Native projects the leanest framework I ever worked with: Redux. Redux is basically a state container for JavaScript apps. It is 100 percent library-agnostic so you can use it with React, Backbone, or any other view library. Moreover, it is really small and has no dependencies, which makes it an awesome tool for React Native projects. Step 1: Install Redux in your React Native project. Redux can be added as an npm dependency into your project. Just navigate to your project’s main folder and type: npm install --save react-redux By the time this article was written React Native was still depending on React Redux 3.1.0 since versions above depended on React 0.14, which is not 100 percent compatible with React Native. Because of this, you will need to force version 3.1.0 as the one to be dependent on in your project. Step 2: Set up a Redux-friendly folder structure. Of course, setting up the folder structure for your project is totally up to every developer but you need to take into account that you will need to maintain a number of actions, reducers, and components. Besides, it’s also useful to keep a separate folder for your API and utility functions so these won’t be mixing with your app’s core functionality. Having this in mind, this is my preferred folder structure under the src folder in any React Native project: Step 3: Create your first action. In this article we will be implementing a simple login functionality to illustrate how to integrate Redux inside React Native. A good point to start this implementation is the action, a basic function called from the component whenever we want the whole state of the app to be changed (i.e. changing from the logged out state into the logged in state). To keep this example as concise as possible we won’t be doing any API calls to a backend – only the pure Redux integration will be explained. Our action creator is a simple function returning an object (the action itself) with a type attribute expressing what happened with the app. No business logic should be placed here; our action creators should be really plain and descriptive. Step 4: Create your first reducer. Reducers are the ones in charge of updating the state of the app. Unlike in Flux, Redux only has one store for the whole app, but it will be conveniently name-spaced automatically by Redux once the reducers have been applied. In our example, the user reducer needs to be aware of when the user is logged in. Because of that, it needs to import the LOGIN_SUCCESS constant we defined in our actions before and export a default function, which will be called by Redux every time an action occurs in the app. Redux will automatically pass the current state of the app and the action occurred. It’s up to the reducer to realize if it needs to modify the state or not based on the action.type. That’s why almost every time our reducer will be a function containing a switch statement, which modifies and returns the state based on what action occurred. It’s important to state that Redux works with object references to identify when the state is changed. Because of this, the state should be cloned before any modification. It’s also interesting to know that the action passed to the reducers can contain other attributes apart from type. For example, when doing a more complex login, the user first name and last name can be added to the action by the action created and used by the reducer to update the state of the app. Step 5: Create your component. This step is almost pure React Native coding. We need a component to trigger the action and to respond to the change of state in the app. In our case it will be a simple View containing a button that disappears when logged in. This is a normal React Native component except for some pieces of the Redux boilerplate: The three import lines at the top will require everything we need from Redux ‘mapStateToProps’ and ‘mapDispatchToProps’ are two functions bound with ‘connect’ to the component: this makes Redux know that this component needs to be passed a piece of the state (everything under ‘userReducers’) and all the actions available in the app. Just by doing this, we will have access to the login action (as it is used in the onLoginButtonPress) and to the state of the app (as it is used in the !this.props.user.loggedIn statement) Step 6: Glue it all from your index.ios.js. For Redux to apply its magic, some initialization should be done in the main file of your React Native project (index.ios.js). This is pure boilerplate and only done once: Redux needs to inject a store holding the app state into the app. To do so, it requires a ‘Provider’ wrapping the whole app. This store is basically a combination of reducers. For this article we only need one reducer, but a full app will include many others and each of them should be passed into the combineReducers function to be taken into account by Redux whenever an action is triggered. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.

0
0
44376

How-To Tutorials

Packt

01 Apr 2016

16 min read

Machine Learning Tasks

Packt

01 Apr 2016

16 min read

In this article written by David Julian, author of the book Designing Machine Learning Systems with Python, the author wants to state that, he will first introduce the basic machine learning tasks. Classification is probably the most common task, due in part to the fact that it is relatively easy, well understood, and solves a lot of common problems. Multiclass classification (for instance, handwriting recognition) can sometimes be achieved by chaining binary classification tasks. However, we lose information this way, and we become unable to define a single decision boundary. For this reason, multiclass classification is often treated separately from binary classification. (For more resources related to this topic, see here.) There are cases where we are not interested in discrete classes but rather a real number, for instance, a probability. These type of problems are regression problems. Both classification and regression require a training set of correctly labelled data. They are supervised learning problems. Originating from these basic machine tasks are a number of derived tasks. In many applications, this may simply be applying the learning model to a prediction to establish a causal relationship. We must remember that explaining and predicting are not the same. A model can make a prediction, but unless we know explicitly how it made the prediction, we cannot begin to form a comprehensible explanation. An explanation requires human knowledge of the domain. We can also use a prediction model to find exceptions from a general pattern. Here, we are interested in the individual cases that deviate from the predictions. This is often called anomaly detection and has wide applications in areas such as detecting bank fraud, noise filtering, and even in the search for extraterrestrial life. An important and potentially useful task is subgroup discovery. Our goal here is not, as in clustering, to partition the entire domain but rather to find a subgroup that has a substantially different distribution. In essence, subgroup discovery is trying to find relationships between a dependent target variable and many independent explaining variables. We are not trying to find a complete relationship but rather a group of instances that are different in ways that are important in the domain. For instance, establishing the subgroups, smoker = true and family history =true, for a target variable of heart disease =true. Finally, we consider control type tasks. These act to optimize control setting to maximize a pay off is given different conditions. This can be achieved in several ways. We can clone expert behavior; the machine learns directly from a human and makes predictions of actions given different conditions. The task is to learn a prediction model for the expert's actions. This is similar to reinforcement learning, where the task is to learn about the relationship between conditions and optimal action. Clustering, on the other hand, is the task of grouping items without any information on that group; this is an unsupervised learning task. Clustering is basically making a measurement of similarity. Related to clustering is association, which is an unsupervised task to find a certain type of pattern in the data. This task is behind movie recommender systems, and customers who bought this also bought .. on checkout pages of online stores. Data for machine learning When considering raw data for machine learning applications, there are three separate aspects: The volume of the data The velocity of the data The variety of the data Data volume The volume problem can be approached from three different directions: efficiency, scalability, and parallelism. Efficiency is about minimizing the time it takes for an algorithm to process a unit of information. A component of this is the underlying processing power of the hardware. The other component, and one that we have more control over, is ensuring our algorithms are not wasting precious processing cycles on unnecessary tasks. Scalability is really about brute force, and throwing as much hardware at a problem as you can. With Moore's law, which predicts the trend of computer power doubling every two years and reaching its limit, it is clear that scalability is not, by its self, going to be able to keep pace with the ever increasing amounts of data. Simply adding more memory and faster processors is not, in many cases, going to be a cost effective solution. Parallelism is a growing area of machine learning, and it encompasses a number of different approaches from harnessing capabilities of multi core processors, to large scale distributed computing on many different platforms. Probably, the most common method is to simply run the same algorithm on many machines, each with a different set of parameters. Another method is to decompose a learning algorithm into an adaptive sequence of queries, and have these queries processed in parallel. A common implementation of this technique is known as MapReduce, or its open source version, Hadoop. Data velocity The velocity problem is often approached in terms of data producers and data consumers. The data transfer rate between the two is its velocity, and it can be measured in interactive response times. This is the time it takes from a query being made to its response being delivered. Response times are constrained by latencies such as hard disk read and write times, and the time it takes to transmit data across a network. Data is being produced at ever greater rates, and this is largely driven by the rapid expansion of mobile networks and devices. The increasing instrumentation of daily life is revolutionizing the way products and services are delivered. This increasing flow of data has led to the idea of streaming processing. When input data is at a velocity that makes it impossible to store in its entirety, a level of analysis is necessary as the data streams, in essence, deciding what data is useful and should be stored and what data can be thrown away. An extreme example is the Large Hadron Collider at CERN, where the vast majority of data is discarded. A sophisticated algorithm must scan the data as it is being generated, looking at the information needle in the data haystack. Another instance where processing data streams might be important is when an application requires an immediate response. This is becoming increasingly used in applications such as online gaming and stock market trading. It is not just the velocity of incoming data that we are interested in. In many applications, particularly on the web, the velocity of a system's output is also important. Consider applications such as recommender systems, which need to process large amounts of data and present a response in the time it takes for a web page to load. Data variety Collecting data from different sources invariably means dealing with misaligned data structures, and incompatible formats. It also often means dealing with different semantics and having to understand a data system that may have been built on a pretty different set of logical principles. We have to remember that, very often, data is repurposed for an entirely different application than the one it was originally intended for. There is a huge variety of data formats and underlying platforms. Significant time can be spent converting data into one consistent format. Even when this is done, the data itself needs to be aligned such that each record consists of the same number of features and is measured in the same units. Models The goal in machine learning is not to just solve an instance of a problem, but to create a model that will solve unique problems from new data. This is the essence of learning. A learning model must have a mechanism for evaluating its output, and in turn, changing its behavior to a state that is closer to a solution. A model is essentially a hypothesis: a proposed explanation for a phenomenon. The goal is to apply a generalization to the problem. In the case of supervised learning, problem knowledge gained from the training set is applied to the unlabeled test. In the case of an unsupervised learning problem, such as clustering, the system does not learn from a training set. It must learn from the characteristics of the dataset itself, such as degree of similarity. In both cases, the process is iterative. It repeats a well-defined set of tasks, that moves the model closer to a correct hypothesis. There are many models and as many variations on these models as there are unique solutions. We can see that the problems that machine learning systems solve (regression, classification, association, and so on) come up in many different settings. They have been used successfully in almost all branches of science, engineering, mathematics, commerce, and also in the social sciences; they are as diverse as the domains they operate in. This diversity of models gives machine learning systems great problem solving powers. However, it can also be a bit daunting for the designer to decide what is the best model, or models, for a particular problem. To complicate things further, there are often several models that may solve your task, or your task may need several models. The most accurate and efficient pathway through an original problem is something you simply cannot know when you embark upon such a project. There are several modeling approaches. These are really different perspectives that we can use to help us understand the problem landscape. A distinction can be made regarding how a model divides up the instance space. The instance space can be considered all possible instances of your data, regardless of whether each instance actually appears in the data. The data is a subset of the instance space. There are two approaches to dividing up this space: grouping and grading. The key difference between the two is that grouping models divide the instance space into fixed discrete units called segments. Each segment has a finite resolution and cannot distinguish between classes beyond this resolution. Grading, on the other hand, forms a global model over the entire instance space, rather than dividing the space into segments. In theory, the resolution of a grading model is infinite, and it can distinguish between instances no matter how similar they are. The distinction between grouping and grading is not absolute, and many models contain elements of both. Geometric models One of the most useful approaches to machine learning modeling is through geometry. Geometric models use the concept of instance space. The most obvious example is when all the features are numerical and can become coordinates in a Cartesian coordinate system. When we only have two or three features, they are easy to visualize. Since many machine learning problems have hundreds or thousands of features, and therefore dimensions, visualizing these spaces is impossible. Importantly, many of the geometric concepts, such as linear transformations, still apply in this hyper space. This can help us better understand our models. For instance, we expect many learning algorithms to be translation invariant, which means that it does not matter where we place the origin in the coordinate system. Also, we can use the geometric concept of Euclidean distance to measure similarity between instances; this gives us a method to cluster alike instances and form a decision boundary between them. Probabilistic models Often, we will want our models to output probabilities rather than just binary true or false. When we take a probabilistic approach, we assume that there is an underlying random process that creates a well-defined, but unknown, probability distribution. Probabilistic models are often expressed in the form of a tree. Tree models are ubiquitous in machine learning, and one of their main advantages is that they can inform us about the underlying structure of a problem. Decision trees are naturally easy to visualize and conceptualize. They allow inspection and do not just give an answer. For example, if we have to predict a category, we can also expose the logical steps that gave rise to a particular result. Also, tree models generally require less data preparation than other models and can handle numerical and categorical data. On the down side, tree models can create overly complex models that do not generalize very well to new data. Another potential problem with tree models is that they can become very sensitive to changes in the input data, and as we will see later, this problem can be mitigated by using them as ensemble learners. Linear models A key concept in machine learning is that of the linear model. Linear models form the foundation of many advanced nonlinear techniques such as support vector machines and neural networks. They can be applied to any predictive task such as classification, regression, or probability estimation. When responding to small changes in the input data, and provided that our data consists of entirely uncorrelated features, linear models tend to be more stable than tree models. Tree models can over-respond to small variation in training data. This is because splits at the root of a tree have consequences that are not recoverable further down a branch, potentially making the rest of the tree significantly different. Linear models, on the other hand, are relatively stable, being less sensitive to initial conditions. However, as you would expect, this has the opposite effect of making it less sensitive to nuanced data. This is described by the terms variance (for over fitting models) and bias (for under fitting models). A linear model is typically low variance and high bias. Linear models are generally best approached from a geometric perspective. We know we can easily plot two dimensions of space in a Cartesian co-ordinate system, and we can use the illusion of perspective to illustrate a third. We have also been taught to think of time as being a fourth dimension, but when we start speaking of n dimensions, a physical analogy breaks down. Intriguingly, we can still use many of the mathematical tools that we intuitively apply to three dimensions of space. While it becomes difficult to visualize these extra dimensions, we can still use the same geometric concepts (such as lines, planes, angles, and distance) to describe them. With geometric models, we describe each instance as having a set of real-valued features, each of which is a dimension in a space. Model ensembles Ensemble techniques can be divided broadly into two types. The Averaging Method: With this method, several estimators are run independently, and their predictions are averaged. This includes the random forests and bagging methods. The Boosting Methods: With this method, weak learners are built sequentially using weighted distributions of the data, based on the error rates. Ensemble methods use multiple models to obtain better performance than any single constituent model. The aim is to not only build diverse and robust models, but also to work within limitations such as processing speed and return times. When working with large datasets and quick response times, this can be a significant developmental bottleneck. Troubleshooting and diagnostics are important aspects of working with all machine learning models, but they are especially important when dealing with models that might take days to run. The types of machine learning ensembles that can be created are as diverse as the models themselves, and the main considerations revolve around three things: how we divide our data, how we select the models, and the methods we use to combine their results. This simplistic statement actually encompasses a very large and diverse space. Neural nets When we approach the problem of trying to mimic the brain, we are faced with a number of difficulties. Considering all the different things the brain does, we might first think that it consists of a number of different algorithms, each specialized to do a particular task, and each hard wired into different parts of the brain. This approach translates to considering the brain as a number of subsystems, each with its own program and task. For example, the auditory cortex for perceiving sound has its own algorithm that, say, does a Fourier transform on an incoming sound wave to detect the pitch. The visual cortex, on the other hand, has its own distinct algorithm for decoding the signals from the optic nerve and translating them into the sensation of sight. There is, however, growing evidence that the brain does not function like this at all. It appears, from biological studies, that brain tissue in different parts of the brain can relearn how to interpret inputs. So, rather than consisting of specialized subsystems that are programmed to perform specific tasks, the brain uses the same algorithm to learn different tasks. This single algorithm approach has many advantages, not least of which is that it is relatively easy to implement. It also means that we can create generalized models and then train them to perform specialized tasks. Like in real brains, using a singular algorithm to describe how each neuron communicates with the other neurons around it allows artificial neural networks to be adaptable and able to carry out multiple higher-level tasks. Much of the most important work being done with neural net models, and indeed machine learning in general, is through the use of very complex neural nets with many layers and features. This approach is often called deep architecture or deep learning. Human and animal learning occurs at a rate and depth that no machine can match. Many of the elements of biological learning still remain a mystery. One of the key areas of research, and one of the most useful in application, is that of object recognition. This is something quite fundamental to living systems, and higher animals have evolved to possessing an extraordinary ability to learn complex relationships between objects. Biological brains have many layers; each synaptic event exists in a long chain of synaptic processes. In order to recognize complex objects, such as people's faces or handwritten digits, a fundamental task is to create a hierarchy of representation from the raw input to higher and higher levels of abstraction. The goal is to transform raw data, such as a set of pixel values, into something that we can describe as, say, a person riding bicycle. Resources for Article: Further resources on this subject: Python Data Structures [article] Exception Handling in MySQL for Python [article] Python Data Analysis Utilities [article]

0
0
5074

Packt

01 Apr 2016

11 min read

Integrating with Objective-C

Packt

01 Apr 2016

11 min read

In this article written by Kyle Begeman author of the book Swift 2 Cookbook, we will cover the following recipes: Porting your code from one language to another Replacing the user interface classes Upgrading the app delegate Introduction Swift 2 is out, and we can see that it is going to replace Objective-C on iOS development sooner or later, however how should you migrate your Objective-C app? Is it necessary to rewrite everything again? Of course you don't have to rewrite a whole application in Swift from scratch, you can gradually migrate it. Imagine a four years app developed by 10 developers, it would take a long time to be rewritten. Actually, you've already seen that some of the codes we've used in this book have some kind of "old Objective-C fashion". The reason is that not even Apple computers could migrate the whole Objective-C code into Swift. (For more resources related to this topic, see here.) Porting your code from one language to another In the previous recipe we learned how to add a new code into an existing Objective-C project, however you shouldn't only add new code but also, as far as possible, you should migrate your old code to the new Swift language. If you would like to keep your application core on Objective-C that's ok, but remember that new features are going to be added on Swift and it will be difficult keeping two languages on the same project. In this recipe we are going to port part of the code, which is written in Objective-C to Swift. Getting ready Make a copy of the previous recipe, if you are using any version control it's a good time for committing your changes. How to do it… Open the project and add a new file called Setup.swift, here we are going to add a new class with the same name (Setup): class Setup { class func generate() -> [Car]{ var result = [Car]() for distance in [1.2, 0.5, 5.0] { var car = Car() car.distance = Float(distance) result.append(car) } var car = Car() car.distance = 4 var van = Van() van.distance = 3.8 result += [car, van] return result } } Now that we have this car array generator we can call it on the viewDidLoad method replacing the previous code: - (void)viewDidLoad { [super viewDidLoad]; vehicles = [Setup generate]; [self->tableView reloadData]; } Again press play and check that the application is still working. How it works… The reason we had to create a class instead of creating a function is that you can only export to Objective-C classes, protocols, properties, and subscripts. Bear that in mind in case of developing with the two languages. If you would like to export a class to Objective-C you have two choices, the first one is inheriting from NSObject and the other one is adding the @objc attribute before your class, protocol, property, or subscript. If you paid attention, our method returns a Swift array but it was converted to an NSArray, but as you might know, they are different kinds of array. Firstly, because Swift arrays are mutable and NSArray are not, and the other reason is that their methods are different. Can we use NSArray in Swift? The answer is yes, but I would recommend avoiding it, imagine once finished migrating to Swift your code still follows the old way, it would be another migration. There's more… Migrating from Objective-C is something that you should do with care, don't try to change the whole application at once, remember that some Swift objects behave differently from Objective-C, for example, dictionaries in Swift have the key and the value types specified but in Objective-C they can be of any type. Replacing the user interface classes At this moment you know how to migrate the model part of an application, however in real life we also have to replace the graphical classes. Doing it is not complicated but it could be a bit full of details. Getting ready Continuing with the previous recipe, make a copy of it or just commit the changes you have and let's continue with our migration. How to do it… First create a new file called MainViewController.swift and start importing the UIKit: import UIKit The next step is creating a class called MainViewController, this class must inherit from UIViewController and implement the protocols UITableViewDataSource and UITableViewDelegate: class MainViewController:UIViewController,UITableViewDataSource, UITableViewDelegate { Then, add the attributes we had in the previous view controller, keep the same name you have used before: private var vehicles = [Car]() @IBOutlet var tableView:UITableView! Next, we need to implement the methods, let's start with the table view data source methods: func tableView(tableView: UITableView, numberOfRowsInSection section: Int) -> Int{ return vehicles.count } func tableView(tableView: UITableView, cellForRowAtIndexPath indexPath: NSIndexPath) -> UITableViewCell{ var cell:UITableViewCell? = self.tableView.dequeueReusableCellWithIdentifier ("vehiclecell") if cell == nil { cell = UITableViewCell(style: .Subtitle, reuseIdentifier: "vehiclecell") } var currentCar = self.vehicles[indexPath.row] cell!.textLabel?.numberOfLines = 1 cell!.textLabel?.text = "Distance (currentCar.distance * 1000) meters" var detailText = "Pax: (currentCar.pax) Fare: (currentCar.fare)" if currentCar is Van{ detailText += ", Volume: ( (currentCar as Van).capacity)" } cell!.detailTextLabel?.text = detailText cell!.imageView?.image = currentCar.image return cell! } Pay attention that this conversion is not 100% equivalent, the fare for example isn't going to be shown with two digits of precision, there is an explanation later of why we are not going to fix this now. The next step is adding the event, in this case we have to do the action done when the user selects a car: func tableView(tableView: UITableView, willSelectRowAtIndexPath indexPath: NSIndexPath) -> NSIndexPath? { var currentCar = self.vehicles[indexPath.row] var time = currentCar.distance / 50.0 * 60.0 UIAlertView(title: "Car booked", message: "The car will arrive in (time) minutes", delegate: nil, cancelButtonTitle: "OK").show() return indexPath } As you can see, we need only do one more step to complete our code, in this case it's the view didLoad. Pay attention that another difference from Objective-C and Swift is that in Swift you have to specify that you are overloading an existing method: override func viewDidLoad() { super.viewDidLoad() vehicles = Setup.generate() self.tableView.reloadData() } } // end of class Our code is complete, but of course our application is still using the old code. To complete this operation, click on the storyboard, if the document outline isn't being displayed, click on the Editor menu and then on Show Document Outline: Now that you can see the document outline, click on View Controller that appears with a yellow circle with a square inside: Then on the right-hand side, click on the identity inspector, next go to the custom class and change the value of the class from ViewController to MainViewController. After that, press play and check that your application is running, select a car and check that it is working. Be sure that it is working with your new Swift class by paying attention on the fare value, which in this case isn't shown with two digits of precision. Is everything done? I would say no, it's a good time to commit your changes. Lastly, delete the original Objective-C files, because you won't need them anymore. How it works… As you can see, it's not so hard replacing an old view controller with a Swift one, the first thing you need to do is create a new view controller class with its protocols. Keep the same names you had on your old code for attributes and methods that are linked as IBActions, it will make the switch very straightforward otherwise you will have to link again. Bear in mind that you need to be sure that your changes are applied and that they are working, but sometimes it is a good idea to have something different, otherwise your application can be using the old Objective-C and you didn't realize it. Try to modernize our code using the Swift way instead of the old Objective-C style, for example, nowadays it's preferable using interpolation rather than using stringWithFormat. We also learned that you don't need to relink any action or outlet if you keep the same name. If you want to change the name of anything you might first keep its original name, test your app, and after that you can refactor it following the traditional factoring steps. Don't delete the original Objective-C files until you are sure that the equivalent Swift file is working on every functionality. There's more… This application had only one view controller, however applications usually have more than one view controller. In this case, the best way you can update them is one by one instead of all of them at the same time. Upgrading the app delegate As you know there is an object that controls the events of an application, which is called application delegate. Usually you shouldn't have much code here, but a few of them you might have. For example, you may deactivate the camera or the GPS requests when your application goes to the background and reactivate them when the app returns active. Certainly it is a good idea to update this file even if you don't have any new code on it, so it won't be a problem in the future. Getting ready If you are using the version control system, commit your changes from the last recipe or if you prefer just copy your application. How to do it… Open the previous application recipe and create a new Swift file called ApplicationDelegate.swift, then you can create a class with the same name. As in our previous class we didn't have any code on the application delegate, so we can differentiate it by printing on the log console. So add this traditional application delegate on your Swift file: class ApplicationDelegate: UIResponder, UIApplicationDelegate { var window: UIWindow? func application(application: UIApplication, didFinishLaunchingWithOptions launchOptions: [NSObject: AnyObject]?) -> Bool { print("didFinishLaunchingWithOptions") return true } func applicationWillResignActive(application: UIApplication) { print("applicationWillResignActive") } func applicationDidEnterBackground(application: UIApplication) { print("applicationDidEnterBackground") } func applicationWillEnterForeground(application: UIApplication) { print("applicationWillEnterForeground") } func applicationDidBecomeActive(application: UIApplication) { print("applicationDidBecomeActive") } func applicationWillTerminate(application: UIApplication) { print("applicationWillTerminate") } } Now go to your project navigator and expand the Supporting Files group, after that click on the main.m file. In this file we are going to import the magic file, the Swift header file: #import "Chapter_8_Vehicles-Swift.h" After that we have to specify that the application delegate is the new class we have, so replace the AppDelegate class on the UIApplicationMain call with ApplicationDelegate. Your main function should be like this: int main(int argc, char * argv[]) { @autoreleasepool { return UIApplicationMain(argc, argv, nil, NSStringFromClass([ApplicationDelegate class])); } } It's time to press play and check whether the application is working or not. Press the home button or the combination shift + command + H if you are using the simulator and again open your application. Have a look that you have some messages on your log console. Now that you are sure that your Swift code is working, remove the original app delegate and its importation on the main.m. Test your app just in case. You could consider that we had finished this part, but actually we still have another step to do: removing the main.m file. Now it is very easy, just click on the ApplicationDelegate.swift file and before the class declaration add the attribute @UIApplicationMain, then right click on the main.h and choose to delete it. Test it and your application is done. How it works… The application delegate class has always been specified at the starting of an application. In Objective-C, it follows the C start point, which is a function called main. In iOS, you can specify the class that you want to use as an application delegate. If you program for OS X the procedure is different, you have to go to your nib file and change its class name to the new one. Why did we have to change the main function and then eliminate it? The reason is that you should avoid massive changes, if something goes wrong you won't know the step where you failed, so probably you will have to rollback everything again. If you do your migration step by step ensuring that it is still working, it means that in case of finding an error, it will be easier to solve it. Avoid doing massive changes on your project, changing step by step will be easier to solve issues. There's more… In this recipe, we learned the last steps of how to migrate an app from Objective-C to Swift code, however we have to remember that programming is not only about applications, you can also have a framework. In the next recipe, we are going to learn how to create your own framework compatible with Swift and Objective-C. Summary This article shows you how Swift and Objective-C can live together and give you a step-by-step guide on how to migrate your Objective-C app to Swift. Resources for Article: Further resources on this subject: Concurrency and Parallelism with Swift 2 [article] Swift for Open Source Developers [article] Your First Swift 2 Project [article]

0
0
14500

article-image-testing-and-debugging-distributed-applications

Packt

01 Apr 2016

21 min read

Testing and Debugging Distributed Applications

Packt

01 Apr 2016

21 min read

0
0
3629

Packt

31 Mar 2016

7 min read

Launching a Spark Cluster

Packt

31 Mar 2016

7 min read

In this article by Omar Khedher, author of OpenStack Sahara Essentials we will use Sahara to create and launch a Spark cluster. Sahara provides several plugins to provision Hadoop clusters on top of OpenStack. We will be using Spark plugins to provision Apache Spark clusters using Horizon. (For more resources related to this topic, see here.) General settings The following diagram illustrates our Spark cluster topology, which includes: One Spark master node: This runs the Spark Master and the HDFS NameNode Three Spark slave nodes: These run a Spark Slave and an HDFS DataNode each Preparing the Spark image The following link provides several Sahara images available for download for different plugins: http://sahara-files.mirantis.com/images/upstream/liberty. Note that the upstream Sahara image files are destined for the OpenStack Liberty release. From Horizon, click on Compute and select Images, click on Create Image and add the new image, as shown here: We will need to upload the downloaded image to Glance so that it can be registered in the Sahara image registry catalog. Make sure that the new image is active. Click on the Data Processing tab and select Image Registry. Click on Register Image to register the new uploaded Glance image to Sahara, as shown here: Click on Done and the new Spark image is ready to start launching the Spark cluster. Creating the Spark master group template Node group templates in Sahara facilitate the configuration of a set of instances that have same properties, such as RAM and CPU. We will start by creating the first node group template for the Spark master. From the Data Processing tab, select Node Group Templates and click on Create Template. Our first node group template will be based on Apache Spark with Version 1.3.1, as shown here: The next wizard will guide to specifying the name of the template, the instance flavor, the storage location, and which floating IP pool will be assigned to the cluster instance: The next tab in same wizard will guide you to selecting which kind of process the nodes in the cluster will run. In our case, the Spark master node group template will include Spark master and HDFS namenode processes, as shown here: The next tab in the wizard exposes more choices regarding the security groups that will be applied for the template cluster nodes: Auto security group: This will automatically create a set of security groups that will be directly applied to the instances of the node group template Default security group: Any existing security groups in the OpenStack environment configured as default will be applied the instances of the node group template The last tab in the wizard exposes more specific HDFS configuration that depend on the available resources of the cluster, such as disk space, CPU and memory: dfs.datanode.handler.count: How many server threads there are for the datanode dfs.datanode.du.reserved: How much of the available disk space will not be taken into account for HDFS use dfs.namenode.handler.count: How many server threads there are for the namenode dfs.datanode.failed.volumes.tolerated: How many volumes are allowed to fail before a datanode instance stops dfs.datanode.max.xcievers: What is the maximum number of threads to be used in order to transfer data to/from the DataNode instance. Name Node Heap Size: How much memory will be assigned to the heap size to handle workload per NameNode instance Data Node Heap Size: How much memory will be assigned to the heap size to handle workload per DataNode instance Creating the Spark slave group template Creating the Spark slave group template will be performed in the same way as the Spark master group template except the assignment of the node processes. The Spark slave nodes will be running Spark slave and HDFS datanode processes, as shown here: Security groups and HDFS parameters can be configured the same as the Spark master node group template. Creating the Spark cluster template Now that we have defined the basic templates for the Spark cluster, we will need to compile both entities into one cluster template. In the Sahara dashboard, select Cluster Templates and click on Create Template. Select Apache Spark as the Plugin name, with version 1.3.1, as follows: Give the cluster template a name and small description. It is also possible to mention which process in the Spark cluster will run in a different compute node for high-availability purposes. This is only valid when you have more than one compute node in the OpenStack environment. The next tab in the same wizard allows you to add the necessary number of Spark instances based on the node group templates created previously. In our case, we will use one master Spark instance and three slave Spark instances, as shown here: The next tab, General Parameters, provides more advanced cluster configuration, including the following: Timeout for disk preparing: The cluster will fail when the duration of formatting and mounting the disk per node exceeds the timeout value. Enable NTP service: This option will enable all the instances of the cluster to synchronized time. An NTP file can be found under /tmp when cluster nodes are active. URL of NTP server: If mentioned, the Spark cluster will use the URL of the NTP server for time synchronization. Heat Wait Condition timeout: Heat will throw an error message to Sahara and the cluster will fail when a node is not able to boot up after a certain amount of time. This will prevent Sahara spawning instances indefinitely. Enable XFS: Allows XFS disk formatting. Decommissioning Timeout: This will throw an error when scaling data nodes in the Spark cluster takes more than the time mentioned. Enable Swift: Allows using Swift object storage to pull and push data during job execution. The Spark Parameters tab allows you to specify the following: Master webui port: Which port will access the Spark master web user interface. Work webui port: Which port will access the Spark slave web user interface. Worker memory: How much memory will be reserved for Spark applications. By default, if all is selected, Spark will use all the available RAM is the instance minus 1 GB. Spark will not run properly when using a flavor having RAM less than 1 GB. Launching the Spark cluster Based on the cluster template, the last step will require you to only push the button Launch Cluster from the Clusters tab in the Sahara dashboard. You will need only to select the plugin name, Apache Spark, with version 1.3.1. Next, you will need to name the new cluster, select the right cluster template created previously, and the base image registered in Sahara. Additionally, if you intend to access the cluster instances via SSH, select an existing SSH keypair. It is also possible to select from which network segment you will be able to manage the cluster instances; in our case, an existing private network, Private_Net10, will be used for this purpose. Launch the cluster; this will take a while to finish spawning four instances forming the Spark cluster. The Spark cluster instances can be listed in the Compute Instances tab, as shown here: Summary In this article, we created a Spark cluster using Sahara in OpenStack by means of the Apache Spark plugin. The provisioned cluster includes one Spark master node and three Spark slave nodes. When the cluster status changes to theactive state, it is possible to start executing jobs. Resources for Article: Further resources on this subject: Introducing OpenStack Trove [article] OpenStack Performance, Availability [article] Monitoring Physical Network Bandwidth Using OpenStack Ceilometer [article]

0
0
2945

How-To Tutorials

Packt

31 Mar 2016

8 min read

Why Mesos?

Packt

31 Mar 2016

8 min read

In this article by Dipa Dubhasi and Akhil Das authors of the book Mastering Mesos, delves into understanding the importance of Mesos. Apache Mesos is an open source, distributed cluster management software that came out of AMPLab, UC Berkeley in 2011. It abstracts CPU, memory, storage, and other computer resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively. It is referred to as a metascheduler (scheduler of schedulers) and a "distributed systems kernel/distributed datacenter OS". It improves resource utilization, simplifies system administration, and supports a wide variety of distributed applications that can be deployed by leveraging its pluggable architecture. It is scalable and efficient and provides a host of features, such as resource isolation and high availability, which, along with a strong and vibrant open source community, makes this one of the most exciting projects. (For more resources related to this topic, see here.) Introduction to the datacenter OS and architecture of Mesos Over the past decade, datacenters have graduated from packing multiple applications into a single server box to having large datacenters that aggregate thousands of servers to serve as a massively distributed computing infrastructure. With the advent of virtualization, microservices, cluster computing, and hyper-scale infrastructure, the need of the hour is the creation of an application-centric enterprise that follows a software-defined datacenter strategy. Currently, server clusters are predominantly managed individually, which can be likened to having multiple operating systems on the PC, one each for processor, disk drive, and so on. With an abstraction model that treats these machines as individual entities being managed in isolation, the ability of the datacenter to effectively build and run distributed applications is greatly reduced. Another way of looking at the situation is comparing running applications in a datacenter to running them on a laptop. One major difference is that while launching a text editor or web browser, we are not required to check which memory modules are free and choose ones that suit our need. Herein lies the significance of a platform that acts like a host operating system and allows multiple users to run multiple applications simultaneously by utilizing a shared set of resources. Datacenters now run varied distributed application workloads, such as Spark, Hadoop, and so on, and need the capability to intelligently match resources and applications. The datacenter ecosystem today has to be equipped to manage and monitor resources and efficiently distribute workloads across a unified pool of resources with the agility and ease to cater to a diverse user base (noninfrastructure teams included). A datacenter OS brings to the table a comprehensive and sustainable approach to resource management and monitoring. This not only reduces the cost of ownership but also allows a flexible handling of resource requirements in a manner that isolated datacenter infrastructure cannot support. The idea behind a datacenter OS is that of an intelligent software that sits above all the hardware in a datacenter and ensures efficient and dynamic resource sharing. Added to this is the capability to constantly monitor resource usage and improve workload and infrastructure management in a seamless way that is not tied to specific application requirements. In its absence, we have a scenario with silos in a datacenter that force developers to build software catering to machine-specific characteristics and make the moving and resizing of applications a highly cumbersome procedure. The datacenter OS acts as a software layer that aggregates all servers in a datacenter into one giant supercomputer to deliver the benefits of multilatency, isolation, and resource control across all microservice applications. Another major advantage is the elimination of human-induced error during the continual assigning and reassigning of virtual resources. From a developer's perspective, this will allow them to easily and safely build distributed applications without restricting them to a bunch of specialized tools, each catering to a specific set of requirements. For instance, let's consider the case of Data Science teams who develop analytic applications that are highly resource intensive. An operating system that can simplify how the resources are accessed, shared, and distributed successfully alleviates their concern about reallocating hardware every time the workloads change. Of key importance is the relevance of the datacenter OS to DevOps, primarily a software development approach that emphasizes automation, integration, collaboration, and communication between traditional software developers and other IT professionals. With a datacenter OS that effectively transforms individual servers into a pool of resources, DevOps teams can focus on accelerating development and not continuously worry about infrastructure issues. In a world where distributed computing becomes the norm, the datacenter OS is a boon in disguise. With freedom from manually configuring and maintaining individual machines and applications, system engineers need not configure specific machines for specific applications as all applications would be capable of running on any available resources from any machine, even if there are other applications already running on them. Using a datacenter OS results in centralized control and smart utilization of resources that eliminate hardware and software silos to ensure greater accessibility and usability even for noninfrastructural professionals. Examples of some organizations administering their hyperscale datacenters via the datacenter OS are Google with the Borg (and next geneneration Omega) systems. The merits of the datacenter OS are undeniable, with benefits ranging from the scalability of computing resources and flexibility to support data sharing across applications to saving team effort, time, and money while launching and managing interoperable cluster applications. It is this vision of transforming the datacenter into a single supercomputer that Apache Mesos seeks to achieve. Born out of a Berkeley AMPLab research paper in 2011, it has since come a long way with a number of leading companies, such as Apple, Twitter, Netflix, and AirBnB among others, using it in production. Mesosphere is a start-up that is developing a distributed OS product with Mesos at its core. The architecture of Mesos Mesos is an open-source platform for sharing clusters of commodity servers between different distributed applications (or frameworks), such as Hadoop, Spark, and Kafka among others. The idea is to act as a centralized cluster manager by pooling together all the physical resources of the cluster and making it available as a single reservoir of highly available resources for all the different frameworks to utilize. For example, if an organization has one 10-node cluster (16 CPUs and 64 GB RAM) and another 5-node cluster (4 CPUs and 16 GB RAM), then Mesos can be leveraged to pool them into one virtual cluster of 720 GB RAM and 180 CPUs, where multiple distributed applications can be run. Sharing resources in this fashion greatly improves cluster utilization and eliminates the need for an expensive data replication process per-framework. Some of the important features of Mesos are: Scalability: It can elastically scale to over 50,000 nodes Resource isolation: This is achieved through Linux/Docker containers Efficiency: This is achieved through CPU and memory-aware resource scheduling across multiple frameworks High availability: This is through Apache ZooKeeper Interface: A web UI for monitoring the cluster state Mesos is based on the same principles as the Linux kernel and aims to provide a highly available, scalable, and fault-tolerant base for enabling various frameworks to share cluster resources effectively and in isolation. Distributed applications are varied and continuously evolving, a fact that leads Mesos' design philosophy towards a thin interface that allows an efficient resource allocation between different frameworks and delegates the task of scheduling and job execution to the frameworks themselves. The two advantages of doing so are: Different frame data replication works can independently devise methods to address their data locality, fault-tolerance, and other such needs. It simplifies the Mesos codebase and allows it to be scalable, flexible, robust, and agile Mesos' architecture hands over the responsibility of scheduling tasks to the respective frameworks by employing a resource offer abstraction that packages a set of resources and makes offers to each framework. The Mesos master node decides the quantity of resources to offer each framework, while each framework decides which resource offers to accept and which tasks to execute on these accepted resources. This method of resource allocation is shown to achieve good degree of data locality for each framework sharing the same cluster. An alternative architecture would implement a global scheduler that took framework requirements, organizational priorities, and resource availability as inputs and provided a task schedule breakdown by framework and resource as output, essentially acting as a matchmaker for jobs and resources with priorities acting as constraints. The challenges with this architecture, such as developing a robust API that could capture all the varied requirements of different frameworks, anticipating new frameworks, and solving a complex scheduling problem for millions of jobs, made the former approach a much more attractive option for the creators. Summary Thus in this article, we introduced Mesos, and then dived deep into its architecture to understand importance of Mesos. Resources for Article: Further resources on this subject: Understanding Mesos Internals [article] Leveraging Python in the World of Big Data [article] Self-service Business Intelligence, Creating Value from Data [article]

0
0
10027

Packt

30 Mar 2016

15 min read

ALM – Developers and QA

Packt

30 Mar 2016

15 min read

0
0
22117

article-image-golang-decorators-logging-time-profiling

Nicholas Maccharoli

30 Mar 2016

6 min read

Golang Decorators: Logging & Time Profiling

Nicholas Maccharoli

30 Mar 2016

6 min read

Golang's imperative world Golang is not, by any means, a functional language; its design remains true to its jingle, which says that it is "C for the 21st Century". One task I tried to do early on in learning the language was search for the map, filter, and reduce functions in the standard library but to no avail. Next, I tried rolling my own versions, but I felt as though I hit a bit of a road block when I discovered that there is no support for generics in the language at the time of writing this. There is, however, support for Higher Order Functions or, more simply put, functions that take other functions as arguments and return functions. If you have spent some time in Python, you may have come to love a design pattern called "Decorator". In fact, decorators make life in Python so great that support for applying them is built right into the language with a nifty @ operator! Python frameworks such as Flask extensively use decorators. If you have little or no experience in Python, fear not for the concept is a design pattern independent of any language. Decorators An alternative name for the decorator pattern is "wrapper", which pretty much sums it all up in one word! A decorator's job is only to wrap a function so that additional code can be executed when the original function is called. This is accomplished by writing a function that takes a function as its argument and returns a function of the same type (Higher Order Functions in action!). While this still calls the original function and passes through its return value, it does something extra along the way. Decorators for logging We can easily log which specific method is passed with a little help from our decorator friends. Say, we wanted to log which user liked a blog post and what the ID of the post was all without touching any code in the original likePost function. Here is our original function: func likePost(userId int, postId int) bool { fmt.Printf("Update Complete!n") return true } Our decorator might look something similar to this: type LikeFunc func(int, int) bool func decoratedLike(f LikeFunc) LikeFunc { return func(userId int, postId int) bool { fmt.Printf("likePost Log: User %v liked post# %vn", userId, postId) return f(userId, postId) } } Note the use of the type definition here. I encourage you to use it for the sake of readability when defining functions with long signatures, such as those of decorators, as you need to type the function signature twice. Now, we can apply the decorator and allow the logging to begin: r := likeStats(likePost) r(1414, 324) r(5454, 324) r(4322, 250) This produces the following output: likePost Log: User 1414 liked post# 324 Update Complete! likePost Log: User 5454 liked post# 324 Update Complete! likePost Log: User 4322 liked post# 250 Update Complete! Our original likePost function still gets called and runs as expected, but now we get an additional log detailing the user and post IDs that were passed to the function each time it was called. Hopefully, this will help speed up debugging our likePost function if and when we encounter strange behavior! Decorators for performance! Say, we run a "Top 10" site and previously, our main sorting routine to find the top 10 cat photos of this week on the Internet was written with Golang's func Sort(data Interface) function from the sort package of the Golang standard library. Everything is fine until we are informed that Fluffy the cat is infuriated that she is coming in at number six on the list and not number five. The cats at ranks five and six on the list both had 5000 likes each, but Fluffy reached 5000 likes a day earlier than Bozo the cat, who is currently higher ranked. We like to give credit where it's due, so we apologize to Fluffy and go on to use the stable version of the func Stable(data Interface) sort, which preserves the order of elements equal in value during the sort. We can improve our code and tests so that this does not happen again (We promised Fluffy!). The tests pass, everything looks great, and we deploy gracefully... or so we think. Over the course of the day, other developers also deploy their changes, and then, after checking our performance reports, we notice a slowdown somewhere. Is it from our switch to stable the sorting? Well, let’s use decorators to measure the performance of both sort functions and check whether there is a noticeable dip in performance. Here’s our testing function: type SortFunc func(sort.Interface) func timedSortFunc(f SortFunc) SortFunc { return func(data sort.Interface) { defer func(t time.Time) { fmt.Printf("--- Time Elapsed: %v ---n", time.Since(t)) }(time.Now()) f(data) } } In case you are unfamiliar with defer, all it does is call the function it is passed right after its calling function returns. The arguments passed to defer are evaluated right away, so the value we get from time.Now() is really the start time of the function! Let’s go ahead and give this test a go: stable := timedSortFunc(sort.Stable) unStable := timedSortFunc(sort.Sort) // 10000 Elements with values ranging // between 0 and 5000 randomCatList1 := randomCatScoreSlice(10000, 5000) randomCatList2 := randomCatScoreSlice(10000, 5000) fmt.Printf("Unstable Sorting Function:n") stable(randomCatList1) fmt.Printf("Stable Sorting Function:n") unStable(randomCatList2) The following output is yielded: Unstable Sorting Function: --- Time Elapsed: 282.889µs --- Stable Sorting Function: --- Time Elapsed: 93.947µs --- Wow! Fluffy's complaint not only made our top 10 list more accurate but now they sort about three times as fast with the stable version of sort as well! (However, we still need to be careful; sort.Stable most likely uses way more memory than the standard sort.Sort function.) Final thoughts Figuring out when and where to apply the decorator pattern is really up to you and your team. There are no hard rules, and you can completely live without it. However, when it comes to things like extra logging or profiling a pesky area of your code, this technique may prove to be a valuable tool. Where is the rest of the code? In order get this example up and running, there is some setup code that was not shown here in order to keep the post from becoming too bloated. I encourage you take a look at this code here if you are interested! About the author Nick Maccharoli is an iOS/backend developer and open source enthusiast working at a start-up in Tokyo and enjoying the current development scene. You can see what he is up to at @din0sr or github.com/nirma.

0
0
19758

How-To Tutorials

article-image-boosting-performance-database

Packt

29 Mar 2016

10 min read

Boosting up the Performance of a Database

Packt

29 Mar 2016

10 min read

In this article by Altaf Hussain, author of the book Learning PHP 7 High Performance we will see how databases play a key role in dynamic websites. All incoming and outgoing data is stored in databases. So if the database for a PHP application is not well-designed and optimized, then it will affect the application performance tremendously. In this article, we will be looking into the ways to optimize our PHP application database. (For more resources related to this topic, see here.) MySQL MySQL is the most used Relational Database Management System (RDMS) for the web. It is open source and has a free community version. It provides all those features, which can be provided by an enterprise-level database. The default settings provided with the MySQL installation may not be so good for performance, and there are always ways to fine-tune settings to get an increased performance. Also, remember that your database design also plays a role in performance. A poorly designed database will have an effect on overall performance. In this article, we will discuss how to improve the MySQL database performance. We will be modifying the MySQL configuration my.cnf file. This file is located in different places in different OSes. Also, if you are using XAMPP, WAMP, and so on, on Windows, this file will be located in those respective folders. Whenever my.cnf is mentioned, it is assumed that the file is open no matter which OS is used. Query Caching Query Caching is an important performance feature of MySQL. It caches SELECT queries along with the resulting dataset. When an identical SELECT query occurs, MySQL fetches the data from memory; hence, the query is executed faster. Thus, this reduces the load on the database. To check whether query cache is enabled on a MySQL server or not, issue the following command in your MySQL command line: SHOW VARIABLES LIKE 'have_query_cache'; This command will display an output, as follows: This result set shows that query cache is enabled. If query cache is disabled, the value will be NO. To enable query caching, open up the my.cnf file and add the following lines. If these lines are present, just uncomment them if they are commented: query_cache_type = 1 query_cache_size = 128MB query_cache_limit = 1MB Save the my.cnf file and restart the MySQL server. Let's discuss what these three configurations mean. query_cache_size The query_cache_size parameter means how much memory will be allocated. Some will think that the more memory used, the better this is; but this is just a misunderstanding. It all depends on the size of the database, the types of queries, and ratios between read and writes, hardware and database traffic, and so on. A good value for query_cache_size is in between 100 MB and 200 MB. Then, monitor the performance and the other previously mentioned variables on which the query cache depends, and adjust the size. We have used 128 MB for a medium range traffic magento website, and it is working perfectly. Set this value to 0 to disable the query cache. query_cache_limit This defines the maximum size of a query dataset to be cached. If the size of a query dataset is larger than this value, it won't be cached. The value of this configuration can be guessed by finding out the largest select query and the size of its returned dataset. query_cache_type The query_cache_type parameter plays a weird role. If query_cache_type is set to 1, then the following may occur: If query_cache_size is 0, then no memory is allocated and query cache is disabled If query_cache_size is greater than 0, then query cache is enabled, memory is allocated, and all queries that do not exceed query_cache_limit and use the SQL_NO_CACHE option will be cached If query_cache_type value is 0, then the following occurs: If query_cache_size is 0, then no memory is allocated and the cache is disabled If query_cache_size is greater than 0, then the memory is allocated, but nothing is cached, that is, the cache is disabled Storage Engines Storage Engines (or Table Types) are a part of core MySQL and are responsible for handling operations on tables. MySQL provides several storage engines, and the two most widely-used are MyISAM and InnoDB. Both storage engines have their own pros and cons, but InnoDB is always prioritized. MySQL started to use InnoDB as its default storage engine starting from version 5.5. MySQL provides some other storage engines, which have their own purposes. During the database design process, which table should use which storage engine can be decided. A complete list of storage engines for MySQL 5.6 can be found at http://dev.mysql.com/doc/refman/5.6/en/storage-engines.html. Storage engine can be set at database level, which will be then used as default storage engine for each newly created table. Note that the storage engine is table-based and different tables can have different storage engines in a single database. What if we have a table already created and we want to change its storage engine? This is easy. Let's say our table name is pkt_users and its storage engine is MyISAM and we want to change it to InnoDB, then we will use the following MySQL command: ALTER TABLE pkt_users ENGINE=INNODB; This will change the storage engine of the table to InnoDB. Now, let's discuss the difference between the two most widely-used storage engines MyISAM and InnoDB. MyISAM A brief list of features that are or are not supported by MyISAM is as follows: MyISAM is designed for speed, which plays best with SELECT statement. If a table is more static, that is, the data in that table is less frequently updated or deleted and mostly the data is only fetched, then MyISAM is best for this table. MyISAM supports table-level locking. If a specific operation needs to be performed on data in a table, then the complete table can be locked. During this lock, no operation can be performed on this table. This can cause performance degradation if the table is more dynamic, that is, the data is frequently changing in the table. MyISAM does not have support for Foreign Keys (FK). MyISAM supports fulltext search. MyISAM does not support transactions. So, there is no support for commit and rollback. If a query on a table is executed, it is executed and there is no coming back. Data compression, Replication, Query Cache, and Data encryption is supported. Cluster database is not supported. InnoDB A brief list of features that are or are not supported by InnoDB is as follows: InnoDB is designed for high reliability and high performance when processing a high volume of data. InnoDB supports row-level locking. It is a good feature and is great for performance. Instead of locking the complete table like MyISAM, it locks only the specific rows for SELECT, DELETE, or UPDATE operations; and during these operations, other data in this table can be manipulated. InnoDB supports Foreign Keys and support forcing Foreign Keys Constraints. Transactions are supported. Commits and rollbacks are possible; hence, data can be recovered from a specific transaction. Data Compression, Replication, Query Cache, and Data encryption is supported. InnoDB can be used in a cluster environment, but it does not have full support. However, the InnoDB tables can be converted to an NDB storage engine, which is used in a MySQL cluster by changing the table engine to NDB. In the following sections, we will discuss some more performance features that are related to InnoDB. Values for the following configuration are set in the my.cnf file. InnoDB_buffer_pool_size This setting defines how much memory should be used for InnoDB data and indexes loaded into memory. For a dedicated MySQL server, the recommended value is 50-80% of the installed memory on the sever. If this value is set to a high value, then there will be no memory left for the operating system and other subsystems of MySQL, such as transaction logs. So, let's open our my.cnf file, search for innodb_buffer_pool_size, and set the value in between the recommended value (50-80%) of our RAM. Innoddb_buffer_pool_instances This feature is not that widely-used. This feature enables multiple buffer pool instances to work together to reduce the chances of memory contentions on 64 bits' system and with a large value for innodb_buffer_pool_size. There are different choices on which the value for innodb_buffer_pool_instances should be calculated. One way is to use one instance per GB of innodb_buffer_pool_size. So, if the value of innodb_bufer_pool_size is 16 GB, we will set innodb_buffer_pool_instances to 16. InnoDB_log_file_size Inno_db_log_file_size is the the size of the log file that stores every query information that has been executed. For a dedicated server, a value up to 4 GB is safe, but the time of crash recovery may increase if the log file size is too big. So, in best practices, it should be kept in between 1 GB to 4 GB. Percona server According to Percona website, "Percona server is a free, fully compatible, enhanced, open source drop-in replacement for MySQL that provides superior performance, scalability, and instrumentation." Percona is a fork of MySQL with enhanced features for performance. All the features available in MySQL are available in Percona. Percona uses an enhanced storage engine, which is called XtraDB. According to the Percona website: "Percona XtraDB is an enhanced version of the InnoDB storage engine for MySQL, which has more features, faster performance, and better scalability on modern hardware. Percona XtraDB uses memory more efficiently in high-load environments." As mentioned previously, XtraDB is a fork of InnoDB, so all features available with InnoDB are available in XtraDB. Installation Percona is only available for Linux systems. It is not available for Windows as of now. In this book, we will install the Percona server on Debian 8. The process is the same for both Ubuntu and Debian. To install the Percona server on other Linux flavors, check out the Percona Installation manual at https://www.percona.com/doc/percona-server/5.5/installation.html. As of now, they provide instructions for Debian, Ubuntu, CentOS, and RHEL. They also provide instructions to install the Percona server from sources and Git. Now, let's install Percona server using the following steps: Open your sources list file using the following command in your terminal: sudo nano /etc/apt/sources.list If prompted for a password, enter your Debian password. The file will be opened. Now, place the following repository information at the end of the sources.list file: deb http://repo.percona.com/apt jessie main deb-src http://repo.percona.com/apt jessie main Save the file by clicking on CTRL + O and close the file by clicking on CTRL + X. Update your system using the following command in terminal: sudo apt-get update Start the installation by issuing the following command in terminal: sudo apt-get install percona-server-server-5.5 The installation will start. The process is the same as the MySQL server installation. During installation, the root password for the Percona server will be asked. You just need to enter it. When the installation is completed, you are ready to use the Percona server in the same way as you would use MySQL. Configure the Percona server and optimize it as discussed in the previous sections. Summary In this article, we studied the MySQL and Percona servers with Query Caching and other MySQL configuration options for performance. We also compared different storage engines and Percona XtraDB. We saw MySQL Workbench Performance monitoring tools as well. Resources for Article: Further resources on this subject: Building a Web Application with PHP and MariaDB – Introduction to caching [article] PHP Magic Features [article] Understanding PHP basics [article]

0
0
3014

How-To Tutorials

article-image-building-product-recommendation-system

Packt

29 Mar 2016

25 min read

Building a Product Recommendation System

Packt

29 Mar 2016

25 min read

0
0
5424

How-To Tutorials

article-image-making-app-react-and-material-design

Soham Kamani

21 Mar 2016

7 min read

Making an App with React and Material Design

Soham Kamani

21 Mar 2016

7 min read

There has been much progression in the hybrid app development space, and also in React.js. Currently, almost all hybrid apps use cordova to build and run web applications on their platform of choice. Although learning React can be a bit of a steep curve, the benefit you get is that you are forced to make your code more modular, and this leads to huge long-term gains. This is great for developing applications for the browser, but when it comes to developing mobile apps, most web apps fall short because they fail to create the "native" experience that so many users know and love. Implementing these features on your own (through playing around with CSS and JavaScript) may work, but it's a huge pain for even something as simple as a material-design-oriented button. Fortunately, there is a library of react components to help us out with getting the look and feel of material design in our web application, which can then be ported to a mobile to get a native look and feel. This post will take you through all the steps required to build a mobile app with react and then port it to your phone using cordova. Prerequisites and dependencies Globally, you will require cordova, which can be installed by executing this line: npm install -g cordova Now that this is done, you should make a new directory for your project and set up a build environment to use es6 and jsx. Currently, webpack is the most popular build system for react, but if that's not according to your taste, there are many more build systems out there. Once you have your project folder set up, install react as well as all the other libraries you would be needing: npm init npm install --save react react-dom material-ui react-tap-event-plugin Making your app Once we're done, the app should look something like this: If you just want to get your hands dirty, you can find the source files here. Like all web applications, your app will start with an index.html file: <html> <head> <title>My Mobile App</title> </head> <body> <div id="app-node"> </div> <script src="bundle.js" ></script> </body> </html> Yup, that's it. If you are using webpack, your CSS will be included in the bundle.js file itself, so there's no need to put "style" tags either. This is the only HTML you will need for your application. Next, let's take a look at index.js, the entry point to the application code: //index.js import React from 'react'; import ReactDOM from 'react-dom'; import App from './app.jsx'; const node = document.getElementById('app-node'); ReactDOM.render( <App/>, node ); What this does is grab the main App component and attach it to the app-node DOM node. Drilling down further, let's look at the app.jsx file: //app.jsx'use strict';import React from 'react';import AppBar from 'material-ui/lib/app-bar';import MyTabs from './my-tabs.jsx';let App = React.createClass({ render : function(){ return ( <div> <AppBar title="My App" /> <MyTabs /> </div> ); }});module.exports = App; Following react's philosophy of structuring our code, we can roughly break our app down into two parts: The title bar The tabs below The title bar is more straightforward and directly fetched from the material-ui library. All we have to do is supply a "title" property to the AppBar component. MyTabs is another component that we have made, put in a different file because of the complexity: 'use strict';import React from 'react';import Tabs from 'material-ui/lib/tabs/tabs';import Tab from 'material-ui/lib/tabs/tab';import Slider from 'material-ui/lib/slider';import Checkbox from 'material-ui/lib/checkbox';import DatePicker from 'material-ui/lib/date-picker/date-picker';import injectTapEventPlugin from 'react-tap-event-plugin';injectTapEventPlugin();const styles = { headline: { fontSize: 24, paddingTop: 16, marginBottom: 12, fontWeight: 400 }};const TabsSimple = React.createClass({ render: () => ( <Tabs> <Tab label="Item One"> <div> <h2 style={styles.headline}>Tab One Template Example</h2> <p> This is the first tab. </p> <p> This is to demonstrate how easy it is to build mobile apps with react </p> <Slider name="slider0" defaultValue={0.5}/> </div> </Tab> <Tab label="Item 2"> <div> <h2 style={styles.headline}>Tab Two Template Example</h2> <p> This is the second tab </p> <Checkbox name="checkboxName1" value="checkboxValue1" label="Installed Cordova"/> <Checkbox name="checkboxName2" value="checkboxValue2" label="Installed React"/> <Checkbox name="checkboxName3" value="checkboxValue3" label="Built the app"/> </div> </Tab> <Tab label="Item 3"> <div> <h2 style={styles.headline}>Tab Three Template Example</h2> <p> Choose a Date:</p> <DatePicker hintText="Select date"/> </div> </Tab> </Tabs> )});module.exports = TabsSimple; This file has quite a lot going on, so let’s break it down step by step: We import all the components that we're going to use in our app. This includes tabs, sliders, checkboxes, and datepickers. injectTapEventPlugin is a plugin that we need in order to get tab switching to work. We decide the style used for our tabs. Next, we make our Tabs react component, which consists of three tabs: The first tab has some text along with a slider. The second tab has a group of checkboxes. The third tab has a pop-up datepicker. Each component has a few keys, which are specific to it (such as the initial value of the slider, the value reference of the checkbox, or the placeholder for the datepicker). There are a lot more properties you can assign, which are specific to each component. Building your App For building on Android, you will first need to install the Android SDK. Now that we have all the code in place, all that is left is building the app. For this, make a new directory, start a new cordova project, and add the Android platform, by running the following on your terminal: mkdir my-cordova-project cd my-cordova-project cordova create . cordova platform add android Once the installation is complete, build the code we just wrote previously. If you are using the same build system as the source code, you will have only two files, that is, index.html and bundle.min.js. Delete all the files that are currently present in the www folder of your cordova project and copy those two files there instead. You can check whether your app is working on your computer by running cordova serve and going to the appropriate address on your browser. If all is well, you can build and deploy your app: cordova build android cordova run android This will build and install the app on your Android device (provided it is in debug mode and connected to your computer). Similarly, you can build and install the same app for iOS or windows (you may need additional tools such as XCode or .NET for iOS or Windows). You can also use any other framework to build your mobile app. The angular framework also comes with its own set of material design components. About the Author Soham Kamani is a full-stack web developer and electronics hobbyist. He is especially interested in JavaScript, Python, and IoT.

0
0
14657

How-To Tutorials

article-image-delegate-pattern-limitations-swift

Anthony Miller

18 Mar 2016

5 min read

Delegate Pattern Limitations in Swift

Anthony Miller

18 Mar 2016

5 min read

0
0
17579

Packt

18 Mar 2016

13 min read

Neutron API Basics

Packt

18 Mar 2016

13 min read

In this article by James Denton, the author of the book OpenStack Networking Essentials, you can see that Neutron is a virtual networking service that allows users to define network connectivity and IP addressing for instances and other cloud resources using an application programmable interface (API). The Neutron API is made up of core elements that define basic network architectures and extensions that extend base functionality. Neutron accomplishes this by virtue of its data model that consists of networks, subnets, and ports. These objects help define characteristics of the network in an easily storable format. (For more resources related to this topic, see here.) These core elements are used to build a logical network data model using information that corresponds to layers 1 through 3 of the OSI model, shown in the following screenshot: For more information on the OSI model, check out the Wikipedia article at https://en.wikipedia.org/wiki/OSI_model. Neutron uses plugins and drivers to identify network features and construct the virtual network infrastructure based on information stored in the database. A core plugin, such as the Modular Layer 2 (ML2) plugin included with Neutron, implements the core Neutron API and is responsible for adapting the logical network described by networks, ports, and subnets into something that can be implemented by the L2 agent and IP address management system running on the hosts. The extension API, provided by service plugins, allows users to manage the following resources, among others: Security groups Quotas Routers Firewalls Load balancers Virtual private networks Neutron's extensibility means that new features can be implemented in the form of extensions and plugins that extend the API without requiring major changes. This allows vendors to introduce features and functionality that would otherwise not be available with the base API. The following diagram demonstrates at a high level how the Neutron API server interacts with the various plugins and agents responsible for constructing the virtual and physical network across the cloud: The previous diagram demonstrates the interaction between the Neutron API service, Neutron plugins and drivers, and services such as the L2 and L3 agents. As network actions are performed by users via the API, the Neutron server publishes messages to the message queue that are consumed by agents. L2 agents build and maintain the virtual network infrastructure, while L3 agents are responsible for building and maintaining Neutron routers and associated functionality. The Neutron API specifications can be found on the OpenStack wiki at https://wiki.openstack.org/wiki/Neutron/APIv2-specification. In the next few sections, we will look at some of the core elements of the API and the data models used to represent those elements. Networks A network is the central object of the Neutron v2.0 API data model and describes an isolated L2 segment. In a traditional infrastructure, machines are connected to switch ports that are often grouped together into virtual local area networks (VLANs) identified by unique IDs. Machines in the same network or VLAN can communicate with one another but cannot communicate with other networks in other VLANs without the use of a router. The following diagram demonstrates how networks are isolated from one another in a traditional infrastructure: Neutron network objects have attributes that describe the network type and the physical interface used for traffic. The attributes also describe the segmentation ID used to differentiate traffic between other networks connected to virtual switches on the underlying host. The following diagram shows how a Neutron network describes various Layer 1 and Layer 2 attributes: Traffic between instances on different hosts requires underlying connectivity between the hosts. This means that the hosts must reside on the same physical switching infrastructure so that VLAN-tagged traffic can pass between them. Traffic between hosts can also be encapsulated using L2-in-L3 technologies such as GRE or VXLAN. Neutron supports multiple L2 methods of segmenting traffic, including using 802.1q VLANs, VXLANs, GRE, and more, depending on the plugin and configured drivers and agents. Devices in the same network are in the same broadcast domain, even though they may reside on different hosts and attach to different virtual switches. Neutron network attributes are very important in defining how traffic between virtual machine instances should be forwarded between hosts. Network attributes The following table describes base attributes associated with network objects, and more details can be found at the Neutron API specifications wiki referenced earlier in this article: Attribute Type Required Default Notes id uuid-str N/A Auto generated The UUID for the network name string no None The human-readable name for the network admin_state_up boolean no True The administrative state of the network status string N/A Null Indicates whether the network is currently operational subnets list no Empty list The subnets associated with the network shared boolean no False Specifies whether the network can be accessed by any tenant tenant_id uuid-str no N/A The owner of the network Networks are typically associated with tenants or projects and are usable by any user that is a member of the same tenant or project. Networks can also be shared with all other projects or a subnet of projects using Neutron's role-based access control (RBAC) functionality. Neutron RBAC first became available in the Liberty release of OpenStack. For more information on using the RBAC features, check out my blog at the following URL: https://developer.rackspace.com/blog/A-First-Look-at-RBAC-in-the-Liberty-Release-of-Neutron/. Provider attributes One of the earliest extensions to the Neutron API is known as the provider extension. The provider network extension maps virtual networks to physical networks by adding additional network attributes that describe the network type, segmentation ID, and physical interface. The following table shows various provider attributes and their associated values: Attribute Type Required Options Default Notes provider:network_type string yes vlan,flat,local, vxlan,gre Based on the configuration provider:segmentation_id int optional Depends on the network type Based on the configuration The segmentation ID range varies among L2 technologies provider:physical_network string optional Provider label Based on the configuration This specifies the physical interface used for traffic (flat or VLAN-only) All networks have provider attributes. However, because provider attributes specify particular network configuration settings and mappings, only users with the admin role can specify them when creating networks. Users without the admin role can still create networks, but the Neutron server, not the user, will determine the type of network created and any corresponding interface or segmentation ID. Additional attributes The external-net extension adds an attribute to networks that is used to determine whether or not the network can be used as the external, or gateway, network for a Neutron router. When set to true, the network becomes eligible for use as a floating IP pool when attached to routers. Using the Neutron router-gateway-set command, routers can be attached to external networks. The following table shows the external network attribute and its associated values: Attribute Type Required Default Notes router:external Boolean no false When true, the network is eligible for use as a floating IP pool when attached to a router Subnets In the Neutron data model, a subnet is an IPv4 or IPv6 address block from which IP addresses can be assigned to virtual machine instances and other network resources. Each subnet must have a subnet mask represented by a classless inter-domain routing (CIDR) address and must be associated with a network, as shown here: In the preceding diagram, three isolated VLAN networks each have a corresponding subnet. Instances and other devices cannot be attached to networks without an associated subnet. Instances connected to a network can communicate among one another but are unable to connect to other networks or subnets without the use of a router. The following diagram shows how a Neutron subnet describes various Layer 3 attributes in the OSI model: When creating subnets, users can specify IP allocation pools that limit which addresses in the subnet are available for allocation. Users can also define a custom gateway address, a list of DNS servers, and individual host routes that can be pushed to virtual machine instances using DHCP. The following table describes attributes associated with subnet objects: Attribute Type Required Default Notes id uuid-str n/a Auto Generated The UUID for the subnet network_id uuid-str Yes N/A The UUID of the associated network name string no None The human-readable name for the subnet ip_version int Yes 4 IP version 4 or 6 cidr string Yes N/A The CIDR address representing the IP address range for the subnet gateway_ip string or null no First address in CIDR The default gateway used by devices in the subnet dns_nameservers list(str) no None The DNS name servers used by hosts in the subnet allocation_pools list(dict) no Every address in the CIDR (excluding the gateway) The subranges of the CIDR available for dynamic allocation. tenant_id uuid-str no N/A The owner of the subnet enable_dhcp boolean no True This indicates whether or not DHCP is enabled for the subnet host_routes list(dict) no N/A Additional static routes Ports In the Neutron data model, a port represents a switch port on a logical switch that spans the entire cloud and contains information about the connected device. Virtual machine interfaces (VMIFs) and other network objects, such as router and DHCP server interfaces, are mapped to Neutron ports. The ports define both the MAC address and the IP address to be assigned to the device associated with them. Each port must be associated with a Neutron network. The following diagram shows how a port describes various Layer 2 attributes in the OSI model: The following table describes attributes associated with port objects: Attribute Type Required Default Notes id uuid-str n/a Auto generated The UUID for the subnet network_id uuid-str Yes N/A The UUID of the associated network name string no None The human-readable name for the subnet admin_state_up Boolean no True The administrative state of the port status string N/A N/A The current status of the port (for example, ACTIVE, BUILD, or DOWN) mac_address string no Auto generated The MAC address of the port fixed_ips list(dict) no Auto allocated The IP address(es) associated with the port device_id string no None The instance ID or other resource associated with the port device_owner string no None tenant_id uuid-str no ID of tenant adding resource The owner of the port When Neutron is first installed, no ports exist in the database. As networks and subnets are created, ports may be created for each of the DHCP servers reflected by the logical switch model, seen here: As instances are created, a single port is created for each network interface attached to the instance, as shown here: A port can only be associated with a single network. Therefore, if an instance is connected to multiple networks, it will be associated with multiple ports. As instances and other cloud resources are created, the logical switch may scale to hundreds or thousands of ports over time, as shown in the following diagram: There is no limit to the number of ports that can be created in Neutron. However, quotas exist that limit tenants to a small number of ports that can be created. As the number of Neutron ports scale out, the performance of the Neutron API server and the implementation of networking across the cloud may degrade over time. It's a good idea to keep quotas in place to ensure a high-performing cloud, but the defaults and subsequent quota increases should be kept reasonable. The Neutron workflow In the standard Neutron workflow, networks must be created first, followed by subnets and then ports. The following sections describe the workflows involved with booting and deleting instances. Booting an instance Before an instance can be created, it must be associated with a network that has a corresponding subnet or a precreated port that is associated with a network. The following process documents the steps involved in booting an instance and attaching it to a network: The user creates a network. The user creates a subnet and associates it with the network. The user boots a virtual machine instance and specifies the network. Nova interfaces with Neutron to create a port on the network. Neutron assigns a MAC address and IP address to the newly created port using attributes defined by the subnet. Nova builds the instance's libvirt XML file containing local network bridge and MAC address information and starts the instance. The instance sends a DHCP request during boot, at which point the DHCP server responds with the IP address corresponding to the MAC address of the instance. If multiple network interfaces are attached to an instance, each network interface will be associated with a unique Neutron port and may send out DHCP requests to retrieve their respective network information. How the logical model is implemented Neutron agents are services that run on network and compute nodes and are responsible for taking information described by networks, subnets, and ports and using it to implement the virtual and physical network infrastructure. In the Neutron database, the relationship between networks, subnets, and ports can be seen in the following diagram: This information is then implemented on the compute node by way of virtual network interfaces, virtual switches or bridges, and IP addresses, as shown in the following diagram: In the preceding example, the instance was connected to a network bridge on a compute node that provides connectivity from the instance to the physical network. For now, it's only necessary to know how the data model is implemented into something that is usable. Deleting an instance The following process documents the steps involved in deleting an instance: The user destroys virtual machine instance. Nova interfaces with Neutron to destroy the ports associated with the instances. Nova deletes local instance data. The allocated IP and MAC addresses are returned to the pool. When instances are deleted, Neutron removes all virtual network connections from the respective compute node and removes corresponding port information from the database. Summary In this article, we looked at the basics of the Neutron API and its data model made up of networks, subnets, and ports. These objects were used to describe in a logical way how the virtual network is architected and implemented across the cloud. Resources for Article: Further resources on this subject: Introducing OpenStack Trove[article] Concepts for OpenStack[article] Monitoring OpenStack Networks[article]

0
0
3426

Proxmox VE Fundamentals

Morphology – Getting Our Feet Wet

How To Get Started with Redux in React Native

Machine Learning Tasks

Integrating with Objective-C

Testing and Debugging Distributed Applications

Launching a Spark Cluster

Why Mesos?

ALM – Developers and QA

Golang Decorators: Logging & Time Profiling

Trending Topics

Boosting up the Performance of a Database

Building a Product Recommendation System

Making an App with React and Material Design

Delegate Pattern Limitations in Swift

Neutron API Basics

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access