Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7013 Articles
article-image-mark-zuckerberg-congressional-testimony-5-things-learned
Richard Gall
11 Apr 2018
8 min read
Save for later

Mark Zuckerberg's Congressional testimony: 5 things we learned

Richard Gall
11 Apr 2018
8 min read
Mark Zuckerberg yesterday (April 10 2018) testified in front of congress. That's a pretty big deal. Congress has been waiting some time for the chance to grill the Facebook chief, with "Zuck" resisting. So the fact that he finally had his day in D.C. indicates the level of pressure currently on him. Some have lamented the fact that senators were given so little time to respond to Zuckerberg - there was no time to really get deep into the issues at hand. However, although it's true that there was a lot that was superficial about the event, if you looked closely, there was plenty to take away from it. Here are the 5 of the most important things we learned from Mark Zuckerberg's testimony in front of Congress. Policy makers don't really understand that much about tech The most shocking thing to come out of Zuckerberg's testimony was unsurprising; the fact that some of the most powerful people in the U.S. don't really understand the technology that's being discussed. More importantly this is technology they're going to have to be making decisions on. One Senator brought printouts of Facebook pages and asked Zuckerberg if these were examples of Russian propaganda groups. Another was confused about Facebook's business model - how could it run a free service and still make money? Those are just two pretty funny examples, but the senators' lack of understanding could be forgiven due to their age. However, there surely isn't any excuse for 45 year old Senator Brian Schatz to misunderstand the relationship between Whatsapp and Facebook. https://twitter.com/pdmcleod/status/983809717116993537 Chris Cillizza argued on CNN that "the senate's tech illiteracy saved Zuckerberg". He explained: The problem was that once Zuckerberg responded - and he largely stuck to a very strict script in doing so - the lack of tech knowledge among those asking him questions was exposed. The result? Zuckerberg was rarely pressed, rarely forced off his talking points, almost never made to answer for the very real questions his platform faces. This lack of knowledge led to proceedings being less than satisfactory for onlookers. Until this knowledge gap is tackled, it's always going to be a challenge for political institutions to keep up with technological innovators. Ultimately, that's what makes regulation hard. Zuckerberg is still held up as the gatekeeper of tech in 2018 Zuckerberg is still held up as a gatekeeper or oracle of modern technology. That is probably a consequence of the point above. Because there's such a knowledge gap within the institutions that govern and regulate, it's more manageable for them to look to a figurehead. That, of course, goes both ways - on the one hand Zuckerberg is a fountain of knowledge, someone who can solve these problems. On the other hand is part of a Silicon Valley axis of evil, nefariously plotting the downfall of democracy and how to read your WhatsApp messages. Most people know that neither is true. The key point, though, is that however you feel about Zuckerberg, he is not the man you need to ask about regulation. This is something that Zephy Teachout argues on the Guardian. "We shouldn’t be begging for Facebook’s endorsement of laws, or for Mark Zuckerberg’s promises of self-regulation" she writes. In fact, one of the interesting subplots of the hearing was the fact that Zuckerberg didn't actually know that much. For example, a lot has been made of how extensive his notes were. And yes, you certainly would expect someone facing a panel of Senators in Washington to be well-briefed. But it nevertheless underlines an important point - the fact that Facebook is a complex and multi-faceted organization that far exceeds the knowledge of its founder and CEO. In turn, this tells you something about technology that's often lost within the discourse: the fact that its hard to consider what's happening at a superficial or abstract level without completely missing the point. There's a lot you could say about Zuckerberg's notes. One of the most interesting was the point around GDPR. The note is very prescriptive: it says "Don't say we already do what GDPR requires." Many have noted that this throws up a lot of issues, not least how Facebook plan to tackle GDPR in just over a month if they haven't moved on it already. But it's the suggestion that Zuckerberg was completely unaware of the situation that is most remarkable here. He doesn't even know where his company is on one of the most important pieces of data legislation for decades. Facebook is incredibly naive If senators were often naive - or plain ignorant - on matters of technology - during the hearing, there was plenty of evidence to indicate that Zuckerberg is just as naive. The GDPR issue mentioned above is just one example. But there are other problems too. You can't, for example, get much more naive than thinking that Cambridge Analytica had deleted the data that Facebook had passed to it. Zuckerberg's initial explanation was that he didn't realize that Cambridge Analytica was "not an app developer or advertiser", but he corrected this saying that his team told him they were an advertiser back in 2015, which meant they did have reason to act on it but chose not to. Zuckerberg apologized for this mistake, but it's really difficult to see how this would happen. There almost appears to be a culture of naivety within Facebook, whereby the organization generally, and Zuckerberg specifically, don't fully understand the nature of the platform it has built and what it could be used for. It's only now, with Zuckerberg talking about an "arms race" with Russia that this naivety is disappearing. But its clear there was an organizational blindspot that has got us to where we are today. Facebook still thinks AI can solve all of its problems The fact that Facebook believes AI is the solution to so many of its problems is indicative of this ingrained naivety. When talking to Congress about the 'arms race' with Russian intelligence, and the wider problem of hate speech, Zuckerberg signaled that the solution lies in the continued development of better AI systems. However, he conceded that building systems actually capable of detecting such speech could be 5 to 10 years away. This is a problem. It's proving a real challenge for Facebook to keep up with the 'misuse' of its platform. Foreign Policy reports that: "...just last week, the company took down another 70 Facebook accounts, 138 Facebook pages, and 65 Instagram accounts controlled by Russia’s Internet Research Agency, a baker’s dozen of whose executives and operatives have been indicted by Special Counsel Robert Mueller for their role in Russia’s campaign to propel Trump into the White House." However, the more AI comes to be deployed on Facebook, the more that the company is going to have to rethink how it describes itself. By using algorithms to regulate the way the platform is used, there comes to be an implicit editorializing of content. That's not necessarily a bad thing, but it does mean we again return to this final problem... There's still confusion about the difference between a platform and a publisher Central to every issue that was raised in Zuckerberg's testimony was the fact that Facebook remains confused about whether it is a platform or a publisher. Or, more specifically, the extent to which it is responsible for the content on the platform. It's hard to single out Zuckerberg here because everyone seems to be confused on this point. But it's interesting that he seems to have never really thought about the problem. That does seem to be changing, however. In his testimony, Zuckerberg said that "Facebook was responsible" for the content on its platforms. This statement marks a big change from the typical line used by every social media platform - that platforms are just platforms, they bear no responsibility for what is published on them. However, just when you think Zuckerberg is making a definitive statement, he steps back. He went on to say that "I agree that we are responsible for the content, but we don't produce the content." This statement hints that he still wants to keep the distinction between platform and publisher. Unfortunately for Zuckerberg, that might be too late. Read Next OpenAI charter puts safety, standards, and transparency first ‘If tech is building the future, let’s make that future inclusive and representative of all of society’ – An interview with Charlotte Jee What your organization needs to know about GDPR 20 lessons on bias in machine learning systems by Kate Crawford at NIPS 2017
Read more
  • 0
  • 0
  • 17950

article-image-5-things-consider-developing-ecommerce-website
Johti Vashisht
11 Apr 2018
7 min read
Save for later

5 things to consider when developing an eCommerce website

Johti Vashisht
11 Apr 2018
7 min read
Online businesses are booming and rightly so – this year it is expected that 18% of all UK retail purchases will occur online this year. That's partly because eCommerce website development has got easy - almost anyone can do it. But hubris might be your downfall; there are a number of important things to consider before you start to building your eCommerce website. This is especially true if you want customers to keep coming back to your site. We’ve compiled a list of things to keep in mind for when you are ready to build an eCommerce store. eCommerce website development begins with the right platform and brilliant Design Platform Before creating your eCommerce website, you need to decide which platform to create the website on. There are a variety of content management systems including WordPress, Joomla and Magento. Wordpress is a versatile and easy to use platform which also supports a large number of plugins so it may be suitable if you are offering services or only a few products. Platforms such as Magento have been created specifically for eCommerce use. If you are thinking of opening up an online store with many products then Magento is the best option as it is easier to manage your products. Design When designing your website, use a clean, simple design rather than one with too many graphics and incorporate clear call to actions. Another thing to take into account is whether you want to create your own custom theme or choose a preselected theme and build upon it. Although it can be pricier, a custom theme allows you to add custom functionality to your website that a standard pre-made theme may not have. In contrast, pre-made themes will be much cheaper or in most cases free. If you are choosing a pre-made theme, then be sure to check that it is regularly updated and that they have support contact details in case of any queries. Your website design should also be responsive so that your website can be viewed correctly across multiple platforms and operating systems. Your eCommerce website needs to be secure A secure website is beneficial for both you and your customers. With a growing number of websites being hacked and data being stolen, security is the one part of a website you cannot skip out on. An SSL (Secure Sockets Layer) certificate is essential to get for your website, as not only does it allow for a secure connection over which personal data can be transmitted, it also provides authentication so that customers know it’s safe to make purchases on your website.  SSL certificates are mandatory if you collect private information from customers via forms. HTTPS – (Hyper Text Transfer Protocol Secure) is an encrypted standard for website client communications. In order to for HTTP to become HTTPS, data is wrapped into secure SSL packets before being sent and after receiving the data. As well as securing data, HTTPS may also be used for search ranking purposes. If you utilise HTTPS, you will have a slight boost in ranking over competitor websites that do not utilise HTTPS. eCommerce plugins make adding features to your site easier If you have decided to use Wordpress to create your eCommerce website then there are a number of eCommerce plugins available to help you create your online store. Top eCommerce plugins include WooCommerce, Shopify, Shopp and Easy Digital Downloads. SEO attracts organic traffic to your eCommerce site If you want potential customers to see your products before that of competitors then optimising your website pages will aid in trying to be on the first page of search results. Undertake a keyword research to get the words that potential customers are most commonly using to find the products you offer. Google’s keyword planner is quite helpful in managing your keyword research. You can then add relevant words to your product names and descriptions. Revisit these keywords occasionally to update them and experiment with which keywords work better. You can improve your rankings with good page titles that include relevant keywords. Although meta descriptions do not improve ranking, it’s good to add useful meta descriptions as a better description may draw more clicks. Also ensure that the product URLs mirror what the product is and isn’t unnecessarily long. Other things to consider when building an eCommerce website You may wish to consider additional features in order to increase your chance of returning visitors: Site speed If your website is slow then it’s likely that customers may not return for a repeat purchase if it takes too long for a product to load. They’ll simply visit a competitor website that loads much faster. There are a few things you can do to speed up your website including caching and using in memory technology for certain things rather than constantly accessing the database. You could also use fast hosting servers to meet traffic requirements. Site speed is also an important SEO consideration. Guest checkout 23% of shoppers will abandon their shopping basket if they are forced to register an account. Make it easier for customers to purchase items with guest checkout. Some customers may not wish to create an account as they may be limited for time. Create a smooth, quick transaction process by adding the option of a guest checkout. Once they have completed their checkout, you can ask them if they would like to create an account. Site search Utilise search functionality to allow users to search for products with the ability to filter products through a variety of options (if applicable). Pain points Address potential concerns customers may have before purchasing your products by displaying information they may have concerns or queries about. This can include delivery options and whether free returns are offered. Mobile optimization In 2017 almost 59% of ecommerce sales occurred via mobile. There is an increasing number of users who now shop online using their smart phones and this trend will most likely grow. That’s why optimising your website for mobile is a must. User-generated reviews and testimonials Use social proof on your website with user reviews and testimonials. If a potential customer reads customer reviews then they are more likely to purchase a product. However, user-generated reviews can go both ways – a user may also post a negative review which may not be good for your website/online store. Related items Showing related items under a product is useful for customers who are looking for an item but may not have decided what type of that particular product they want. This is also useful for when the main product is out of stock. FAQs section Creating an FAQ section with common questions is very useful and saves both the customer and company time as basic queries can be answered by looking at the FAQ page. If you're starting out, good luck! Yes, in 2018 eCommerce website development is pretty easy thanks to the likes of Shopify, WooCommerce, and Magento among others. But as you can see, there’s plenty you need to consider. By incorporating most of these points, you will be able to create an ecommerce website that users will be able to navigate through easily and find the products or services they are looking for.
Read more
  • 0
  • 0
  • 30144

article-image-what-is-lstm
Richard Gall
11 Apr 2018
3 min read
Save for later

What is LSTM?

Richard Gall
11 Apr 2018
3 min read
What does LSTM stand for? LSTM stands for long short term memory. It is a model or architecture that extends the memory of recurrent neural networks. Typically, recurrent neural networks have 'short term memory' in that they use persistent previous information to be used in the current neural network. Essentially, the previous information is used in the present task. That means we do not have a list of all of the previous information available for the neural node. Find out how LSTM works alongside recurrent neural networks. Watch this short video tutorial. How does LSTM work? LSTM introduces long-term memory into recurrent neural networks. It mitigates the vanishing gradient problem, which is where the neural network stops learning because the updates to the various weights within a given neural network become smaller and smaller. It does this by using a series of 'gates'. These are contained in memory blocks which are connected through layers, like this: There are three types of gates within a unit: Input Gate: Scales input to cell (write) Output Gate: Scales output to cell (read) Forget Gate: Scales old cell value (reset) Each gate is like a switch that controls the read/write, thus incorporating the long-term memory function into the model. Applications of LSTM There are a huge range of ways that LSTM can be used, including: Handwriting recognition Time series anomaly detection Speech recognition Learning grammar Composing music The difference between LSTM and GRU There are many similarities between LSTM and GRU (Gated Recurrent Units). However, there are some important differences that are worth remembering: A GRU has two gates, whereas an LSTM has three gates. GRUs don't possess any internal memory that is different from the exposed hidden state. They don't have the output gate, which is present in LSTMs. There is no second nonlinearity applied when computing the output in GRU. GRU as a concept, is a little newer than LSTM. It is generally more efficient - it trains models at a quicker rate than LSTM. It is also easier to use. Any modifications you need to make to a model can be done fairly easily. However, LSTM should perform better than GRU where longer term memory is required. Ultimately, comparing performance is going to depend on the data set you are using. 4 ways to enable Continual learning into Neural Networks Build a generative chatbot using recurrent neural networks (LSTM RNNs) How Deep Neural Networks can improve Speech Recognition and generation
Read more
  • 0
  • 0
  • 48111

article-image-applying-spring-security-using-json-web-token-jwt
Vijin Boricha
10 Apr 2018
9 min read
Save for later

Applying Spring Security using JSON Web Token (JWT)

Vijin Boricha
10 Apr 2018
9 min read
Today, we will learn about spring security and how it can be applied in various forms using powerful libraries like JSON Web Token (JWT). Spring Security is a powerful authentication and authorization framework, which will help us provide a secure application. By using Spring Security, we can keep all of our REST APIs secured and accessible only by authenticated and authorized calls. Authentication and authorization Let's look at an example to explain this. Assume you have a library with many books. Authentication will provide a key to enter the library; however, authorization will give you permission to take a book. Without a key, you can't even enter the library. Even though you have a key to the library, you will be allowed to take only a few books. JSON Web Token (JWT) Spring Security can be applied in many forms, including XML configurations using powerful libraries such as Jason Web Token. As most companies use JWT in their security, we will focus more on JWT-based security than simple Spring Security, which can be configured in XML. JWT tokens are URL-safe and web browser-compatible especially for Single Sign-On (SSO) contexts. JWT has three parts: Header Payload Signature The header part decides which algorithm should be used to generate the token. While authenticating, the client has to save the JWT, which is returned by the server. Unlike traditional session creation approaches, this process doesn't need to store any cookies on the client side. JWT authentication is stateless as the client state is never saved on a server. JWT dependency To use JWT in our application, we may need to use the Maven dependency. The following dependency should be added in the pom.xml file. You can get the Maven dependency from: https://mvnrepository.com/artifact/javax.xml.Bind. We have used version 2.3.0 of the Maven dependency in our application: <dependency> <groupId>javax.xml.bind</groupId> <artifactId>jaxb-api</artifactId> <version>2.3.0</version> </dependency> Note: As Java 9 doesn't include DataTypeConverter in their bundle, we need to add the preceding configuration to work with DataTypeConverter. We will cover DataTypeConverter in the following section. Creating a Jason Web Token To create a token, we have added an abstract method called createToken in our SecurityService interface. This interface will tell the implementing class that it has to create a complete method for createToken. In the createToken method, we will use only the subject and expiry time as these two options are important when creating a token. At first, we will create an abstract method in the SecurityService interface. The concrete class (whoever implements the SecurityService interface) has to implement the method in their class: public interface SecurityService { String createToken(String subject, long ttlMillis); // other methods } In the preceding code, we defined the method for token creation in the interface. SecurityServiceImpl is the concrete class that implements the abstract method of the SecurityService interface by applying the business logic. The following code will explain how JWT will be created by using the subject and expiry time: private static final String secretKey= "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX"; @Override public String createToken(String subject, long ttlMillis) { if (ttlMillis <= 0) { throw new RuntimeException("Expiry time must be greater than Zero :["+ttlMillis+"] "); } // The JWT signature algorithm we will be using to sign the token SignatureAlgorithm signatureAlgorithm = SignatureAlgorithm.HS256; byte[] apiKeySecretBytes = DatatypeConverter.parseBase64Binary(secretKey); Key signingKey = new SecretKeySpec(apiKeySecretBytes, signatureAlgorithm.getJcaName()); JwtBuilder builder = Jwts.builder() .setSubject(subject) .signWith(signatureAlgorithm, signingKey); long nowMillis = System.currentTimeMillis(); builder.setExpiration(new Date(nowMillis + ttlMillis)); return builder.compact(); } The preceding code creates the token for the subject. Here, we have hardcoded the secret key "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX " to simplify the token creation process. If needed, we can keep the secret key inside the properties file to avoid hard code in the Java code. At first, we verify whether the time is greater than zero. If not, we throw the exception right away. We are using the SHA-256 algorithm as it is used in most applications. Note: Secure Hash Algorithm (SHA) is a cryptographic hash function. The cryptographic hash is in the text form of a data file. The SHA-256 algorithm generates an almost-unique, fixed-size 256-bit hash. SHA-256 is one of the more reliable hash functions. We have hardcoded the secret key in this class. We can also store the key in the application.properties file. However to simplify the process, we have hardcoded it: private static final String secretKey= "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX"; We are converting the string key to a byte array and then passing it to a Java class, SecretKeySpec, to get a signingKey. This key will be used in the token builder. Also, while creating a signing key, we use JCA, the name of our signature algorithm. Note: Java Cryptography Architecture (JCA) was introduced by Java to support modern cryptography techniques. We use the JwtBuilder class to create the token and set the expiration time for it. The following code defines the token creation and expiry time setting option: JwtBuilder builder = Jwts.builder() .setSubject(subject) .signWith(signatureAlgorithm, signingKey); long nowMillis = System.currentTimeMillis(); builder.setExpiration(new Date(nowMillis + ttlMillis)); We will have to pass time in milliseconds while calling this method as the setExpiration takes only milliseconds. Finally, we have to call the createToken method in our HomeController. Before calling the method, we will have to autowire the SecurityService as follows: @Autowired SecurityService securityService; The createToken call is coded as follows. We take the subject as the parameter. To simplify the process, we have hardcoded the expiry time as 2 * 1000 * 60 (two minutes). HomeController.java: @Autowired SecurityService securityService; @ResponseBody @RequestMapping("/security/generate/token") public Map<String, Object> generateToken(@RequestParam(value="subject") String subject){ String token = securityService.createToken(subject, (2 * 1000 * 60)); Map<String, Object> map = new LinkedHashMap<>(); map.put("result", token); return map; } Generating a token We can test the token by calling the API in a browser or any REST client. By calling this API, we can create a token. This token will be used for user authentication-like purposes. Sample API for creating a token is as follows: http://localhost:8080/security/generate/token?subject=one Here we have used one as a subject. We can see the token in the following result. This is how the token will be generated for all the subjects we pass to the API: { result: "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJvbmUiLCJleHAiOjE1MDk5MzY2ODF9.GknKcywiIG4- R2bRmBOsjomujP0MxZqdawrB8TO3P4" } Note: JWT is a string that has three parts, each separated by a dot (.). Each section is base-64 encoded. The first section is the header, which gives a clue about the algorithm used to sign the JWT. The second section is the body, and the final section is the signature. Getting a subject from a Jason Web Token So far, we have created a JWT token. Here, we are going to decode the token and get the subject from it. In a future section, we will talk about how to decode and get the subject from the token. As usual, we have to define the method to get the subject. We will define the getSubject method in SecurityService. Here, we will create an abstract method called getSubject in the SecurityService interface. Later, we will implement this method in our concrete class: String getSubject(String token); In our concrete class, we will implement the getSubject method and add our code in the SecurityServiceImpl class. We can use the following code to get the subject from the token: @Override public String getSubject(String token) { Claims claims = Jwts.parser() .setSigningKey(DatatypeConverter.parseBase64Binary(secretKey)) .parseClaimsJws(token).getBody(); return claims.getSubject(); } In the preceding method, we use the Jwts.parser to get the claims. We set a signing key by converting the secret key to binary and then passing it to a parser. Once we get the Claims, we can simply get the subject by calling getSubject. Finally, we can call the method in our controller and pass the generated token to get the subject. You can check the following code, where the controller is calling the getSubject method and returning the subject in the HomeController.java file: @ResponseBody @RequestMapping("/security/get/subject") public Map<String, Object> getSubject(@RequestParam(value="token") String token){ String subject = securityService.getSubject(token); Map<String, Object> map = new LinkedHashMap<>(); map.put("result", subject); return map; } Getting a subject from a token Previously, we created the code to get the token. Here we will test the method we created previously by calling the get subject API. By calling the REST API, we will get the subject that we passed earlier. Sample API: http://localhost:8080/security/get/subject?token=eyJhbGciOiJIUzI1NiJ9.eyJzd WIiOiJvbmUiLCJleHAiOjE1MDk5MzY2ODF9.GknKcywiI-G4- R2bRmBOsjomujP0MxZqdawrB8TO3P4 Since we used one as the subject when creating the token by calling the generateToken method, we will get "one" in the getSubject method: { result: "one" } Note: Usually, we attach the token in the headers; however, to avoid complexity, we have provided the result. Also, we have passed the token as a parameter to get the subject. You may not need to do it the same way in a real application. This is only for demo purposes. This article is an excerpt from the book Building RESTful Web Services with Spring 5 - Second Edition, written by Raja CSP Raman. This book involves techniques to deal with security in Spring and shows how to implement unit test and integration test strategies. You may also like How to develop RESTful web services in Spring, another tutorial from this book. Check out other posts on Spring Security: Spring Security 3: Tips and Tricks Opening up to OpenID with Spring Security Migration to Spring Security 3  
Read more
  • 0
  • 0
  • 33632

article-image-sql-server-user-management
Vijin Boricha
10 Apr 2018
8 min read
Save for later

Get SQL Server user management right

Vijin Boricha
10 Apr 2018
8 min read
The question who are you sounds pretty simple, right? Well, possibly not where philosophy is concerned, and neither is it where databases are concerned either. But user management is essential for anyone managing databases. In this tutorial, learn how SQL server user management works - and how to configure it in the right way. SQL Server user management: the authentication process During the setup procedure, you have to select a password which actually uses the SQL Server authentication process. This database engine comes from Windows and it is tightly connected with Active Directory and internal Windows authentication. In this phase of development, SQL Server on Linux only supports SQL authentication. SQL Server has a very secure entry point. This means no access without the correct credentials. Every information system has some way of checking a user's identity, but SQL Server has three different ways of verifying identity, and the ability to select the most appropriate method, based on individual or business needs. When using SQL Server authentication, logins are created on SQL Server. Both the user name and the password are created by using SQL Server and stored in SQL Server. Users connecting through SQL Server authentication must provide their credentials every time that they connect (user name and password are transmitted through the network). Note: When using SQL Server authentication, it is highly recommended to set strong passwords for all SQL Server accounts.  As you'll have noticed, so far you have not had any problems accessing SQL Server resources. The reason for this is very simple. You are working under the sa login. This login has unlimited SQL Server access. In some real-life scenarios, sa is not something to play with. It is good practice to create a login under a different name with the same level of access. Now let's see how to create a new SQL Server login. But, first, we'll check the list of current SQL Server logins. To do this, access the sys.sql_logins system catalog view and three attributes: name, is_policy_checked, and is_expiration_checked. The attribute name is clear; the second one will show the login enforcement password policy; and the third one is for enforcing account expiration. Both attributes have a Boolean type of value: TRUE or FALSE (1 or 0). Type the following command to list all SQL logins: 1> SELECT name, is_policy_checked, is_expiration_checked 2> FROM sys.sql_logins 3> WHERE name = 'sa' 4> GO name is_policy_checked is_expiration_checked -------------- ----------------- --------------------- sa 1 0 (1 rows affected) 2. If you want to see what your password for the sa login looks like, just type this version of the same statement. This is the result of the hash function: 1> SELECT password_hash 2> FROM sys.sql_logins 3> WHERE name = 'sa' 4> GO password_hash ------------------------------------------------------------- 0x0200110F90F4F4057F1DF84B2CCB42861AE469B2D43E27B3541628 B72F72588D36B8E0DDF879B5C0A87FD2CA6ABCB7284CDD0871 B07C58D0884DFAB11831AB896B9EEE8E7896 (1 rows affected) 3. Now let's create the login dba, which will require a strong password and will not expire: 1> USE master 2> GO Changed database context to 'master'. 1> CREATE LOGIN dba 2> WITH PASSWORD ='S0m3c00lPa$$', 3> CHECK_EXPIRATION = OFF, 4> CHECK_POLICY = ON 5> GO 4. Re-check the dba on the login list: 1> SELECT name, is_policy_checked, is_expiration_checked 2> FROM sys.sql_logins 3> WHERE name = 'dba' 4> GO name is_policy_checked is_expiration_checked ----------------- ----------------- --------------------- dba 1 0 (1 rows affected) Notice that dba logins do not have any kind of privilege. Let's check that part. First close your current sqlcmd session by typing exit. Now, connect again but, instead of using sa, you will connect with the dba login. After the connection has been successfully created, try to change the content of the active database to AdventureWorks. This process, based on the login name, should looks like this: # dba@tumbleweed:~> sqlcmd -S suse -U dba Password: 1> USE AdventureWorks 2> GO Msg 916, Level 14, State 1, Server tumbleweed, Line 1 The server principal "dba" is not able to access the database "AdventureWorks" under the current security context As you can see, the authentication process will not grant you anything. Simply, you can enter the building but you can't open any door. You will need to pass the process of authorization first. Authorization process After authenticating a user, SQL Server will then determine whether the user has permission to view and/or update data, view metadata, or perform administrative tasks (server-side level, database-side level, or both). If the user, or a group to which the user is amember, has some type of permission within the instance and/or specific databases, SQL Server will let the user connect. In a nutshell, authorization is the process of checking user access rights to specific securables. In this phase, SQL Server will check the login policy to determine whether there are any access rights to the server and/or database level. Login can have successful authentication, but no access to the securables. This means that authentication is just one step before login can proceed with any action on SQL Server. SQL Server will check the authorization process on every T-SQL statement. In other words, if a user has SELECT permissions on some database, SQL Server will not check once and then forget until the next authentication/authorization process. Every statement will be verified by the policy to determine whether there are any changes. Permissions are the set of rules that govern the level of access that principals have to securables. Permissions in an SQL Server system can be granted, revoked, or denied. Each of the SQL Server securables has associated permissions that can be granted to each Principal. The only way a principal can access a resource in an SQL Server system is if it is granted permission to do so. At this point, it is important to note that authentication and authorization are two different processes, but they work in conjunction with one another. Furthermore, the terms login and user are to be used very carefully, as they are not the same: Login is the authentication part User is the authorization part Prior to accessing any database on SQL Server, the login needs to be mapped as a user. Each login can have one or many user instances in different databases. For example, one login can have read permission in AdventureWorks and write permission in WideWorldImporters. This type of granular security is a great SQL Server security feature. A login name can be the same or different from a user name in different databases. In the following lines, we will create a database user dba based on login dba. The process will be based on the AdventureWorks database. After that we will try to enter the database and execute a SELECT statement on the Person.Person table: dba@tumbleweed:~> sqlcmd -S suse -U sa Password: 1> USE AdventureWorks 2> GO Changed database context to 'AdventureWorks'. 1> CREATE USER dba 2> FOR LOGIN dba 3> GO 1> exit dba@tumbleweed:~> sqlcmd -S suse -U dba Password: 1> USE AdventureWorks 2> GO Changed database context to 'AdventureWorks'. 1> SELECT * 2> FROM Person.Person 3> GO Msg 229, Level 14, State 5, Server tumbleweed, Line 1 The SELECT permission was denied on the object 'Person', database 'AdventureWorks', schema 'Person' We are making progress. Now we can enter the database, but we still can't execute SELECT or any other SQL statement. The reason is very simple. Our dba user still is not authorized to access any types of resources. Schema separation In Microsoft SQL Server, a schema is a collection of database objects that are owned by a single principal and form a single namespace. All objects within a schema must be uniquely named and a schema itself must be uniquely named in the database catalog. SQL Server (since version 2005) breaks the link between users and schemas. In other words, users do not own objects; schemas own objects, and principals own schemas. Users can now have a default schema assigned using the DEFAULT_SCHEMA option from the CREATE USER and ALTER USER commands. If a default schema is not supplied for a user, then the dbo will be used as the default schema. If a user from a different default schema needs to access objects in another schema, then the user will need to type a full name. For example, Denis needs to query the Contact tables in the Person schema, but he is in Sales. To resolve this, he would type: SELECT * FROM Person.Contact Keep in mind that the default schema is dbo. When database objects are created and not explicitly put in schemas, SQL Server will assign them to the dbo default database schema. Therefore, there is no need to type dbo because it is the default schema. You read a book excerpt from SQL Server on Linux written by Jasmin Azemović.  From this book, you will be able to recognize and utilize the full potential of setting up an efficient SQL Server database solution in the Linux environment. Check out other posts on SQL Server: How SQL Server handles data under the hood How to implement In-Memory OLTP on SQL Server in Linux How to integrate SharePoint with SQL Server Reporting Services
Read more
  • 0
  • 0
  • 28566

article-image-how-greedy-algorithms-work
Richard Gall
10 Apr 2018
2 min read
Save for later

How greedy algorithms work

Richard Gall
10 Apr 2018
2 min read
What is a greedy algorithm? Greedy algorithms are useful for optimization problems. They make the optimal choice at a localized and immediate level, with the aim being that you’ll find the optimal solution you want. It’s important to note that they don’t always find you the best solution for the data science problem you’re trying to solve - so apply them wisely. In the below video tutorial above, from Fundamental Algorithms in Scala, you'll learn when and how to apply a simple greedy algorithm, and see examples of both an iterative and recursive algorithm in action. with examples of both an iterative algorithm and a recursive algorithm. The advantages and disadvantages of greedy algorithms Greedy algorithms have a number of advantages and disadvantages. While on the one hand, it's relatively easy to come up with them, it is actually pretty challenging to identify the issues around the 'correctness' of your algorithm. That means that ultimately the optimization problem you're trying to solve by using greedy algorithms isn't really a technical issue as such. Instead, it's more of an issue with the scope and definition of your data analysis project. It's a human problem, not a mechanical one. Different ways to apply greedy algorithms There are a number of areas where greedy algorithms can most successfully be applied. In fact, it's worth exploring some of these problems if you want to get to know them in more detail. They should give you a clearer indication of how they work, what makes them useful, and potential drawbacks. Some of the best examples are: Huffman coding Dijkstra's algorithm Continuous knapsack problem  To learn more about other algorithms check out these articles: 4 popular algorithms for Distance-based outlier detection 10 machine learning algorithms every engineer needs to know 4 Clustering Algorithms every Data Scientist should know Backpropagation Algorithm To learn to implement specific algorithms, use these tutorials: Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm Implementing gradient descent algorithm to solve optimization problems Implementing face detection using the Haar Cascades and AdaBoost algorithm Getting started with big data analysis using Google’s PageRank algorithm Implementing the k-nearest neighbors' algorithm in Python Machine Learning Algorithms: Implementing Naive Bayes with Spark MLlib
Read more
  • 0
  • 0
  • 66175
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-testing-restful-web-services-with-postman
Vijin Boricha
10 Apr 2018
3 min read
Save for later

Testing RESTful Web Services with Postman

Vijin Boricha
10 Apr 2018
3 min read
In today's tutorial, we are going to leverage Postman framework to successfully test RESTful Web Services. We will also discuss a simple JUnit test case, which is calling the getAllUsers method in userService. We can check the following code: @RunWith(SpringRunner.class) @SpringBootTest public class UserTests { @Autowired UserService userSevice; @Test public void testAllUsers(){ List<User> users = userSevice.getAllUsers(); assertEquals(3, users.size()); } } In the preceding code, we have called getAllUsers and verified the total count. Let's test the single-user method in another test case: // other methods @Test public void testSingleUser(){ User user = userSevice.getUser(100); assertTrue(user.getUsername().contains("David")); } In the preceding code snippets, we just tested our service layer and verified the business logic. However, we can directly test the controller by using mocking methods. Postman First, we shall start with a simple API for getting all the users: http://localhost:8080/user The earlier method will get all the users. The Postman screenshot for getting all the users is as follows: In the preceding screenshot, we can see that we get all the users that we added before. We have used the GET method to call this API. Adding a user – Postman Let's try to use the POST method in user to add a new user: http://localhost:8080/user Add the user, as shown in the following screenshot: In the preceding result, we can see the JSON output: { "result" : "added" } Generating a JWT – Postman Let's try generating the token (JWT) by calling the generate token API in Postman using the following code: http://localhost:8080/security/generate/token We can clearly see that we use subject in the Body to generate the token. Once we call the API, we will get the token. We can check the token in the following screenshot: Getting the subject from the token By using the existing token that we created before, we will get the subject by calling the get subject API: http://localhost:8080/security/get/subject The result will be as shown in the following screenshot: In the preceding API call, we sent the token in the API to get the subject. We can see the subject in the resulting JSON. You read an excerpt from Building RESTful Web Services with Spring 5 - Second Edition written by Raja CSP Raman.  From this book, you will learn to build resilient software in Java with the help of the Spring 5.0 framework. Check out the other tutorials from this book: How to develop RESTful web services in Spring Applying Spring Security using JSON Web Token (JWT) More Spring 5 tutorials: Introduction to Spring Framework Preparing the Spring Web Development Environment  
Read more
  • 0
  • 0
  • 66295

article-image-opencv-primer-what-can-you-do-with-computer-vision-and-how-to-get-started
Vijin Boricha
10 Apr 2018
11 min read
Save for later

OpenCV Primer: What can you do with Computer Vision and how to get started?

Vijin Boricha
10 Apr 2018
11 min read
Computer vision applications have become quite ubiquitous in our lives. The applications are varied, ranging from apps that play Virtual Reality (VR) or Augmented Reality (AR) games to applications for scanning documents using smartphone cameras. On our smartphones, we have QR code scanning and face detection, and now we even have facial recognition techniques. Online, we can now search using images and find similar looking images. Photo sharing applications can identify people and make an album based on the friends or family found in the photos. Due to improvements in image stabilization techniques, even with shaky hands, we can create stable videos. In this context, we will learn about basic computer vision, reading an image and image color conversion. With the recent advancements in deep learning techniques, applications like image classification, object detection, tracking, and so on have become more accurate and this has led to the development of more complex autonomous systems, such as drones, self-driving cars, humanoids, and so on. Using deep learning, images can be transformed into more complex details; for example, images can be converted into Van Gogh style paintings.  Such progress in several domains makes a non-expert wonder, how computer vision is capable of inferring this information from images. The motivation lies in human perception and the way we can perform complex analyzes of the environment around us. We can estimate the closeness of, structure and shape of objects, and estimate the textures of a surface too. Even under different lights, we can identify objects and even recognize something if we have seen it before. Considering these advancements and motivations, one of the basic questions that arise is what is computer vision? In this article, we will begin by answering this question and then provide a broader overview of the various sub-domains and applications within computer vision. Later in the article, we will start with basic image operations. What is computer vision? In order to begin the discussion on computer vision, observe the following image: Even if we have never done this activity before, we can clearly tell that the image is of people skiing in the snowy mountains on a cloudy day. This information that we perceive is quite complex and can be subdivided into more basic inferences for a computer vision System. The most basic observation that we can get from an image is of the things or objects in it. In the previous image, the various things that we can see are trees, mountains, snow, sky, people, and so on. Extracting this information is often referred to as image classification, where we would like to label an image with a predefined set of categories. In this case, the labels are the things that we see in the image. A wider observation that we can get from the previous image is landscape. We can tell that the image consists of snow, mountains, and sky, as shown in the following image: Although it is difficult to create exact boundaries for where the snow, mountain, and sky are in the image, we can still identify approximate regions of the image for each of them. This is often termed as segmentation of an image, where we break it up into regions according to object occupancy. Making our observation more concrete, we can further identify the exact boundaries of objects in the image, as shown in the following figure: In the image, we see that people are doing different activities and as such have different shapes; some are sitting, some are standing, some are skiing. Even with this many variations, we can detect objects and can create bounding boxes around them. Only a few bounding boxes are shown in the image for understanding—we can observe much more than these. While, in the image, we show rectangular bounding boxes around some objects, we are not categorizing what object is in the box. The next step would be to say the box contains a person. This combined observation of detecting and categorizing the box is often referred to as object detection. Extending our observation of people and surroundings, we can say that different people in the image have different heights, even though some are nearer and others are farther from the camera. This is due to our intuitive understanding of image formation and the relations of objects. We know that a tree is usually much taller than a person, even if the trees in the image are shorter than the people nearer to the camera. Extracting the information about geometry in the image is another sub-field of computer vision, often referred to as image reconstruction. Computer vision is everywhere In the previous section, we developed an initial understanding of computer vision. With this understanding, there are several algorithms that have been developed and are used in industrial applications. Studying these not only improve our understanding of the system but can also seed newer ideas to improve overall systems. In this section, we will extend our understanding of computer vision by looking at various applications and their problem formulations: Image classification: In the past few years, categorizing images based on the objects within has gained popularity. This is due to advances in algorithms as well as the availability of large datasets. Deep learning algorithms for image classification have significantly improved the accuracy while being trained on datasets like Imagenet. The trained model is often further used to improve other recognition algorithms like object detection, as well as image categorization in online applications. In this book, we will see how to create a simple algorithm to classify images using deep learning models. [box type="note" align="" class="" width=""]Here is a simple tutorial to see how to perform image classification in OpenCV to see object detection in action.[/box] Object detection: Not just self-driving cars, but robotics, automated retail stores, traffic detection, smartphone camera apps, image filters and many more applications use object detection. These also benefit from deep learning and vision techniques as well as the availability of large, annotated datasets. We saw an introduction to object detection in the previous section that produces bounding boxes around objects and also categorizes what object is inside the box. [box type="note" align="" class="" width=""]Check out this tutorial on fingerprint detection in OpenCV to see object detection in action.[/box] Object Tracking: Following robots, surveillance cameras and people interaction are few of the several applications of object tracking. This consists of defining the location and keeps track of corresponding objects across a sequence of images. Image geometry: This is often referred to as computing the depth of objects from the camera. There are several applications in this domain too. Smartphones apps are now capable of computing three-dimensional structures from the video created onboard. Using the three-dimensional reconstructed digital models, further extensions like AR or VR application are developed to interface the image world with the real world. Image segmentation: This is creating cluster regions in images, such that one cluster has similar properties. The usual approach is to cluster image pixels belonging to the same object. Recent applications have grown in self-driving cars and healthcare analysis using image regions. Image generation: These have a greater impact in the artistic domain, merging different image styles or generating completely new ones. Now, we can mix and merge Van Gogh's painting style with smartphone camera images to create images that appear as if they were painted in a similar style to Van Gogh's. The field is quickly evolving, not only through making newer methods of image analysis but also finding newer applications where computer vision can be used. Therefore, applications are not just limited to those explained above. [box type="note" align="" class="" width=""]Check out this post on Image filtering techniques in OpenCV.[/box] Getting started with image operations In this section, we will see basic image operations for reading and writing images. We will also, see how images are represented digitally. Before we proceed further with image IO, let's see what an image is made up of in the digital world. An image is simply a two-dimensional array, with each cell of the array containing intensity values. A simple image is a black and white image with 0s representing white and 1s representing black. This is also referred to as a binary image. A further extension of this is dividing black and white into a broader grayscale with a range of 0 to An image of this type, in the three-dimensional view, is as follows, where x and y are pixel locations and z is the intensity value: This is a top view, but on viewing sideways we can see the variation in the intensities that make up the image: We can see that there are several peaks and image intensities that are not smooth. Let's apply smoothing algorithm. As we can see, pixel intensities form more continuous formations, even though there is no significant change in the object representation. import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import cv2 # loads and read an image from path to file img = cv2.imread('../figures/building_sm.png') # convert the color to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # resize the image(optional) gray = cv2.resize(gray, (160, 120)) # apply smoothing operation gray = cv2.blur(gray,(3,3)) # create grid to plot using numpy xx, yy = np.mgrid[0:gray.shape[0], 0:gray.shape[1]] # create the figure fig = plt.figure() ax = fig.gca(projection='3d') ax.plot_surface(xx, yy, gray ,rstride=1, cstride=1, cmap=plt.cm.gray, linewidth=1) # show it plt.show() This code uses the following libraries: NumPy, OpenCV, and matplotlib. In the further sections of this article, we will see operations on images using their color properties. Please download the relevant images from the website to view them clearly. Reading an image We can use the OpenCV library to read an image, as follows. Here, change the path to the image file according to use: import cv2 # loads and read an image from path to file img = cv2.imread('../figures/flower.png') # displays previous image cv2.imshow("Image",img) # keeps the window open until a key is pressed cv2.waitKey(0) # clears all window buffers cv2.destroyAllWindows() The resulting image is shown in the following screenshot: Here, we read the image in BGR color format where B is blue, G is green, and R is red. Each pixel in the output is collectively represented using the values of each of the colors. An example of the pixel location and its color values is shown in the previous figure bottom. Image color conversions An image is made up pixels and is usually visualized according to the value stored. There is also an additional property that makes different kinds of image. Each of the value stored in a pixel is linked to a fixed representation. For example, a pixel value of 10 can represent gray intensity value 1o or blue color intensity value 10 and so on. It is therefore important to understand different color types and their conversion. In this section, we will see color types and conversions using OpenCV. [box type="note" align="" class="" width=""]Did you know OpenCV 4 is on schedule for July release, check out this news piece to know about it in detail.[/box] Grayscale: This is a simple one channel image with values ranging from 0 to 255 that represent the intensity of pixels. The previous image can be converted to grayscale, as follows: import cv2 # loads and read an image from path to file img = cv2.imread('../figures/flower.png') # convert the color to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # displays previous image cv2.imshow("Image",gray) # keeps the window open until a key is pressed cv2.waitKey(0) # clears all window buffers cv2.destroyAllWindows() The resulting image is as shown in the following screenshot: HSV and HLS: These are another representation of color representing H is hue, S is saturation, V is value, and L is lightness. These are motivated by the human perception system. An example of image conversion for these is as follows: # convert the color to hsv hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # convert the color to hls hls = cv2.cvtColor(img, cv2.COLOR_BGR2HLS) This conversion is as shown in the following figure, where an input image read in BGR format is converted to each of the HLS (on left) and HSV (on right) colortypes: LAB color space: Denoted L for lightness, A for green-red colors, and B for blueyellow colors, this consists of all perceivable colors. This is used to convert between one type of colorspace (for example, RGB) to others (such as CMYK) because of its device independence properties. On devices where the format is different to that of the image that is sent, the incoming image color space is first converted to LAB and then to the corresponding space available on the device. The output of converting an RGB image is as follows: This article is an excerpt from the book Practical Computer Vision written by Abhinav Dadhich.  This book will teach you different computer vision techniques and show how to apply them in practical applications.    
Read more
  • 0
  • 0
  • 27819

article-image-26-new-java-9-enhancements-you-will-love
Aarthi Kumaraswamy
09 Apr 2018
11 min read
Save for later

26 new Java 9 enhancements you will love

Aarthi Kumaraswamy
09 Apr 2018
11 min read
Java 9 represents a major release and consists of a large number of internal changes to the Java platform. Collectively, these internal changes represent a tremendous set of new possibilities for Java developers, some stemming from developer requests, others from Oracle-inspired enhancements. In this post, we will review 26 of the most important changes. Each change is related to a JDK Enhancement Proposal (JEP). JEPs are indexed and housed at openjdk.java.net/jeps/0. You can visit this site for additional information on each JEP. [box type="note" align="" class="" width=""]The JEP program is part of Oracle's support for open source, open innovation, and open standards. While other open source Java projects can be found, OpenJDK is the only one supported by Oracle. [/box] These changes have several impressive implications, including: Heap space efficiencies Memory allocation Compilation process improvements Type testing Annotations Automated runtime compiler tests Improved garbage collection 26 Java 9 enhancements you should know Improved Contended Locking [JEP 143] The general goal of JEP 143 was to increase the overall performance of how the JVM manages contention over locked Java object monitors. The improvements to contended locking were all internal to the JVM and do not require any developer actions to benefit from them. The overall improvement goals were related to faster operations. These include faster monitor enter, faster monitor exit, and faster notifications. 2. Segmented code cache [JEP 197] The segmented code cache JEP (197) upgrade was completed and results in faster, more efficient execution time. At the core of this change was the segmentation of the code cache into three distinct segments--non-method, profiled, and non-profiled code. 3. Smart Java compilation, phase two [JEP 199] The JDK Enhancement Proposal 199 is aimed at improving the code compilation process. All Java developers will be familiar with the javac tool for compiling source code to bytecode, which is used by the JVM to run Java programs. Smart Java Compilation, also referred to as Smart Javac and sjavac, adds a smart wrapper around the javac process. Perhaps the core improvement sjavac adds is that only the necessary code is recompiled. [box type="shadow" align="" class="" width=""]Check out this tutorial to know how you can recognize patterns with neural networks in Java.[/box] 4. Resolving Lint and Doclint warnings [JEP 212] Both Lint and Doclint report errors and warnings during the compile process. Resolution of these warnings was the focus of JEP 212. When using core libraries, there should not be any warnings. This mindset led to JEP 212, which has been resolved and implemented in Java 9. 5. Tiered attribution for javac [JEP 215] JEP 215 represents an impressive undertaking to streamline javac's type checking schema. In Java 8, type checking of poly expressions is handled by a speculative attribution tool. The goal with JEP 215 was to change the type checking schema to create faster results. The new approach, released with Java 9, uses a tiered attribution tool. This tool implements a tiered approach for type checking argument expressions for all method calls. Permissions are also made for method overriding. 6. Annotations pipeline 2.0 [JEP 217] Java 8 related changes impacted Java annotations but did not usher in a change to how javac processed them. There were some hardcoded solutions that allowed javac to handle the new annotations, but they were not efficient. Moreover, this type of coding (hardcoding workarounds) is difficult to maintain. So, JEP 217 focused on refactoring the javac annotation pipeline. This refactoring was all internal to javac, so it should not be evident to developers. 7. New version-string scheme [JEP 223] Prior to Java 9, the release numbers did not follow industry standard versioning--semantic versioning. Oracle has embraced semantic versioning for Java 9 and beyond. For Java, a major-minor-security schema will be used for the first three elements of Java version numbers: Major: A major release consisting of a significant new set of features Minor: Revisions and bug fixes that are backward compatible Security: Fixes deemed critical to improving security 8. Generating run-time compiler tests automatically [JEP 233] The purpose of JEP 233 was to create a tool that could automate the runtime compiler tests. The tool that was created starts by generating a random set of Java source code and/or byte code. The generated code will have three key characteristics: Be syntactically correct Be semantically correct Use a random seed that permits reusing the same randomly-generated code 9. Testing class-file attributes generated by Javac [JEP 235] Prior to Java 9, there was no method of testing a class-file's attributes. Running a class and testing the code for anticipated or expected results was the most commonly used method of testing javac generated class-files. This technique falls short of testing to validate the file's attributes. The lack of, or insufficient, capability to create tests for class-file attributes was the impetus behind JEP 235. The goal is to ensure javac creates a class-file's attributes completely and correctly. 10. Storing interned strings in CDS archives [JEP 250] CDS archives now allocate specific space on the heap for strings: The string space is mapped using a shared-string table, hash tables, and deduplication. 11. Preparing JavaFX UI controls and CSS APIs for modularization [JEP 253] Prior to Java 9, JavaFX controls as well as CSS functionality were only available to developers by interfacing with internal APIs. Java 9's modularization has made the internal APIs inaccessible. Therefore, JEP 253 was created to define public, instead of internal, APIs. This was a larger undertaking than it might seem. Here are a few actions that were taken as part of this JEP: Moving javaFX control skins from the internal to public API (javafx.scene.skin) Ensuring API consistencies Generation of a thorough javadoc 12. Compact strings [JEP 254] The string data type is an important part of nearly every Java app. While JEP 254's aim was to make strings more space-efficient, it was approached with caution so that existing performance and compatibilities would not be negatively impacted. Starting with Java 9, strings are now internally represented using a byte array along with a flag field for encoding references. 13. Merging selected Xerces 2.11.0 updates into JAXP [JEP 255] Xerces is a library used for parsing XML in Java. It was updated to 2.11.0 in late 2010, so JEP 255's aim was to update JAXP to incorporate changes in Xerces 2.11.0. 14. Updating JavaFX/Media to a newer version of GStreamer [JEP 257] The purpose of JEP 257 was to ensure JavaFX/Media was updated to include the latest release of GStreamer for stability, performance, and security assurances. GStreamer is a multimedia processing framework that can be used to build systems that take in media from several different formats and, after processing, export them in selected formats. 15. HarfBuzz Font-Layout Engine [JEP 258] Prior to Java 9, the layout engine used to handle font complexities; specifically, fonts that have rendering behaviors beyond what the common Latin fonts have. Java used the uniform client interface, also referred to as ICU, as the defacto text rendering tool. The ICU layout engine has been depreciated and, in Java 9, has been replaced with the HarfBuzz font layout engine. HarfBuzz is an OpenType text rendering engine. This type of layout engine has the characteristic of providing script-aware code to help ensure text is laid out as desired. 16. HiDPI graphics on Windows and Linux [JEP 263] JEP 263 was focused on ensuring the crispness of on-screen components, relative to the pixel density of the display. The following terms are relevant to this JEP and are provided along with the below listed descriptive information: DPI-aware application: An application that is able to detect and scale images for the display's specific pixel density DPI-unaware application: An application that makes no attempt to detect and scale images for the display's specific pixel density HiDPI graphics: High dots-per-inch graphics Retina display: This term was created by Apple to refer to displays with a pixel density of at least 300 pixels per inch Prior to Java 9, automatic scaling and sizing were already implemented in Java for the Mac OS X operating system. This capability was added in Java 9 for Windows and Linux operating systems. 17. Marlin graphics renderer [JEP 265] JEP 265 replaced the Pisces graphics rasterizer with the Marlin graphics renderer in the Java 2D API. This API is used to draw 2D graphics and animations. The goal was to replace Pisces with a rasterizer/renderer that was much more efficient and without any quality loss. This goal was realized in Java 9. An intended collateral benefit was to include a developer-accessible API. Previously, the means of interfacing with the AWT and Java 2D was internal. 18. Unicode 8.0.0 [JEP 267] Unicode 8.0.0 was released on June 17, 2015. JEP 267 focused on updating the relevant APIs to support Unicode 8.0.0. In order to fully comply with the new Unicode standard, several Java classes were updated. The following listed classes were updated for Java 9 to comply with the new Unicode standard: java.awt.font.NumericShaper java.lang.Character java.lang.String java.text.Bidi java.text.BreakIterator java.text.Normalizer 19. Reserved stack areas for critical sections [JEP 270] The goal of JEP 270 was to mitigate problems stemming from stack overflows during the execution of critical sections. This mitigation took the form of reserving additional thread stack space. [box type="shadow" align="" class="" width=""]Are you looking out for running parallel data operations using Java streams, check out this post for more details.[/box] 20. Dynamic linking of language-defined object models [JEP 276] Java interoperability was enhanced with JEP 276. The necessary JDK changes were made to permit runtime linkers from multiple languages to coexist in a single JVM instance. This change applies to high-level operations, as you would expect. An example of a relevant high-level operation is the reading or writing of a property with elements such as accessors and mutators. The high-level operations apply to objects of unknown types. They can be invoked with INVOKEDYNAMIC instructions. Here is an example of calling an object's property when the object's type is unknown at compile time:   INVOKEDYNAMIC "dyn:getProp:age" 21. Additional tests for humongous objects in G1 [JEP 278] One of the long-favored features of the Java platform is the behind the scenes garbage collection. JEP 278's focus was to create additional WhiteBox tests for humongous objects as a feature of the G1 garbage collector. 22. Improving test-failure troubleshooting [JEP 279] For developers that do a lot of testing, JEP 279 is worth reading about. Additional functionality has been added in Java 9 to automatically collect information to support troubleshooting test failures as well as timeouts. Collecting readily available diagnostic information during tests stands to provide developers and engineers with greater fidelity in their logs and other output. 23. Optimizing string concatenation [JEP 280] JEP 280 is an interesting enhancement for the Java platform. Prior to Java 9, string concatenation was translated by javac into StringBuilder : : append chains. This was a sub-optimal translation methodology often requiring StringBuilder presizing. The enhancement changed the string concatenation bytecode sequence, generated by javac, so that it uses INVOKEDYNAMIC calls. The purpose of the enhancement was to increase optimization and to support future optimizations without the need to reformat the javac's bytecode. 24. HotSpot C++ unit-test framework [JEP 281] HotSpot is the name of the JVM. This Java enhancement was intended to support the development of C++ unit tests for the JVM. Here is a partial, non-prioritized, list of goals for this enhancement: Command-line testing Create appropriate documentation Debug compile targets Framework elasticity IDE support Individual and isolated unit testing Individualized test results Integrate with existing infrastructure Internal test support Positive and negative testing Short execution time testing Support all JDK 9 build platforms Test compile targets Test exclusion Test grouping Testing that requires the JVM to be initialized Tests co-located with source code Tests for platform-dependent code Write and execute unit testing (for classes and methods) This enhancement is evidence of the increasing extensibility. 25. Enabling GTK 3 on Linux [JEP 283] GTK+, formally known as the GIMP toolbox, is a cross-platform tool used for creating Graphical User Interfaces (GUI). The tool consists of widgets accessible through its API. JEP 283's focus was to ensure GTK 2 and GTK 3 were supported on Linux when developing Java applications with graphical components. The implementation supports Java apps that employ JavaFX, AWT, and Swing. 26. New HotSpot build system [JEP 284] The Java platform used, prior to Java 9, was a build system riddled with duplicate code, redundancies, and other inefficiencies. The build system has been reworked for Java 9 based on the build-infra framework. In this context, infra is short for infrastructure. The overarching goal for JEP 284 was to upgrade the build system to one that was simplified. Specific goals included: Leverage existing build system Maintainable code Minimize duplicate code Simplification Support future enhancements Summary We explored some impressive new features of the Java platform, with a specific focus on javac, JDK libraries, and various test suites. Memory management improvements, including heap space efficiencies, memory allocation, and improved garbage collection represent a powerful new set of Java platform enhancements. Changes regarding the compilation process resulting in greater efficiencies were part of this discussion We also covered important improvements, such as with the compilation process, type testing, annotations, and automated runtime compiler tests. You just enjoyed an excerpt from the book, Mastering Java 9 written by By Dr. Edward Lavieri and Peter Verhas.  
Read more
  • 0
  • 0
  • 23716

article-image-packt-hub-weekly-roundup-6th-april-2018
Aarthi Kumaraswamy
06 Apr 2018
2 min read
Save for later

This Week on Packt Hub - 6 April  2018

Aarthi Kumaraswamy
06 Apr 2018
2 min read
Here is what you missed this week on Packt Hub - Tech news, tutorials and insights. Tutorials Implementing GANs, OpenCV, BFS algorithms and more tutorials for machine learning folks this week. Web developers can learn to implement Dockers with Microservices, JS micro-optimizations and Selenium tutorials from this week’s tutorials. Data Tutorials Predicting Bitcoin price from historical and live data Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm Datasets and deep learning methodologies to extend image-based applications to videos Generative Models in action: How to create a Van Gogh with Neural Artistic Style... [Editor's Pick] 3 ways to deploy a QT and OpenCV application Web Development Tutorials How to build Dockers with microservices 6 JavaScript micro-optimizations you need to know How to work with the Selenium IntelliJ IDEA plugin How to handle exceptions and synchronization methods with Selenium WebDriver API [Editor's Pick] Other Tutorials Creating a custom layout implementation for your Android app Insights & Opinions See why Jupyter notebooks are hot and Oracle databases are not on Data insights this week. Learn why DDD (domain driven design) is trending and why programmers dread threads on insights for web developers this week. Data Insights & Opinions Why Oracle is losing the Database Race Top 10 Tools for Computer Vision Paper in Two minutes: Attention Is All You Need 10 reasons why data scientists love Jupyter notebooks [Editor's Pick] Top 5 programming languages for crunching Big Data effectively Why DeepMind made Sonnet open source Top 5 free Business Intelligence tools Paper in Two minutes: i-RevNet, a deep invertible convolutional network Web Development Insights & Opinions Concurrency programming 101: Why do programmers hang by a thread? What is domain driven design? [Editor's Pick] Other Insights and Opinions The key differences between Kubernetes and Docker Swarm News Data News Polaris GPS: Rubrik’s new SaaS platform for data management applications Google Employees Protest against the use of Artificial Intelligence in Military CockroachDB 2.0 is out! Huawei launches HiAI Apple steals AI chief from Google Emoji Scavenger Hunt showcases TensorFlow.js D3 5.0 is out! The 5 biggest announcements from TensorFlow Developer Summit 2018 [Editor's Pick] Web Development News SurveyJS leaves beta Sails.js 1.0 has arrived on the shores Other News Microsoft commits $5 billion to IoT projects Netflix releases FlameScope Introducing MapD Cloud, the first Analytics Platform with GPU Acceleration on Cloud Coinbase Commerce API launches AWS Sydney Summit 2018 is all about IoT Kali Linux 2018.1 released [Editor's Pick]
Read more
  • 0
  • 0
  • 2907
article-image-how-to-build-dockers-with-microservices
Pravin Dhandre
06 Apr 2018
9 min read
Save for later

How to build Dockers with microservices

Pravin Dhandre
06 Apr 2018
9 min read
Today, we will demonstrate in detail how to create and build dockers with microservices. We will also explore commands used to manage the building process with microservices. First, we will create a simple microservice that we will use for this tutorial. Then we will get familiar with the Docker building process, and finally, we will create and run our microservice within a Docker. Creating an example microservice In order to create our microservice, we will use Spring Initializr. We can start by visiting the URL: https:/​/​start.​spring.​io/​: We have chosen to create a Maven Project using Kotlin and Spring Boot 2.0.0 M7, and we've chosen the Group to be com.microservices and Artifact chapter07. For Dependencies, we have set Web. Now we can click on Generate Project to download it as a ZIP file. After we unzip it, we can open it with IntelliJ IDEA to start working on our project. After some minutes, our project will be ready and we can open the Maven window to see the different lifecycle phases, Maven plugins, and their goals. Now we will modify our application to create a simple microservice. Open the Chapter07Application.kt file from the project window, and modify it by adding a @RestController: package com.microservices.chapter07 import org.springframework.boot.autoconfigure.SpringBootApplication import org.springframework.boot.runApplication import org.springframework.web.bind.annotation.GetMapping import org.springframework.web.bind.annotation.RestController @SpringBootApplication class Chapter07Application @RestController class GreetingsController { @GetMapping("/greetings") fun greetings() = "hello from a Docker" } fun main(args: Array<String>) { runApplication<Chapter07Application>(*args) } Let's run to see our microservice start somehow. In the Maven window, just double-click on the spring-boot plugin, or just run goal from the command line in the microservice folder: mvnw spring-boot:run After some seconds, we will see several log lines, including something like the following: INFO 11960 --- [ main] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port(s): 8080 (http) INFO 11960 --- [ main] c.m.chapter07.Chapter07ApplicationKt : Started Chapter07ApplicationKt in 1.997 seconds (JVM running for 8.154) Our service is ready, and we can just navigate to the http://localhost:8080/greetings URL, but it's still not running in a Docker; let's stop with Ctrl + C, and continue. Creating a Dockerfile In order to create a Docker image, we need to first create a Dockerfile, a file that will include the instructions that we will give to Docker in order to build our image. To create this file, on the top of the Project window, right-click on chapter07 and then select in the drop-down menu New | File, and type Dockerfile. In the next window, click OK, and the file will be created. IntelliJ will recognize that file and offer a plugin to handle it. At the top of the editing window, a message will appear as Plugins supporting Dockerfile files found. On the right of this message, we will see Install Plugins Ignore extension. Let's click on Install Plugins to allow IntelliJ to handle this file. This will require the IDE to restart, and after some seconds it should start again. Now we can add this to our Dockerfile: FROM openjdk:8-jdk-alpine ENTRYPOINT ["java","-version"] Here, we are telling Docker that our image will be based on Java OpenJDK 8 in Alpine Linux. Then, we configure the entry point of our Docker and the command that will be executed when our Docker runs to be just the java command with a parameter, -version. Each of the lines on the Dockerfile will be a step, one of those layers that our Docker is completed with. Now, we should open a command line in our chapter07 directory and run this command to build our image: docker build . -t chapter07 This will create output that will look something like this: Sending build context to Docker daemon 2.302MB Step 1/2 : FROM openjdk:8-jdk-alpine 8-jdk-alpine: Pulling from library/openjdk b56ae66c2937: Pull complete 81cebc5bcaf8: Pull complete 9f7678525069: Pull complete Digest: sha256:219d9c2e4c27b8d1cfc6daeaf339e3eb7ceb82e67ce85857bdc55254822802bc Status: Downloaded newer image for openjdk:8-jdk-alpine ---> a2a00e606b82 Step 2/2 : ENTRYPOINT java --version ---> Running in 661d47cd0bbd ---> 3a1d8bea31e7 Removing intermediate container 661d47cd0bbd Successfully built 3a1d8bea31e7 Successfully tagged chapter07:latest What has happened now is that Docker has built an image for us, and the image has been tagged as chapter07, since we used the -t option. Let's now run it with: docker run chapter07 This output should look something like this: openjdk version "1.8.0_131" OpenJDK Runtime Environment (IcedTea 3.4.0) (Alpine 8.131.11-r2) OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode) This has run our Docker image that simply displays the Java version, but we need to add our microservice to it. Before that, let's understand clearly what a Docker is. A Dockerfile produces a binary image of a set of commands creating layers for each of them. Those commands are executed at build time to output the desired image. An image will have an entry point, a command that will be executed when we run the image itself. A Docker is a containerized instance of a particular image. We usually refer to them as containers. When we run them, a copy of the original image is containerized and run through the defined entry point, outputting the results of their execution. We have just briefly discussed creating Dockerfiles, but it is a technique that we should eventually master. We strongly recommend reviewing the Docker file reference on the Docker page https:/​/​docs.​Docker.​com/engine/​reference/​builder/​, also see Dockerfile best practices at: https://​docs.​Docker.​com/​engine/​userguide/​eng-​image/​dockerfile_​bestpractices/​. Dockerize our microservice In order to create a Docker with our microservice, we first need to package it into a JAR. So let's use Maven to do it, using the package lifecycle: mvnw package With the package created, now we need to modify our Dockerfile to actually use it: FROM openjdk:8-jdk-alpine ADD target/*.jar microservice.jar ENTRYPOINT ["java","-jar", "microservice.jar"] We use the ADD command to include our microservice JAR from the target folder. We get it from our target directory, and we add it to the Docker as microservices.jar. Then, we change our entry point to actually execute our JAR. Now we can build our image again, repeating the build command: docker build . -t chapter07 This should now give the following output: Sending build context to Docker daemon 21.58MB Step 1/3 : FROM openjdk:8-jdk-alpine ---> a2a00e606b82 Step 2/3 : ADD target/*.jar microservice.jar ---> 5c385fee6516 Step 3/3 : ENTRYPOINT java -jar microservice.jar ---> Running in 11071fdd0eb2 ---> a43186cc4ea0 Removing intermediate container 11071fdd0eb2 Successfully built a43186cc4ea0 Successfully tagged chapter07:latest However, this build is quicker than before, since the Docker command is an intelligent command; the things that have no changes from our FROM command are cached, and will not be built again. Now we can run our microservice again by using: docker run chapter07 We can now see our Spring Boot application running; however, if we try to navigate in our browser to it, we will not be able to reach it, so let's stop it with Ctrl + C. Sometimes, doing Ctrl + C will not stop our Docker from just returning to the terminal. If we really want to completely stop it, we could follow these steps. First, we should list our Docker with: docker ps This should list our Docker status, and actually, tell us that the Docker is still up: CONTAINER ID IMAGE COMMAND STATUS d6bd15780353 chapter07 "java -jar microse..." Up About a minute We can just stop it with the kill command: docker kill d6bd15780353 Now, if we repeat our Docker ps command again, the Docker should not be shown, but it will if we do a Docker ps -a: CONTAINER ID IMAGE COMMAND STATUS d6bd15780353 chapter07 "java -jar microse..." Exited (137) 2 minutes ago The status of our Docker has changed from up to existed, as we'd expect. Running the microservice The reason we can't access the microservice when we run our previous example is that we need to expose the port that is running on the container outside of it. So, we need to modify our Docker run command to: docker run -d -p8080:8080 chapter07 Now we can just navigate to the URL http://localhost:8080/greetings, and we should get the following output: hello from a Docker We have just exposed our Docker internal port 8080, but the -p option allows us to expose a different port too. So inside, the Docker can run on port 8080, but we can externally run on another port. When we run our microservice via the command line, we actually wait until we press Ctrl + C to terminate it. We can instead just run it as a daemon. A daemon is a process that runs in the background of our system, so we could continue executing other commands while our process keeps running behind the scenes. To run a Docker as a daemon, we could use the following command: docker run -d -p8080:8080 chapter07 This will run the Docker as a daemon in the background, but it is still accessible. It should be listed when we do the following: docker ps Here, we can get the CONTAINER ID from our running Docker: CONTAINER ID IMAGE COMMAND STATUS 741bf50a0bfc chapter07 "java -jar microse..." Up About a minute To see the logs, we can now run the following command: docker logs 741bf50a0bfc This will display the log of a running Docker; however, it will just exit after displaying the current logs. If we can wait for more output, as the Unix command tail does, we can instead do the following: docker logs 741bf50a0bfc -f With this, we learned quickly the building process of a docker along with various commands in microservices. Do check out the book Hands-On Microservices with Kotlin to start creating Docker containers for your microservices and scale them in your production environment.  
Read more
  • 0
  • 0
  • 36227

article-image-predicting-bitcoin-price-from-historical-and-live-data
Sunith Shetty
06 Apr 2018
17 min read
Save for later

Predicting Bitcoin price from historical and live data

Sunith Shetty
06 Apr 2018
17 min read
Today, you will learn how to collect Bitcoin historical and live-price data. You will also learn to transform data into time series and train your model to make insightful predictions. Historical and live-price data collection We will be using the Bitcoin historical price data from Kaggle. For the real-time data, Cryptocompare API will be used. Historical data collection For training the ML algorithm, there is a Bitcoin  Historical  Price Data dataset available to the public on Kaggle (version 10). The dataset can be downloaded here. It has 1 minute OHLC data for BTC-USD pairs from several exchanges. At the beginning of the project, for most of them, data was available from January 1, 2012 to May 31, 2017; but for the Bitstamp exchange, it's available until October 20, 2017 (as well as for Coinbase, but that dataset became available later): Figure 1: The Bitcoin  historical dataset on Kaggle Note that you need to be a registered user and be logged in in order to download the file. The file that we are using is bitstampUSD_1-min_data_2012-01-01_to_2017-10-20.csv. Now, let us get the data we have. It has eight columns: Timestamp: The time elapsed in seconds since January 1, 1970. It is 1,325,317,920 for the first row and 1,325,317,920 for the second 1. (Sanity check! The difference is 60 seconds). Open: The price at the opening of the time interval. It is 4.39 dollars. Therefore it is the price of the first trade that happened after Timestamp (1,325,317,920 in the first row's case). Close: The price at the closing of the time interval. High: The highest price from all orders executed during the interval. Low: The same as High but it is the lowest price. Volume_(BTC): The sum of all Bitcoins that were transferred during the time interval. So, take all transactions that happened during the selected interval and sum up the BTC values of each of them. Volume_(Currency): The sum of all dollars transferred. Weighted_Price: This is derived from the volumes of BTC and USD. By dividing all dollars traded by all bitcoins, we can get the weighted average price of BTC during this minute. So Weighted_Price=Volume_(Currency)/Volume_(BTC). One of the most important parts of the data-science pipeline after data collection (which is in a sense outsourced; we use data collected by others) is data preprocessing—clearing a dataset and transforming it to suit our needs. Transformation of historical data into a time series Stemming from our goal—predict the direction of price change—we might ask ourselves, does having an actual price in dollars help to achieve this? Historically, the price of Bitcoin was usually rising, so if we try to fit a linear regression, it will show further exponential growth (whether in the long run this will be true is yet to be seen). Assumptions and design choices One of the assumptions of this project is as follows: whether we are thinking about Bitcoin trading in November 2016 with a price of about $700, or trading in November 2017 with a price in the $6500-7000 range, patterns in how people trade are similar. Now, we have several other assumptions, as described in the following points: Assumption one: From what has been said previously, we can ignore the actual price and rather look at its change. As a measure of this, we can take the delta between opening and closing prices. If it is positive, it means the price grew during that minute; the price went down if it is negative and stayed the same if delta = 0. In the following figure, we can see that Delta was -1.25 for the first minute observed, -12.83 for the second one, and -0.23 for the third one. Sometimes, the open price can differ significantly from the close price of the previous minute (although Delta is negative during all three of the observed minutes, for the third minute the shown price was actually higher than close for a second). But such things are not very common, and usually the open price doesn't change significantly compared to the close price of the previous minute. Assumption two: The next need to consider...  is predicting the price change in a black box environment. We do not use other sources of knowledge such as news, Twitter feeds, and others to predict how the market would react to them. This is a more advanced topic. The only data we use is price and volume. For simplicity of the prototype, we can focus on price only and construct time series data. Time series prediction is a prediction of a parameter based on the values of this parameter in the past. One of the most common examples is temperature prediction. Although there are many supercomputers using satellite and sensor data to predict the weather, a simple time series analysis can lead to some valuable results. We predict the price at T+60 seconds, for instance, based on the price at T, T-60s, T-120s and so on. Assumption three: Not all data in the dataset is valuable. The first 600,000 records are not informative, as price changes are rare and trading volumes are small. This can affect the model we are training and thus make end results worse. That is why the first 600,000 of rows are eliminated from the dataset. Assumption four: We need to Label  our data so that we can use a supervised ML algorithm. This is the easiest measure, without concerns about transaction fees. Data preprocessing Taking into account the goals of data preparation, Scala was chosen as an easy and interactive way to manipulate data: val priceDataFileName: String = "bitstampUSD_1- min_data_2012-01-01_to_2017-10-20.csv" val spark = SparkSession .builder() .master("local[*]") .config("spark.sql.warehouse.dir", "E:/Exp/") .appName("Bitcoin Preprocessing") .getOrCreate() val data = spark.read.format("com.databricks.spark.csv").option("header", "true").load(priceDataFileName) data.show(10) >>> println((data.count(),  data.columns.size)) >>> (3045857,  8) In the preceding code, we load data from the file downloaded from Kaggle and look at what is inside. There are 3045857 rows in the dataset and 8 columns, described before. Then we create the Delta column, containing the difference between closing and opening prices (that is, to consider only that data where meaningful trading has started to occur): val dataWithDelta  = data.withColumn("Delta",  data("Close") - data("Open")) The following code labels our data by assigning 1 to the rows the Delta value of which was positive; it assigns 0 otherwise: import org.apache.spark.sql.functions._  import spark.sqlContext.implicits._  val dataWithLabels  = dataWithDelta.withColumn("label",    when($"Close"  -  $"Open"  > 0, 1).otherwise(0))  rollingWindow(dataWithLabels,  22, outputDataFilePath,  outputLabelFilePath) This code transforms the original dataset into time series data. It takes the Delta values of WINDOW_SIZE rows (22 in this experiment) and makes a new row out of them. In this way, the first row has Delta values from t0 to t21, and the second one has values from t1 to t22. Then we create the corresponding array with labels (1 or 0). Finally, we save X and Y into files where 612000 rows were cut off from the original dataset; 22 means rolling window size and 2 classes represents that labels are binary 0 and 1: val dropFirstCount: Int = 612000 def rollingWindow(data: DataFrame, window: Int, xFilename: String, yFilename: String): Unit = { var i = 0 val xWriter = new BufferedWriter(new FileWriter(new File(xFilename))) val yWriter = new BufferedWriter(new FileWriter(new File(yFilename))) val zippedData = data.rdd.zipWithIndex().collect() System.gc() val dataStratified = zippedData.drop(dropFirstCount)//slice 612K while (i < (dataStratified.length - window)) { val x = dataStratified .slice(i, i + window) .map(r => r._1.getAs[Double]("Delta")).toList val y = dataStratified.apply(i + window)._1.getAs[Integer]("label") val stringToWrite = x.mkString(",") xWriter.write(stringToWrite + "n") yWriter.write(y + "n") i += 1 if (i % 10 == 0) { xWriter.flush() yWriter.flush() } } xWriter.close() yWriter.close() } In the preceding code segment: val outputDataFilePath:  String = "output/scala_test_x.csv" val outputLabelFilePath:  String = "output/scala_test_y.csv" Real-time data through the Cryptocompare API For real-time data, the Cryptocompare API is used, more specifically HistoMinute, which gives us access to OHLC data for the past seven days at most. The details of the API will be discussed in a section devoted to implementation, but the API response is very similar to our historical dataset, and this data is retrieved using a regular HTTP request. For example, a simple JSON response has the following structure: {  "Response":"Success", "Type":100, "Aggregated":false, "Data":  [{"time":1510774800,"close":7205,"high":7205,"low":7192.67,"open":7198,  "volumefrom":81.73,"volumeto":588726.94},  {"time":1510774860,"close":7209.05,"high":7219.91,"low":7205,"open":7205, "volumefrom":16.39,"volumeto":118136.61},    ... (other  price data)    ], "TimeTo":1510776180,    "TimeFrom":1510774800,    "FirstValueInArray":true,    "ConversionType":{"type":"force_direct","conversionSymbol":""} } Through Cryptocompare HistoMinute, we can get open, high, low, close, volumefrom, and volumeto from each minute of historical data. This data is stored for 7 days only; if you need more, use the hourly or daily path. It uses BTC conversion if data is not available because the coin is not being traded in the specified currency: Now, the following method fetches the correctly formed URL of the Cryptocompare API, which is a fully formed URL with all parameters, such as currency, limit, and aggregation specified. It finally returns the future that will have a response body parsed into the data model, with the price list to be processed at an upper level: import javax.inject.Inject import play.api.libs.json.{JsResult, Json} import scala.concurrent.Future import play.api.mvc._ import play.api.libs.ws._ import processing.model.CryptoCompareResponse class RestClient @Inject() (ws: WSClient) { def getPayload(url : String): Future[JsResult[CryptoCompareResponse]] = { val request: WSRequest = ws.url(url) val future = request.get() implicit val context = play.api.libs.concurrent.Execution.Implicits.defaultContext future.map { response => response.json.validate[CryptoCompareResponse] } } } In the preceding code segment, the CryptoCompareResponse class is the model of API, which takes the following parameters: Response Type Aggregated\ Data FirstValueInArray TimeTo TimeFrom Now, it has the following signature: case class CryptoCompareResponse(Response  : String, Type : Int,  Aggregated  : Boolean, Data  :  List[OHLC], FirstValueInArray  :  Boolean, TimeTo  : Long,  TimeFrom:  Long) object CryptoCompareResponse  {  implicit  val cryptoCompareResponseReads  = Json.reads[CryptoCompareResponse] } Again, the preceding two code segments the open-high-low-close (also known as OHLC), are a model class for mapping with CryptoAPI response data array internals. It takes these parameters: Time: Timestamp in seconds, 1508818680, for instance. Open: Open price at a given minute interval. High: Highest price. Low: Lowest price. Close: Price at the closing of the interval. Volumefrom: Trading volume in the from currency. It's BTC in our case. Volumeto: The trading volume in the to currency, USD in our case. Dividing Volumeto by Volumefrom gives us the weighted price of BTC. Now, it has the following signature: case class OHLC(time:  Long,  open:  Double,  high:  Double,  low:  Double,  close:  Double,  volumefrom:  Double,  volumeto:  Double)  object OHLC  {    implicit  val implicitOHLCReads  = Json.reads[OHLC]  } Model training for prediction Inside the project, in the package folder prediction.training, there is a Scala object called TrainGBT.scala. Before launching, you have to specify/change four things: In the code, you need to set up spark.sql.warehouse.dir in some actual place on your computer that has several gigabytes of free space: set("spark.sql.warehouse.dir",  "/home/user/spark") The RootDir is the main folder, where all files and train models will be stored:rootDir  = "/home/user/projects/btc-prediction/" Make sure that the x filename matches the one produced by the Scala script in the preceding step: x  = spark.read.format("com.databricks.spark.csv").schema(xSchema).load(rootDir  + "scala_test_x.csv") Make sure that the y filename matches the one produced by Scala script: y_tmp=spark.read.format("com.databricks.spark.csv").schema (ySchema).load(rootDir + "scala_test_y.csv") The code for training uses the Apache Spark ML library (and libraries required for it) to train the classifier, which means they have to be present in your class path to be able to run it. The easiest way to do that (since the whole project uses SBT) is to run it from the project root folder by typing sbtrun-main  prediction.training.TrainGBT, which will resolve all dependencies and launch training. Depending on the number of iterations and depth, it can take several hours to train the model. Now let us see how training is performed on the example of the gradient-boosted trees model. First, we need to create a SparkSession object: val xSchema = StructType(Array( StructField("t0", DoubleType, true), StructField("t1", DoubleType, true), StructField("t2", DoubleType, true), StructField("t3", DoubleType, true), StructField("t4", DoubleType, true), StructField("t5", DoubleType, true), StructField("t6", DoubleType, true), StructField("t7", DoubleType, true), StructField("t8", DoubleType, true), StructField("t9", DoubleType, true), StructField("t10", DoubleType, true), StructField("t11", DoubleType, true), StructField("t12", DoubleType, true), StructField("t13", DoubleType, true), StructField("t14", DoubleType, true), StructField("t15", DoubleType, true), StructField("t16", DoubleType, true), StructField("t17", DoubleType, true), StructField("t18", DoubleType, true), StructField("t19", DoubleType, true), StructField("t20", DoubleType, true), StructField("t21", DoubleType, true)) ) Then we read the files we defined for the schema. It was more convenient to generate two separate files in Scala for data and labels, so here we have to join them into a single DataFrame: Then we read the files we defined for the schema. It was more convenient to generate two separate files in Scala for data and labels, so here we have to join them into a single DataFrame: import spark.implicits._ val y = y_tmp.withColumn("y", 'y.cast(IntegerType)) import org.apache.spark.sql.functions._ val x_id = x.withColumn("id", monotonically_increasing_id()) val y_id = y.withColumn("id", monotonically_increasing_id()) val data = x_id.join(y_id, "id") The next step is required by Spark—we need to vectorize the features: val featureAssembler = new VectorAssembler() .setInputCols(Array("t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17", "t18", "t19", "t20", "t21")) .setOutputCol("features") We split the data into train and test sets randomly in the proportion of 75% to 25%. We set the seed so that the splits would be equal among all times we run the training: val Array(trainingData,testData) = dataWithLabels.randomSplit(Array(0.75, 0.25), 123) We then define the model. It tells which columns are features and which are labels. It also sets parameters: val gbt = new GBTClassifier() .setLabelCol("label") .setFeaturesCol("features") .setMaxIter(10) .setSeed(123) Create a pipeline of steps—vector assembling of features and running GBT: val pipeline = new Pipeline() .setStages(Array(featureAssembler, gbt)) Defining evaluator function—how the model knows whether it is doing well or not. As we have only two classes that are imbalanced, accuracy is a bad measurement; area under the ROC curve is better: val rocEvaluator = new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("rawPrediction") .setMetricName("areaUnderROC") K-fold cross-validation is used to avoid overfitting; it takes out one-fifth of the data at each iteration, trains the model on the rest, and then tests on this one-fifth: val cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(rocEvaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(numFolds) .setSeed(123) val cvModel = cv.fit(trainingData) After we get the trained model (which can take an hour or more depending on the number of iterations and parameters we want to iterate on, specified in paramGrid), we then compute the predictions on the test data: val predictions = cvModel.transform(testData) In addition, evaluate quality of predictions: val roc = rocEvaluator.evaluate(predictions) The trained model is saved for later usage by the prediction service: val gbtModel = cvModel.bestModel.asInstanceOf[PipelineModel] gbtModel.save(rootDir + "__cv__gbt_22_binary_classes_" + System.nanoTime()/ 1000000 + ".model") In summary, the code for model training is given as follows: import org.apache.spark.{ SparkConf, SparkContext } import org.apache.spark.ml.{ Pipeline, PipelineModel } import org.apache.spark.ml.classification.{ GBTClassificationModel, GBTClassifier, RandomForestClassificationModel, RandomForestClassifier} import org.apache.spark.ml.evaluation.{BinaryClassificationEvaluator, MulticlassClassificationEvaluator} import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorAssembler, VectorIndexer} import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder} import org.apache.spark.sql.types.{DoubleType, IntegerType, StructField, StructType} import org.apache.spark.sql.SparkSession object TrainGradientBoostedTree { def main(args: Array[String]): Unit = { val maxBins = Seq(5, 7, 9) val numFolds = 10 val maxIter: Seq[Int] = Seq(10) val maxDepth: Seq[Int] = Seq(20) val rootDir = "output/" val spark = SparkSession .builder() .master("local[*]") .config("spark.sql.warehouse.dir", ""/home/user/spark/") .appName("Bitcoin Preprocessing") .getOrCreate() val xSchema = StructType(Array( StructField("t0", DoubleType, true), StructField("t1", DoubleType, true), StructField("t2", DoubleType, true), StructField("t3", DoubleType, true), StructField("t4", DoubleType, true), StructField("t5", DoubleType, true), StructField("t6", DoubleType, true), StructField("t7", DoubleType, true), StructField("t8", DoubleType, true), StructField("t9", DoubleType, true), StructField("t10", DoubleType, true), StructField("t11", DoubleType, true), StructField("t12", DoubleType, true), StructField("t13", DoubleType, true), StructField("t14", DoubleType, true), StructField("t15", DoubleType, true), StructField("t16", DoubleType, true), StructField("t17", DoubleType, true), StructField("t18", DoubleType, true), StructField("t19", DoubleType, true), StructField("t20", DoubleType, true), StructField("t21", DoubleType, true))) val ySchema = StructType(Array(StructField("y", DoubleType, true))) val x = spark.read.format("csv").schema(xSchema).load(rootDir + "scala_test_x.csv") val y_tmp = spark.read.format("csv").schema(ySchema).load(rootDir + "scala_test_y.csv") import spark.implicits._ val y = y_tmp.withColumn("y", 'y.cast(IntegerType)) import org.apache.spark.sql.functions._ //joining 2 separate datasets in single Spark dataframe val x_id = x.withColumn("id", monotonically_increasing_id()) val y_id = y.withColumn("id", monotonically_increasing_id()) val data = x_id.join(y_id, "id") val featureAssembler = new VectorAssembler() .setInputCols(Array("t0", "t1", "t2", "t3", "t4", "t5", "t6", "t7", "t8", "t9", "t10", "t11", "t12", "t13", "t14", "t15", "t16", "t17","t18", "t19", "t20", "t21")) .setOutputCol("features") val encodeLabel = udf[Double, String] { case "1" => 1.0 case "0" => 0.0 } val dataWithLabels = data.withColumn("label", encodeLabel(data("y"))) //123 is seed number to get same datasplit so we can tune params val Array(trainingData, testData) = dataWithLabels.randomSplit(Array(0.75, 0.25), 123) val gbt = new GBTClassifier() .setLabelCol("label") .setFeaturesCol("features") .setMaxIter(10) .setSeed(123) val pipeline = new Pipeline() .setStages(Array(featureAssembler, gbt)) // *********************************************************** println("Preparing K-fold Cross Validation and Grid Search") // *********************************************************** val paramGrid = new ParamGridBuilder() .addGrid(gbt.maxIter, maxIter) .addGrid(gbt.maxDepth, maxDepth) .addGrid(gbt.maxBins, maxBins) .build() val cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(new BinaryClassificationEvaluator()) .setEstimatorParamMaps(paramGrid) .setNumFolds(numFolds) .setSeed(123) // ************************************************************ println("Training model with GradientBoostedTrees algorithm") // ************************************************************ // Train model. This also runs the indexers. val cvModel = cv.fit(trainingData) cvModel.save(rootDir + "cvGBT_22_binary_classes_" + System.nanoTime() / 1000000 + ".model") println("Evaluating model on train and test data and calculating RMSE") // ********************************************************************** // Make a sample prediction val predictions = cvModel.transform(testData) // Select (prediction, true label) and compute test error. val rocEvaluator = new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("rawPrediction") .setMetricName("areaUnderROC") val roc = rocEvaluator.evaluate(predictions) val prEvaluator = new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("rawPrediction") .setMetricName("areaUnderPR") val pr = prEvaluator.evaluate(predictions) val gbtModel = cvModel.bestModel.asInstanceOf[PipelineModel] gbtModel.save(rootDir + "__cv__gbt_22_binary_classes_" + System.nanoTime()/1000000 +".model") println("Area under ROC curve = " + roc) println("Area under PR curve= " + pr) println(predictions.select().show(1)) spark.stop() } } Now let us see how the training went: >>> Area under ROC curve = 0.6045355104779828 Area under PR curve= 0.3823834607704922 Therefore, we have not received very high accuracy, as the ROC is only 60.50% out of the best GBT model. Nevertheless, if we tune the hyperparameters, we will get better accuracy. We learned how a complete ML pipeline can be implemented, from collecting historical data, to transforming it into a format suitable for testing hypotheses. We also performed machine learning model training to carry out predictions. You read an excerpt from a book written by Md. Rezaul Karim, titled Scala Machine Learning Projects. In this book, you will learn to build powerful machine learning applications for performing advanced numerical computing and functional programming.  
Read more
  • 0
  • 0
  • 29814

article-image-creating-a-custom-layout-implementation-for-your-android-app
Aarthi Kumaraswamy
06 Apr 2018
5 min read
Save for later

Creating a custom layout implementation for your Android app

Aarthi Kumaraswamy
06 Apr 2018
5 min read
In most applications, you'll find that a combination of the ConstraintLayout, CoordinatorLayout, and some of the more primitive layout classes (such as LinearLayout and FrameLayout) are more than enough to achieve any layout requirements you can dream up for your user interface. Every now and again though, you'll find yourself needing a custom layout manager to achieve an effect required for the application. Layout classes extend from the ViewGroup class, and their job is to tell their child widgets where to position themselves, and how large they should be. They do this in two phases: the measurement phase and the layout phase. All View implementations are expected to provide measurements for their actual size according to specifications. These measurements are then used by the View widget's parent ViewGroup to allocate the amount of space the widget will consume on the screen. For example, a View might be told to consume, at most, the screen width. The View must then determine how much of that space it actually requires, and records that size in its measured dimensions. The measured dimensions are then used by the parent ViewGroup during the layout process. The second phase is the layout phase, and it is conducted by the ViewGroup parent of each View widget. This phase positions the View on the screen, relative to its parent ViewGroup location, and specifies the actual size that the widget will consume on the screen (typically based on the measured size calculated in the measurement phase). When you implement your own ViewGroup, you'll need to ensure that all of your child View widgets are given a chance to measure themselves before you perform the actual layout operations. Let's build a layout class to arrange its children in a circle. To keep the implementation simple, we'll assume that all the child widgets are the same size (for example, if they were all icons): Right-click on the widget package in the travel claim example app, and select New|Java Class. Name the new class CircleLayout. Change the Superclass to android.view.ViewGroup. Click OK to create the new class. Declare the standard ViewGroup constructors: public CircleLayout(final Context context) {  super(context); } public CircleLayout(    final Context context,    final AttributeSet attrs) {  super(context, attrs); } public CircleLayout(      final Context context,      final AttributeSet attrs,      final int defStyleAttr) {  super(context, attrs, defStyleAttr); } Override the onMeasure method to calculate the size of the CircleLayout and all of its child Viewwidgets. The measurement specifications are passed in as int values, which are interpreted using the staticmethods in the MeaureSpec class. Measurement specifications come in two flavors: at most and exactly, and each has a size value attached. In this particular layout, we always measure the CircleLayout as the size given in the specification. This means that the CircleLayout will always consume the maximum amount of space available. It also expects all of its children to be able to specify sizes without the match_parent attribute (as this will cause each child to take up all the available space): @Override protected void onMeasure(    final int widthMeasureSpec,    final int heightMeasureSpec) {  super.onMeasure(widthMeasureSpec, heightMeasureSpec);  measureChildren(widthMeasureSpec, heightMeasureSpec);  setMeasuredDimension(        MeasureSpec.getSize(widthMeasureSpec),        MeasureSpec.getSize(heightMeasureSpec)); } The next method to implement is the onLayout method. This performs the actual arrangement of the child View widget within the CircleLayout, by invoking their layout method. The layout method should never be overridden, because it's closely tied to the platform and performs several other important actions (such as notifying layout listeners). Instead, you should override onLayout, but invoking layout.CircleLayoutassumes that all the child View widgets are of the same size (and forces this as part of the onLayoutimplementation). This onLayout method simply calculates the available space, and then positions the child View widgets in a circle around the outside edge: protected void onLayout( final boolean changed, final int left, final int top, final int right, final int bottom) { final int childCount = getChildCount(); if (childCount == 0) { return; } final int width = right - left; final int height = bottom - top; // if we have children, we assume they're all the same size final int childrenWidth = getChildAt(0).getMeasuredWidth(); final int childrenHeight = getChildAt(0).getMeasuredHeight(); final int boxSize = Math.min( width - childrenWidth, height - childrenHeight); for (int i = 0; i < childCount; i++) { final View child = getChildAt(i); final int childWidth = child.getMeasuredWidth(); final int childHeight = child.getMeasuredHeight(); final double x = Math.sin((Math.PI * 2.0) * ((double) i / (double) childCount)); final double y = -Math.cos((Math.PI * 2.0) * ((double) i / (double) childCount)); final int childLeft = (int) (x * (boxSize / 2)) + (width / 2) - (childWidth / 2); final int childTop = (int) (y * (boxSize / 2)) + (height / 2) - (childHeight / 2); final int childRight = childLeft + childWidth; final int childBottom = childTop + childHeight; child.layout(childLeft, childTop, childRight, childBottom); } } Although the implementation of the onLayout method is quite long, it's also relatively simple. Most of the code is concerned with determining the desired position of the child View widgets. Layout code needs to execute as quickly as possible, and should avoid allocating any objects during the onMeasure and onLayout methods (similar to the rules of onDraw). Layout is a critical part of building the screen from a performance standpoint, because no rendering can actually occur without the layout being completed. The layout will also be rerun every time the layout changes its structure. For example, if you add or remove any child View widgets, or change the size or position of the ViewGroup. Changing the size of a ViewGroup might happen on every frame if you use a CoordinatorLayout, where the ViewGroup is being collapsed (or if you change its size as part of a property-animation). You read an excerpt from the book, Hands-On Android UI Development by Jason Morris. For more recipes on cutting edge Android UI tasks such as creating themes, animations, custom widgets and more, give this book a try.  
Read more
  • 0
  • 0
  • 31807
article-image-6-javascript-micro-optimizations-need-know
Savia Lobo
05 Apr 2018
18 min read
Save for later

6 JavaScript micro optimizations you need to know

Savia Lobo
05 Apr 2018
18 min read
JavaScript micro optimizations can improve the performance of your JavaScript code. This means you can get it to do more - this is essential especially when thinking about the scale of modern web applications, as greater efficiencies in code can lead to much stronger overall performance. Let us have a look at micro optimizations in detail. Truthy/falsy comparisons We have all, at some point, written if conditions or assigned default values by relying on the truthy or falsy nature of the JavaScript variables. As helpful as it is most of the times, we will need to consider the impact that such an operation would cause on our application. However, before we jump into the details, let's discuss how any condition is evaluated in JavaScript, specifically an if condition in this case. As a developer, we tend to do the following: if(objOrNumber) { // do something } This works for most of the cases, unless the number is 0, in which case it gets evaluated to false. That is a very common edge case, and most of us catch it anyway. However, what does the JavaScript engine have to do to evaluate this condition? How does it know whether the objOrNumber evaluates to true or false? Let's return to our ECMA262 specs and pull out the IF condition spec (https://www.ecma-international.org/ecma-262/5.1/#sec-12.5). The following is an excerpt of the same: Semantics The production IfStatement : If (Expression) Statement else Statement Statement is evaluated as follows:    Let exprRef be the result of evaluating Expression.    If ToBoolean(GetValue(exprRef)) is true, then Return the result of evaluating the first Statement.    Else, Return the result of evaluating the second Statement. Now, we note that whatever expression we pass goes through the following three steps: Getting the exprRef from Expression. GetValue is called on exprRef. ToBoolean is called as the result of step 2. Step 1 does not concern us much at this stage; think of it this way—an expression can be something like a == b or something like the shouldIEvaluateTheIFCondition() method call, that is, something that evaluates your condition. Step 2 extracts the value of the exprRef, that is, 10, true, undefined. In this step, we differentiate how the value is extracted based on the type of the exprRef. You can refer to the details of GetValue here. Step 3 then converts the value extracted from Step 2 into a Boolean value based on the following table (taken from https://www.ecma-international.org/ecma-262/5.1/#sec-9. 2): At each step, you can see that it is always beneficial if we are able to provide the direct boolean value instead of a truthy or falsy value. Looping optimizations We can do a deep-down dive into the for loop, similar to what we did with the if condition earlier (https://www.ecma-international.org/ecma-262/5.1/#sec-12.6.3), but there are easier and more obvious optimizations which can be applied when it comes to loops. Simple changes can drastically affect the quality and performance of the code; consider this for example: for(var i = 0; i < arr.length; i++) { // logic } The preceding code can be changed as follows: var len = arr.length; for(var i = 0; i < len; i++) { // logic } What is even better is to run the loops in reverse, which is even faster than what we have seen previously: var len = arr.length; for(var i = len; i >= 0; i--) { // logic } The conditional function call Some of the features that we have within our applications are conditional. For example, logging or analytics fall into this category. Some of the applications may have logging turned off for some time and then turned back on. The most obvious way of achieving this is to wrap the method for logging within an if condition. However, since the method could be triggered a lot of times, there is another way in which we can make the optimization in this case: function someUserAction() { // logic if (analyticsEnabled) { trackUserAnalytics(); } } // in some other class function trackUserAnalytics() { // save analytics } Instead of the preceding approach, we can instead try to do something, which is only slightly different but allows V8-based engines to optimize the way the code is executed: function someUserAction() { // logic trackUserAnalytics(); } // in some other class function toggleUserAnalytics() { if(enabled) { trackUserAnalytics =   userAnalyticsMethod; } else { trackUserAnalytics = noOp; } } function userAnalyticsMethod() { // save analytics } // empty function function noOp           {} Now, the preceding implementation is a double-edged sword. The reason for that is very simple. JavaScript engines employ a technique called inline caching (IC), which means that any previous lookup for a certain method performed by the JS engine will be cached and reused when triggered the next time; for example, if we have an object that has a nested method, a.b.c, the method a.b.c will be only looked up once and stored on cache (IC); if a.b.c is called the next time, it will be picked up from IC, and the JS engine will not parse the whole chain again. If there are any changes to the a.b.c chain, then the IC gets invalidated and a new dynamic lookup is performed the next time instead of being retrieved from the IC. So, from our previous example, when we have noOp assigned to the trackUserAnalytics() method, the method path gets tracked and saved within IC, but it internally removes this function call as it is a call to an empty method. However, when it is applied to an actual function with some logic in it, IC points it directly to this new method. So, if we keep calling our toggleUserAnalytics() method multiple times, it keeps invalidating our IC, and our dynamic method lookup has to happen every time until the application state stabilizes (that is, toggleUserAnalytics() is no longer called). Image and font optimizations When it comes to image and font optimizations, there are no limits to the types and the scale of optimization that we can perform. However, we need to keep in mind our target audience, and we need to tailor our approach based on the problem at hand. With both images and fonts, the first and foremost important thing is that we do not overserve, that is, we request and send only the data that is necessary by determining the dimensions of the device that our application is running on. The simplest way to do this is by adding a cookie for your device size and sending it to the server along with each of the request. Once the server receives the request for the image, it can then retrieve the image based on the dimension of the image that was sent to the cookie. Most of the time these images are something like a user avatar or a list of people who commented on a certain post. We can agree that the thumbnail images do not need to be of the same size as that of the profile page, and we can save some of the bandwidth while transmitting a smaller image based on the image. Since screens these days have very high Dots Per Inch (DPI), the media that we serve to screens needs to be worthy of it. Otherwise, the application looks bad and the images look all pixelated. This can be avoided using Vector images or SVGs, which can be GZipped over the wire, thus reducing the payload size. Another not so obvious optimization is changing the image compression type. Have you ever loaded a page in which the image loads from the top to bottom in small, incremental rectangles? By default, the images are compressed using a baseline technique, which is a default method of compressing the image from top to bottom. We can change this to be progressive compression using libraries such as imagemin. This would load the entire image first as blurred, then semi blurred, and so on until the entire image is uncompressed and displayed on the screen. Uncompressing a progressive JPEG might take a little longer than that of the baseline, so it is important to measure before making such optimizations. Another extension based on this concept is a Chrome-only format of an image called WebP. This is a highly effective way of serving images, which serves a lot of companies in production and saved almost 30% on bandwidth. Using WebP is almost as simple as the progressive compression as discussed previously. We can use the imagemin-webp node module, which can convert a JPEG image into a webp image, thus reducing the image size to a great extent. Web fonts are a little different than that of images. Images get downloaded and rendered onto the UI on demand, that is, when the browser encounters the image either from the HTML 0r CSS files. However, the fonts, on the other hand, are a little different. The font files are only requested when the Render Tree is completely constructed. That means that the CSSOM and DOM have to be ready by the time request is dispatched for the fonts. Also, if the fonts files are being served from the server and not locally, then there are chances that we may see the text without the font applied first (or no text at all) and then we see the font applied, which may cause a flashing effect of the text. There are multiple simple techniques to avoid this problem: Download, serve, and preload the font files locally: <link rel="preload" href="fonts/my-font.woff2" as="font"> Specify the unicode-range in the font-face so that browsers can adapt and improvise on the character set and glyphs that are actually expected by the browser: @font-face( ... unicode-range: U+000-5FF; // latin ... ) So far, we have seen that we can get the unstyled text to be loaded on to the UI and the get styled as we expected it to be; this can be changed using the font loading API, which allows us to load and render the font using JavaScript: var font = new FontFace("myFont", "url(/my-fonts/my-font.woff2)", { unicodeRange: 'U+000-5FF' }); // initiate a fetch without Render Tree font.load().then(function() { // apply the font document.fonts.add(font); document.body.style.fontFamily = "myFont"; }); Garbage collection in JavaScript Let's take a quick look at what garbage collection (GC) is and how we can handle it in JavaScript. A lot of low-level languages provide explicit capabilities to developers to allocate and free memory in their code. However, unlike those languages, JavaScript automatically handles the memory management, which is both a good and bad thing. Good because we no longer have to worry about how much memory we need to allocate, when we need to do so, and how to free the assigned memory. The bad part about the whole process is that, to an uninformed developer, this can be a recipe for disaster and they can end up with an application that might hang and crash. Luckily for us, understanding the process of GC is quite easy and can be very easily incorporated into our coding style to make sure that we are writing optimal code when it comes to memory management. Memory management has three very obvious steps:    Assign the memory to variables: var a = 10; // we assign a number to a memory location referenced by variable a    Use the variables to read or write from the memory: a += 3; // we read the memory location referenced by a and write a new value to it    Free the memory when it's no longer needed. Now, this is the part that is not explicit. How does the browser know when we are done with the variable a and it is ready to be garbage collected? Let's wrap this inside a function before we continue this discussion: function test() { var a = 10; a += 3; return a; } We have a very simple function, which just adds to our variable a and returns the result and finishes the execution. However, there is actually one more step, which will happen after the execution of this method called mark and sweep (not immediately after, sometimes this can also happen after a batch of operations is completed on the main thread). When the browser performs mark and sweep, it's dependent on the total memory the application consumes and the speed at which the memory is being consumed. Mark and sweep algorithm Since there is no accurate way to determine whether the data at a particular memory location is going to be used or not in the future, we will need to depend on alternatives which can help us make this decision. In JavaScript, we use the concept of a reference to determine whether a variable is still being used or not—if not, it can be garbage collected. The concept of mark and sweep is very straightforward: what all memory locations are reachable from all the known active memory locations? If something is not reachable, collect it, that is, free the memory. That's it, but what are the known active memory locations? It still needs a starting point, right? In most of the browsers, the GC algorithm keeps a list of the roots from which the mark and sweep process can be started. All the roots and their children are marked as active, and any variable that can be reached from these roots are also marked as active. Anything that cannot be reached can be marked as unreachable and thus collected. In most of the cases, the roots consist of the window object. So, we will go back to our previous example: function test() { var a = 10; a += 3; return a; } Our variable a is local to the test() method. As soon as the method is executed, there is no way to access that variable anymore, that is, no one holds any reference to that variable, and that is when it can be marked for garbage collection so that the next time GC runs, the var  a will be swept and the memory allocated to it can be freed. Garbage collection and V8 When it comes to V8, the process of garbage collection is extremely complex (as it should be). So, let's briefly discuss how V8 handles it. In V8, the memory (heap) is divided into two main generations, which are the new-space and old-space. Both new-space and old-space are assigned some memory (between 1 MB and 20 MB). Most of the programs and their variables when created are assigned within the new-space. As and when we create a new variable or perform an operation, which consumes memory, it is by default assigned from the new-space, which is optimized for memory allocation. Once the total memory allocated to the new-space is almost completely consumed, the browser triggers a Minor GC, which basically removes the variables that are no longer being referenced and marks the variables that are still being referenced and cannot be removed yet. Once a variable survives two or more Minor GCs, then it becomes a candidate for old-space where the GC cycle is not run as frequently as that of the new- space. A Major GC is triggered when the old-space is of a certain size, all of this is driven by the heuristics of the application, which is very important to the whole process. So, well- written programs move fewer objects into the old-space and thus have less Major GC events being triggered. Needless to say that this is a very high-level overview of what V8 does for garbage collection, and since this process keeps changing over time, we will switch gears and move on to the next topic. Avoiding memory leaks Well, now that we know on a high level what garbage collection is in JavaScript and how it works, let's take a look at some common pitfalls which prevent us from getting our variables marked for GC by the browser. Assigning variables to global scope This should be pretty obvious by now; we discussed how the GC mechanism determines a root (which is the window object) and treats everything on the root and its children as active and never marks them for garbage collection. So, the next time you forget to add a var to your variable declarations, remember that the global variable that you are creating will live forever and never get garbage collected: function test() { a = 10; // created on window object a += 3; return a; } Removing DOM elements and references It's imperative that we keep our DOM references to a minimum, so a well-known step that we like to perform is caching the DOM elements in our JavaScript so that we do not have to query any of the DOM elements over and over. However, once the DOM elements are removed, we will need to make sure that these methods are removed from our cache as well, otherwise, they will never get GC'd: var cache = { row: document.getElementById('row') }; function removeTable() { document.body.removeChild(document.getElementById('row')); } The code shown previously removes the row from the DOM but the variable cache still refers to the DOM element, hence preventing it from being garbage collected. Another interesting thing to note here is that even when we remove the table that was containing the row, the entire table would remain in the memory and not get GC'd because the row, which is in cache internally refers to the table. Closures edge case Closures are amazing; they help us deal with a lot of problematic scenarios and also provide us with ways in which we can simulate the concept of private variables. Well, all that is good, but sometimes we tend to overlook the potential downsides that are associated with the closures. Here is what we do know and use: function myGoodFunc() { var a = new Array(10000000).join('*'); // something big enough to cause a spike in memory usage function myGoodClosure() { return a + ' added from closure'; } myGoodClosure(); } setInterval(myGoodFunc, 1000); When we run this script in the browser and then profile it, we see as expected that the method consumes a constant amount of memory and then is GC'd and restored to the baseline memory consumed by the script: Now, let's zoom into one of these spikes and take a look at the call tree to determine what all events are triggered around the time of the spikes: We can see that everything happens as per our expectation here; first, our setInterval() is triggered, which calls myGoodFunc(), and once the execution is done, there is a GC, which collects the data and hence the spike, as we can see from the preceding screenshots. Now, this was the expected flow or the happy path when dealing with closures. However, sometimes our code is not as simple and we end up performing multiple things within one closure, and sometimes even end up nesting closures: function myComplexFunc() { var a = new Array(1000000).join('*'); // something big enough to cause a spike in memory usage function closure1() { return a + ' added from closure'; } closure1(); function closure2() { console.log('closure2 called') } setInterval(closure2, 100); } setInterval(myComplexFunc, 1000); We can note in the preceding code that we extended our method to contain two closures now: closure1 and closure2. Although closure1 still performs the same operation as before, closure2 will run forever because we have it running at 1/10th of the frequency of the parent function. Also, since both the closure methods share the parent closure scope, in this case the variable a, it will never get GC'd and thus cause a huge memory leak, which can be seen from the profile as follows: On a closer look, we can see that the GC is being triggered but because of the frequency at which the methods are being called, the memory is slowly leaking (lesser memory is collected than being created): Well, that was an extreme edge case, right? It's way more theoretical than practical—why would anyone have two nested setInterval() methods with closures. Let's take a look at another example in which we no longer nest multiple setInterval(), but it is driven by the same logic. Let's assume that we have a method that creates closures: var something = null; function replaceValue () { var previousValue = something; // `unused` method loads the `previousValue` into closure scope function </span>unused() { if (previousValue) console.log("hi"); } // update something something = { str: new Array(1000000).join('*'), // all closures within replaceValue share the same // closure scope hence someMethod would have access // to previousValue which is nothing but its parent // object (`something`) // since `someMethod` has access to its parent // object, even when it is replaced by a new (identical) // object in the next setInterval iteration, the previous // value does not get garbage collected because the someMethod // on previous value still maintains reference to previousValue // and so on. someMethod: function () {} }; } setInterval(replaceValue, 1000); A simple fix to solve this problem is obvious, as we have said ourselves that the previous value of the object something doesn't get garbage collected as it refers to the previousValue from the previous iteration. So, the solution to this would be to clear out the value of the previousValue at the end of each iteration, thus leaving nothing for something to refer once it is unloaded, hence the memory profiling can be seen to change: The preceding image changes as follows: To summarize,  we introduced JavaScript micro-optimizations and memory optimizations that ultimately led to a high performance JavaScript. If you have found this post useful, do check out the book Hands-On Data Structures and Algorithms with JavaScript for solutions to implement complex data structures and algorithms in practical way.    
Read more
  • 0
  • 2
  • 35850

article-image-reference-generator-for-job-portal-breadth-first-search-algorithm
Savia Lobo
05 Apr 2018
11 min read
Save for later

Creating a reference generator for a job portal using Breadth First Search (BFS) algorithm

Savia Lobo
05 Apr 2018
11 min read
In this tutorial, we will create a reference generator for a job portal with the help of Breadth First Search (BFS) algorithm. For instance, we have few users who are friends with each other, we will create nodes for each of the user and associate each of the nodes with data, such as their name and the company they work in. Once we create all the nodes, we will join them based on some predefined relationships between the nodes. Then, we will use these predefined relationships to determine who a user would have to talk to, in order to get referred for a job interview at a company of their choice. For example, A who works at company X and B who works at company Y are friends, B and C who works at company Z are friends. So, if A wants to get referred to company Z, then A talks to B, who can introduce them to C for a referral to company Z. In most production-level apps, you will not be creating graphs in such a fashion. You can simply use a graph database, which can perform a lot of features out of the box. Returning to our example, in more technical terms, we have an undirected graph (think of users as nodes and friendship as edges between them), and we want to determine the shortest path from one node to another. To perform what we have described so far, we will be using a technique known as Breadth First Search (BFS). BFS is a graph traversal mechanism in which the neighboring nodes are examined or evaluated first before moving on to the next level. This helps to ensure that the number of links found in the resulting chain is always minimum, hence we always get the shortest possible path from node A to node B. Although there are other algorithms, such as Dijkstra, to achieve similar results, we will go with BFS because Dijkstra is a more complex algorithm that is well suited when each edge has an associated cost with it. For example, in our case, we would go with Dijkstra if our user's friendships have a weight associated with it such as acquaintance, friend, and close friend, which would help us associate weights with each of those paths. A good use case to consider Dijkstra would be for something such as a Maps application, which would give you directions from point A to B based on the traffic (that is, the weight or cost associated with each edge) in between. Creating a bidirectional graph We can start with logic for our graph by creating a new file under utils/graph.js, which will hold the edges and then provide a simple shortestPath method to access the Graph and apply the BFS algorithm on the graph that is generated, as shown in the following code: var _ = require('lodash'); class Graph { constructor(users) { // initialize edges this.edges = {}; // save users for later access this.users = users; // add users and edges of each _.forEach(users, (user) => { this.edges[user.id] = user.friends; }); } } module.exports = Graph; Once we add the edges to our graph, it has nodes (user IDs), and edges are defined as the relationship between each user ID and friend in the friends array, which is available for each user. Forming the graph was an easy task, thanks to the way our data is structured. In our example dataset, each user has a set of friends list, which is listed in the following code: [ { id: 1, name: 'Adam', company: 'Facebook', friends: [2, 3, 4, 5, 7] }, { id: 2, name: 'John', company: 'Google', friends: [1, 6, 8] }, { id: 3, name: 'Bill', company: 'Twitter', friends: [1, 4, 5, 8] }, { id: 4, name: 'Jose', company: 'Apple', friends: [1, 3, 6, 8] }, { id: 5, name: 'Jack', company: 'Samsung', friends: [1, 3, 7] }, { id: 6, name: 'Rita', company: 'Toyota', friends: [2, 4, 7, 8] }, { id: 7, name: 'Smith', company: 'Matlab', friends: [1, 5, 6, 8] }, { id: 8, name: 'Jane', company: 'Ford', friends: [2, 3, 4, 6, 7] } ] As you can note in the preceding code, we did not really have to establish a bidirectional edge exclusively here because if user 1 is a friend of user 2 then user 2 is also a friend of user 1. Generating a pseudocode  for the shortest path generation Before its implementation, let's quickly jot down what we are about to do so that the actual implementation becomes a lot easier: INITIALIZE tail to 0 for subsequent iterations MARK source node as visited WHILE result not found GET neighbors of latest visited node (extracted using tail) FOR each of the node IF node already visited RETURN Mark node as visited IF node is our expected result INITIALIZE result with current neighbor node WHILE not source node BACKTRACK steps by popping users from previously visited path until the source user ADD source user to the result CREATE and format result variable IF result found return control NO result found, add user to previously visited path ADD friend to queue for BFS in next iteration INCREMENT tail for next loop RETURN NO_RESULT Implementing the shortest path generation Let's now create our customized BFS algorithm to parse the graph and generate the shortest possible path for our user to get referred to company A: var _ = require('lodash'); class Graph { constructor(users) { // initialize edges this.edges = {}; // save users for later access this.users = users; // add users and edges of each _.forEach(users, (user) => { this.edges[user.id] = user.friends; }); } shortestPath(sourceUser, targetCompany) { // final shortestPath var shortestPath; // for iterating along the breadth var tail = 0; // queue of users being visited var queue = [ sourceUser ]; // mark visited users var visitedNodes = []; // previous path to backtrack steps when shortestPath is found var prevPath = {}; // request is same as response if (_.isEqual(sourceUser.company, targetCompany)) { return; } // mark source user as visited so // next time we skip the processing visitedNodes.push(sourceUser.id); // loop queue until match is found // OR until the end of queue i.e no match while (!shortestPath && tail < queue.length) { // take user breadth first var user = queue[tail]; // take nodes forming edges with user var friendsIds = this.edges[user.id]; // loop over each node _.forEach(friendsIds, (friendId) => { // result found in previous iteration, so we can stop if (shortestPath) return; // get all details of node var friend = _.find(this.users, ['id', friendId]); // if visited already, // nothing to recheck so return if (_.includes(visitedNodes, friendId)) { return; } // mark as visited visitedNodes.push(friendId); // if company matched if (_.isEqual(friend.company, targetCompany)) { // create result path with the matched node var path = [ friend ]; // keep backtracking until source user and add to path while (user.id !== sourceUser.id) { // add user to shortest path path.unshift(user); // prepare for next iteration user = prevPath[user.id]; } // add source user to the path path.unshift(user); // format and return shortestPath shortestPath = _.map(path, 'name').join(' -> '); } // break loop if shortestPath found if (shortestPath) return; // no match found at current user, // add it to previous path to help backtracking later prevPath[friend.id] = user; // add to queue in the order of visit // i.e. breadth wise for next iteration queue.push(friend); }); // increment counter tail++; } return shortestPath || `No path between ${sourceUser.name} & ${targetCompany}`; } } module.exports = Graph; The most important part of the code is when the match is found, as shown in the following code block from the preceding code: // if company matched if (_.isEqual(friend.company, targetCompany)) { // create result path with the matched node var path = [ friend ]; // keep backtracking until source user and add to path while (user.id !== sourceUser.id) { // add user to shortest path path.unshift(user); // prepare for next iteration user = prevPath[user.id]; } // add source user to the path path.unshift(user); // format and return shortestPath shortestPath = _.map(path, 'name').join(' -> '); } Here, we are employing a technique called backtracking, which helps us retrace our steps when the result is found. The idea here is that we add the current state of the iteration to a map whenever the result is not found—the key as the node being visited currently, and the value as the node from which we are visiting. So, for example, if we visited node 1 from node 3, then the map would contain { 1: 3 } until we visit node 1 from some other node, and when that happens, our map will update to point to the new node from which we got to node 1, such as { 1: newNode }. Once we set up these previous paths, we can easily trace our steps back by looking at this map. By adding some log statements (available only in the GitHub code to avoid confusion), we can easily take a look at the long but simple flow of the data. Let us take an example of the data set that we defined earlier, so when Bill tries to look for friends who can refer him to Toyota, we see the following log statements: starting the shortest path determination added 3 to the queue marked 3 as visited shortest path not found, moving on to next node in queue: 3 extracting neighbor nodes of node 3 (1,4,5,8) accessing neighbor 1 mark 1 as visited result not found, mark our path from 3 to 1 result not found, add 1 to queue for next iteration current queue content : 3,1 accessing neighbor 4 mark 4 as visited result not found, mark our path from 3 to 4 result not found, add 4 to queue for next iteration current queue content : 3,1,4 accessing neighbor 5 mark 5 as visited result not found, mark our path from 3 to 5 result not found, add 5 to queue for next iteration current queue content : 3,1,4,5 accessing neighbor 8 mark 8 as visited result not found, mark our path from 3 to 8 result not found, add 8 to queue for next iteration current queue content : 3,1,4,5,8 increment tail to 1 shortest path not found, moving on to next node in queue: 1 extracting neighbor nodes of node 1 (2,3,4,5,7) accessing neighbor 2 mark 2 as visited result not found, mark our path from 1 to 2 result not found, add 2 to queue for next iteration current queue content : 3,1,4,5,8,2 accessing neighbor 3 neighbor 3 already visited, return control to top accessing neighbor 4 neighbor 4 already visited, return control to top accessing neighbor 5 neighbor 5 already visited, return control to top accessing neighbor 7 mark 7 as visited result not found, mark our path from 1 to 7 result not found, add 7 to queue for next iteration current queue content : 3,1,4,5,8,2,7 increment tail to 2 shortest path not found, moving on to next node in queue: 4 extracting neighbor nodes of node 4 (1,3,6,8) accessing neighbor 1 neighbor 1 already visited, return control to top accessing neighbor 3 neighbor 3 already visited, return control to top accessing neighbor 6 mark 6 as visited result found at 6, add it to result path ([6]) backtracking steps to 3 we got to 6 from 4 update path accordingly: ([4,6]) add source user 3 to result form result [3,4,6] return result increment tail to 3 return result Bill -> Jose -> Rita What we basically have here is an iterative process using BFS to traverse the tree and backtracking the result. This forms the core of our functionality. Creating a web server We can now add a route to access this graph and its corresponding shortestPath method. Let's first create the route under routes/references and add it as a middleware to the web server: var express = require('express'); var app = express(); var bodyParser = require('body-parser'); // register endpoints var references = require('./routes/references'); // middleware to parse the body of input requests app.use(bodyParser.json()); // route middleware app.use('/references', references); // start server app.listen(3000, function () { console.log('Application listening on port 3000!'); }); Then, create the route as shown in the following code: var express = require('express'); var router = express.Router(); var Graph = require('../utils/graph'); var _ = require('lodash'); var userGraph; // sample set of users with friends // same as list shown earlier var users = [...]; // middleware to create the users graph router.use(function(req) { // form graph userGraph = new Graph(users); // continue to next step req.next(); }); // create the route for generating reference path // this can also be a get request with params based // on developer preference router.route('/') .post(function(req, res) { // take user Id const userId = req.body.userId; // target company name const companyName = req.body.companyName; // extract current user info const user = _.find(users, ['id', userId]); // get shortest path const path = userGraph.shortestPath(user, companyName); // return res.send(path); }); module.exports = router; Running the reference generator To test this, simply start the web server by running the npm start command from the root of the project as shown earlier. Once the server is up and running, you can use any tool you wish to post the request to your web server, as shown in the following screenshot: As you can see in the preceding screenshot, we get the response back as expected. This can, of course, be changed in a way to return all the user objects instead of just the names. That could be a fun extension of the example for you to try on your own. We learned to create a reference generator for a job portal using the Breadth First Search (BFS) algorithm in JavaScript. If you have found this post interesting, do check out this book, Hands-On Data Structures and Algorithms with JavaScript to create and employ various data structures in a way that is demanded by your project or use case.  
Read more
  • 0
  • 0
  • 31603
Modal Close icon
Modal Close icon