Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-advanced-fetching
Packt
21 Jan 2016
6 min read
Save for later

Advanced Fetching

Packt
21 Jan 2016
6 min read
In this article by Ramin Rad, author of the book Mastering Hibernate, we have discussed various ways of fetching the data from the permanent store. We will focus a little more on annotations related to data fetch. (For more resources related to this topic, see here.) Fetching strategy In Java Persistence API, JPA, you can provide a hint to fetch the data lazily or eagerly using the FetchType. However, some implementations may ignore lazy strategy and just fetch everything eagerly. Hibernate's default strategy is FetchType.LAZY to reduce the memory footprint of your application. Hibernate offers additional fetch modes in addition to the commonly used JPA fetch types. Here, we will discuss how they are related and provide an explanation, so you understand when to use which. JOIN fetch mode The JOIN fetch type forces Hibernate to create a SQL join statement to populate both the entities and the related entities using just one SQL statement. However, the JOIN fetch mode also implies that the fetch type is EAGER, so there is no need to specify the fetch type. To understand this better, consider the following classes: @Entity public class Course { @Id @GeneratedValue private long id; private String title; @OneToMany(cascade=CascadeType.ALL, mappedBy="course") @Fetch(FetchMode.JOIN) private Set<Student> students = new HashSet<Student>(); // getters and setters } @Entity public class Student { @Id @GeneratedValue private long id; private String name; private char gender; @ManyToOne private Course course; // getters and setters } In this case, we are instructing Hibernate to use JOIN to fetch course and student in one SQL statement and this is the SQL that is composed by Hibernate: select course0_.id as id1_0_0_, course0_.title as title2_0_0_, students1_.course_id as course_i4_0_1_, students1_.id as id1_1_1_, students1_.gender as gender2_1_2_, students1_.name as name3_1_2_ from Course course0_ left outer join Student students1_ on course0_.id=students1_.course_id where course0_.id=? As you can see, Hibernate is using a left join all courses and any student that may have signed up for those courses. Another important thing to note is that if you use HQL, Hibernate will ignore JOIN fetch mode and you'll have to specify the join in the HQL. (we will discuss HQL in the next section) In other words, if you fetch a course entity using a statement such as this: List<Course> courses = session .createQuery("from Course c where c.id = :courseId") .setLong("courseId", chemistryId) .list(); Then, Hibernate will use SELECT mode; but if you don't use HQL, as shown in the next example, Hibernate will pay attention to the fetch mode instructions provided by the annotation. Course course = (Course) session.get(Course.class, chemistryId); SELECT fetch mode In SELECT mode, Hibernate uses an additional SELECT statement to fetch the related entities. This mode doesn't affect the behavior of the fetch type (LAZY, EAGER), so they will work as expected. To demonstrate this, consider the same example used in the last section and lets examine the output: select id, title from Course where id=? select course_id, id, gender, name from Student where course_id=? Note that the first Hibernate fetches and populates the Course entity and then uses the course ID to fetch the related students. Also, if your fetch type is set to LAZY and you never reference the related entities, the second SELECT is never executed. SUBSELECT fetch mode The SUBSELECT fetch mode is used to minimize the number of SELECT statements executed to fetch the related entities. If you first fetch the owner entities and then try to access the associated owned entities, without SUBSELECT, Hibernate will issue an additional SELECT statement for every one of the owner entities. Using SUBSELECT, you instruct Hibernate to use a SQL sub-select to fetch all the owners for the list of owned entities already fetched. To understand this better, let's explore the following entity classes. @Entity public class Owner { @Id @GeneratedValue private long id; private String name; @OneToMany(cascade=CascadeType.ALL, mappedBy="owner") @Fetch(FetchMode.SUBSELECT) private Set<Car> cars = new HashSet<Car>(); // getters and setters } @Entity public class Car { @Id @GeneratedValue private long id; private String model; @ManyToOne private Owner owner; // getters and setters } If you try to fetch from the Owner table, Hibernate will only issue two select statements; one to fetch the owners and another to fetch the cars for those owners, by using a sub-select, as follows: select id, name from Owner select owner_id, id, model from Car where owner_id in (select id from Owner) Without the SUBSELECT fetch mode, instead of the second select statement as shown in the preceding section, Hibernate will execute a select statement for every entity returned by the first statement. This is known as the n+1 problem, where one SELECT statement is executed, then, for each returned entity another SELECT statement is executed to fetch the associated entities. Finally, SUBSELECT fetch mode is not supported in the ToOne associations, such as OneToOne or ManyToOne because it was designed for relationships where the ownership of the entities is clear. Batch fetching Another strategy offered by Hibernate is batch fetching. The idea is very similar to SUBSELECT, except that instead of using SUBSELECT, the entity IDs are explicitly listed in the SQL and the list size is determined by the @BatchSize annotation. This may perform slightly better for smaller batches. (Note that all the commercial database engines also perform query optimization.) To demonstrate this, let's consider the following entity classes: @Entity public class Owner { @Id @GeneratedValue private long id; private String name; @OneToMany(cascade=CascadeType.ALL, mappedBy="owner") @BatchSize(size=10) private Set<Car> cars = new HashSet<Car>(); // getters and setters } @Entity public class Car { @Id @GeneratedValue private long id; private String model; @ManyToOne private Owner owner; // getters and setters } Using @BatchSize, we are instructing Hibernate to fetch the related entities (cars) using a SQL statement that uses a where in clause; thus listing the relevant ID for the owner entity, as shown: select id, name from Owner select owner_id, id, model from Car where owner_id in (?, ?) In this case, the first select statement only returned two rows, but if it returns more than the batch size there would be multiple select statements to fetch the owned entities, each fetching 10 entities at a time. Summary In this article, we covered many ways of fetching datasets from the database. Resources for Article: Further resources on this subject: Hibernate Types[article] Java Hibernate Collections, Associations, and Advanced Concepts[article] Integrating Spring Framework with Hibernate ORM Framework: Part 1[article]
Read more
  • 0
  • 0
  • 23051

article-image-secure-private-cloud-iam
Savia Lobo
10 May 2018
11 min read
Save for later

How to secure a private cloud using IAM

Savia Lobo
10 May 2018
11 min read
In this article, we look at securing the private cloud using IAM. For IAM, OpenStack uses the Keystone project. Keystone provides the identity, token, catalog, and policy services, which are used specifically by OpenStack services. It is organized as a group of internal services exposed on one or many endpoints. For example, an authentication call validates the user and project credentials with the identity service. [box type="shadow" align="" class="" width=""]This article is an excerpt from the book,'Cloud Security Automation'. In this book, you'll learn how to work with OpenStack security modules and learn how private cloud security functions can be automated for better time and cost-effectiveness.[/box] Authentication Authentication is an integral part of an OpenStack deployment and so we must be careful about the system design. Authentication is the process of confirming a user's identity, which means that a user is actually who they claim to be. For example, providing a username and a password when logging into a system. Keystone supports authentication using the username and password, LDAP, and external authentication methods. After successful authentication, the identity service provides the user with an authorization token, which is further used for subsequent service requests. Transport Layer Security (TLS) provides authentication between services and users using X.509 certificates. The default mode for TLS is server-side only authentication, but we can also use certificates for client authentication. However, in authentication, there can also be the case where a hacker is trying to access the console by guessing your username and password. If we have not enabled the policy to handle this, it can be disastrous. For this, we can use the Failed Login Policy, which states that a maximum number of attempts are allowed for a failed login; after that, the account is blocked for a certain number of hours and the user will also get a notification about it. However, the identity service provided in Keystone does not provide a method to limit access to accounts after repeated unsuccessful login attempts. For this, we need to rely on an external authentication system that blocks out an account after a configured number of failed login attempts. Then, the account might only be unlocked with further side-channel intervention, or on request, or after a certain duration. We can use detection techniques to the fullest only when we have a prevention method available to save them from damage. In the detection process, we frequently review the access control logs to identify unauthorized attempts to access accounts. During the review of access control logs, if we find any hints of a brute force attack (where the user tries to guess the username and password to log in to the system), we can define a strong username and password or block the source of the attack (IP) through firewall rules. When we define firewall rules on Keystone node, it restricts the connection, which helps to reduce the attack surface. Apart from this, reviewing access control logs also helps to examine the account activity for unusual logins and suspicious actions, so that we can take corrective actions such as disabling the account. To increase the level of security, we can also utilize MFA for network access to the privileged user accounts. Keystone supports external authentication services through the Apache web server that can provide this functionality. Servers can also enforce client-side authentication using certificates. This will help to get rid of brute force and phishing attacks that may compromise administrator passwords. Authentication methods – internal and external Keystone stores user credentials in a database or may use an LDAP-compliant directory server. The Keystone identity database can be kept separate from databases used by other OpenStack services to reduce the risk of a compromise of the stored credentials. When we use the username and password to authenticate, identity does not apply policies for password strength, expiration, or failed authentication attempts. For this, we need to implement external authentication services. To integrate an external authentication system or organize an existing directory service to manage users account management, we can use LDAP. LDAP simplifies the integration process. In OpenStack authentication and authorization, the policy may be delegated to another service. For example, an organization that is going to deploy a private cloud and already has a database of employees and users in an LDAP system. Using this LDAP as an authentication authority, requests to the Identity service (Keystone) are transferred to the LDAP system, which allows or denies requests based on its policies. After successful authentication, the identity service generates a token for access to the authorized services. Now, if the LDAP has already defined attributes for the user such as the admin, finance, and HR departments, these must be mapped into roles and groups within identity for use by the various OpenStack services. We need to define this mapping into Keystone node configuration files stored at /etc/keystone/keystone.conf. Keystone must not be allowed to write to the LDAP used for authentication outside of the OpenStack Scope, as there is a chance to allow a sufficiently privileged Keystone user to make changes to the LDAP directory, which is not desirable from a security point of view. This can also lead to unauthorized access of other information and resources. So, if we have other authentication providers such as LDAP or Active Directory, then user provisioning always happens at other authentication provider systems. For external authentication, we have the following methods: MFA: The MFA service requires the user to provide additional layers of information for authentication such as a one-time password token or X.509 certificate (called MFA token). Once MFA is implemented, the user will have to enter the MFA token after putting the user ID and password in for a successful login. Password policy enforcement: Once the external authentication service is in place, we can define the strength of the user passwords to conform to the minimum standards for length, diversity of characters, expiration, or failed login attempts. Keystone also supports TLS-based client authentication. TLS client authentication provides an additional authentication factor, apart from the username and password, which provides greater reliability on user identification. It reduces the risk of unauthorized access when usernames and passwords are compromised. However, TLS-based authentication is not cost effective as we need to have a certificate for each of the clients. Authorization Keystone also provides the option of groups and roles. Users belong to groups where a group has a list of roles. All of the OpenStack services, such as Cinder, Glance, nova, and Horizon, reference the roles of the user attempting to access the service. OpenStack policy enforcers always consider the policy rule associated with each resource and use the user’s group or role, and their association, to determine and allow or deny the service access. Before configuring roles, groups, and users, we should document your required access control policies for the OpenStack installation. The policies must be as per the regulatory or legal requirements of the organization. Additional changes to the access control configuration should be done as per the formal policies. These policies must include the conditions and processes for creating, deleting, disabling, and enabling accounts, and for assigning privileges to the accounts. One needs to review these policies from time to time and ensure that the configuration is in compliance with the approved policies. For user creation and administration, there must be a user created with the admin role in Keystone for each OpenStack service. This account will provide the service with the authorization to authenticate users. Nova (compute) and Swift (object storage) can be configured to use the Identity service to store authentication information. For the test environment, we can have tempAuth, which records user credentials in a text file, but it is not recommended for the production environment. The OpenStack administrator must protect sensitive configuration files from unauthorized modification with mandatory access control frameworks such as SELinux or DAC. Also, we need to protect the Keystone configuration files, which are stored at /etc/keystone/keystone.conf, and also the X.509 certificates. It is recommended that cloud admin users must authenticate using the identity service (Keystone) and an external authentication service that supports two-factor authentication. Getting authenticated with two-factor authentication reduces the risk of compromised passwords. It is also recommended in the NIST guideline called NIST 800-53 IA-2(1). Which defines MFA for network access to privileged accounts, when one factor is provided by a separate device from the system being accessed. Policy, tokens, and domains In OpenStack, every service defines the access policies for its resources in a policy file, where a resource can be like an API access, it can create and attach Cinder volume, or it can create an instance. The policy rules are defined in JSON format in a file called policy.json. Only administrators can modify the service-based policy.json file, to control the access to the various resources. However, one has to also ensure that any changes to the access control policies do not unintentionally breach or create an option to breach the security of any resource. Any changes made to policy.json are applied immediately and it does not need any service restart. After a user is authenticated, a token is generated for authorization and access to an OpenStack environment. A token can have a variable lifespan, but the default value is 1 hour. It is also recommended to lower the lifespan of the token to a certain level so that within the specified timeframe the internal service can complete the task. If the token expires before task completion, the system can be unresponsive. Keystone also supports token revocation. For this, it uses an API to revoke a token and to list the revoked tokens. In OpenStack Newton release, there are four supported token types: UUID, PKI, PKIZ, and fernet. After the OpenStack Ocata release, there are two supported token types: UUID and fernet. We'll see all of these token types in detail here: UUID: These tokens are persistent tokens. UUID tokens are 32 bytes in length, which must be persisted in the backend. They are stored in the Keystone backend, along with the metadata for authentication. All of the clients must pass their UUID token to the Keystone (identity service) in order to validate it. PKI and PKIZ: These are signed documents that contain the authentication content, as well as the service catalog. The difference between the PKI and PKIZ is that PKIZ tokens are compressed to help mitigate the size issues of PKI (sometimes PKI tokens becomes very long). Both of these tokens have become obsolete after the Ocata release. The length of PKI and PKIZ tokens typically exceeds 1,600 bytes. The Identity service uses public and private key pairs and certificates in order to create and validate these tokens. Fernet: These tokens are the default supported token provider for OpenStack Pike Release. It is a secure messaging format explicitly designed for use in API tokens. They are nonpersistent, lightweight (fall in the range of 180 to 240 bytes), and reduce the operational overhead. Authentication and authorization metadata is neatly bundled into a message-packed payload, which is then encrypted and signed in as a fernet token. In the OpenStack, the Keystone Service domain is a high-level container for projects, users, and groups. Domains are used to centrally manage all Keystone-based identity components. Compute, storage, and other resources can be logically grouped into multiple projects, which can further be grouped under a master account. Users of different domains can be represented in different authentication backends and have different attributes that must be mapped to a single set of roles and privileges in the policy definitions to access the various service resources. Domain-specific authentication drivers allow the identity service to be configured for multiple domains, using domain-specific configuration files stored at keystone.conf. Federated identity Federated identity enables you to establish trusts between identity providers and the cloud environment (OpenStack Cloud). It gives you secure access to cloud resources using your existing identity. You do not need to remember multiple credentials to access your applications. Now, the question is, what is the reason for using federated identity? This is answered as follows: It enables your security team to manage all of the users (cloud or noncloud) from a single identity application It enables you to set up different identity providers on the basis of the application that somewhere creates an additional workload for the security team and leads the security risk as well It gives ease of life to users by proving them a single credential for all of the apps so that they can save the time they spend on the forgot password page Federated identity enables you to have a single sign-on mechanism. We can implement it using SAML 2.0. To do this, you need to run the identity service provider under Apache. We learned about securing your private cloud and the authentication process therein. If you've enjoyed this article, do check out 'Cloud Security Automation' for a hands-on experience of automating your cloud security and governance. Top 5 cloud security threats to look out for in 2018 Cloud Security Tips: Locking Your Account Down with AWS Identity Access Manager (IAM)
Read more
  • 0
  • 0
  • 23050

article-image-protecting-gpg-keys-beaglebone
Packt
24 Sep 2014
23 min read
Save for later

Protecting GPG Keys in BeagleBone

Packt
24 Sep 2014
23 min read
In this article by Josh Datko, author of BeagleBone for Secret Agents, you will learn how to use the BeagleBone Black to safeguard e-mail encryption keys. (For more resources related to this topic, see here.) After our investigation into BBB hardware security, we'll now use that technology to protect your personal encryption keys for the popular GPG software. GPG is a free implementation of the OpenPGP standard. This standard was developed based on the work of Philip Zimmerman and his Pretty Good Privacy (PGP) software. PGP has a complex socio-political backstory, which we'll briefly cover before getting into the project. For the project, we'll treat the BBB as a separate cryptographic co-processor and use the CryptoCape, with a keypad code entry device, to protect our GPG keys when they are not in use. Specifically, we will do the following: Tell you a little about the history and importance of the PGP software Perform basic threat modeling to analyze your project Create a strong PGP key using the free GPG software Teach you to use the TPM to protect encryption keys History of PGP The software used in this article would have once been considered a munition by the U.S. Government. Exporting it without a license from the government, would have violated the International Traffic in Arms Regulations (ITAR). As late as the early 1990s, cryptography was heavily controlled and restricted. While the early 90s are filled with numerous accounts by crypto-activists, all of which are well documented in Steven Levy's Crypto, there is one man in particular who was the driving force behind the software in this project: Philip Zimmerman. Philip Zimmerman had a small pet project around the year 1990, which he called Pretty Good Privacy. Motivated by a strong childhood passion for codes and ciphers, combined with a sense of political activism against a government capable of strong electronic surveillance, he set out to create a strong encryption program for the people (Levy 2001). One incident in particular helped to motivate Zimmerman to finish PGP and publish his work. This was the language that the then U.S. Senator Joseph Biden added to Senate Bill #266, which would mandate that: "Providers of electronic communication services and manufacturers of electronic communications service equipment shall ensure that communication systems permit the government to obtain the plaintext contents of voice, data, and other communications when appropriately authorized by law." In 1991, in a rush to release PGP 1.0 before it was illegal, Zimmerman released his software as a freeware to the Internet. Subsequently, after PGP spread, the U.S. Government opened a criminal investigation on Zimmerman for the violation of the U.S. export laws. Zimmerman, in what is best described as a legal hack, published the entire source code of PGP, including instructions on how to scan it back into digital form, as a book. As Zimmerman describes: "It would be politically difficult for the Government to prohibit the export of a book that anyone may find in a public library or a bookstore."                                                                                                                           (Zimmerman, 1995) A book published in the public domain would no longer fall under ITAR export controls. The genie was out of the bottle; the government dropped its case against Zimmerman in 1996. Reflecting on the Crypto Wars Zimmerman's battle is considered a resilient victory. Many other outspoken supporters of strong cryptography, known as cypherpunks, also won battles popularizing and spreading encryption technology. But if the Crypto Wars were won in the early nineties, why hasn't cryptography become ubiquitous? Well, to a degree, it has. When you make purchases online, it should be protected by strong cryptography. Almost nobody would insist that their bank or online store not use cryptography and most probably feel more secure that they do. But what about personal privacy protecting software? For these tools, habits must change as the normal e-mail, chat, and web browsing tools are insecure by default. This change causes tension and resistance towards adoption. Also, security tools are notoriously hard to use. In the seminal paper on security usability, researchers conclude that the then PGP version 5.0, complete with a Graphical User Interface (GUI), was not able to prevent users, who were inexperienced with cryptography but all of whom had at least some college education, from making catastrophic security errors (Whitten 1999). Glenn Greenwald delayed his initial contact with Edward Snowden for roughly two months because he thought GPG was too complicated to use (Greenwald, 2014). Snowden absolutely refused to share anything with Greenwald until he installed GPG. GPG and PGP enable an individual to protect their own communications. Implicitly, you must also trust the receiving party not to forward your plaintext communication. GPG expects you to protect your private key and does not rely on a third party. While this adds some complexity and maintenance processes, trusting a third party with your private key can be disastrous. In August of 2013, Ladar Levison decided to shut down his own company, Lavabit, an e-mail provider, rather than turn over his users' data to the authorities. Levison courageously pulled the plug on his company rather then turn over the data. The Lavabit service generated and stored your private key. While this key was encrypted to the user's password, it still enabled the server to have access to the raw key. Even though the Lavabit service alleviated users from managing their private key themselves, it enabled the awkward position for Levison. To use GPG properly, you should never turn over your private key. For a complete analysis of Lavabit, see Moxie Marlinspike's blog post at http://www.thoughtcrime.org/blog/lavabit-critique/. Given the breadth and depth of state surveillance capabilities, there is a re-kindled interest in protecting one's privacy. Researchers are now designing secure protocols, with these threats in mind (Borisov, 2014). Philip Zimmerman ended the chapter on Why Do You Need PGP? in the Official PGP User's Guide with the following statement, which is as true today as it was when first inked: "PGP empowers people to take their privacy into their own hands. There's a growing social need for it." Developing a threat model We introduced the concept of a threat model. A threat model is an analysis of the security of the system that identifies assets, threats, vulnerabilities, and risks. Like any model, the depth of the analysis can vary. In the upcoming section, we'll present a cursory analysis so that you can start thinking about this process. This analysis will also help us understand the capabilities and limitations of our project. Outlining the key protection system The first step of our analysis is to clearly provide a description of the system we are trying to protect. In this project, we'll build a logical GPG co-processor using the BBB and the CryptoCape. We'll store the GPG keys on the BBB and then connect to the BBB over Secure Shell (SSH) to use the keys and to run GPG. The CryptoCape will be used to encrypt your GPG key when not in use, known as at rest. We'll add a keypad to collect a numeric code, which will be provided to the TPM. This will allow the TPM to unwrap your GPG key. The idea for this project was inspired by Peter Gutmann's work on open source cryptographic co-processors (Gutmann, 2000). The BBB, when acting as a co-processor to a host, is extremely flexible, and considering the power usage, relatively high in performance. By running sensitive code that will have access to cleartext encryption keys on a separate hardware, we gain an extra layer of protection (or at the minimum, a layer of indirection). Identifying the assets we need to protect Before we can protect anything, we must know what to protect. The most important assets are the GPG private keys. With these keys, an attacker can decrypt past encrypted messages, recover future messages, and use the keys to impersonate you. By protecting your private key, we are also protecting your reputation, which is another asset. Our decrypted messages are also an asset. An attacker may not care about your key if he/she can easily access your decrypted messages. The BBB itself is an asset that needs protecting. If the BBB is rendered inoperable, then an attacker has successfully prevented you from accessing your private keys, which is known as a Denial-Of-Service (DOS). Threat identification To identify the threats against our system, we need to classify the capabilities of our adversaries. This is a highly personal analysis, but we can generalize our adversaries into three archetypes: a well funded state actor, a skilled cracker, and a jealous ex-lover. The state actor has nearly limitless resources both from a financial and personnel point of view. The cracker is a skilled operator, but lacks the funding and resources of the state actor. The jealous ex-lover is not a sophisticated computer attacker, but is very motivated to do you harm. Unfortunately, if you are the target of directed surveillance from a state actor, you probably have much bigger problems than your GPG keys. This actor can put your entire life under monitoring and why go through the trouble of stealing your GPG keys when the hidden video camera in the wall records everything on your screen. Also, it's reasonable to assume that everyone you are communicating with is also under surveillance and it only takes one mistake from one person to reveal your plans for world domination. The adage by Benjamin Franklin is apropos here: Three may keep a secret if two of them are dead. However, properly using GPG will protect you from global passive surveillance. When used correctly, neither your Internet Service Provider, nor your e-mail provider, or any passive attacker would learn the contents of your messages. The passive adversary is not going to engage your system, but they could monitor a significant amount of Internet traffic in an attempt to collect it all. Therefore, the confidentiality of your message should remain protected. We'll assume the cracker trying to harm you is remote and does not have physical access to your BBB. We'll also assume the worst case that the cracker has compromised your host machine. In this scenario there is, unfortunately, a lot that the cracker can perform. He can install a key logger and capture everything, including the password that is typed on your computer. He will not be able to get the code that we'll enter on the BBB; however, he would be able to log in to the BBB when the key is available. The jealous ex-lover doesn't understand computers very well, but he doesn't need to, because he knows how to use a golf club. He knows that this BBB connected to your computer is somehow important to you because you've talked his ear off about this really cool project that you read in a book. He physically can destroy the BBB and with it, your private key (and probably the relationship as well!). Identifying the risks How likely are the previous risks? The risk of active government surveillance in most countries is fortunately low. However, the consequences of this attack are very damaging. The risk of being caught up in passive surveillance by a state actor, as we have learned from Edward Snowden, is very likely. However, by using GPG, we add protection against this threat. An active cracker seeking you harm is probably unlikely. Contracting keystroke-capturing malware, however, is probably not an unreasonable event. A 2013 study by Microsoft concluded that 8 out of every 1,000 computers were infected with malware. You may be tempted to play these odds but let's rephrase this statement: in a group of 125 computers, one is infected with malware. A school or university easily has more computers than this. Lastly, only you can assess the risk of a jealous ex-lover. For the full Microsoft report, refer to http://blogs.technet.com/b/security/archive/2014/03/31/united-states-malware-infection-rate-more-than-doubles-in-the-first-half-of-2013.aspx. Mitigating the identified risks If you find yourself the target of a state, this project alone is not going to help much. We can protect ourselves somewhat from the cracker with two strategies. The first is instead of connecting the BBB to your laptop or computer, you can use the BBB as a standalone machine and transfer files via a microSD card. This is known as an air-gap. With a dedicated monitor and keyboard, it is much less likely for software vulnerabilities to break the gap and infect the BBB. However, this comes as a high level of personal inconvenience, depending on how often you encrypt files. If you consider the risk of running the BBB attached to your computer too high, create an air-gapped BBB for maximum protection. If you deem the risk low, because you've hardened your computer and have other protection mechanism, then keep the BBB attached to the computer. An air-gapped computer can still be compromised. In 2010, a highly specialized worm known as Stuxnet was able to spread to networked isolated machines through USB flash drives. The second strategy is to somehow enter the GPG passphrase directly into the BBB without using the host's keyboard. After we complete the project, we'll suggest a mechanism to do this, but it is slightly more complicated. This would eliminate the threat of the key logger since the pin is directly entered. The mitigation against the ex-lover is to treat your BBB as you would your own wallet, and don't leave it out of your sight. It's slightly larger than you would want, but it's certainly small enough to fit in a small backpack or briefcase. Summarizing our threat model Our threat model, while cursory, illustrates the thought process one should go through before using or developing security technologies. The term threat model is specific to the security industry, but it's really just proper planning. The purpose of this analysis is to find logic bugs and prevent you from spending thousands of dollars on high-tech locks for your front door when you keep your backdoor unlocked. Now that we understand what we are trying to protect and why it is important to use GPG, let's build the project. Generating GPG keys First, we need to install GPG on the BBB. It is mostly likely already installed, but you can check and install it with the following command: sudo apt-get install gnupg gnupg-curl Next, we need to add a secret key. For those that already have a secret key, you can import your secret key ring, secring.gpg, to your ~/.gnupg folder. For those that want to create a new key, on the BBB, proceed to the upcoming section. This project assumes some familiarity with GPG. If GPG is new to you, the Free Software Foundation maintains the Email Self-Defense guide which is a very approachable introduction to the software and can be found at https://emailselfdefense.fsf.org/en/index.html. Generating entropy If you decided to create a new key on the BBB, there are a few technicalities we must consider. First of all, GPG will need a lot of random data to generate the keys. The amount of random data available in the kernel is proportional to the amount of entropy that is available. You can check the available entropy with the following command: cat /proc/sys/kernel/random/entropy_avail If this command returns a relatively low number, under 200, then GPG will not have enough entropy to generate a key. On a PC, one can increase the amount of entropy by interacting with the computer such as typing on the keyboard or moving the mouse. However, such sources of entropy are difficult for embedded systems, and in our current setup, we don't have the luxury of moving a mouse. Fortunately, there are a few tools to help us. If your BBB is running kernel version 3.13 or later, we can use the hardware random number generator on the AM3358 to help us out. You'll need to install the rng-tools package. Once installed, you can edit /etc/default/rng-tools and add the following line to register the hardware random number generated for rng-tools: HRNGDEVICE=/dev/hwrng After this, you should start the rng-tools daemon with: /etc/init.d/rng-tools start If you don't have /dev/hwrng—and currently, the chips on the CryptoCape do not yet have character device support and aren't available to /dev/hwrng—then you can install haveged. This daemon implements the Hardware Volatile Entropy Gathering and Expansion (HAVEGE) algorithm, the details of which are available at http://www.irisa.fr/caps/projects/hipsor/. This daemon will ensure that the BBB maintains a pool of entropy, which will be sufficient for generating a GPG key on the BBB. Creating a good gpg.conf file Before you generate your key, we need to establish some more secure defaults for GPG. As we discussed earlier, it is still not as easy as it should be to use e-mail encryption. Riseup.net, an e-mail provider with a strong social cause, maintains an OpenPGP best practices guide at https://help.riseup.net/en/security/message-security/openpgp/best-practices. This guide details how to harden your GPG configuration and provides the motivation behind each option. It is well worth a read to understand the intricacies of GPG key management. Jacob Applebaum maintains an implementation of these best practices, which you should download from https://github.com/ioerror/duraconf/raw/master/configs/gnupg/gpg.conf and save as your ~/.gnupg/gpg.conf file. The configuration is well commented and you can refer to the best practices guide available at Riseup.net for more information. There are three entries, however, that you should modify. The first is default-key, which is the fingerprint of your primary GPG key. Later in this article, we'll show you how to retrieve that fingerprint. We can't perform this action now because we don't have a key yet. The second is keyserver-options ca-cert-file, which is the certificate authority for the keyserver pool. Keyservers host your public keys and a keyserver pool is a redundant collection of keyservers. The instructions on Riseup.net gives the details on how to download and install that certificate. Lastly, you can use Tor to fetch updates on your keys. The act of you requesting a public key from a keyserver signals that you have a potential interest in communicating with the owner of that key. This metadata might be more interesting to a passive adversary than the contents of your message, since it reveals your social network. Tor is apt at protecting traffic analysis. You probably don't want to store your GPG keys on the same BBB as your bridge, so a second BBB would help here. On your GPG BBB, you need to only run Tor as a client, which is its default configuration. Then you can update keyserver-options http-proxy to point to your Tor SOCKS proxy running on localhost. The Electronic Frontier Foundation (EFF) provides some hypothetical examples on the telling nature of metadata, for example, They (the government) know you called the suicide prevention hotline from the Golden Gate Bridge. But the topic of the call remains a secret. Refer to the EFF blog post at https://www.eff.org/deeplinks/2013/06/why-metadata-matters for more details. Generating the key Now you can generate your GPG key. Follow the on screen instructions and don't include a comment. Depending on your entropy source, this could take a while. This example took 10 minutes using haveged as the entropy collector. There are various opinions on what to set as the expiration date. If this is your first GPG, try one year at first. You can always make a new key or extend the same one. If you set the key to never expire and you lose the key, by forgetting the passphrase, people will still think it's valid unless you revoke it. Also, be sure to set the user ID to a name that matches some sort of identification, which will make it easier for people to verify that the holder of the private key is the same person as a certified piece of paper. The command to create a new key is gpg –-gen-key: Please select what kind of key you want:    (1) RSA and RSA (default)    (2) DSA and Elgamal    (3) DSA (sign only)    (4) RSA (sign only) Your selection? 1 RSA keys may be between 1024 and 4096 bits long. What keysize do you want? (2048) 4096 Requested keysize is 4096 bits Please specify how long the key should be valid.          0 = key does not expire      <n> = key expires in n days      <n>w = key expires in n weeks      <n>m = key expires in n months      <n>y = key expires in n years Key is valid for? (0) 1y Key expires at Sat 06 Jun 2015 10:07:07 PM UTC Is this correct? (y/N) y   You need a user ID to identify your key; the software constructs the user ID from the Real Name, Comment and Email Address in this form:    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"   Real name: Tyrone Slothrop Email address: tyrone.slothrop@yoyodyne.com Comment: You selected this USER-ID:    "Tyrone Slothrop <tyrone.slothrop@yoyodyne.com>"   Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O You need a Passphrase to protect your secret key.   We need to generate a lot of random bytes. It is a good idea to perform some other action (type on the keyboard, move the mouse, utilize the disks) during the prime generation; this gives the random number generator a better chance to gain enough entropy. ......+++++ ..+++++   gpg: key 0xABD9088171345468 marked as ultimately trusted public and secret key created and signed.   gpg: checking the trustdb gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model gpg: depth: 0 valid:   1 signed:   0 trust: 0-, 0q, 0n, 0m, 0f, 1u gpg: next trustdb check due at 2015-06-06 pub   4096R/0xABD9088171345468 2014-06-06 [expires: 2015-06-06]      Key fingerprint = CBF9 1404 7214 55C5 C477 B688 ABD9 0881 7134 5468 uid                 [ultimate] Tyrone Slothrop <tyrone.slothrop@yoyodyne.com> sub   4096R/0x9DB8B6ACC7949DD1 2014-06-06 [expires: 2015-06-06]   gpg --gen-key 320.62s user 0.32s system 51% cpu 10:23.26 total From this example, we know that our secret key is 0xABD9088171345468. If you end up creating multiple keys, but use just one of them more regularly, you can edit your gpg.conf file and add the following line: default-key 0xABD9088171345468 Postgeneration maintenance In order for people to send you encrypted messages, they need to know your public key. Having your public key server can help distribute your public key. You can post your key as follows, and replace the fingerprint with your primary key ID: gpg --send-keys 0xABD9088171345468 GPG does not rely on third parties and expects you to perform key management. To ease this burden, the OpenPGP standards define the Web-of-Trust as a mechanism to verify other users' keys. Details on how to participate in the Web-of-Trust can be found in the GPG Privacy Handbook at https://www.gnupg.org/gph/en/manual/x334.html. You are also going to want to create a revocation certificate. A revocation certificate is needed when you want to revoke your key. You would do this when the key has been compromised, say if it was stolen. Or more likely, if the BBB fails and you can no longer access your key. Generate the certificate and follow the ensuing prompts replacing the ID with your key ID: gpg --output revocation-certificate.asc --gen-revoke 0xABD9088171345468   sec 4096R/0xABD9088171345468 2014-06-06 Tyrone Slothrop <tyrone.slothrop@yoyodyne.com>   Create a revocation certificate for this key? (y/N) y Please select the reason for the revocation: 0 = No reason specified 1 = Key has been compromised 2 = Key is superseded 3 = Key is no longer used Q = Cancel (Probably you want to select 1 here) Your decision? 0 Enter an optional description; end it with an empty line: >  Reason for revocation: No reason specified (No description given) Is this okay? (y/N) y   You need a passphrase to unlock the secret key for user: "Tyrone Slothrop <tyrone.slothrop@yoyodyne.com>" 4096-bit RSA key, ID 0xABD9088171345468, created 2014-06-06   ASCII armored output forced. Revocation certificate created.   Please move it to a medium which you can hide away; if Mallory gets access to this certificate he can use it to make your key unusable. It is smart to print this certificate and store it away, just in case your media become unreadable. But have some caution: The print system of your machine might store the data and make it available to others! Do take the advice and move this file off the BeagleBone. Printing it out and storing it somewhere safe is a good option, or burn it to a CD. The lifespan of a CD or DVD may not be as long as you think. The United States National Archives Frequently Asked Questions (FAQ) page on optical storage media states that: "CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer." Refer to their website http://www.archives.gov/records-mgmt/initiatives/temp-opmedia-faq.html for more details. Lastly, create an encrypted backup of your encryption key and consider storing that in a safe location on durable media. Using GPG With your GPG private key created or imported, you can now use GPG on the BBB as you would on any other computer. You may have already installed Emacs on your host computer. If you follow the GNU/Linux instructions, you can also install Emacs on the BBB. If you do, you'll enjoy automatic GPG encryption and decryption for files that end in the .gpg extension. For example, suppose you want to send a message to your good friend, Pirate Prentice, whose GPG key you already have. Compose your message in Emacs, and then save it with a .gpg extension. Emacs will prompt you to select the public keys for encryption and will automatically encrypt the buffer. If a GPG-encrypted message is encrypted to a public key, with which you have the corresponding private key, Emacs will automatically decrypt the message if it ends with .gpg. When using Emacs from the terminal, the prompt for encryption should look like the following screenshot: Summary This article covered and taught you about how GPG can protect e-mail confidentiality Resources for Article: Further resources on this subject: Making the Unit Very Mobile - Controlling Legged Movement [Article] Pulse width modulator [Article] Home Security by BeagleBone [Article]
Read more
  • 0
  • 0
  • 23042

article-image-dynamodb-best-practices
Packt
15 Sep 2015
24 min read
Save for later

DynamoDB Best Practices

Packt
15 Sep 2015
24 min read
 In this article by Tanmay Deshpande, the author of the book DynamoDB Cookbook, we will cover the following topics: Using a standalone cache for frequently accessed items Using the AWS ElastiCache for frequently accessed items Compressing large data before storing it in DynamoDB Using AWS S3 for storing large items Catching DynamoDB errors Performing auto-retries on DynamoDB errors Performing atomic transactions on DynamoDB tables Performing asynchronous requests to DynamoDB (For more resources related to this topic, see here.) Introduction We are going to talk about DynamoDB implementation best practices, which will help you improve the performance while reducing the operation cost. So let's get started. Using a standalone cache for frequently accessed items In this recipe, we will see how to use a standalone cache for frequently accessed items. Cache is a temporary data store, which will save the items in memory and will provide those from the memory itself instead of making a DynamoDB call. Make a note that this should be used for items, which you expect to not be changed frequently. Getting ready We will perform this recipe using Java libraries. So the prerequisite is that you should have performed recipes, which use the AWS SDK for Java. How to do it… Here, we will be using the AWS SDK for Java, so create a Maven project with the SDK dependency. Apart from the SDK, we will also be using one of the most widely used open source caches, that is, EhCache. To know about EhCache, refer to http://ehcache.org/. Let's use a standalone cache for frequently accessed items: To use EhCache, we need to include the following repository in pom.xml: <repositories> <repository> <id>sourceforge</id> <name>sourceforge</name> <url>https://oss.sonatype.org/content/repositories/ sourceforge-releases/</url> </repository> </repositories> We will also need to add the following dependency: <dependency> <groupId>net.sf.ehcache</groupId> <artifactId>ehcache</artifactId> <version>2.9.0</version> </dependency> Once the project setup is done, we will create a cachemanager class, which will be used in the following code: public class ProductCacheManager { // Ehcache cache manager CacheManager cacheManager = CacheManager.getInstance(); private Cache productCache; public Cache getProductCache() { return productCache; } //Create an instance of cache using cache manager public ProductCacheManager() { cacheManager.addCache("productCache"); this.productCache = cacheManager.getCache("productCache"); } public void shutdown() { cacheManager.shutdown(); } } Now, we will create another class where we will write a code to get the item from DynamoDB. Here, we will first initiate the ProductCacheManager: static ProductCacheManager cacheManager = new ProductCacheManager(); Next, we will write a method to get the item from DynamoDB. Before we fetch the data from DynamoDB, we will first check whether the item with the given key is available in cache. If it is available in cache, we will return it from cache itself. If the item is not found in cache, we will first fetch it from DynamoDB and immediately put it into cache. Once the item is cached, every time we need this item, we will get it from cache, unless the cached item is evicted: private static Item getItem(int id, String type) { Item product = null; if (cacheManager.getProductCache().isKeyInCache(id + ":" + type)) { Element prod = cacheManager.getProductCache().get(id + ":" + type); product = (Item) prod.getObjectValue(); System.out.println("Returning from Cache"); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); cacheManager.getProductCache().put( new Element(id + ":" + type, product)); System.out.println("Making DynamoDB Call for getting the item"); } return product; } Now we can use this method whenever needed. Here is how we can test it: Item product = getItem(10, "book"); System.out.println("First call :Item: " + product); Item product1 = getItem(10, "book"); System.out.println("Second call :Item: " + product1); cacheManager.shutdown(); How it works… EhCache is one of the most popular standalone caches used in the industry. Here, we are using EhCache to store frequently accessed items from the product table. Cache keeps all its data in memory. Here, we will save every item against its keys that are cached. We have the product table, which has the composite hash and range keys, so we will also store the items against the key of (Hash Key and Range Key). Note that caching should be used for only those tables that expect lesser updates. It should only be used for the table, which holds static data. If at all anyone uses cache for not so static tables, then you will get stale data. You can also go to the next level and implement a time-based cache, which holds the data for a certain time, and after that, it clears the cache. We can also implement algorithms, such as Least Recently Used (LRU), First In First Out (FIFO), to make the cache more efficient. Here, we will make comparatively lesser calls to DynamoDB, and ultimately, save some cost for ourselves. Using AWS ElastiCache for frequently accessed items In this recipe, we will do the same thing that we did in the previous recipe. The only thing we will change is that we will use a cloud hosted distributed caching solution instead of saving it on the local standalone cache. ElastiCache is a hosted caching solution provided by Amazon Web Services. We have two options to select which caching technology you would need. One option is Memcached and another option is Redis. Depending upon your requirements, you can decide which one to use. Here are links that will help you with more information on the two options: http://memcached.org/ http://redis.io/ Getting ready To get started with this recipe, we will need to have an ElastiCache cluster launched. If you are not aware of how to do it, you can refer to http://aws.amazon.com/elasticache/. How to do it… Here, I am using the Memcached cluster. You can choose the size of the instance as you wish. We will need a Memcached client to access the cluster. Amazon has provided a compiled version of the Memcached client, which can be downloaded from https://github.com/amazonwebservices/aws-elasticache-cluster-client-memcached-for-java. Once the JAR download is complete, you can add it to your Java Project class path: To start with, we will need to get the configuration endpoint of the Memcached cluster that we launched. This configuration endpoint can be found on the AWS ElastiCache console itself. Here is how we can save the configuration endpoint and port: static String configEndpoint = "my-elastic- cache.mlvymb.cfg.usw2.cache.amazonaws.com"; static Integer clusterPort = 11211; Similarly, we can instantiate the Memcached client: static MemcachedClient client; static { try { client = new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort)); } catch (IOException e) { e.printStackTrace(); } } Now, we can write the getItem method as we did for the previous recipe. Here, we will first check whether the item is present in cache; if not, we will fetch it from DynamoDB, and put it into cache. If the same request comes the next time, we will return it from the cache itself. While putting the item into cache, we are also going to put the expiry time of the item. We are going to set it to 3,600 seconds; that is, after 1 hour, the key entry will be deleted automatically: private static Item getItem(int id, String type) { Item product = null; if (null != client.get(id + ":" + type)) { System.out.println("Returning from Cache"); return (Item) client.get(id + ":" + type); } else { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("product"); product = table.getItem(new PrimaryKey("id", id, "type", type)); System.out.println("Making DynamoDB Call for getting the item"); ElasticCache.client.add(id + ":" + type, 3600, product); } return product; } How it works… A distributed cache also works in the same fashion as the local one works. A standalone cache keeps the data in memory and returns it if it finds the key. In distributed cache, we have multiple nodes; here, keys are kept in a distributed manner. The distributed nature helps you divide the keys based on the hash value of the keys. So, when any request comes, it is redirected to a specified node and the value is returned from there. Note that ElastiCache will help you provide a faster retrieval of items at the additional cost of the ElastiCache cluster. Also note that the preceding code will work if you execute the application from the EC2 instance only. If you try to execute this on the local machine, you will get connection errors. Compressing large data before storing it in DynamoDB We are all aware of DynamoDB's storage limitations for the item's size. Suppose that we get into a situation where storing large attributes in an item is a must. In that case, it's always a good choice to compress these attributes, and then save them in DynamoDB. In this recipe, we are going to see how to compress large items before storing them. Getting ready To get started with this recipe, you should have your workstation ready with Eclipse or any other IDE of your choice. How to do it… There are numerous algorithms with which we can compress the large items, for example, GZIP, LZO, BZ2, and so on. Each algorithm has a trade-off between the compression time and rate. So, it's your choice whether to go with a faster algorithm or with an algorithm, which provides a higher compression rate. Consider a scenario in our e-commerce website, where we need to save the product reviews written by various users. For this, we created a ProductReviews table, where we will save the reviewer's name, its detailed product review, and the time when the review was submitted. Here, there are chances that the product review messages can be large, and it would not be a good idea to store them as they are. So, it is important to understand how to compress these messages before storing them. Let's see how to compress large data: First of all, we will write a method that accepts the string input and returns the compressed byte buffer. Here, we are using the GZIP algorithm for compressions. Java has a built-in support, so we don't need to use any third-party library for this: private static ByteBuffer compressString(String input) throws UnsupportedEncodingException, IOException { // Write the input as GZIP output stream using UTF-8 encoding ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPOutputStream os = new GZIPOutputStream(baos); os.write(input.getBytes("UTF-8")); os.finish(); byte[] compressedBytes = baos.toByteArray(); // Writing bytes to byte buffer ByteBuffer buffer = ByteBuffer.allocate(compressedBytes.length); buffer.put(compressedBytes, 0, compressedBytes.length); buffer.position(0); return buffer; } Now, we can simply use this method to store the data before saving it in DynamoDB. Here is an example of how to use this method in our code: private static void putReviewItem() throws UnsupportedEncodingException, IOException { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("ProductReviews"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10)) .withString("reviewerName", "John White") .withString("dateTime", "20-06-2015T08:09:30") .withBinary("reviewMessage", compressString("My Review Message")); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } In a similar way, we can write a method that decompresses the data on retrieval from DynamoDB. Here is an example: private static String uncompressString(ByteBuffer input) throws IOException { byte[] bytes = input.array(); ByteArrayInputStream bais = new ByteArrayInputStream(bytes); ByteArrayOutputStream baos = new ByteArrayOutputStream(); GZIPInputStream is = new GZIPInputStream(bais); int chunkSize = 1024; byte[] buffer = new byte[chunkSize]; int length = 0; while ((length = is.read(buffer, 0, chunkSize)) != -1) { baos.write(buffer, 0, length); } return new String(baos.toByteArray(), "UTF-8"); } How it works… Compressing data at client side has numerous advantages. Lesser size means lesser use of network and disk resources. Compression algorithms generally maintain a dictionary of words. While compressing, if they see the words getting repeated, then those words are replaced by their positions in the dictionary. In this way, the redundant data is eliminated and only their references are kept in the compressed string. While uncompressing the same data, the word references are replaced with the actual words, and we get our normal string back. Various compression algorithms contain various compression techniques. Therefore, the compression algorithm you choose will depend on your need. Using AWS S3 for storing large items Sometimes, we might get into a situation where storing data in a compressed format might not be sufficient enough. Consider a case where we might need to store large images or binaries that might exceed the DynamoDB's storage limitation per items. In this case, we can use AWS S3 to store such items and only save the S3 location in our DynamoDB table. AWS S3: Simple Storage Service allows us to store data in a cheaper and efficient manner. To know more about AWS S3, you can visit http://aws.amazon.com/s3/. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Consider a case in our e-commerce website where we would like to store the product images along with the product data. So, we will save the images on AWS S3, and only store their locations along with the product information in the product table: First of all, we will see how to store data in AWS S3. For this, we need to go to the AWS console, and create an S3 bucket. Here, I created a bucket called e-commerce-product-images, and inside this bucket, I created folders to store the images. For example, /phone/apple/iphone6. Now, let's write the code to upload the images to S3: private static void uploadFileToS3() { String bucketName = "e-commerce-product-images"; String keyName = "phone/apple/iphone6/iphone.jpg"; String uploadFileName = "C:\tmp\iphone.jpg"; // Create an instance of S3 client AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider()); // Start the file uploading File file = new File(uploadFileName); s3client.putObject(new PutObjectRequest(bucketName, keyName, file)); } Once the file is uploaded, you can save its path in one of the attributes of the product table, as follows: private static void putItemWithS3Link() { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Map<String, String> features = new HashMap<String, String>(); features.put("camera", "13MP"); features.put("intMem", "16GB"); features.put("processor", "Dual-Core 1.4 GHz Cyclone (ARM v8-based)"); Set<String> imagesSet = new HashSet<String>(); imagesSet.add("https://s3-us-west-2.amazonaws.com/ e-commerce-product-images/phone/apple/iphone6/iphone.jpg"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 250, "type", "phone")) .withString("mnfr", "Apple").withNumber("stock", 15) .withString("name", "iPhone 6").withNumber("price", 45) .withMap("features", features) .withStringSet("productImages", imagesSet); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } So whenever required, we can fetch the item by its key, and fetch the actual images from S3 using the URL saved in the productImages attribute. How it works… AWS S3 provides storage services at very cheaper rates. It's like a flat data dumping ground where we can store any type of file. So, it's always a good option to store large datasets in S3 and only keep its URL references in DynamoDB attributes. The URL reference will be the connecting link between the DynamoDB item and the S3 file. If your file is too large to be sent in one S3 client call, you may want to explore its multipart API, which allows you to send the file in chunks. Catching DynamoDB errors Till now, we discussed how to perform various operations in DynamoDB. We saw how to use AWS provided by SDK and play around with DynamoDB items and attributes. Amazon claims that AWS provides high availability and reliability, which is quite true considering the years of experience I have been using their services, but we still cannot deny the possibility where services such as DynamoDB might not perform as expected. So, it's important to make sure that we have a proper error catching mechanism to ensure that the disaster recovery system is in place. In this recipe, we are going to see how to catch such errors. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Catching errors in DynamoDB is quite easy. Whenever we perform any operations, we need to put them in the try block. Along with it, we need to put a couple of catch blocks in order to catch the errors. Here, we will consider a simple operation to put an item into the DynamoDB table: try { AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); DynamoDB dynamoDB = new DynamoDB(client); Table table = dynamoDB.getTable("productTable"); Item product = new Item() .withPrimaryKey(new PrimaryKey("id", 10, "type", "mobile")) .withString("mnfr", "Samsung").withNumber("stock", 15) .withBoolean("isProductionStopped", true) .withNumber("price", 45); PutItemOutcome outcome = table.putItem(product); System.out.println(outcome.getPutItemResult()); } catch (AmazonServiceException ase) { System.out.println("Error Message: " + ase.getMessage()); System.out.println("HTTP Status Code: " + ase.getStatusCode()); System.out.println("AWS Error Code: " + ase.getErrorCode()); System.out.println("Error Type: " + ase.getErrorType()); System.out.println("Request ID: " + ase.getRequestId()); } catch (AmazonClientException e) { System.out.println("Amazon Client Exception :" + e.getMessage()); } We should first catch AmazonServiceException, which arrives if the service you are trying to access throws any exception. AmazonClientException should be put last in order to catch any client-related exceptions. How it works… Amazon assigns a unique request ID for each and every request that it receives. Keeping this request ID is very important if something goes wrong, and if you would like to know what happened, then this request ID is the only source of information. We need to contact Amazon to know more about the request ID. There are two types of errors in AWS: Client errors: These errors normally occur when the request we submit is incorrect. The client errors are normally shown with a status code starting with 4XX. These errors normally occur when there is an authentication failure, bad requests, missing required attributes, or for exceeding the provisioned throughput. These errors normally occur when users provide invalid inputs. Server errors: These errors occur when there is something wrong from Amazon's side and they occur at runtime. The only way to handle such errors is retries; and if it does not succeed, you should log the request ID, and then you can reach the Amazon support with that ID to know more about the details. You can read more about DynamoDB specific errors at http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ErrorHandling.html. Performing auto-retries on DynamoDB errors As mentioned in the previous recipe, we can perform auto-retries on DynamoDB requests if we get errors. In this recipe, we are going to see how to perform auto=retries. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Auto-retries are required if we get any errors during the first request. We can use the Amazon client configurations to set our retry strategy. By default, the DynamoDB client auto-retries a request if any error is generated three times. If we think that this is not efficient for us, then we can define this on our own, as follows: First of all, we need to create a custom implementation of RetryCondition. It contains a method called shouldRetry, which we need to implement as per our needs. Here is a sample CustomRetryCondition class: public class CustomRetryCondition implements RetryCondition { public boolean shouldRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 3 && exception.isRetryable()) { return true; } else { return false; } } } Similarly, we can implement CustomBackoffStrategy. The back-off strategy gives a hint on after what time the request should be retried. You can choose either a flat back-off time or an exponential back-off time: public class CustomBackoffStrategy implements BackoffStrategy { /** Base sleep time (milliseconds) **/ private static final int SCALE_FACTOR = 25; /** Maximum exponential back-off time before retrying a request */ private static final int MAX_BACKOFF_IN_MILLISECONDS = 20 * 1000; public long delayBeforeNextRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) { if (retriesAttempted < 0) return 0; long delay = (1 << retriesAttempted) * SCALE_FACTOR; delay = Math.min(delay, MAX_BACKOFF_IN_MILLISECONDS); return delay; } } Next, we need to create an instance of RetryPolicy, and set the RetryCondition and BackoffStrategy classes, which we created. Apart from this, we can also set a maximum number of retries. The last parameter is honorMaxErrorRetryInClientConfig. It means whether this retry policy should honor the maximum error retry set by ClientConfiguration.setMaxErrorRetry(int): RetryPolicy retryPolicy = new RetryPolicy(customRetryCondition, customBackoffStrategy, 3, false); Now, initiate the ClientConfiguration, and set the RetryPolicy we created earlier: ClientConfiguration clientConfiguration = new ClientConfiguration(); clientConfiguration.setRetryPolicy(retryPolicy); Now, we need to set this client configuration when we initiate the AmazonDynamoDBClient; and once done, your retry policy with a custom back-off strategy will be in place: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider(), clientConfiguration); How it works… Auto-retries are quite handy when we receive a sudden burst in DynamoDB requests. If there are more number of requests than the provisioned throughputs, then auto-retries with an exponential back-off strategy will definitely help in handling the load. So if the client gets an exception, then it will get auto retried after sometime; and if by then the load is less, then there wouldn't be any loss for your application. The Amazon DynamoDB client internally uses HttpClient to make the calls, which is quite a popular and reliable implementation. So if you need to handle such cases, this kind of an implementation is a must. In case of batch operations, if any failure occurs, DynamoDB does not fail the complete operation. In case of batch write operations, if a particular operation fails, then DynamoDB returns the unprocessed items, which can be retried. Performing atomic transactions on DynamoDB tables I hope we are all aware that operations in DynamoDB are eventually consistent. Considering this nature it obviously does not support transactions the way we do in RDBMS. A transaction is a group of operations that need to be performed in one go, and they should be handled in an atomic nature. (If one operation fails, the complete transaction should be rolled back.) There might be use cases where you would need to perform transactions in your application. Considering this need, AWS has provided open sources, client-side transaction libraries, which helps us achieve atomic transactions in DynamoDB. In this recipe, we are going to see how to perform transactions on DynamoDB. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… To get started, we will first need to download the source code of the library from GitHub and build the code to generate the JAR file. You can download the code from https://github.com/awslabs/dynamodb-transactions/archive/master.zip. Next, extract the code and run the following command to generate the JAR file: mvn clean install –DskipTests On a successful build, you will see a JAR generated file in the target folder. Add this JAR to the project by choosing a configure build path in Eclipse: Now, let's understand how to use transactions. For this, we need to create the DynamoDB client and help this client to create two helper tables. The first table would be the Transactions table to store the transactions, while the second table would be the TransactionImages table to keep the snapshots of the items modified in the transaction: AmazonDynamoDBClient client = new AmazonDynamoDBClient( new ProfileCredentialsProvider()); client.setRegion(Region.getRegion(Regions.US_EAST_1)); // Create transaction table TransactionManager.verifyOrCreateTransactionTable(client, "Transactions", 10, 10, (long) (10 * 60)); // Create transaction images table TransactionManager.verifyOrCreateTransactionImagesTable(client, "TransactionImages", 10, 10, (long) (60 * 10)); Next, we need to create a transaction manager by providing the names of the tables we created earlier: TransactionManager txManager = new TransactionManager(client, "Transactions", "TransactionImages"); Now, we create one transaction, and perform the operations you will need to do in one go. Consider our product table where we need to add two new products in one single transaction, and the changes will reflect only if both the operations are successful. We can perform these using transactions, as follows: Transaction t1 = txManager.newTransaction(); Map<String, AttributeValue> product = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("250"); product.put("id", id); product.put("type", new AttributeValue("phone")); product.put("name", new AttributeValue("MI4")); t1.putItem(new PutItemRequest("productTable", product)); Map<String, AttributeValue> product1 = new HashMap<String, AttributeValue>(); id.setN("350"); product1.put("id", id); product1.put("type", new AttributeValue("phone")); product1.put("name", new AttributeValue("MI3")); t1.putItem(new PutItemRequest("productTable", product1)); t1.commit(); Now, execute the code to see the results. If everything goes fine, you will see two new entries in the product table. In case of an error, none of the entries would be in the table. How it works… The transaction library when invoked, first writes the changes to the Transaction table, and then to the actual table. If we perform any update item operation, then it keeps the old values of that item in the TransactionImages table. It also supports multi-attribute and multi-table transactions. This way, we can use the transaction library and perform atomic writes. It also supports isolated reads. You can refer to the code and examples for more details at https://github.com/awslabs/dynamodb-transactions. Performing asynchronous requests to DynamoDB Till now, we have used a synchronous DynamoDB client to make requests to DynamoDB. Synchronous requests block the thread unless the operation is not performed. Due to network issues, sometimes, it can be difficult for the operation to get completed quickly. In that case, we can go for asynchronous client requests so that we submit the requests and do some other work. Getting ready To get started with this recipe, you should have your workstation ready with the Eclipse IDE. How to do it… Asynchronous client is easy to use: First, we need to the AmazonDynamoDBAsync class: AmazonDynamoDBAsync dynamoDBAsync = new AmazonDynamoDBAsyncClient( new ProfileCredentialsProvider()); Next, we need to create the request to be performed in an asynchronous manner. Let's say we need to delete a certain item from our product table. Then, we can create the DeleteItemRequest, as shown in the following code snippet: Map<String, AttributeValue> key = new HashMap<String, AttributeValue>(); AttributeValue id = new AttributeValue(); id.setN("10"); key.put("id", id); key.put("type", new AttributeValue("phone")); DeleteItemRequest deleteItemRequest = new DeleteItemRequest( "productTable", key); Next, invoke the deleteItemAsync method to delete the item. Here, we can optionally define AsyncHandler if we want to use the result of the request we had invoked. Here, I am also printing the messages with time so that we can confirm its asynchronous nature: dynamoDBAsync.deleteItemAsync(deleteItemRequest, new AsyncHandler<DeleteItemRequest, DeleteItemResult>() { public void onSuccess(DeleteItemRequest request, DeleteItemResult result) { System.out.println("Item deleted successfully: "+ System.currentTimeMillis()); } public void onError(Exception exception) { System.out.println("Error deleting item in async way"); } }); System.out.println("Delete item initiated" + System.currentTimeMillis()); How it works Asynchronous clients use AsyncHttpClient to invoke the DynamoDB APIs. This is a wrapper implementation on top of Java asynchronous APIs. Hence, they are quite easy to use and understand. The AsyncHandler is an optional configuration you can do in order to use the results of asynchronous calls. We can also use the Java Future object to handle the response. Summary We have covered various recipes on cost and performance efficient use of DynamoDB. Recipes like error handling and auto retries helps readers in make their application robust. It also highlights use of transaction library in order to implement atomic transaction on DynamoDB. Resources for Article: Further resources on this subject: The EMR Architecture[article] Amazon DynamoDB - Modelling relationships, Error handling[article] Index, Item Sharding, and Projection in DynamoDB [article]
Read more
  • 0
  • 0
  • 23026

article-image-was-2019-the-year-the-world-caught-the-kubernetes-fever
Guest Contributor
17 Dec 2019
8 min read
Save for later

Was 2019 the year the world caught the Kubernetes fever?

Guest Contributor
17 Dec 2019
8 min read
In the current IT landscape, phrases such as “containerized applications” and “container deployment” are thrown around so often, that the meanings and connotations behind them often get tampered, and ultimately forgotten. In the case of Kubernetes, however, the opposite seems to be coming true. Although it might seem hyperbolic to refer to the modern interaction with software management as being heavily influenced by the “Age of Kubernetes”-  the accelerating growth of Kubernetes as one of the most widely adopted open-source project, with over 2300 active contributors to Kubernetes’s repository on GitHub bears witness to the massive influence that the orchestration platform has had. Originally developed by Google, and launched in 2014- Kubernetes has come a really long way since it’s advent. Although there are other similar container orchestration platforms available on the market, the most notable ones being Docker Swarm and Apache Mesos; Kubernetes has established itself as the de-facto orchestration platform in use today. Having said that, as a quick Google search might reveal- with a whopping 26,400,000 results- Kubernetes has risen to the top of the totem pole over the course of the year. However, before we can get into rationalizing the reasons that drive the world’s obsession with the container orchestration platform, we’d like to provide our readers with a quick snapshot of everything Kubernetes is and everything that it is not. Kubernetes: A Brief Overview The transition from the traditional deployment era, where organizations used to rely on applications being run on physical servers to the virtual deployment era, in which the highly popular concept of virtualization was introduced- to the container deployment era, which saw the employment of  ‘containers’ that are significantly lighter in weight, as compared to virtual machines (VMs)- these changes ultimately led to the creation of a container orchestration market, which is a huge contributing factor to the growing popularity of Kubernetes and other similar platforms. Having said that, however, as we’ve already mentioned above- the features that Kubernetes offers to organizations enable it to have a certain edge over its competition. Originally developed by Google in 2014, having descended from an old-school container orchestration platform called ‘Borg,’ Kubernetes is an open-source container orchestration platform that reduces the workload for both large and small companies, by automating the deployment, scaling and management of containerized applications. Bearing witness to the effectiveness and reliability of the container orchestration application is the fact that it is imbursed by gigantic digital entities such as Google, Microsoft, Cisco, Intel, and Red Hat. Furthermore, on their website, Kubernetes cites several testimonials from colossal corporations such as Spotify, Nav, Capital One, Comcast- which further goes on to demonstrate the reliability of the benefits offered by the container orchestration platform. What functions does Kubernetes perform? Taking into consideration the fact that most organizations, regardless of how large or small they might be, are deploying hundreds and thousands of containerized instances daily- the complexity of the situation requires platforms such as Kubernetes to step in and help organizations manage and automate containerized processes while taking into account the context of the microservice architecture as well. Kubernetes aids development teams by deploying applications and helping in the management of the containerized applications by performing the following functions: Deployment: Perhaps the most significant function that Kubernetes performs includes the deployment of a specified number of containers to a host, along with ensuring that the containers are functioning as they are supposed to, that is, without any malfunctions, etc. Rollouts: A rollout refers to a change in the original deployment of a container. Kubernetes allows development teams to take the management of their containerized tasks to the next level, by automating the initiation of the container deployment, along with offering them the option of pausing, resuming or rolling back any rollouts. Discovery of service: Kubernetes automates the exposure of a specified container to the internet, or to other containers, by allotting to containers a DNS name or an IP address. Since the increasing threats and risks of cyber-attacks, it has become essential to protect your IP address. To do so use a VPN as it not only hides the IP address but also provides protection against IP spoofing. Managing storage: A monumental advantage that Kubernetes offers organization is the liberty to allocate persistent local or cloud storage to specified containers as needed. Load scaling and balancing: Kubernetes allows for organizations to maintain stability across the network by automatically load balancing and scaling in the instance that traffic to a certain container increases. Self-healing: A feature unique to Kubernetes, the widely popular container orchestration platform seeks to improve the availability on the network through restarting or replacing a failed container. Moreover, Kubernetes can also automate the removal of containers that appear to be damaged, or fail to meet the health-check requirements. Are there any limitations to Kubernetes’s power? Up till now, we’ve done nothing but present facts regarding Kubernetes. Often times, however, organizations tend to overlook the limitations of an effective management tool. Despite the numerous advantages that organizations get to reap with the integration of Kubernetes, the fact that Kubernetes is not a traditional software and functions on a container level, rather than at the hardware-level should always be kept in mind. In order to make the most effective use of the container orchestration platform, it is essential that companies take into account the limitations of Kubernetes- which consist of the following: Kubernetes does not build applications, neither does it deploy source code. Kubernetes is not responsible for providing organizations with services centric to applications. Examples of these application-level services include middleware (message buses) and other data-processing frameworks such as Spark, caches, amongst many others. Kubernetes does not offer to organizations logging, monitoring, and alerting solutions, instead it provides integrations and mechanisms which then enable organizations to collect and export metrics. In addition to these limitations, it should also be mentioned that despite the constant referral of Kubernetes as an orchestration tool- it is not just that. Instead of simply orchestrating or managing the containerized applications by propagating a defined workflow, Kubernetes eliminates the need for orchestration altogether and consists of components that constantly drive the current state of the network into providing the desired result to the organization. Furthermore, Kubernetes also gives rise to a system without any centralized control, which makes it much more easier to use. Explaining Kubernetes’s popularity Now that we’ve hopefully jogged up our reader’s memories by providing them with a rundown of everything Kubernetes- let’s get down to business. Taking into consideration the ever-increasing growth and popularity of the container orchestration platform, particularly it’s a spike in 2019- readers might be left wondering with the question; “Why is Kubernetes so popular?” Well, the short explanation behind Kubernetes’s popularity is simple- it’s highly effective. The longer explanation, on the other hand, however, can be broken down into the following main reasons: Kubernetes saves time: In the digital age, time is more crucial than ever. As more and more organizations get digitized, time plays a monumental role in routine operations, especially where development teams are concerned. The staggering popularity of Kubernetes is deeply rooted in how time-effective, a platform is since it allows organizations to effectively handle all facets of container orchestration without having to fill out forms or send emails to request new machines to run applications. 2. Kubernetes is highly cost-effective: For most enterprises, the driving force behind their operations is the knowledge that their business goal is being fulfilled. Kubernetes can actually contribute to that since it allows for organizations to partake in better resource utilization. As we’ve already mentioned above, Kubernetes is a much more improved alternative to VMs, since it focuses solely on containers, which are light-weight, and thus require less CPU and memory resources. 3. Kubernetes can run on the cloud, as well as on-premise: An unprecedented, but widely welcomed feature that Kubernetes offers is that it is cloud-agnostic. The term ‘cloud-agnostic’ implies that Kubernetes can run on cloud-based services, as well as on-premise. This offers organizations with the luxury of not having to redesign or alter their infrastructure or applications to accommodate Kubernetes. Additionally, companies are also providing software that helps organizations manage the running of Kubernetes, whether it is on a cloud-based server or on-premise. Final Words We hope that we’ve made it clear what Kubernetes does, and the reasons that led to its rise in popularity. Having said that, however, it is still equally important that organizations take into consideration the limitations of the container orchestration system, and integrate it within their companies smartly- which ultimately enables organizations to leverage better benefits! Author Bio Rebecca James is an enthusiastic cybersecurity journalist. A creative team leader, editor of PrivacyCrypts. DevOps mistakes which developers should avoid! Chaos engineering comes to Kubernetes thanks to Gremlin Understanding the role AIOps plays in the present-day IT environment
Read more
  • 0
  • 0
  • 23014

article-image-amds-293-million-jv-with-chinese-chipmaker-hygon-starts-production-of-x86-cpus
Natasha Mathur
10 Jul 2018
3 min read
Save for later

AMD’s $293 million JV with Chinese chipmaker Hygon starts production of x86 CPUs

Natasha Mathur
10 Jul 2018
3 min read
Chinese chip producer Hygon begins production of China-made x86 processors named “Dhyana”. These processors use AMD’s Zen microarchitecture and are the result of the licensing deal between AMD and its Chinese partners. Hygon has started shipping the new “Dhyana” x86 CPUs. According to the official statements made by AMD, it does not permit the selling of the final chip designs to its partners in China instead it encourages their partners to design their own processors that suit the needs of the Chinese server market. This is an effort to break China’s dependency on the foreign technology market. In 2016, AMD announced that it is working on a joint project in China to develop processors. This provided AMD with a $293 million in cash by Tianjin Haiguang Advanced Technology Investment Co. (THATIC). THATIC includes AMD as well as the Chinese Academy of Sciences. What’s interesting is that AMD managed to establish a license allowing Chinese processor manufacturers to develop and sell x86 processors despite the fact that Intel was blocked from selling Xeon processors to China in 2015 by the Obama administration. This happened over concerns that the chips would help China’s nuclear weapon programs. Dhyana processors are focusing on embedded applications currently. It is a System on chip ( SoC) instead of a socketed chip. But this design doesn’t limit the Dhyana processors from being used in high-performance or data center applications, which usually leverages Intel Xeon and other server processors. Also, Linux kernels developers have stated that the x86 processors are very close in design to that of AMD’s EPYC. In fact, when moving the Linux kernel code for EPYC processors to the Hygon chips, it required fewer than 200 new lines of code, according to a report from Michael Larabel of Phoronix. The only difference between the two is the vendor IDs and family series. Apart from AMD, there are other chip-producing ventures that China is heavily engrossed in. One such venture is Zhaoxin Semiconductor that is working to manufacture x86 chips through a partnership with VIA. China is making continuous efforts to free the country from US interventions and to change their long-term processor market. There are implications that most of the x86 processors are 14nm chips, but there isn’t much information available on the capabilities of the Dhyana family. Also, other details of their manufacturing characteristics are still to be known. Baidu releases Kunlun AI chip, China’s first cloud-to-edge AI chip Qualcomm announces a new chipset for standalone AR/VR headsets at Augmented World Expo
Read more
  • 0
  • 0
  • 22977
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-perform-regression-analysis-using-sas
Gebin George
27 Feb 2018
7 min read
Save for later

How to perform regression analysis using SAS

Gebin George
27 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Big Data Analysis with SAS written by David Pope. This book will help you leverage the power of SAS for data management, analysis and reporting. It contains practical use-cases and real-world examples on predictive modelling, forecasting, optimizing, and reporting your Big Data analysis using SAS.[/box] Today, we will perform regression analysis using SAS in a step-by-step manner with a practical use-case. Regression analysis is one of the earliest predictive techniques most people learn because it can be applied across a wide variety of problems dealing with data that is related in linear and non-linear ways. Linear data is one of the easier use cases, and as such PROC REG is a well-known and often-used procedure to help predict likely outcomes before they happen. The REG procedure provides extensive capabilities for fitting linear regression models that involve individual numeric independent variables. Many other procedures can also fit regression models, but they focus on more specialized forms of regression, such as robust regression, generalized linear regression, nonlinear regression, nonparametric regression, quantile regression, regression modeling of survey data, regression modeling of survival data, and regression modeling of transformed variables. The SAS/STAT procedures that can fit regression models include the ADAPTIVEREG, CATMOD, GAM, GENMOD, GLIMMIX, GLM, GLMSELECT, LIFEREG, LOESS, LOGISTIC, MIXED, NLIN, NLMIXED, ORTHOREG, PHREG, PLS, PROBIT, QUANTREG, QUANTSELECT, REG, ROBUSTREG, RSREG, SURVEYLOGISTIC, SURVEYPHREG, SURVEYREG, TPSPLINE, and TRANSREG procedures. Several procedures in SAS/ETS software also fit regression models. SAS/STAT14.2 / SAS/STAT User's Guide - Introduction to Regression Procedures - Overview: Regression Procedures (http://documentation.sas.com/?cdcId=statcdccdcVersion=14.2 docsetId=statugdocsetTarget=statug_introreg_sect001.htmlocale=enshowBanner=yes). Regression analysis attempts to model the relationship between a response or output variable and a set of input variables. The response is considered the target variable or the variable that one is trying to predict, while the rest of the input variables make up parameters used as input into the algorithm. They are used to derive the predicted value for the response variable. PROC REG One of the easiest ways to determine if regression analysis is applicable to helping you answer a question is if the type of question being asked has only two answers. For example, should a bank lend an applicant money? Yes or no? This is known as a binary response, and as such, regression analysis can be applied to help determine the answer. In the following example, the reader will use the SASHELP.BASEBALL dataset to create a regression model to predict the value of a baseball player's salary. The SASHELP.BASEBALL dataset contains salary and performance information for Major League. Baseball players who played at least one game in both the 1986 and 1987 seasons, excluding pitchers. The salaries (Sports Illustrated, April 20, 1987) are for the 1987 season and the performance measures are from 1986 (Collier Books, The 1987 Baseball Encyclopedia Update). SAS/STAT® 14.2 / SAS/STAT User's Guide - Example 99: Modeling Salaries of Major League Baseball Players (http://documentation.sas.com/ ?cdcId= statcdc cdcVersion= 14.2 docsetId=statugdocsetTarget= statug_ reg_ examples01.htmlocale= en showBanner= yes). Let's first use PROC UNIVARIATE to learn something about this baseball data by submitting the following code: proc univariate data=sashelp.baseball; quit; While reviewing the results of the output, the reader will notice that the variance associated with logSalary, 0.79066, is much less than the variance associated with the actual target variable Salary, 203508. In this case, it makes better sense to attempt to predict the logSalary value of a player instead of Salary. Write the following code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball; id name team league; model logSalary = nAtBat nHits nHome nRuns nRBI YrMajor CrAtBat CrHits CrHome CrRuns CrRbi; Quit; Notice that there are 59 observations as specified in the first output table with at least one of the input variables with missing values; as such those are not used in the development of the regression model. The Root Mean Squared Error (RMSE) and R-square are statistics that typically inform the analyst how good the model is in predicting the target. These range from 0 to 1.0 with higher values typically indicating a better model. The higher the Rsquared values typically indicate a better performing model but sometimes conditions or the data used to train the model over-fit and don't represent the true value of the prediction power of that particular model. Over-fitting can happen when an analyst doesn't have enough real-life data and chooses data or a sample of data that over-presents the target event, and therefore it will produce a poor performing model when using real-world data as input. Since several of the input values appear to have little predictive power on the target, an analyst may decide to drop these variables, thereby reducing the need for that information to make a decent prediction. In this case, it appears we only need to use four input variables. YrMajor, nHits, nRuns, and nAtBat. Modify the code as follows and submit it again: proc reg data=sashelp.baseball; id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; The p-value associated with each of the input variables provides the analyst with an insight into which variables have the biggest impact on helping to predict the target variable. In this case, the smaller the value, the higher the predictive value of the input variable. Both the RMSE and R-square values for this second model are slightly lower than the original. However, the adjusted R-square value is slightly higher. In this case, an analyst may chose to use the second model since it requires much less data and provides basically the same predictive power. Prior to accepting any model, an analyst should determine whether there are a few observations that may be over-influencing the results by investigating the influence and fit diagnostics. The default output from PROC REG provides this type of visual insight: The top-right corner plot, showing the externally studentized residuals (RStudent) by leverage values, shows that there are a few observations with high leverage that may be overly influencing the fit produced. In order to investigate this further, we will add a plots statement to our PROC REG to produce a labeled version of this plot. Type the following code in a SAS Studio program section and submit: proc reg data=sashelp.baseball plots(only label)=(RStudentByLeverage); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; Sure enough, there are three to five individuals whose input variables may have excessive influence on fitting this model. Let's remove those points and see if the model improves. Type this code in a SAS Studio program section and submit it: proc reg data=sashelp.baseball plots=(residuals(smooth)); where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); id name team league; model logSalary = YrMajor nHits nRuns nAtBat; Quit; This change, in itself, has not improved the model but actually made the model worse as can be seen by the R-square, 0.5592. However, the plots residuals(smooth) option gives some insights as it pertains to YrMajor; players at the beginning and the end of their careers tend to be paid less compared to others, as can be seen in Figure 4.12: In order to address this lack of fit, an analyst can use polynomials of degree two for this variable, YrMajor. Type the following code in a SAS Studio program section and submit it: data work.baseball; set sashelp.baseball; where name NOT IN ("Mattingly, Don", "Henderson, Rickey", "Boggs, Wade", "Davis, Eric", "Rose, Pete"); YrMajor2 = YrMajor*YrMajor; run; proc reg data=work.baseball; id name team league; model logSalary = YrMajor YrMajor2 nHits nRuns nAtBat; Quit; After removing some outliers and adjusting for the YrMajor variable, the model's predictive power has improved significantly as can be seen in the much improved R-square value of 0.7149. We saw an effective way of performing regression analysis using SAS platform. If you found our post useful, do check out this book Big Data Analysis with SAS to understand other data analysis models and perform them practically using SAS.    
Read more
  • 0
  • 0
  • 22959

article-image-7-things-java-programmers-need-to-watch-for-in-2019
Prasad Ramesh
24 Jan 2019
7 min read
Save for later

7 things Java programmers need to watch for in 2019

Prasad Ramesh
24 Jan 2019
7 min read
Java is one of the most popular and widely used programming languages in the world. Its dominance of the TIOBE index ranking is unmatched for the most part, holding the number 1 position for almost 20 years. Although Java’s dominance is unlikely to waver over the next 12 months, there are many important issues and announcements that will demand the attention of Java developers. So, get ready for 2019 with this list of key things in the Java world to watch out for. #1 Commercial Java SE users will now need a license Perhaps the most important change for Java in 2019 is that commercial users will have to pay a license fee to use Java SE from February. This move comes in as Oracle decided to change the support model for the Java language. This change currently affects Java SE 8 which is an LTS release with premier and extended support up to March 2022 and 2025 respectively. For individual users, however, the support and updates will continue till December 2020. The recently released Java SE 11 will also have long term support with five and extended eight-year support from the release date. #2 The Java 12 release in March 2019 Since Oracle changed their support model, non-LTS version releases will be bi-yearly and probably won’t contain many major changes. JDK 12 is non-LTS, that is not to say that the changes in it are trivial, it comes with its own set of new features. It will be generally available in March this year and supported until September which is when Java 13 will be released. Java 12 will have a couple of new features, some of them are approved to ship in its March release and some are under discussion. #3 Java 13 release slated for September 2019, with early access out now So far, there is very little information about Java 13. All we really know at the moment is that it’s’ due to be released in September 2019. Like Java 12, Java 13 will be a non-LTS release. However, if you want an early insight, there is an early access build available to test right now. Some of the JEP (JDK Enhancement Proposals) in the next section may be set to be featured in Java 13, but that’s just speculation. https://twitter.com/OpenJDK/status/1082200155854639104 #4 A bunch of new features in Java in 2019 Even though the major long term support version of Java, Java 11, was released last year, releases this year also have some new noteworthy features in store. Let’s take a look at what the two releases this year might have. Confirmed candidates for Java 12 A new low pause time compiler called Shenandoah is added to cause minimal interruption when a program is running. It is added to match modern computing resources. The pause time will be the same irrespective of the heap size which is achieved by reducing GC pause times. The Microbenchmark Suite feature will make it easier for developers to run existing testing benchmarks or create new ones. Revamped switch statements should help simplify the process of writing code. It essentially means the switch statement can also be used as an expression. The JVM Constants API will, the OpenJDK website explains, “introduce a new API to model nominal descriptions of key class-file and run-time artifacts”. Integrated with Java 12 is one AArch64 port, instead of two. Default CDS Archives. G1 mixed collections. Other features that may not be out with Java 12 Raw string literals will be added to Java. A Packaging Tool, designed to make it easier to install and run a self-contained Java application on a native platform. Limit Speculative Execution to help both developers and operations engineers more effectively secure applications against speculative-execution vulnerabilities. #5 More contributions and features with OpenJDK OpenJDK is an open source implementation of Java standard edition (Java SE) which has contributions from both Oracle and the open-source community. As of now, the binaries of OpenJDK are available for the newest LTS release, Java 11. Even the life cycles of OpenJDK 7 and 8 have been extended to June 2020 and 2023 respectively. This suggests that Oracle does seem to be interested in the idea of open source and community participation. And why would it not be? Many valuable contributions come from the open source community. Microsoft seems to have benefitted from open sourcing with the incoming submissions. Although Oracle will not support these versions after six months from initial release, Red Hat will be extending support. As the chief architect of the Java platform, Mark Reinhold said stewards are the true leaders who can shape what Java should be as a language. These stewards can propose new JEPs, bring new OpenJDK problems to notice leading to more JEPs and contribute to the language overall. #6 Mobile and machine learning job opportunities In the mobile ecosystem, especially Android, Java is still the most widely used language. Yes, there’s Kotlin, but it is still relatively new. Many developers are yet to adopt the new language. According to an estimated by Indeed, the average salary of a Java developer is about $100K in the U.S. With the Android ecosystem growing rapidly over the last decade, it’s not hard to see what’s driving Java’s value. But Java - and the broader Java ecosystem - are about much more than mobile. Although Java’s importance in enterprise application development is well known, it's also used in machine learning and artificial intelligence. Even if Python is arguably the most used language in this area, Java does have its own set of libraries and is used a lot in enterprise environments. Deeplearning4j, Neuroph, Weka, OpenNLP, RapidMiner, RL4J etc are some of the popular Java libraries in artificial intelligence. #7 Java conferences in 2019 Now that we’ve talked about the language, possible releases and new features let’s take a look at the conferences that are going to take place in 2019. Conferences are a good medium to hear top professionals present, speak, and programmers to socialize. Even if you can’t attend, they are important fixtures in the calendar for anyone interested in following releases and debates in Java. Here are some of the major Java conferences in 2019 worth checking out: JAX is a Java architecture and software innovation conference. To be held in Mainz, Germany happening May 6–10 this year, the Expo is from May 7 to 9. Other than Java, topics like agile, Cloud, Kubernetes, DevOps, microservices and machine learning are also a part of this event. They’re offering discounts on passes till February 14. JBCNConf is happening in Barcelona, Spain from May 27. It will be a three-day conference with talks from notable Java champions. The focus of the conference is on Java, JVM, and open-source technologies. Jfokus is a developer-centric conference taking place in Stockholm, Sweden. It will be a three-day event from February 4-6. Speakers include the Java language architect, Brian Goetz from Oracle and many other notable experts. The conference will include Java, of course, Frontend & Web, cloud and DevOps, IoT and AI, and future trends. One of the biggest conferences is JavaZone attracting thousands of visitors and hundreds of speakers will be 18 years old this year. Usually held in Oslo, Norway in the month of September. Their website for 2019 is not active at the time of writing, you can check out last year’s website. Javaland will feature lectures, training, and community activities. Held in Bruehl, Germany from March 19 to 21 attendees can also exhibit at this conference. If you’re working in or around Java this year, there’s clearly a lot to look forward to - as well as a few unanswered questions about the evolution of the language in the future. While these changes might not impact the way you work in the immediate term, keeping on top of what’s happening and what key figures are saying will set you up nicely for the future. 4 key findings from The State of JavaScript 2018 developer survey Netflix adopts Spring Boot as its core Java framework Java 11 is here with TLS 1.3, Unicode 11, and more updates
Read more
  • 0
  • 0
  • 22949

article-image-raspberry-pi-4-is-up-for-sale-at-35-with-64-bit-arm-core-up-to-4gb-memory-full-throughput-gigabit-ethernet-and-more
Vincy Davis
24 Jun 2019
5 min read
Save for later

Raspberry Pi 4 is up for sale at $35, with 64-bit ARM core, up to 4GB memory, full-throughput gigabit Ethernet and more!

Vincy Davis
24 Jun 2019
5 min read
Today, the Raspberry Pi 4 model is up for sale, starting at $35. It has a 1.5GHz quad-core 64-bit ARM Cortex-A72 CPU, three memory options of up to 4GB, full-throughput gigabit Ethernet, Dual-band 802.11ac wireless networking, two USB 3.0 and two USB 2.0 ports, a complete compatibility with earlier Raspberry Pi products and more. Eben Upton, Chief Executive at Raspberry Pi Trading has said that “This is a comprehensive upgrade, touching almost every element of the platform.” This is the first Raspberry Pi product available offline, since the opening of their store in Cambridge, UK.   https://youtu.be/sajBySPeYH0   What’s new in Raspberry Pi 4? New Raspberry Pi silicon Previous Raspberry Pi models are based on 40nm silicon. However, the new Raspberry Pi 4 is a complete re-implementation of BCM283X on 28nm. The power saving delivered by the smaller process geometry has enabled the use of Cortex-A72 core, which has a 1.5GHz quad-core 64-bit ARM. The Cortex-A72 core can execute more instructions per clock, yielding four times performance improvement, over Raspberry Pi 3B+, depending on the benchmark. New Raspbian software The new Raspbian software provides numerous technical improvements, along with an extensively modernized user interface, and updated applications including the Chromium 74 web browser. For Raspberry Pi 4, the Raspberry team has retired the legacy graphics driver stack used on previous models and opted for the Mesa “V3D” driver. It offers benefits like OpenGL-accelerated web browsing and desktop composition, and also eliminates roughly half of the lines of closed-source code in the platform. Raspberry Pi 4 memory options For the first time, Raspberry Pi 4 is offering a choice of memory capacities, as shown below: All three variants of the new Raspberry Pi model have been launched. The entry-level Raspberry Pi 4 Model B is priced at 35$, excluding sales tax, import duty, and shipping. Additional improvements in Raspberry Pi 4 Power Raspberry Pi 4 has USB-C as the power connector, which will support an extra 500mA of current, ensuring 1.2A for downstream USB devices, even under heavy CPU load. Video The previous type-A HDMI connector has been replaced with a pair of type-D HDMI connectors, so as to accommodate dual display output within the existing board footprint. Ethernet and USB The Gigabit Ethernet magjack has been moved to the top right of the board, hence simplifying the PCB routing. The 4-pin Power-over-Ethernet (PoE) connector is in the same location, thus Raspberry Pi 4 remains compatible with the PoE HAT. The Ethernet controller on the main SoC is connected to an external Broadcom PHY, thus providing full throughput. USB is provided via an external VLI controller, connected over a single PCI Express Gen 2 lane, and providing a total of 4Gbps of bandwidth, shared between the four ports. The Raspberry Pi 4 model has the LPDDR4 memory technology, with triple bandwidth. It has also upgraded the video decode, 3D graphics, and display output to support 4Kp60 throughput. Onboard Gigabit Ethernet and PCI Express controllers have been added to address the non-multimedia I/O limitations of the previous devices. Image Source: Raspberry Pi blog New Raspberry Pi 4 accessories Due to the connector and form-factor changes, Raspberry Pi 4 has the requirement of new accessories. The Raspberry Pi 4 has its own case, priced at $5. It also has developed a suitable 5V/3A power supply, which is priced at $8 and is available in the UK, European, North American and Australian plug formats. The Raspberry Pi 4 Desktop Kit is also available and priced at $120. While the earlier Raspberry Pi models will be available in the market, Upton has mentioned that Raspberry Pi will continue to build these models as long as there's a demand for them. Users are quite ecstatic with the availability of Raspberry Pi 4 and many have already placed orders for it. https://twitter.com/Morphy99/status/1143103131821252609 https://twitter.com/M0VGA/status/1143064771446677509 A user on Reddit comments, “Very nice. Gigabit LAN and 4GB memory is opening it up to a hell of a lot more use cases. I've been tempted by some of the Pi's higher-specced competitors like the Pine64, but didn't want to lose out on the huge community behind the Pi. This seems like the best of both worlds to me.” A user on Hacker News says that “Oh my! This is such a crazy upgrade. I've been using the RPI2 as my HTPC/NAS at my folks, and I'm so happy with it. I was itching to get the last one for myself. USB 3.0! Gigabit Ethernet! WiFi 802.11ac, BT 5.0, 4GB RAM! 4K! $55 at most?! What the!? How the??! I know I'm not maintaining decorum at Hacker News, but I am SO mighty, MIGHTY excited! I'm setting up a VPN to hook this (when I get it) to my VPS and then do a LOT of fun stuff back and forth, remotely, and with the other RPI at my folks.” Another comment reads “This is absolutely great. The RPi was already exceptional for its price point, and this version seems to address the few problems it had (lack of Gigabit, USB speed and RAM capacity) and add onto it even more features. It almost seems too good to be true. Can't wait!” Another user says that “I'm most excited about the modern A72 cores, upgraded hardware decode, and up to 4 GB RAM. They really listened and delivered what most people wanted in a next gen RPi.” For more details, head over to the Raspberry Pi official blog. You can now install Windows 10 on a Raspberry Pi 3 Setting up a Raspberry Pi for a robot – Headless by Default [Tutorial] Introducing Strato Pi: An industrial Raspberry Pi
Read more
  • 0
  • 0
  • 22923

article-image-unit-testing-apps-android-studio
Troy Miles
15 Mar 2016
6 min read
Save for later

Unit Testing Apps with Android Studio

Troy Miles
15 Mar 2016
6 min read
We will need to create an Android app, get it all set up, then add a test project to it. Let's begin. 1. Start Android Studio and select new project. 2. Change the Application name to UTest. Click Next . 3. Click Next again. 4. Click Finish. Now that we have the project started, let’s set it up. Open the layout resource file:activity_main.xml. Add an ID to TextView. It should look as follows: <RelativeLayout android:layout_width="match_parent" android:layout_height="match_parent" android:paddingLeft="@dimen/activity_horizontal_margin" android:paddingRight="@dimen/activity_horizontal_margin" android:paddingTop="@dimen/activity_vertical_margin" android:paddingBottom="@dimen/activity_vertical_margin" tools:context="com.tekadept.utest.app.MainActivity" > <TextView android:id="@+id/greeting" android:text="@string/hello_world" android:layout_width="wrap_content" android:layout_height="wrap_content" /> </RelativeLayout> The random message Next we modify the MainActivity class. We are going to add some code that will display a random greeting message to the user. Modify MainActivity so that it looks like the following code: TextViewtxtGreeting; @Override protected void onCreate(Bundle savedInstanceState) { super .onCreate(savedInstanceState); setContentView(R.layout.activity_main); txtGreeting = (TextView)findViewById(R.id.greeting); Random rndGenerator = new Random(); int rnd = rndGenerator.nextInt(4); String greeting = getGreeting(rnd); txtGreeting.setText(greeting); } private String getGreeting(intmsgNumber) { String greeting; switch (msgNumber){ case 0: greeting = "Holamundo"; break ; case 1: greeting = "Bonjour tout le monde"; break ; case 2: greeting = "Ciao mondo"; break ; case 3: greeting = "Hallo Welt"; break ; default : greeting = "Hello world"; break ; } return greeting; } At this point, if you run the app, it should display one of four random greetings each time you run. We want to test the getGreeting method. We need to be sure that the string it returns matches the number we sent it. Currently, however, we have no way to know that. In order to add a test package, we need to hover over the package name. For my app, the package name is com.tekadept.utest.app . It is the line directly below the Java directory. The rest of the steps are as follows: Right click on the package name and choose New-> Package. Give your new package the name tests . Click OK. Right click on tests and choose New -> Java Class . Enter MainActivityTest as your name. Click OK from inside MainActivityTest. Currently, we are not extending from the proper base class. Let's fix that. Change the MainActivityTest class so it looks like the following code: package com.tekadept.utest.app.tests; import android.test.ActivityInstrumentationTestCase2; import com.tekadept.utest.app.MainActivity; public class MainActivityTestextends ActivityInstrumentationTestCase2<MainActivity>{ public MainActivityTest() { super (MainActivity.class); } } We've done two things. First, we changed the base class to ActivityInstrumentationTestCase2. Secondly, we added a constructor method. Before we can test the logic of the getGreeting method, we need to make it visible to outside classes by changing its modifier from private to public. Once we've done that, return to the MainActivityTest class and add a new method, testGetGreetings. This is shown in the following code: public void testGetGreeting() throws Exception { MainActivity activity = getActivity(); int count = 0; String result = activity.getGreeting(count); Assert.assertEquals("Holamundo", result); count = 1; result = activity.getGreeting(count); Assert.assertEquals("Bonjour tout le monde", result); count = 2; result = activity.getGreeting(count); Assert.assertEquals("Ciao mondo", result); count = 3; result = activity.getGreeting(count); Assert.assertEquals("Hallo Welt", result); } Time to test All we need to do now is create a configuration for our test package. Click Run -> Edit Configurations…. On the Run/Debug Configurations click the plus sign in the upper left hand corner. Click onAndroid Tests. For the name, enter test. Make sure the General tab is selected. For Module , choose app . For Test, choose All in Package . For Package , browse down to the test folder. The Android unit test must run on a device or emulator. I prefer having the choose dialog come up, so I've selected that option. You should select whichever option works best for you. Then click OK. At this point, you have a working app complete with a functioning unit test. To run the unit test, choose the test configuration from the drop-down menu to the left of the run button. Then click the run button. After building your app and running it on your selected device, Android Studio will show the test results. If you don't see the results, click the run button in the lower left hand corner of Android Studio. Green is good. Red means one or more tests have failed. Currently our one test should be passing, so everything should be green. In order to see a test fail, let's make a temporary change to the getGreeting method. Change the first greeting from "Holamundo" to "Adios mundo". Save your change and click the run button to run the tests again. This time the test should fail. You should see a message something like the following: The test runner shows the failure message and includes a stack trace of the failure. The first line of the stack trace shows that the test failed on line 17 of MainActivityTest . Don't forget to restore the MainActivity class' getGreeting method back to fix the failing unit test. Conclusion That is it for this post. You now know how to add a unit test package to Android Studio. If you had any trouble with this post, be sure to check out the complete source code to the UTest project on my GitHub repo at: https://github.com/Rockncoder/UTest. From 14th-20th March we're throwing the spotlight on iOS and Android, and asking you which one you think will win out in the future. Tell us - then save 50% on a selection of our very best Android and iOS titles! About the author Troy Miles, also known asthe Rockncoder, currently has fun writing full stack code with ASP.NET MVC or Node.js on the backend and web or mobile up front. He started coding over 30 years ago, cutting his teeth writing games for C64, Apple II, and IBM PCs. After burning out, he moved on to Windows system programming before catching Internet fever just before the dot net bubble burst. After realizing that mobile devices were the perfect window into backend data, he added mobile programming to his repertoire. He loves competing in hackathons and randomly posting interesting code nuggets on his blog: http://therockncoder.blogspot.com/.
Read more
  • 0
  • 0
  • 22907
article-image-learn-to-build-a-scatterplot-in-ibm-spss
Kartikey Pandey
27 Nov 2017
4 min read
Save for later

How to build a Scatterplot in IBM SPSS

Kartikey Pandey
27 Nov 2017
4 min read
[box type="note" align="" class="" width=""] ----The following excerpt is from the title Data Analysis with IBM SPSS Statistics, Chapter 5, written by Kenneth Stehlik-Barry and Anthony J. Babinec. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. [/box] In this article we help you learn the techniques of SPSS to build Scatterplot using the Chart Builder feature. One of the most valuable methods for examining the relationship between two variables containing scale-level data is a scatterplot. In the previous chapter, scatterplots were used to detect points that deviated from the typical pattern--multivariate outliers. To produce a similar scatterplot using two fields from the 2016 General Social Survey data, navigate to Graphs | Chart Builder. An information box is displayed indicating that each field's measurement properties will be used to identify the types of graphs available so adjusting these properties is advisable. In this example, the properties will be modified as a part of the graph specification process but you may want to alter the properties of some variables permanently so that they don't need to be changed for each use. For now, just select OK to move ahead. In the main Chart Builder window, select Scatter/Dot from the menu at the lower left, double-click on the first graph to the right (Simple Scatter) to place it in the preview pane at the upper right, and then right-click on the first field labeled HIGHEST YEAR OF SCHOOL. Change this variable from Nominal to Scale, as shown in the following Screenshot: After changing the respondent's education to Scale, drag this field to the X-Axis location in the preview pane and drag spouse's education to the Y-Axis location. Once both elements are in place, the OK choice will become available. Select it to produce the scatterplot in the following screenshot: The scatterplot produced by default provides some sense of the trend in that the denser circles are concentrated in a band from the lower left to the upper right. This pattern, however, is rather subtle visually. With some editing, the relationship can be made more Evident. Double-click on the graph to open the Chart Editor and select the X icon at the top and change the major increment to 4 so that there are numbers corresponding to completing high school and college. Do the same for the y-axis values. Select a point on the graph to highlight all the "dots" and right-click to display the following dialog. Click on the Marker tab and change the symbol to the star shape, increase the size to 6, increase the border to 2, and change the border color to a dark blue. Use Apply to make the changes visible on the scatterplot: Use the Add Fit line at Total icon above the graph to show the regression line for this data. Drag the R2 box from the upper right to the bottom, below the graph and drag the box on the graphs with the equation displayed to the lower left away from the points: The modifications to the original scatterplot make it easier to see the pattern since the “stars” near the line are darker and denser than those farther from the line indicating fewer cases are associated with those points. The SPSS capabilities with respect to scatterplot in this article will give you a foundation to create a visual representation of data for both deeper pattern discovery and to communicate results to a broader audience. Several other graph types such as pie charts and multiple line charts  can be built and edited using the approach shown in Chapter 5, Visually Exploring the Data from our title Data Analysis with IBM SPSS Statistics. Go on, explore these alternative graph styles to see when they may be better suited to your needs.  
Read more
  • 0
  • 0
  • 22902

article-image-introduction-to-open-shortest-path-first-ospf-tutorial
Amrata Joshi
02 Jan 2019
14 min read
Save for later

Introduction to Open Shortest Path First (OSPF) [Tutorial]

Amrata Joshi
02 Jan 2019
14 min read
The OSPF interior routing protocol is a very popular protocol in enterprise networks. OSPF does a very good job in calculating cost values to choose the Shortest Path First to its destinations. OSPF operations can be separated into three categories: Neighbor and adjacency initialization LSA flooding SPF tree calculation This article is an excerpt taken from the book  CCNA Routing and Switching 200-125 Certification Guide by Lazaro (Laz) Diaz. This book covers the understanding of networking using routers and switches, layer 2 technology and its various configurations and connections, VLANS and inter-VLAN routing and more. In this article, we will cover the basics of OSPF, its features and configuration, and much more. Neighbor and adjacency initialization This is the very first part of OSPF operations. The router at this point will allocate memory for this function as well as for the maintenance of both the neighbor and topology tables. Once the router discovers which interfaces are configured with OSPF, it will begin sending hello packets throughout the interface in the hope of finding other routers using OSPF. Let's look at a visual representation: Remember this would be considered a broadcast in between the routers so the election needs to run to choose DR and BDR. 00:03:06: OSPF: DR/BDR election on FastEthernet0/0 00:03:06: OSPF: Elect BDR 10.1.1.5 00:03:06: OSPF: Elect DR 10.1.1.6 00:03:06: OSPF: Elect BDR 10.1.1.5 00:03:06: OSPF: Elect DR 10.1.1.6 00:03:06: DR: 10.1.1.6 (Id) BDR: 10.1.1.5 (Id) One thing to keep in mind is that if you are using Ethernet, as we are, the hello packet timer is set to 10 seconds. If it is not an Ethernet connection, the hello packet timer will be set to 30 seconds. Why is this so important to know? Because the hello packet timer must be identical to its adjacent router or they will never become neighbors. Link State Advertisements and Flooding Before we begin with LSA flooding and how it uses LSUs to create the OSPF routing table, let's elaborate on this term. There is not just one type of LSA either. Let's have a look at the following table: By no means are these the only LSAs that exist. There are 11 LSAs, but for the CCNA, you must know about the ones that I highlighted, do not dismiss the rest. LSA updates are sent via multicast addresses. Depending on the type of network topology you have, that multicast address is used. For the point-to-point networks, the multicast address is 224.0.0.5. In a broadcast environment, 224.0.0.6 is used. But as we get further into OSPF and start discussing DR/BDR routers in a broadcast environment, the DR uses 224.0.0.5 and the BDR uses 224.0.0.6. In any case, remember that these two multicast addresses are used within OSPF. The network topology is created via LSAs updates, for which the information is acquired through LSUs or link state updates. So, OSPF routers, after they have converged, send hellos via LSAs. If any new change happens, it is the job of the LSU to update the LSA of the routers in order to keep routing tables current. Configuring the basics of OSPF You have already had a sneak peek into the configuration of OSPF, but let's take it back to the basics. The following diagram shows the topology: Yes, this is the basic topology, but we will do a dual stack, shown as follows: Configuration of R1: Configuration of R2: Configuration of R3: So, what did we do? We put the IP addresses on each interface and since we are using serial cables, on the DCE side of the cable, we must use the clock rate command and assign the clock rate for synchronization and encapsulation. Then we configured OSPF with basic configuration, which means that all we did was advertise the networks we are attached to using the process ID number, which is local to the router. The complete network ID address we are partly using is a wildcard mask and since this is the first area, we must use area 0. We can verify several ways to use the ping command. Use the sh ip protocols or sh ip route, but let's look at how this would look. Verifying from R1, you will get the following: There are three simple commands that we could use to verify that our configuration of OSPF is correct. One thing you need to know very well is wild card masking, so let me show you a couple of examples: Before we begin, let me you present a very simple way of doing wildcard masking. All you must do is use the constant number 255.255.255.255 and subtract your subnet mask from it: So, as you can plainly see, your mask will determine the wildcard mask. The network ID may look the same but you will have three different wildcard masks. That would be a lot of different hosts pointing to a specific interface. Finally, let's look at another example, which is a subnetted Class A address: It's extremely simple, with no physics needed. So, that was a basic configuration of OSPF, but you can configure OSPF in many ways. I just explained wildcard masking, but remember that zeros need to match exactly, so what can you tell me about the following configuration, using a different topology? R1(config)#router ospf 1 R1(config-router)#net 0.0.0.0 0.0.0.0 area 0 R2(config)#router ospf 2 R2(config-router)#net 10.1.1.6 0.0.0.0 area 0 R2(config-router)#net 10.1.1.9 0.0.0.0 area 0 R2(config-router)#net 2.2.2.2 0.0.0.0 area 0 R3(config)#router ospf 3 R3(config-router)#net 10.1.1.0 0.0.0.255 area 0 R3(config-router)#net 3.3.3.0 0.0.0.255 area 0 We configured OSPF in three different ways, so let's explain each one. In this new topology, we are playing around with the wildcard mask. You can see in the first configuration that when we create the network statement, we use all zeros, 0.0.0.0 0.0.0.0, and then we put in the area number. Using all zeros means matching all interfaces, so any IP address that exists on the router will be matched by OSPF, placed in area 0, and advertised to the neighbor routers. In the second example, when we create our network statement, we put the actual IP address of the interface and then use a wildcard mask of all zeros, 192.168.1.254 0.0.0.0. In this case, OSPF will know exactly what interface is going to participate in the OSPF process, because we are matching exactly each octet. In the last example, the network state created was using the network ID and then we only matched the first three octets and we used 255 on the last octet, which states whatever number. So, OSPF has tremendous flexibility in its configurations, to meet your needs on the network. You just need to know what those needs are. By the way, I hope you spotted that I used a different process ID number on each router. Keep in mind for the CCNA and even most "real-world" networks that the process ID number is only locally significant. The other routers do not care, so this number can be whatever you want it to be. To further prove that the three new ways of configuring OSPF work, here are the routers' output: R1#sh ip route Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets C 1.1.1.1 is directly connected, Loopback1 2.0.0.0/32 is subnetted, 1 subnets O 2.2.2.2 [110/2] via 10.1.1.6, 18:41:09, FastEthernet0/0 3.0.0.0/32 is subnetted, 1 subnets O 3.3.3.3 [110/3] via 10.1.1.6, 18:41:09, FastEthernet0/0 10.0.0.0/30 is subnetted, 2 subnets O 10.1.1.8 [110/2] via 10.1.1.6, 18:41:09, FastEthernet0/0 C 10.1.1.4 is directly connected, FastEthernet0/0 R1#sh ip protocols Routing Protocol is "ospf 1" Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 1.1.1.1 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 0.0.0.0 255.255.255.255 area 0 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 3.3.3.3 110 18:41:42 2.2.2.2 110 18:41:42 Distance: (default is 110) R1#ping 2.2.2.2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 16/20/24 ms R1#ping 3.3.3.3 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 36/52/72 ms As you can see, I have full connectivity and by looking at my routing table, I am learning about all the routes. But I want to show the differences in the configuration of the network statements for the three routers using the sh ip protocols command: R2#sh ip protocols Routing Protocol is "ospf 2" Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 2.2.2.2 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 2.2.2.2 0.0.0.0 area 0 10.1.1.6 0.0.0.0 area 0 10.1.1.9 0.0.0.0 area 0 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 3.3.3.3 110 18:31:18 1.1.1.1 110 18:31:18 Distance: (default is 110) R3#sh ip protocols Routing Protocol is "ospf 3" Outgoing update filter list for all interfaces is not set Incoming update filter list for all interfaces is not set Router ID 3.3.3.3 Number of areas in this router is 1. 1 normal 0 stub 0 nssa Maximum path: 4 Routing for Networks: 3.3.3.0 0.0.0.255 area 0 10.1.1.0 0.0.0.255 area 0 Reference bandwidth unit is 100 mbps Routing Information Sources: Gateway Distance Last Update 2.2.2.2 110 18:47:13 1.1.1.1 110 18:47:13 Distance: (default is 110) To look at other features that OSPF uses, we are going to explore the passive-interface command. This is very useful in preventing updates being sent out. But be warned, this command works differently with other routing protocols. For example, if you were to configure it on EIGRP, it will not send or receive updates. In OSPF, it simply prevents updates from being sent out, but will receive updates for neighbor routers. It will not update its routing table, so essentially that interface is down. Let's look from the perspective of R2: R2(config-router)#passive-interface f1/0 *Oct 3 04:47:01.763: %OSPF-5-ADJCHG: Process 2, Nbr 1.1.1.1 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Interface down or detached Almost immediately, it took the F1/0 interface down. What's happening is that the router is not sending any hellos. Let's further investigate by using the debug ip ospf hello command: R2#debug ip ospf hello OSPF hello events debugging is on R2# *Oct 3 04:49:40.319: OSPF: Rcv hello from 3.3.3.3 area 0 from FastEthernet1/1 10.1.1.10 *Oct 3 04:49:40.319: OSPF: End of hello processing R2# *Oct 3 04:49:43.723: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet1/1 from 10.1.1.9 R2# *Oct 3 04:49:50.319: OSPF: Rcv hello from 3.3.3.3 area 0 from FastEthernet1/1 10.1.1.10 *Oct 3 04:49:50.323: OSPF: End of hello processing R2# *Oct 3 04:49:53.723: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet1/1 from 10.1.1.9 R2# *Oct 3 04:50:00.327: OSPF: Rcv hello from 3.3.3.3 area 0 from FastEthernet1/1 10.1.1.10 *Oct 3 04:50:00.331: OSPF: End of hello processing It is no longer sending updates out to the F1/0 interface, so let's look at the routing table now and see what networks we know about: R2#sh ip route Gateway of last resort is not set 2.0.0.0/32 is subnetted, 1 subnets C 2.2.2.2 is directly connected, Loopback2 3.0.0.0/32 is subnetted, 1 subnets O 3.3.3.3 [110/2] via 10.1.1.10, 00:05:12, FastEthernet1/1 10.0.0.0/30 is subnetted, 2 subnets C 10.1.1.8 is directly connected, FastEthernet1/1 C 10.1.1.4 is directly connected, FastEthernet1/0 R2#ping 2.2.2.2 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/4 ms R2#ping 3.3.3.3 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 3.3.3.3, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 20/24/40 ms So, what are we looking at? We are only learning about the 3.3.3.3 network, which is the loopback address on R3. We have stopped learning about the 1.1.1.1 network, and we do not have connectivity to it. We can ping our own loopback, obviously, and we can ping the loopback on R3. Okay, let's remove the passive interface command and compare the difference: R2(config)#router ospf 2 R2(config-router)#no passive-interface f1/0 R2(config-router)# *Oct 3 04:57:34.343: %OSPF-5-ADJCHG: Process 2, Nbr 1.1.1.1 on FastEthernet1/0 from LOADING to FULL, Loading Done We have now recreated our neighbor relationship with R1 once more. Let's debug again: R2#debug ip ospf hello OSPF hello events debugging is on R2# *Oct 3 05:03:48.527: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet1/0 from 10.1.1.6 R2# *Oct 3 05:03:50.303: OSPF: Rcv hello from 3.3.3.3 area 0 from FastEthernet1/1 10.1.1.10 *Oct 3 05:03:50.303: OSPF: End of hello processing R2# *Oct 3 05:03:52.143: OSPF: Rcv hello from 1.1.1.1 area 0 from FastEthernet1/0 10.1.1.5 *Oct 3 05:03:52.143: OSPF: End of hello processing R2# *Oct 3 05:03:53.723: OSPF: Send hello to 224.0.0.5 area 0 on FastEthernet1/1 from 10.1.1.9 Once again, we are sending and receiving hellos from R1, so let's ping the loopback on R1, but also look at the routing table: R2#sh ip route Gateway of last resort is not set 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/2] via 10.1.1.5, 00:06:50, FastEthernet1/0 2.0.0.0/32 is subnetted, 1 subnets C 2.2.2.2 is directly connected, Loopback2 3.0.0.0/32 is subnetted, 1 subnets O 3.3.3.3 [110/2] via 10.1.1.10, 00:06:50, FastEthernet1/1 10.0.0.0/30 is subnetted, 2 subnets C 10.1.1.8 is directly connected, FastEthernet1/1 C 10.1.1.4 is directly connected, FastEthernet1/0 R2#ping 1.1.1.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 1.1.1.1, timeout is 2 seconds: !!!!! Once more, we have connectivity, so with the passive-interface be very careful how you are going to use it and which protocol you are going to use it with. Now let's explore another feature, which is the default-information originate. This is used in conjunction with a static-default route to create an OSPF default static route. It is like advertising a static default route. To let all the routers know if you want to get to a destination network, this is the way to go. So, how would you configure something like that? Let's take a look. Use the following topology: R1(config)# ip route 0.0.0.0 0.0.0.0 GigabitEthernet2/0 R1(config)#router ospf 1 R1(config-router)#default-information originate Now that we have created a static route to an external network and we did the default-information originate command, what would the routing tables of the other routers look like? R2#sh ip route Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is 10.1.1.5 to network 0.0.0.0 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/2] via 10.1.1.5, 00:16:35, FastEthernet1/0 2.0.0.0/32 is subnetted, 1 subnets C 2.2.2.2 is directly connected, Loopback2 3.0.0.0/32 is subnetted, 1 subnets O 3.3.3.3 [110/2] via 10.1.1.10, 00:16:35, FastEthernet1/1 10.0.0.0/30 is subnetted, 2 subnets C 10.1.1.8 is directly connected, FastEthernet1/1 C 10.1.1.4 is directly connected, FastEthernet1/0 O 192.168.1.0/24 [110/2] via 10.1.1.5, 00:16:35, FastEthernet1/0 O*E2 0.0.0.0/0 [110/1] via 10.1.1.5, 00:16:35, FastEthernet1/0 R3#sh ip route Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is 10.1.1.9 to network 0.0.0.0 1.0.0.0/32 is subnetted, 1 subnets O 1.1.1.1 [110/3] via 10.1.1.9, 00:17:17, FastEthernet0/0 2.0.0.0/32 is subnetted, 1 subnets O 2.2.2.2 [110/2] via 10.1.1.9, 00:17:17, FastEthernet0/0 3.0.0.0/32 is subnetted, 1 subnets C 3.3.3.3 is directly connected, Loopback3 10.0.0.0/30 is subnetted, 2 subnets C 10.1.1.8 is directly connected, FastEthernet0/0 O 10.1.1.4 [110/2] via 10.1.1.9, 00:17:17, FastEthernet0/0 O 192.168.1.0/24 [110/3] via 10.1.1.9, 00:17:17, FastEthernet0/0 O*E2 0.0.0.0/0 [110/1] via 10.1.1.9, 00:17:17, FastEthernet0/0 R4#sh ip route Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2 E1 - OSPF external type 1, E2 - OSPF external type 2 i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2 ia - IS-IS inter area, * - candidate default, U - per-user static route o - ODR, P - periodic downloaded static route Gateway of last resort is 192.168.1.1 to network 0.0.0.0 1.0.0.0/32 is subnetted, 1 subnets D EX 1.1.1.1 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 2.0.0.0/32 is subnetted, 1 subnets D EX 2.2.2.2 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 3.0.0.0/32 is subnetted, 1 subnets D EX 3.3.3.3 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 10.0.0.0/30 is subnetted, 2 subnets D EX 10.1.1.8 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 D EX 10.1.1.4 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 C 192.168.1.0/24 is directly connected, GigabitEthernet2/0 D*EX 0.0.0.0/0 [170/5376] via 192.168.1.1, 00:12:38, GigabitEthernet2/0 So, this is how you can advertise a default route to external route, using OSPF. Obviously, you must configure EIGRP on R1 and R4 and do some redistribution. That is why all the routes are external, but you are advertising a way out using a static default route. To summarize, this article covered OSPF configurations,   features of OSPF, and different ways of advertising the networks. To know more about Multi-area OSPF configuration, check out the book CCNA Routing and Switching 200-125 Certification Guide. Brute forcing HTTP applications and web applications using Nmap [Tutorial] Discovering network hosts with ‘TCP SYN’ and ‘TCP ACK’ ping scans in Nmap[Tutorial] How to build a convolution neural network based malware detector using malware visualization [Tutorial]
Read more
  • 0
  • 0
  • 22899

article-image-chaos-engineering-comes-to-kubernetes-thanks-to-gremlin
Richard Gall
18 Nov 2019
2 min read
Save for later

Chaos engineering comes to Kubernetes thanks to Gremlin

Richard Gall
18 Nov 2019
2 min read
Kubernetes causes problems. Just last week Cindy Sridharan wrote on Twitter that while Docker "succeeded... because it was a great developer tool," Kubernetes "decided to be all things tech and not much by way of UX. It was and remains a hostile piece of software to learn, run, operate, maintain." https://twitter.com/copyconstruct/status/1194701905248673792?s=20 That's just a drop in the ocean - you don't have to look hard to find more hot takes, jokes, and memes about how complicated working with Kubernetes can feel. Despite all this, it's certainly here to stay. That makes chaos engineering platform Gremlin's announcement that the platform will offer native support for Kubernetes particularly welcome. Citing container orchestration research done by Datadog in the press release, which indicates the rapid rate of Kubernetes adoption, Gremlin is hoping that it can provide some additional support for users that might be concerned about the platforms complexity. From last year: Gremlin makes chaos engineering with Docker easier with new container discovery feature Gremlin CTO Matt Fornaciari said "our goal is to provide SRE and DevOps teams that are building and deploying modern applications with the tools and processes necessary to understand how their systems handle failure, before that failure has the chance to impact customers and business." The new feature is designed to help engineers do exactly that by allowing them "to automate the process of identifying Kubernetes primitives such as nodes and pods," and to select and attack traffic from different services within Kubernetes. The other important element to all this is that Gremlin wants to make things as straightforward as possible for engineering teams. With a neat and easy to use UI, it would seem that, to return to Sridharan's words, the team are eager to make sure their product is "a great developer tool."   The tool has already been tried and tested in the wild. Simon Govier, Expedia's Director of Program Management described how performing chaos experiments on Kubernetes with Gremlin "significantly reduces the amount of time it takes to do fault injection and increases our systems' resilience to failure." Learn more on the Gremlin website.
Read more
  • 0
  • 0
  • 22879
article-image-why-wall-street-unfriended-facebook-stocks-lost-over-120-billion-in-market-value-after-q2-2018-earnings-call
Natasha Mathur
27 Jul 2018
6 min read
Save for later

Why Wall Street unfriended Facebook: Stocks fell $120 billion in market value after Q2 2018 earnings call

Natasha Mathur
27 Jul 2018
6 min read
After been found guilty of providing discriminatory advertisements on its platform earlier this week, Facebook hit yet another wall yesterday as its stock closed falling down by 18.96% on Thursday with shares selling at $176.26. This means that the company lost around $120 billion in market value overnight, making it the largest loss of value ever in a day for a US-traded company since Intel Corp’s two-decade-old crash. Intel had lost a little over $18 billion in one day, 18 years back. Despite the 41.24% revenue growth compared to last year, this was Facebook’s biggest stock market drop ever. Here’s the stock chart from NASDAQ showing the figures:   Facebook’s market capitalization was worth $629.6 on Wednesday. As soon as Facebook’s Earnings calls concluded by the end of market trading on Thursday, it’s worth dropped to $510 billion after the close. Also, as Facebook’s market shares continued to drop down during Thursday’s market, it left its CEO, Mark Zuckerberg with less than $70 billion, wiping out nearly $17 billion of his personal stake, according to Bloomberg. Also, he was demoted from the third to the sixth position on Bloomberg’s Billionaires Index. Active user growth starting to stagnate in mature markets According to David Wehner, CFO at Facebook, “the Daily active users count on Facebook reached 1.47 billion, up 11% compared to last year, led by growth in India, Indonesia, and the Philippines. This number represents approximately 66% of the 2.23 billion monthly active users in Q2”. Facebook’s daily active users He also mentioned that  “MAUs (monthly active users) were up 228M or 11% compared to last year. It is worth noting that MAU and DAU in Europe were both down slightly quarter-over-quarter due to the GDPR rollout, consistent with the outlook we gave on the Q1 call”. Facebook’s Monthly Active users In fact, Facebook has implemented several privacy policy changes in the last few months. This is due to the European Union's General Data Protection Regulation ( GDPR ) as the company's earnings report revealed the effects of the GDPR rules. Revenue Growth Rate is falling too Speaking of revenue expectations, Wehner gave investors a heads up that revenue growth rates will decline in the third and fourth quarters. Wehner states that the company’s “total revenue growth rate decelerated approximately 7 percentage points in Q2 compared to Q1. Our total revenue growth rates will continue to decelerate in the second half of 2018, and we expect our revenue growth rates to decline by high single-digit percentages from prior quarters sequentially in both Q3 and Q4.”  Facebook reiterated further that these numbers won’t get better anytime soon.                                                 Facebook’s Q2 2018 revenue Wehner further spoke explained the reasons for the decline in revenue,“There are several factors contributing to that deceleration..we expect the currency to be a slight headwind in the second half ...we plan to grow and promote certain engaging experiences like Stories that currently have lower levels of monetization. We are also giving people who use our services more choices around data privacy which may have an impact on our revenue growth”. Let’s look at other performance indicators Other financial highlights of Q2 2018 are as follows: Mobile advertising revenue represented 91% of advertising revenue for q2 2018, which is up from approx. 87% of the advertising revenue in Q2 2017. Capital expenditures for Q2 2018 were $3.46 billion which is up from $1.4 billion in Q2 2017. Headcount was 30,275 around June 30, which is an increase of 47% year-over-year. Cash, Cash equivalents, and marketable securities were $42.3 billion at the end of Q2 2018, an increase from $35.45 billion at the end of the Q2 2017. Wehner also mentioned that the company “continue to expect that full-year 2018 total expenses will grow in the range of 50-60% compared to last year. In addition to increases in core product development and infrastructure -- growth is driven by increasing investments -- safety & security, AR/VR, marketing, and content acquisition”. Another reason for the overall loss is that Facebook has been dealing with criticism for quite some time now over its content policies, its issues regarding user’s private data and its changing rules for advertisers. In fact, it is currently investigating data analytics firm Crimson Hexagon over misuse of data. Mark Zuckerberg also said over a conference call with financial analysts that Facebook has been investing heavily in “safety, security, and privacy” and that how they’re “investing - in security that it will start to impact our profitability, we’re starting to see that this quarter - we run this company for the long term, not for the next quarter”. Here’s what the public feels about the recent wipe-out: https://twitter.com/TeaPainUSA/status/1022586648155054081 https://twitter.com/alistairmilne/status/1022550933014753280 So, why did Facebook’s stocks crash? As we can see, Facebook’s performance itself in Q2 2018 has been better than its performance last year for the same quarter as far as revenue goes. Ironically, scandals and lawsuits have had little impact on Facebook’s growth. For example, Facebook recovered from the Cambridge Analytica scandal fully within two months as far share prices are concerned. The Mueller indictment report released earlier this month managed to arrest growth for merely a couple of days before the company bounced back. The discriminatory advertising verdict against Facebook, had no impact on its bullish growth earlier this week. This brings us to conclude that the public sentiments and market reactions against Facebook have very different underlying reasons. The market’s strong reactions are mainly due to concerns over the active user growth slowdown, the lack of monetization opportunities on the more popular Instagram platform, and Facebook’s perceived lack of ability to evolve successfully to new political and regulatory policies such as the GDPR. Wall Street has been indifferent to Facebook’s long list of scandals, in some ways, enabling the company’s ‘move fast and break things’ approach. In his earnings call on Thursday, Zuckerberg hinted that Facebook may not be keen on ‘growth at all costs’ by saying things like “we’re investing so much in security that it will significantly impact our profitability” and then Wehner adding, “Looking beyond 2018, we anticipate that total expense growth will exceed revenue growth in 2019.” And that has got Wall street unfriending Facebook with just a click of the button! Is Facebook planning to spy on you through your mobile’s microphones? Facebook to launch AR ads on its news feed to let you try on products virtually Decoding the reasons behind Alphabet’s record high earnings in Q2 2018  
Read more
  • 0
  • 0
  • 22853

article-image-implementing-c-libraries-in-delphi-for-hpc-tutorial
Pavan Ramchandani
24 Jul 2018
16 min read
Save for later

Implementing C++ libraries in Delphi for HPC [Tutorial]

Pavan Ramchandani
24 Jul 2018
16 min read
Using C object files in Delphi is hard but possible. Linking to C++ object files is, however, nearly impossible. The problem does not lie within the object files themselves but in C++. While C is hardly more than an assembler with improved syntax, C++ represents a sophisticated high-level language with runtime support for strings, objects, exceptions, and more. All these features are part of almost any C++ program and are as such compiled into (almost) any object file produced by C++. In this tutorial, we will leverage various C++ libraries that enable high-performance with Delphi. It starts with memory management, which is an important program for any high performance applications. The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance. The problem here is that Delphi has no idea how to deal with any of that. C++ object is not equal to a Delphi object. Delphi has no idea how to call functions of a C++ object, how to deal with its inheritance chain, how to create and destroy such objects, and so on. The same holds for strings, exceptions, streams, and other C++ concepts. If you can compile the C++ source with C++Builder then you can create a package (.bpl) that can be used from a Delphi program. Most of the time, however, you will not be dealing with a source project. Instead, you'll want to use a commercial library that only gives you a bunch of C++ header files (.h) and one or more static libraries (.lib). Most of the time, the only Windows version of that library will be compiled with Microsoft's Visual Studio. A more general approach to this problem is to introduce a proxy DLL created in C++. You will have to create it in the same development environment as was used to create the library you are trying to link into the project. On Windows, that will in most cases be Visual Studio. That will enable us to include the library without any problems. To allow Delphi to use this DLL (and as such use the library), the DLL should expose a simple interface in the Windows API style. Instead of exposing C++ objects, the API must expose methods implemented by the objects as normal (non-object) functions and procedures. As the objects cannot cross the API boundary we must find some other way to represent them on the Delphi side. Instead of showing how to write a DLL wrapper for an existing (and probably quite complicated) C++ library, I have decided to write a very simple C++ library that exposes a single class, implementing only two methods. As compiling this library requires Microsoft's Visual Studio, which not all of you have installed, I have also included the compiled version (DllLib1.dll) in the code archive. The Visual Studio solution is stored in the StaticLib1 folder and contains two projects. StaticLib1 is the project used to create the library while the Dll1 project implements the proxy DLL. The static library implements the CppClass class, which is defined in the header file, CppClass.h. Whenever you are dealing with a C++ library, the distribution will also contain one or more header files. They are needed if you want to use a library in a C++ project—such as in the proxy DLL Dll1. The header file for the demo library StaticLib1 is shown in the following. We can see that the code implements a single CppClass class, which implements a constructor (CppClass()), destructor (~CppClass()), a method accepting an integer parameter (void setData(int)), and a function returning an integer (int getSquare()). The class also contains one integer private field, data: #pragma once class CppClass { int data; public: CppClass(); ~CppClass(); void setData(int); int getSquare(); }; The implementation of the CppClass class is stored in the CppClass.cpp file. You don't need this file when implementing the proxy DLL. When we are using a C++ library, we are strictly coding to the interface—and the interface is stored in the header file. In our case, we have the full source so we can look inside the implementation too. The constructor and destructor don't do anything and so I'm not showing them here. The other two methods are as follows. The setData method stores its parameter in the internal field and the getSquare function returns the squared value of the internal field: void CppClass::setData(int value) { data = value; } int CppClass::getSquare() { return data * data; } This code doesn't contain anything that we couldn't write in 60 seconds in Delphi. It does, however, serve as a perfect simple example for writing a proxy DLL. Creating such a DLL in Visual Studio is easy. You just have to select File | New | Project, and select the Dynamic-Link Library (DLL) project type from the Visual C++ | Windows Desktop branch. The Dll1 project from the code archive has only two source files. The file, dllmain.cpp was created automatically by Visual Studio and contains the standard DllMain method. You can change this file if you have to run project-specific code when a program and/or a thread attaches to, or detaches from, the DLL. In my example, this file was left just as the Visual Studio created it. The second file, StaticLibWrapper.cpp fully implements the proxy DLL. It starts with two include lines (shown in the following) which bring in the required RTL header stdafx.h and the header definition for our C++ class, CppClass.h: #include "stdafx.h" #include "CppClass.h" The proxy has to be able to find our header file. There are two ways to do that. We could simply copy it to the folder containing the source files for the DLL project, or we can add it to the project's search path. The second approach can be configured in Project | Properties | Configuration Properties | C/C++ | General | Additional Include Directories. This is also the approach used by the demonstration program. The DLL project must be able to find the static library that implements the CppClass object. The path to the library file should be set in project options, in the Configuration Properties | Linker | General | Additional Library Directories settings. You should put the name of the library (StaticLib1.lib) in the Linker | Input | Additional Dependencies settings. The next line in the source file defines a macro called EXPORT, which will be used later in the program to mark a function as exported. We have to do that for every DLL function that we want to use from the Delphi code. Later, we'll see how this macro is used: #define EXPORT comment(linker, "/EXPORT:" __FUNCTION__ "=" __FUNCDNAME__) The next part of the StaticLibWrapper.cpp file implements an IndexAllocator class, which is used internally to cache C++ objects. It associates C++ objects with simple integer identifiers, which are then used outside the DLL to represent the object. I will not show this class in the book as the implementation is not that important. You only have to know how to use it. This class is implemented as a simple static array of pointers and contains at most MAXOBJECTS objects. The constant MAXOBJECTS is set to 100 in the current code, which limits the number of C++ objects created by the Delphi code to 100. Feel free to modify the code if you need to create more objects. The following code fragment shows three public functions implemented by the IndexAllocator class. The Allocate function takes a pointer obj, stores it in the cache, and returns its index in the deviceIndex parameter. The result of the function is FALSE if the cache is full and TRUE otherwise. The Release function accepts an index (which was previously returned from Allocate) and marks the cache slot at that index as empty. This function returns FALSE if the index is invalid (does not represent a value returned from Allocate) or if the cache slot for that index is already empty. The last function, Get, also accepts an index and returns the pointer associated with that index. It returns NULL if the index is invalid or if the cache slot for that index is empty: bool Allocate(int& deviceIndex, void* obj) bool Release(int deviceIndex) void* Get(int deviceIndex) Let's move now to functions that are exported from the DLL. The first two—Initialize and Finalize—are used to initialize internal structures, namely the GAllocator of type IndexAllocator and to clean up before the DLL is unloaded. Instead of looking into them, I'd rather show you the more interesting stuff, namely functions that deal with CppClass. The CreateCppClass function creates an instance of CppClass, stores it in the cache, and returns its index. The important three parts of the declaration are: extern "C", WINAPI, and #pragma EXPORT. extern "C" is there to guarantee that CreateCppClass name will not be changed when it is stored in the library. The C++ compiler tends to mangle (change) function names to support method overloading (the same thing happens in Delphi) and this declaration prevents that. WINAPI changes the calling convention from cdecl, which is standard for C programs, to stdcall, which is commonly used in DLLs. Later, we'll see that we also have to specify the correct calling convention on the Delphi side. The last important part, #pragma EXPORT, uses the previously defined EXPORT macro to mark this function as exported. The CreateCppClass returns 0 if the operation was successful and -1 if it failed. The same approach is used in all functions exported from the demo DLL: extern "C" int WINAPI CreateCppClass (int& index) { #pragma EXPORT CppClass* instance = new CppClass; if (!GAllocator->Allocate(index, (void*)instance)) { delete instance; return -1; } else return 0; } Similarly, the DestroyCppClass function (not shown here) accepts an index parameter, fetches the object from the cache, and destroys it. The DLL also exports two functions that allow the DLL user to operate on an object. The first one, CppClass_setValue, accepts an index of the object and a value. It fetches the CppClass instance from the cache (given the index) and calls its setData method, passing it the value: extern "C" int WINAPI CppClass_setValue(int index, int value) { #pragma EXPORT CppClass* instance = (CppClass*)GAllocator->Get(index); if (instance == NULL) return -1; else { instance->setData(value); return 0; } } The second function, CppClass_getSquare also accepts an object index and uses it to access the CppClass object. After that, it calls the object's getSquare function and stores the result in the output parameter, value: extern "C" int WINAPI CppClass_getSquare(int index, int& value) { #pragma EXPORT CppClass* instance = (CppClass*)GAllocator->Get(index); if (instance == NULL) return -1; else { value = instance->getSquare(); return 0; } } A proxy DLL that uses a mapping table is a bit complicated and requires some work. We could also approach the problem in a much simpler manner—by treating an address of an object as its external identifier. In other words, the CreateCppClass function would create an object and then return its address as an untyped pointer type. A CppClass_getSquare, for example, would accept this pointer, cast it to a CppClass instance, and execute an operation on it. An alternative version of these two methods is shown in the following: extern "C" int WINAPI CreateCppClass2(void*& ptr) { #pragma EXPORT ptr = new CppClass; return 0; } extern "C" int WINAPI CppClass_getSquare2(void* index, int& value) { #pragma EXPORT value = ((CppClass*)index)->getSquare(); return 0; } This approach is simpler but offers far less security in the form of error checking. The table-based approach can check whether the index represents a valid value, while the latter version cannot know if the pointer parameter is valid or not. If we make a mistake on the Delphi side and pass in an invalid pointer, the code would treat it as an instance of a class, do some operations on it, possibly corrupt some memory, and maybe crash. Finding the source of such errors is very hard. That's why I prefer to write more verbose code that implements some safety checks on the code that returns pointers. Using a proxy DLL in Delphi To use any DLL from a Delphi program, we must firstly import functions from the DLL. There are different ways to do this—we could use static linking, dynamic linking, and static linking with delayed loading. There's plenty of information on the internet about the art of DLL writing in Delphi so I won't dig into this topic. I'll just stick with the most modern approach—delay loading. The code archive for this book includes two demo programs, which demonstrate how to use the DllLib1.dll library. The simpler one, CppClassImportDemo uses the DLL functions directly, while CppClassWrapperDemo wraps them in an easy-to-use class. Both projects use the CppClassImport unit to import the DLL functions into the Delphi program. The following code fragment shows the interface part of that unit which tells the Delphi compiler which functions from the DLL should be imported, and what parameters they have. As with the C++ part, there are three important parts to each declaration. Firstly, the stdcall specifies that the function call should use the stdcall (or what is known in C as  WINAPI) calling convention. Secondly, the name after the name specifier should match the exported function name from the C++ source. And thirdly, the delayed keyword specifies that the program should not try to find this function in the DLL when it is started but only when the code calls the function. This allows us to check whether the DLL is present at all before we call any of the functions: const CPP_CLASS_LIB = 'DllLib1.dll'; function Initialize: integer; stdcall; external CPP_CLASS_LIB name 'Initialize' delayed; function Finalize: integer; stdcall; external CPP_CLASS_LIB name 'Finalize' delayed; function CreateCppClass(var index: integer): integer; stdcall; external CPP_CLASS_LIB name 'CreateCppClass' delayed; function DestroyCppClass(index: integer): integer; stdcall; external CPP_CLASS_LIB name 'DestroyCppClass' delayed; function CppClass_setValue(index: integer; value: integer): integer; stdcall; external CPP_CLASS_LIB name 'CppClass_setValue' delayed; function CppClass_getSquare(index: integer; var value: integer): integer; stdcall; external CPP_CLASS_LIB name 'CppClass_getSquare' delayed; The implementation part of this unit (not shown here) shows how to catch errors that occur during delayed loading—that is, when the code that calls any of the imported functions tries to find that function in the DLL. If you get an External exception C06D007F  exception when you try to call a delay-loaded function, you have probably mistyped a name—either in C++ or in Delphi. You can use the tdump utility that comes with Delphi to check which names are exported from the DLL. The syntax is tdump -d <dll_name.dll>. If the code crashes when you call a DLL function, check whether both sides correctly define the calling convention. Also check if all the parameters have correct types on both sides and if the var parameters are marked as such on both sides. To use the DLL, the code in the CppClassMain unit firstly calls the exported Initialize function from the form's OnCreate handler to initialize the DLL. The cleanup function, Finalize is called from the OnDestroy handler to clean up the DLL. All parts of the code check whether the DLL functions return the OK status (value 0): procedure TfrmCppClassDemo.FormCreate(Sender: TObject); begin if Initialize <> 0 then ListBox1.Items.Add('Initialize failed') end; procedure TfrmCppClassDemo.FormDestroy(Sender: TObject); begin if Finalize <> 0 then ListBox1.Items.Add('Finalize failed'); end; When you click on the Use import library button, the following code executes. It uses the DLL to create a CppClass object by calling the CreateCppClass function. This function puts an integer value into the idxClass value. This value is used as an identifier that identifies a CppClass object when calling other functions. The code then calls CppClass_setValue to set the internal field of the CppClass object and CppClass_getSquare to call the getSquare method and to return the calculated value. At the end, DestroyCppClass destroys the CppClass object: procedure TfrmCppClassDemo.btnImportLibClick(Sender: TObject); var idxClass: Integer; value: Integer; begin if CreateCppClass(idxClass) <> 0 then ListBox1.Items.Add('CreateCppClass failed') else if CppClass_setValue(idxClass, SpinEdit1.Value) <> 0 then ListBox1.Items.Add('CppClass_setValue failed') else if CppClass_getSquare(idxClass, value) <> 0 then ListBox1.Items.Add('CppClass_getSquare failed') else begin ListBox1.Items.Add(Format('square(%d) = %d', [SpinEdit1.Value, value])); if DestroyCppClass(idxClass) <> 0 then ListBox1.Items.Add('DestroyCppClass failed') end; end; This approach is relatively simple but long-winded and error-prone. A better way is to write a wrapper Delphi class that implements the same public interface as the corresponding C++ class. The second demo, CppClassWrapperDemo contains a unit CppClassWrapper which does just that. This unit implements a TCppClass class, which maps to its C++ counterpart. It only has one internal field, which stores the index of the C++ object as returned from the CreateCppClass function: type TCppClass = class strict private FIndex: integer; public class procedure InitializeWrapper; class procedure FinalizeWrapper; constructor Create; destructor Destroy; override; procedure SetValue(value: integer); function GetSquare: integer; end; I won't show all of the functions here as they are all equally simple. One—or maybe two— will suffice. The constructor just calls the CreateCppClass function, checks the result, and stores the resulting index in the internal field: constructor TCppClass.Create; begin inherited Create; if CreateCppClass(FIndex) <> 0 then raise Exception.Create('CreateCppClass failed'); end; Similarly, GetSquare just forwards its job to the CppClass_getSquare function: function TCppClass.GetSquare: integer; begin if CppClass_getSquare(FIndex, Result) <> 0 then raise Exception.Create('CppClass_getSquare failed'); end; When we have this wrapper, the code in the main unit becomes very simple—and very Delphi-like. Once the initialization in the OnCreate event handler is done, we can just create an instance of the TCppClass and work with it: procedure TfrmCppClassDemo.FormCreate(Sender: TObject); begin TCppClass.InitializeWrapper; end; procedure TfrmCppClassDemo.FormDestroy(Sender: TObject); begin TCppClass.FinalizeWrapper; end; procedure TfrmCppClassDemo.btnWrapClick(Sender: TObject); var cpp: TCppClass; begin cpp := TCppClass.Create; try cpp.SetValue(SpinEdit1.Value); ListBox1.Items.Add(Format('square(%d) = %d', [SpinEdit1.Value, cpp.GetSquare])); finally FreeAndNil(cpp); end; end; To summarize, we learned about the C/C++ library that provides a solution for high-performance computing working with Delphi as the primary language. If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi. Exploring the Usages of Delphi Delphi: memory management techniques for parallel programming Delphi Cookbook
Read more
  • 0
  • 0
  • 22852
Modal Close icon
Modal Close icon