Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-23andme-share-client-genetic-data-with-gsk-drug-target-discovery
Sugandha Lahoti
28 Jul 2018
3 min read
Save for later

23andMe shares 5mn client genetic data with GSK for drug target discovery, a machine learning application in genetics research

Sugandha Lahoti
28 Jul 2018
3 min read
Genetics company 23andMe, which uses machine learning algorithms for human genome analysis, has entered into a four year collaboration with pharmaceutical giant GlaxoSmithKline. They will now share their 5 million client genetic data with GSK to advance research into treatments of diseases. This collaboration will be used to identify novel drug targets, tackle new subsets of disease and enable rapid progression of clinical programs. The 12 years old firm has already published more than 100 scientific papers based on its customers' data. All activities within the collaboration will initially be co-funded, with either company having certain rights to reduce its funding share. "The goal of the collaboration is to gather insights and discover novel drug targets driving disease progression and develop therapies," GlaxoSmithKline said in a press release. GSK is also reported to have invested $300 million in 23andMe. During the four year collaboration GSK will use 23andMe’s database and statistical analytics for drug target discovery. This collaboration will be used to design GSK’s LRRK2 inhibitor, which is in development for the potential treatment for Parkinson’s disease. 23andMe’s database of consented customers who have a LRRK2 variant status will be used to accelerate the progress of this programme. Together, GSK and 23andMe will target and recruit patients with defined LRRK2 mutations in order to reach clinical proof of concept. 23andMe have made it quite clear that participating in this program is voluntary and requires clients to affirmatively consent to participate. However not everyone is clear of how this would work. First, the company has specified that any research involving customer data that has already been performed or published prior to receipt of withdrawal request will not be reversed. This may have a negative effect as people are generally not aware of all the privacy policies and generally don’t read the Terms of Service. Moreover, as Peter Pitts, president of the Center for Medicine in the Public Interest, notes, “If a person's DNA is used in research, that person should be compensated. Customers shouldn’t be paying for the privilege of 23andMe working with a for-profit company in a for-profit research project.” Both the companies have sworn to provide maximum data protection for their employees. In a blog post, they note, “The continued protection of customers’ data and privacy is the highest priority for both GSK and 23andMe. Both companies have stringent security protections in place when it comes to collecting, storing and transferring information about research participants.” You can read more about the news, on a blog by 23andMe founder, Anne Wojcicki. 6 use cases of Machine Learning in Healthcare Healthcare Analytics: Logistic Regression to Reduce Patient Readmissions NIPS 2017 Special: How machine learning for genomics is bridging the gap between research and clinical trial success by Brendan Frey
Read more
  • 0
  • 0
  • 12960

article-image-apache-druid-hadoop-data-visualizations-tutorial
Sunith Shetty
27 Jul 2018
9 min read
Save for later

Setting up Apache Druid in Hadoop for Data visualizations [Tutorial]

Sunith Shetty
27 Jul 2018
9 min read
Apache Druid is a distributed, high-performance columnar store. Druid allows us to store both real-time and historical data that is time series in nature. It also provides fast data aggregation and flexible data exploration. The architecture supports storing trillions of data points on petabyte sizes. In this tutorial, we will explore Apache Druid components and how it can be used to visualize data in order to build the analytics that drives the business decisions. In this article we will understand how to set up Apache Druid in Hadoop to visualize data. In order to understand more about the Druid architecture, you may refer to this white paper. This article is an excerpt from a book written by Naresh Kumar and Prashant Shindgikar titled Modern Big Data Processing with Hadoop. Apache Druid components Let's take a quick look at the different components of the Druid cluster: ComponentDescriptionDruid BrokerThese are the nodes that are aware of where the data lies in the cluster. These nodes are contacted by the applications/clients to get the data within Druid.Druid CoordinatorThese nodes manage the data (they load, drop, and load-balance it) on the historical nodes.Druid OverlordThis component is responsible for accepting tasks and returning the statuses of the tasks.Druid RouterThese nodes are needed when the data volume is in terabytes or higher range. These nodes route the requests to the brokers.Druid HistoricalThese nodes store immutable segments and are the backbone of the Druid cluster. They serve load segments, drop segments, and serve queries on segments' requests. Other required components The following table presents a couple of other required components: ComponentDescriptionZookeeperApache Zookeeper is a highly reliable distributed coordination serviceMetadata StorageMySQL and PostgreSQL are the popular RDBMSes used to keep track of all segments, supervisors, tasks, and configurations Apache Druid installation Apache Druid can be installed either in standalone mode or as part of a Hadoop cluster. In this section, we will see how to install Druid via Apache Ambari. Add service First, we invoke the Actions drop-down below the list of services in the Hadoop cluster. The screen looks like this: Select Druid and Superset In this setup, we will install both Druid and Superset at the same time. Superset is the visualization application that we will learn about in the next step. The selection screen looks like this: Click on Next when both the services are selected. Service placement on servers In this step, we will be given a choice to select the servers on which the application has to be installed. I have selected node 3 for this purpose. You can select any node you wish. The screen looks something like this: Click on Next when when the changes are done. Choose Slaves and Clients Here, we are given a choice to select the nodes on which we need the Slaves and Clients for the installed components. I have left the options that are already selected for me: Service configurations In this step, we need to select the databases, usernames, and passwords for the metadata store used by the Druid and Superset applications. Feel free to choose the default ones. I have given MySQL as the backend store for both of them. The screen looks like this: Once the changes look good, click on the Next button at the bottom of the screen. Service installation In this step, the applications will be installed automatically and the status will be shown at the end of the plan. Click on Next once the installation is complete. Changes to the current screen look like this: Installation summary Once everything is successfully completed, we are shown a summary of what has been done. Click on Complete when done: Sample data ingestion into Druid Once we have all the Druid-related applications running in our Hadoop cluster, we need a sample dataset that we must load in order to run some analytics tasks. Let's see how to load sample data. Download the Druid archive from the internet: [druid@node-3 ~$ curl -O http://static.druid.io/artifacts/releases/druid-0.12.0-bin.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 222M 100 222M 0 0 1500k 0 0:02:32 0:02:32 --:--:-- 594k Extract the archive: [druid@node-3 ~$ tar -xzf druid-0.12.0-bin.tar.gz Copy the sample Wikipedia data to Hadoop: [druid@node-3 ~]$ cd druid-0.12.0 [druid@node-3 ~/druid-0.12.0]$ hadoop fs -mkdir /user/druid/quickstart [druid@node-3 ~/druid-0.12.0]$ hadoop fs -put quickstart/wikiticker-2015-09-12-sampled.json.gz /user/druid/quickstart/ Submit the import request: [druid@node-3 druid-0.12.0]$ curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/wikiticker-index.json localhost:8090/druid/indexer/v1/task;echo {"task":"index_hadoop_wikiticker_2018-03-16T04:54:38.979Z"} After this step, Druid will automatically import the data into the Druid cluster and the progress can be seen in the overlord console. The interface is accessible via http://<overlord-ip>:8090/console.html. The screen looks like this: Once the ingestion is complete, we will see the status of the job as SUCCESS. In case of FAILED imports, please make sure that the backend that is configured to store the Metadata for the Druid cluster is up and running.Even though Druid works well with the OpenJDK installation, I have faced a problem with a few classes not being available at runtime. In order to overcome this, I have had to use Oracle Java version 1.8 to run all Druid applications. Now we are ready to start using Druid for our visualization tasks. MySQL database with Apache Druid We will use a MySQL database to store the data. Apache Druid allows us to read the data present in an RDBMS system such as MySQL. Sample database The employees database is a standard dataset that has a sample organization and their employee, salary, and department data. We will see how to set it up for our tasks. This section assumes that the MySQL database is already configured and running. Download the sample dataset Download the sample dataset from GitHub with the following command on any server that has access to the MySQL database: [user@master ~]$ sudo yum install git -y [user@master ~]$ git clone https://github.com/datacharmer/test_db Cloning into 'test_db'... remote: Counting objects: 98, done. remote: Total 98 (delta 0), reused 0 (delta 0), pack-reused 98 Unpacking objects: 100% (98/98), done. Copy the data to MySQL In this step, we will import the contents of the data in the files to the MySQL database: [user@master test_db]$ mysql -u root < employees.sql INFO CREATING DATABASE STRUCTURE INFO storage engine: InnoDB INFO LOADING departments INFO LOADING employees INFO LOADING dept_emp INFO LOADING dept_manager INFO LOADING titles INFO LOADING salaries data_load_time_diff NULL Verify integrity of the tables This is an important step, just to make sure that all of the data we have imported is correctly stored in the database. The summary of the integrity check is shown as the verification happens: [user@master test_db]$ mysql -u root -t < test_employees_sha.sql +----------------------+ | INFO | +----------------------+ | TESTING INSTALLATION | +----------------------+ +--------------+------------------+------------------------------------------+ | table_name | expected_records | expected_crc | +--------------+------------------+------------------------------------------+ | employees | 300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 | | departments | 9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 | | dept_manager | 24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c | | dept_emp | 331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b | | titles | 443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e | | salaries | 2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f | +--------------+------------------+------------------------------------------+ +--------------+------------------+------------------------------------------+ | table_name | found_records | found_crc | +--------------+------------------+------------------------------------------+ | employees | 300024 | 4d4aa689914d8fd41db7e45c2168e7dcb9697359 | | departments | 9 | 4b315afa0e35ca6649df897b958345bcb3d2b764 | | dept_manager | 24 | 9687a7d6f93ca8847388a42a6d8d93982a841c6c | | dept_emp | 331603 | d95ab9fe07df0865f592574b3b33b9c741d9fd1b | | titles | 443308 | d12d5f746b88f07e69b9e36675b6067abb01b60e | | salaries | 2844047 | b5a1785c27d75e33a4173aaa22ccf41ebd7d4a9f | +--------------+------------------+------------------------------------------+ +--------------+---------------+-----------+ | table_name | records_match | crc_match | +--------------+---------------+-----------+ | employees | OK | ok | | departments | OK | ok | | dept_manager | OK | ok | | dept_emp | OK | ok | | titles | OK | ok | | salaries | OK | ok | +--------------+---------------+-----------+ +------------------+ | computation_time | +------------------+ | 00:00:11 | +------------------+ +---------+--------+ | summary | result | +---------+--------+ | CRC | OK | | count | OK | +---------+--------+ Now the data is correctly loaded in the MySQL database called employees. Single Normalized Table In data warehouses, its a standard practice to have normalized tables when compared to many small related tables. Lets create a single normalized table that contains details of employees, salaries, departments MariaDB [employees]> create table employee_norm as select e.emp_no, e.birth_date, CONCAT_WS(' ', e.first_name, e.last_name) full_name , e.gender, e.hire_date, s.salary, s.from_date, s.to_date, d.dept_name, t.title from employees e, salaries s, departments d, dept_emp de, titles t where e.emp_no = t.emp_no and e.emp_no = s.emp_no and d.dept_no = de.dept_no and e.emp_no = de.emp_no and s.to_date < de.to_date and s.to_date < t.to_date order by emp_no, s.from_date; Query OK, 3721923 rows affected (1 min 7.14 sec) Records: 3721923 Duplicates: 0 Warnings: 0 MariaDB [employees]> select * from employee_norm limit 1G *************************** 1. row *************************** emp_no: 10001 birth_date: 1953-09-02 full_name: Georgi Facello gender: M hire_date: 1986-06-26 salary: 60117 from_date: 1986-06-26 to_date: 1987-06-26 dept_name: Development title: Senior Engineer 1 row in set (0.00 sec) MariaDB [employees]> Once we have normalized data, we will see how to use the data from this table to generate rich visualisations. To summarize, we walked through Hadoop application such as Apache Druid that is used to visualize data and learned how to use them with RDBMses such as MySQL. We also saw a sample database to help us understand the application better. To know more about how to visualize data using Apache Superset and learn how to use them with data in RDBMSes such as MySQL, do checkout this book Modern Big Data Processing with Hadoop. What makes Hadoop so revolutionary? Top 8 ways to improve your data visualizations What is Seaborn and why should you use it for data visualization?
Read more
  • 0
  • 0
  • 31223

article-image-automl-build-machine-learning-pipeline-tutorial
Sunith Shetty
27 Jul 2018
15 min read
Save for later

Use AutoML for building simple to complex machine learning pipelines [Tutorial]

Sunith Shetty
27 Jul 2018
15 min read
Many moving parts have to be tied together for an ML model to execute and produce results successfully. This process of tying together different pieces of the ML process is known as pipelines. A pipeline is a generalized concept but a very important concept for a Data Scientist. In software engineering, people build pipelines to develop software that is exercised from source code to deployment. Similarly, in ML, a pipeline is created to allow data flow from its raw format to some useful information. It provides a mechanism to construct a multi-ML parallel pipeline system in order to compare the results of several ML methods. In this tutorial, we see how to create our own AutoML pipelines. You will understand how to build pipelines in order to handle the model building process. Each stage of a pipeline is fed processed data from its preceding stage; that is, the output of a processing unit is supplied as an input to its next step. The data flows through the pipeline just as water flows in a pipe. Mastering the pipeline concept is a powerful way to create error-free ML models, and pipelines form a crucial element for building an AutoML system. The code files for this article are available on Github. This article is an excerpt from a book written by Sibanjan Das, Umit Mert Cakmak titled Hands-On Automated Machine Learning. Getting to know machine learning pipelines Usually, an ML algorithm needs clean data to detect some patterns in the data and make predictions over a new dataset. However, in real-world applications, the data is often not ready to be directly fed into an ML algorithm. Similarly, the output from an ML model is just numbers or characters that need to be processed for performing some actions in the real world. To accomplish that, the ML model has to be deployed in a production environment. This entire framework of converting raw data to usable information is performed using a ML pipeline. The following is a high-level illustration of an ML pipeline: We will break down the blocks illustrated in the preceding figure as follows: Data Ingestion: It is the process of obtaining data and importing data for use. Data can be sourced from multiple systems, such as Enterprise Resource Planning (ERP) software, Customer Relationship Management (CRM) software, and web applications. The data extraction can be in the real time or batches. Sometimes, acquiring the data is a tricky part and is one of the most challenging steps as we need to have a good business and data understanding abilities. Data Preparation: There are several methods to preprocess the data to a suitable form for building models. Real-world data is often skewed—there is missing data, which is sometimes noisy. It is, therefore, necessary to preprocess the data to make it clean and transformed, so it's ready to be run through the ML algorithms. ML model training: It involves the use of various ML techniques to understand essential features in the data, make predictions, or derive insights out of it. Often, the ML algorithms are already coded and available as API or programming interfaces. The most important responsibility we need to take is to tune the hyperparameters. The use of hyperparameters and optimizing them to create a best-fitting model are the most critical and complicated parts of the model training phase. Model Evaluation: There are various criteria using which a model can be evaluated. It is a combination of statistical methods and business rules. In an AutoML pipeline, the evaluation is mostly based on various statistical and mathematical measures. If an AutoML system is developed for some specific business domain or use cases, then the business rules can also be embedded into the system to evaluate the correctness of a model. Retraining: The first model that we create for a use case is not often the best model. It is considered as a baseline model, and we try to improve the model's accuracy by training it repetitively. Deployment: The final step is to deploy the model that involves applying and migrating the model to business operations for their use. The deployment stage is highly dependent on the IT infrastructure and software capabilities an organization has. As we see, there are several stages that we will need to perform to get results out of an ML model. The scikit-learn provides us a pipeline functionality that can be used to create several complex pipelines. While building an AutoML system, pipelines are going to be very complex, as many different scenarios have to be captured. However, if we know how to preprocess the data, utilizing an ML algorithm and applying various evaluation metrics, a pipeline is a matter of giving a shape to those pieces. Let's design a very simple pipeline using scikit-learn. Simple ML pipeline We will first import a dataset known as Iris, which is already available in scikit-learn's sample dataset library (http://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html). The dataset consists of four features and has 150 rows. We will be developing the following steps in a pipeline to train our model using the Iris dataset. The problem statement is to predict the species of an Iris data using four different features: In this pipeline, we will use a MinMaxScaler method to scale the input data and logistic regression to predict the species of the Iris. The model will then be evaluated based on the accuracy measure: The first step is to import various libraries from scikit-learn that will provide methods to accomplish our task. The only addition is the Pipeline method from sklearn.pipeline. This will provide us with necessary methods needed to create an ML pipeline: from sklearn.datasets import load_iris from sklearn.preprocessing import MinMaxScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split from sklearn.pipeline import Pipeline The next step is to load the iris data and split it into training and test dataset. In this example, we will use 80% of the dataset to train the model and the remaining 20% to test the accuracy of the model. We can use the shape function to view the dimension of the dataset: # Load and split the data iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) X_train.shape The following result shows the training dataset having 4 columns and 120 rows, which equates to 80% of the Iris dataset and is as expected: Next, we print the dataset to take a glance at the data: print(X_train) The preceding code provides the following output: The next step is to create a pipeline. The pipeline object is in the form of (key, value) pairs. Key is a string that has the name for a particular step, and value is the name of the function or actual method. In the following code snippet, we have named the MinMaxScaler() method as minmax and LogisticRegression(random_state=42) as lr: pipe_lr = Pipeline([('minmax', MinMaxScaler()), ('lr', LogisticRegression(random_state=42))]) Then, we fit the pipeline object—pipe_lr—to the training dataset: pipe_lr.fit(X_train, y_train) When we execute the preceding code, we get the following output, which shows the final structure of the fitted model that was built: The last step is to score the model on the test dataset using the score method: score = pipe_lr.score(X_test, y_test) print('Logistic Regression pipeline test accuracy: %.3f' % score) As we can note from the following results, the accuracy of the model was 0.900, which is 90%: In the preceding example, we created a pipeline, which constituted of two steps, that is, minmax scaling and LogisticRegression. When we executed the fit method on the pipe_lr pipeline, the MinMaxScaler performed a fit and transform method on the input data, and it was passed on to the estimator, which is a logistic regression model. These intermediate steps in a pipeline are known as transformers, and the last step is an estimator. Transformers are used for data preprocessing and has two methods, fit and transform. The fit method is used to find parameters from the training data, and the transform method is used to apply the data preprocessing techniques to the dataset. Estimators are used for creating machine learning model and has two methods, fit and predict. The fit method is used to train a ML model, and the predict method is used to apply the trained model on a test or new dataset. This concept is summarized in the following figure: We have to call only the pipeline's fit method to train a model and call the predict method to create predictions. Rest all functions that is, Fit and Transform are encapsulated in the pipeline's functionality and executed as shown in the preceding figure. Sometimes, we will need to write some custom functions to perform custom transformations. The following section is about function transformer that can assist us in implementing this custom functionality. FunctionTransformer A FunctionTransformer is used to define a user-defined function that consumes the data from the pipeline and returns the result of this function to the next stage of the pipeline. This is used for stateless transformations, such as taking the square or log of numbers, defining custom scaling functions, and so on. In the following example, we will build a pipeline using the CustomLog function and the predefined preprocessing method StandardScaler: We import all the required libraries as we did in our previous examples. The only addition here is the FunctionTransformer method from the sklearn.preprocessing library. This method is used to execute a custom transformer function and stitch it together to other stages in a pipeline: import numpy as np from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn import preprocessing from sklearn.pipeline import make_pipeline from sklearn.preprocessing import FunctionTransformer from sklearn.preprocessing import StandardScaler In the following code snippet, we will define a custom function, which returns the log of a number X: def CustomLog(X): return np.log(X) Next, we will define a data preprocessing function named PreprocData, which accepts the input data (X) and target (Y) of a dataset. For this example, the Y is not necessary, as we are not going to build a supervised model and just demonstrate a data preprocessing pipeline. However, in the real world, we can directly use this function to create a supervised ML model. Here, we use a make_pipeline function to create a pipeline. We used the pipeline function in our earlier example, where we have to define names for the data preprocessing or ML functions. The advantage of using a make_pipeline function is that it generates the names or keys of a function automatically: def PreprocData(X, Y): pipe = make_pipeline( FunctionTransformer(CustomLog),StandardScaler() ) X_train, X_test, Y_train, Y_test = train_test_split(X, Y) pipe.fit(X_train, Y_train) return pipe.transform(X_test), Y_test As we are ready with the pipeline, we can load the Iris dataset. We print the input data X to take a look at the data: iris = load_iris() X, Y = iris.data, iris.target print(X) The preceding code prints the following output: Next, we will call the PreprocData function by passing the iris data. The result returned is a transformed dataset, which has been processed first using our CustomLog function and then using the StandardScaler data preprocessing method: X_transformed, Y_transformed = PreprocData(X, Y) print(X_transformed) The preceding data transformation task yields the following transformed data results: We will now need to build various complex pipelines for an AutoML system. In the following section, we will create a sophisticated pipeline using several data preprocessing steps and ML algorithms. Complex ML pipeline In this section, we will determine the best classifier to predict the species of an Iris flower using its four different features. We will use a combination of four different data preprocessing techniques along with four different ML algorithms for the task. The following is the pipeline design for the job: We will proceed as follows: We start with importing the various libraries and functions that are required for the task: from sklearn.datasets import load_iris from sklearn.preprocessing import StandardScaler from sklearn.decomposition import PCA from sklearn.preprocessing import MinMaxScaler from sklearn.model_selection import train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.ensemble import RandomForestClassifier from sklearn import svm from sklearn import tree from sklearn.pipeline import Pipeline Next, we load the Iris dataset and split it into train and test datasets. The X_train and Y_train dataset will be used for training the different models, and X_test and Y_test will be used for testing the trained model: # Load and split the data iris = load_iris() X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) Next, we will create four different pipelines, one for each model. In the pipeline for the SVM model, pipe_svm, we will first scale the numeric inputs using StandardScaler and then create the principal components using Principal Component Analysis (PCA). Finally, a Support Vector Machine (SVM) model is built using this preprocessed dataset. Similarly, we will construct a pipeline to create the KNN model named pipe_knn. Only StandardScaler is used to preprocess the data before executing the KNeighborsClassifier to create the KNN model. Then, we create a pipeline for building a decision tree model. We use the StandardScaler and MinMaxScaler methods to preprocess the data to be used by the DecisionTreeClassifier method. The last model created using a pipeline is the random forest model, where only the StandardScaler is used to preprocess the data to be used by the RandomForestClassifier method. The following is the code snippet for creating these four different pipelines used to create four different models: # Construct svm pipeline pipe_svm = Pipeline([('ss1', StandardScaler()), ('pca', PCA(n_components=2)), ('svm', svm.SVC(random_state=42))]) # Construct knn pipeline pipe_knn = Pipeline([('ss2', StandardScaler()), ('knn', KNeighborsClassifier(n_neighbors=6, metric='euclidean'))]) # Construct DT pipeline pipe_dt = Pipeline([('ss3', StandardScaler()), ('minmax', MinMaxScaler()), ('dt', tree.DecisionTreeClassifier(random_state=42))]) # Construct Random Forest pipeline num_trees = 100 max_features = 1 pipe_rf = Pipeline([('ss4', StandardScaler()), ('pca', PCA(n_components=2)), ('rf', RandomForestClassifier(n_estimators=num_trees, max_features=max_features))]) Next, we will need to store the name of pipelines in a dictionary, which would be used to display results: pipe_dic = {0: 'K Nearest Neighbours', 1: 'Decision Tree', 2:'Random Forest', 3:'Support Vector Machines'} Then, we will list the four pipelines to execute those pipelines iteratively: pipelines = [pipe_knn, pipe_dt,pipe_rf,pipe_svm] Now, we are ready with the complex structure of the whole pipeline. The only things that remain are to fit the data to the pipeline, evaluate the results, and select the best model. In the following code snippet, we fit each of the four pipelines iteratively to the training dataset: # Fit the pipelines for pipe in pipelines: pipe.fit(X_train, y_train) Once the model fitting is executed successfully, we will examine the accuracy of the four models using the following code snippet: # Compare accuracies for idx, val in enumerate(pipelines): print('%s pipeline test accuracy: %.3f' % (pipe_dic[idx], val.score(X_test, y_test))) We can note from the following results that the k-nearest neighbors and decision tree models lead the pack with a perfect accuracy of 100%. This is too good to believe and might be a result of using a small data set and/or overfitting: We can use any one of the two winning models, k-nearest neighbors (KNN) or decision tree model, for deployment. We can accomplish this using the following code snippet: best_accuracy = 0 best_classifier = 0 best_pipeline = '' for idx, val in enumerate(pipelines): if val.score(X_test, y_test) > best_accuracy: best_accuracy = val.score(X_test, y_test) best_pipeline = val best_classifier = idx print('%s Classifier has the best accuracy of %.2f' % (pipe_dic[best_classifier],best_accuracy)) As the accuracies were similar for k-nearest neighbor and decision tree, KNN was chosen to be the best model, as it was the first model in the pipeline. However, at this stage, we can also use some business rules or access the execution cost to decide the best model: To summarize, we learned about building pipelines for ML systems.  The concepts that we described in this article gave you a foundation for creating pipelines. To have a clearer understanding of the different aspects of Automated Machine Learning, and how to incorporate automation tasks using practical datasets, do checkout the book Hands-On Automated Machine Learning. Read more What is Automated Machine Learning (AutoML)? 5 ways Machine Learning is transforming digital marketing How to improve interpretability of machine learning systems
Read more
  • 0
  • 0
  • 12578

article-image-why-wall-street-unfriended-facebook-stocks-lost-over-120-billion-in-market-value-after-q2-2018-earnings-call
Natasha Mathur
27 Jul 2018
6 min read
Save for later

Why Wall Street unfriended Facebook: Stocks fell $120 billion in market value after Q2 2018 earnings call

Natasha Mathur
27 Jul 2018
6 min read
After been found guilty of providing discriminatory advertisements on its platform earlier this week, Facebook hit yet another wall yesterday as its stock closed falling down by 18.96% on Thursday with shares selling at $176.26. This means that the company lost around $120 billion in market value overnight, making it the largest loss of value ever in a day for a US-traded company since Intel Corp’s two-decade-old crash. Intel had lost a little over $18 billion in one day, 18 years back. Despite the 41.24% revenue growth compared to last year, this was Facebook’s biggest stock market drop ever. Here’s the stock chart from NASDAQ showing the figures:   Facebook’s market capitalization was worth $629.6 on Wednesday. As soon as Facebook’s Earnings calls concluded by the end of market trading on Thursday, it’s worth dropped to $510 billion after the close. Also, as Facebook’s market shares continued to drop down during Thursday’s market, it left its CEO, Mark Zuckerberg with less than $70 billion, wiping out nearly $17 billion of his personal stake, according to Bloomberg. Also, he was demoted from the third to the sixth position on Bloomberg’s Billionaires Index. Active user growth starting to stagnate in mature markets According to David Wehner, CFO at Facebook, “the Daily active users count on Facebook reached 1.47 billion, up 11% compared to last year, led by growth in India, Indonesia, and the Philippines. This number represents approximately 66% of the 2.23 billion monthly active users in Q2”. Facebook’s daily active users He also mentioned that  “MAUs (monthly active users) were up 228M or 11% compared to last year. It is worth noting that MAU and DAU in Europe were both down slightly quarter-over-quarter due to the GDPR rollout, consistent with the outlook we gave on the Q1 call”. Facebook’s Monthly Active users In fact, Facebook has implemented several privacy policy changes in the last few months. This is due to the European Union's General Data Protection Regulation ( GDPR ) as the company's earnings report revealed the effects of the GDPR rules. Revenue Growth Rate is falling too Speaking of revenue expectations, Wehner gave investors a heads up that revenue growth rates will decline in the third and fourth quarters. Wehner states that the company’s “total revenue growth rate decelerated approximately 7 percentage points in Q2 compared to Q1. Our total revenue growth rates will continue to decelerate in the second half of 2018, and we expect our revenue growth rates to decline by high single-digit percentages from prior quarters sequentially in both Q3 and Q4.”  Facebook reiterated further that these numbers won’t get better anytime soon.                                                 Facebook’s Q2 2018 revenue Wehner further spoke explained the reasons for the decline in revenue,“There are several factors contributing to that deceleration..we expect the currency to be a slight headwind in the second half ...we plan to grow and promote certain engaging experiences like Stories that currently have lower levels of monetization. We are also giving people who use our services more choices around data privacy which may have an impact on our revenue growth”. Let’s look at other performance indicators Other financial highlights of Q2 2018 are as follows: Mobile advertising revenue represented 91% of advertising revenue for q2 2018, which is up from approx. 87% of the advertising revenue in Q2 2017. Capital expenditures for Q2 2018 were $3.46 billion which is up from $1.4 billion in Q2 2017. Headcount was 30,275 around June 30, which is an increase of 47% year-over-year. Cash, Cash equivalents, and marketable securities were $42.3 billion at the end of Q2 2018, an increase from $35.45 billion at the end of the Q2 2017. Wehner also mentioned that the company “continue to expect that full-year 2018 total expenses will grow in the range of 50-60% compared to last year. In addition to increases in core product development and infrastructure -- growth is driven by increasing investments -- safety & security, AR/VR, marketing, and content acquisition”. Another reason for the overall loss is that Facebook has been dealing with criticism for quite some time now over its content policies, its issues regarding user’s private data and its changing rules for advertisers. In fact, it is currently investigating data analytics firm Crimson Hexagon over misuse of data. Mark Zuckerberg also said over a conference call with financial analysts that Facebook has been investing heavily in “safety, security, and privacy” and that how they’re “investing - in security that it will start to impact our profitability, we’re starting to see that this quarter - we run this company for the long term, not for the next quarter”. Here’s what the public feels about the recent wipe-out: https://twitter.com/TeaPainUSA/status/1022586648155054081 https://twitter.com/alistairmilne/status/1022550933014753280 So, why did Facebook’s stocks crash? As we can see, Facebook’s performance itself in Q2 2018 has been better than its performance last year for the same quarter as far as revenue goes. Ironically, scandals and lawsuits have had little impact on Facebook’s growth. For example, Facebook recovered from the Cambridge Analytica scandal fully within two months as far share prices are concerned. The Mueller indictment report released earlier this month managed to arrest growth for merely a couple of days before the company bounced back. The discriminatory advertising verdict against Facebook, had no impact on its bullish growth earlier this week. This brings us to conclude that the public sentiments and market reactions against Facebook have very different underlying reasons. The market’s strong reactions are mainly due to concerns over the active user growth slowdown, the lack of monetization opportunities on the more popular Instagram platform, and Facebook’s perceived lack of ability to evolve successfully to new political and regulatory policies such as the GDPR. Wall Street has been indifferent to Facebook’s long list of scandals, in some ways, enabling the company’s ‘move fast and break things’ approach. In his earnings call on Thursday, Zuckerberg hinted that Facebook may not be keen on ‘growth at all costs’ by saying things like “we’re investing so much in security that it will significantly impact our profitability” and then Wehner adding, “Looking beyond 2018, we anticipate that total expense growth will exceed revenue growth in 2019.” And that has got Wall street unfriending Facebook with just a click of the button! Is Facebook planning to spy on you through your mobile’s microphones? Facebook to launch AR ads on its news feed to let you try on products virtually Decoding the reasons behind Alphabet’s record high earnings in Q2 2018  
Read more
  • 0
  • 0
  • 22853

article-image-adding-postgis-layers-using-qgis-tutorial
Pravin Dhandre
26 Jul 2018
5 min read
Save for later

Adding PostGIS layers using QGIS [Tutorial]

Pravin Dhandre
26 Jul 2018
5 min read
Viewing tables as layers is great for creating maps or for simply working on a copy of the database outside the database. In this tutorial, we will establish a connection to our PostGIS database in order to add a table as a layer in QGIS (formerly known as Quantum GIS). Please navigate to the following site to install the latest version LTR of QGIS. On this page, click on Download Now and you will be able to choose a suitable operating system and the relevant settings. QGIS is available for Android, Linux, macOS X, and Windows. You might also be inclined to click on Discover QGIS to get an overview of basic information about the program along with features, screenshots, and case studies. This QGIS tutorial is an excerpt from a book written by Mayra Zurbaran,Pedro Wightman, Paolo Corti, Stephen Mather, Thomas Kraft and Bborie Park, titled PostGIS Cookbook - Second Edition. Loading Database... To begin, create the schema for this tutorial then, download data from the U.S. Census Bureau's FTP site: The shapefile is All Lines for Cuyahoga county in Ohio, which consist of roads and streams among other line features. Extract the ZIP file to your working directory and then load it into your database using shp2pgsql. Be sure to specify the spatial reference system, EPSG/SRID: 4269. When in doubt about using projections, use the service provided by the folks at OpenGeo at the following website: http://prj2epsg.org/search Use the following command to generate the SQL to load the shapefile: shp2pgsql -s 4269 -W LATIN1 -g the_geom -I tl_2012_39035_edges.shp chp11.tl_2012_39035_edges > tl_2012_39035_edges.sql How to do it... Now it's time to give the data we downloaded a look using QGIS. We must first create a connection to the database in order to access the table. Get connected and add the table as a layer by following the ensuing steps: Click on the Add PostGIS Layers icon: Click on the New button below the Connections drop-down menu. Create a new PostGIS connection. After the Add PostGIS Table(s) window opens, create a name for the connection and fill in a few parameters for your database, including Host, Port, Database, Username, and Password: Once you have entered all of the pertinent information for your database, click on the Test Connection button to verify that the connection is successful. If the connection is not successful, double-check for typos and errors. Additionally, make sure you are attempting to connect to a PostGIS-enabled database. If the connection is successful, go ahead and check the Save Username and Save Password checkboxes. This will prevent you from having to enter your login information multiple times throughout the exercise. Click on OK at the bottom of the menu to apply the connection settings. Now you can connect! Make sure the name of your PostGIS connection appears in the drop-down menu and then click on the Connect button. If you choose not to store your username and password, you will be asked to submit this information every time you try to access the database. Once connected, all schemas within the database will be shown and the tables will be made visible by expanding the target schema. Select the table(s) to be added as a layer by simply clicking on the table name or anywhere along its row. Selection(s) will be highlighted in blue. To deselect a table, click on it a second time and it will no longer be highlighted. Select the tl_2012_39035_edges table that was downloaded at the beginning of the tutorial and click on the Add button, as shown in the following screenshot: A subset of the table can also be added as a layer. This is accomplished by double-clicking on the desired table name. The Query Builder window will open, which aids in creating simple SQL WHERE clause statements. Add the roads by selecting the records where roadflg = Y. This can be done by typing a query or using the buttons within Query Builder: Click on the OK button followed by the Add button. A subset of the table is now loaded into QGIS as a layer. The layer is strictly a static, temporary copy of your database. You can make whatever changes you like to the layer and not affect the database table. The same holds true the other way around. Changes to the table in the database will have no effect on the layer in QGIS. If needed, you can save the temporary layer in a variety of formats, such as DXF, GeoJSON, KML, or SHP. Simply right-click on the layer name in the Layers panel and click on Save As. This will then create a file, which you can recall at a later time or share with others. The following screenshot shows the Cuyahoga county road network: You may also use the QGIS Browser Panel to navigate through the now connected PostGIS database and list the schemas and tables. This panel allows you to double-click to add spatial layers to the current project, providing a better user experience not only of connected databases, but on any directory of your machine: How it works... You have added a PostGIS layer into QGIS using the built-in Add PostGIS Table GUI. This was achieved by creating a new connection and entering your database parameters. Any number of database connections can be set up simultaneously. If working with multiple databases is more common for your workflows, saving all of the connections into one XML file and would save time and energy when returning to these projects in QGIS. To explore more on 3D capabilities of PostGIS, including LiDAR point clouds, read PostGIS Cookbook - Second Edition. Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data Top 7 libraries for geospatial analysis Learning R for Geospatial Analysis
Read more
  • 0
  • 0
  • 29598

article-image-quantum-computing-is-poised-to-take-a-quantum-leap-with-industries-and-governments-on-its-side
Natasha Mathur
26 Jul 2018
9 min read
Save for later

Quantum Computing is poised to take a quantum leap with industries and governments on its side

Natasha Mathur
26 Jul 2018
9 min read
“We’re really just, I would say, six months out from having quantum computers that will outperform classical computers in some capacity". -- Joseph Emerson, Quantum Benchmark CEO There are very few people who would say that their life hasn’t been changed by the computers. But, there are certain tasks that even a computer isn’t capable of solving efficiently and that’s where Quantum Computers come into the picture. They are incredibly powerful machines that process information at a subatomic level. Quantum Computing leverages the logic and principles of Quantum Mechanics. Quantum computers operate one million times faster than any other device that you’re currently using. They are a whole different breed of machines that promise to revolutionize computing. How Quantum computers differ from traditional computers? To start with, let’s look at how these computers look like. Unlike your mobile or desktop computing devices, Quantum Computers will never be able to fit in your pocket and you cannot station these at desks. At least, for the next few years. Also, these are fragile computers that look like vacuum cells or tubes with a bunch of lasers shining into them, and the whole apparatus must be kept in temperatures near to absolute zero at all times. IBM Research Secondly, we look at the underlying mechanics of how they each compute. Quantum Computing is different from classic digital computing in the sense that classical computing requires data to be encoded into binary digits (bits), each of which is always present in one of the two definite states (0 or 1). Whereas, in Quantum Computing, data units called Quantum bits or qubits can be present in more than one state at a time, called, “superpositions” of states. Two particles can also exhibit “entanglement,” where changing of one state may instantaneously affect the other. Thirdly, let’s look at what make up these machines i.e., their hardware and software components. A regular computer’s hardware includes components such as a CPU, monitor, and routers for communicating across the computer network. The software includes systems programs and different protocols that run on top of the seven layers namely physical layer, a data-link layer, network layer, transport layer, session layer, presentation layer and the application layer. Now, Quantum computer also consists of hardware and software components. There are Ion traps, semiconductors, vacuum cells, Josephson junction and wire loops on the hardware side. The software and protocol that lies within a Quantum Computer are divided into five different layers namely physical layer, virtual, Quantum Error Correction ( QEC) layer, logical layer, and application layer. Layered Quantum Computer architecture Quantum Computing: A rapidly growing field Quantum Computing is special and is advancing on a large scale. According to a new market research report by marketsandmarkets, the Quantum Computing Market will be worth 495.3 Million USD by 2023. Also, as per ABI research, total revenue generated from quantum computing services will exceed $15 billion by 2028. Also, Matthew Brisse, research VP at Gartner states, “Quantum computing is heavily hyped and evolving at different rates, but it should not be ignored”. Now, there isn’t any certainty from the Quantum Computing world about when it will make the shift from scientific discoveries to applications for the average user, but, there are different areas, like pharmaceuticals, Machine Learning, Security, etc, that are leveraging the potential of Quantum Computing. Apart from these industries, even countries are getting their feet dirty in the field of Quantum Computing, lately, by joining the “Quantum arms race” ( the race for excelling in Quantum Technology). How is the tech industry driving Quantum Computing forward? Both big tech giants like IBM, Google, Microsoft and startups like Rigetti, D-wave and Quantum Benchmark are slowly, but steadily, bringing the Quantum Computing Era a step closer to us. Big Tech’s big bets on Quantum computing IBM established a landmark in the quantum computing world back in November 2017 when it announced its first powerful quantum computer that can handle 50 qubits. The company is also working on making its 20-qubit system available through its cloud computing platform. It is collaborating with startups such as Quantum Benchmark, Zapata Computing, 1Qbit, etc to further accelerate Quantum Computing. Google also announced Bristlecone, the largest 72 qubit quantum computer, earlier this year, which promises to provide a testbed for research into scalability and error rates of qubit technology. Google along with IBM plans to commercialize its quantum computing technologies in the next few years. Microsoft is also working on Quantum Computing with plans to build a topological quantum computer. This is an effort to bring a practical quantum computer quicker for commercial use. It also introduced a language called Q#  ( Q Sharp ), last year, along with tools to help coders develop software for quantum computers "We want to solve today's unsolvable problems and we have an opportunity with a unique, differentiated technology to do that," mentioned Todd HolmDahl, Corporate VP of Microsoft Quantum. Startups leading the quantum computing revolution - Quantum Valley, anyone? Apart from these major tech giants, startups are also stepping up their Quantum Computing game. D-Wave Systems Inc., a Canadian company, was the first one to sell a quantum computer named D-wave one, back in 2011, even though the usefulness of this computer is limited.  These quantum computers are based on adiabatic quantum computation and have been sold to government agencies, aerospace, and cybersecurity vendors. Earlier this year, D-Wave Systems announced that it has completed testing a working prototype of its next-generation processor, as well as the installation of a D-Wave 2000Q system for a customer. Another startup worth mentioning is San Francisco based Rigetti computing, which is racing against Google, Microsoft, IBM, and Intel to build Quantum Computing projects. Rigetti has raised nearly $70 million for development of quantum computers. According to Chad Rigetti, CEO at Rigetti, “This is going to be a very large industry—every major organization in the world will have to have a strategy for how to use this technology”. Lastly, Quantum Benchmark, another Canadian startup, is collaborating with Google to support the development of quantum computing. Quantum Benchmark’s True-Q technology has been integrated with Cirq, Google’s latest open-source quantum computing framework. Looking at the efforts these companies are putting in to boost the growth of Quantum computing, it’s hard to say who will win the quantum race. But one thing is clear, just as Silicon Valley drove product innovation and software adoption during the early days of computing, having many businesses work together and compete with each other, may not be a bad idea for quantum computing. How are countries preparing for the Quantum arms race? Though Quantum Computing hasn’t yet made its way to the consumer market, it’s remarkable progress in research and it’s infinite potential have now caught the attention of nations worldwide. China, the U.S and other great powers, are now joining the Quantum arms race. But, why are these governments so serious about investing in the research and development of Quantum Computing? The answer is simple: Quantum computing will be a huge competitive advantage to the government that finds breakthroughs with their Quantum technology-based innovation as it will be able to cripple militaries, topples the global economy as well as yields spectacular and quick solutions to complex problems. It further has the power to revolutionize everything in today’s world from cellular communications and navigations to sensors and imaging. China continues to leap ahead in the quantum race China has been consistently investing billions of dollars in the field of Quantum Computing. From building its first form of Quantum Computer, back in 2017, to coming out with world’s first unhackable “quantum satellite” in 2016, China is a clearly killing it when it comes to Quantum Computing. Last year, the Chinese government has also funded $10 billion for the construction of the world’s biggest quantum research facility planned to open in 2020. The country plans to develop quantum computers and performs quantum metrology to support the military and national defense efforts. The U.S is pushing hard to win the race Americans are also trying to top up their game and win the Quantum arms race. For instance, the Pentagon is working on applying quantum computing in the future to the U.S. military. They plan to establish highly secure and encrypted communications for satellites along with ensuring accurate navigation that does not need GPS signals. In fact, Congress has proposed $800 million funding to Pentagon’s quantum projects for the next five years. Similarly, the Defense Advanced Research Projects Agency (DARPA) is interested in exploring Quantum Computing to improve the general computing performance as well as artificial intelligence and machine learning systems. Other than that, the U.S. government spends about $200 million per year on quantum research. The UK is stepping up its game The United Kingdom is also not far behind in the Quantum arms race. It has a $400 million program underway for quantum-based sensing and timing. European Union is also planning to invest worth $1 billion spread over 10 years for projects involving scientific research and development of devices for sensing, communication, simulation, and computing. Other great powers such as Canada, Australia, and Israel are also acquainting themselves with the exciting world of Quantum Computing. These governments are confident that whoever makes it first, gets an upper hand for life. The Quantum Computing revolution has begun Quantum Computing has the power to lead revolutionary breakthroughs even though, Quantum computers that can outperform regular computers are not there yet. Also, this is quite a complex technology which a lot of people find hard to understand as the rules of the quantum world different drastically from those of the physical world. While all progress made is constrained to research labs at the moment, it has a lot of potentials, given the intense development, and investments that are going into Quantum Computing from governments and large corporations across the globe. It’s likely that we’ll be able to see the working prototypes of Quantum computers emerge soon. With the Quantum revolution here, the possibilities that lie ahead are limitless. Perhaps we could crack the big bang theory finally!? Or maybe quantum computing powered by AI will just speed up the arrival of singularity. Google AI releases Cirq and Open Fermion-Cirq to boost Quantum computation PyCon US 2018 Highlights: Quantum computing, blockchains, and serverless rule!  
Read more
  • 0
  • 1
  • 32825
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-will-rust-replace-c
Aaron Lazar
26 Jul 2018
6 min read
Save for later

Will Rust Replace C++?

Aaron Lazar
26 Jul 2018
6 min read
This question has been asked several times, showing that developers like yourself want to know whether Rust will replace the good old, painfully difficult to program, C++. Let’s find out, shall we? Going with the trends If I compare both Rust vs C++ on Google Trends, this is what I get. C++ beats Rust to death. Each one of C++’s troughs are like daggers piercing through Rust, pinning it down to the floor! C++ seems to have it’s own ups and downs, but it’s maintaining a pretty steady trend, over the past 5 years. Now if I knock C++ out of the way, this is what I get, That’s a pretty interesting trend there! I’d guess it’s about a 25 degree slope there. Never once has Rust seen a major dip in it’s gradual rise to fame. But what’s making it grow that well? What Developers Love and Why Okay, if you’re in a mood for funsies, try this out at your workplace: Assemble your team members in a room and then tell them there’s a huge project coming up. Tell them that the requirements state that it’s to be developed in Rust. You might find 78.9% of them beaming! Give it a few moments, then say you’re sorry and that you actually meant C++. Watch those smiles go right out the window! ;) You might wonder why I used the very odd percentage, 78.9%. Well, that’s just the percentage of developers who love Rust, as per the 2018 StackOverflow survey. Now this isn’t something that happened overnight, as Rust topped the charts even in 2017, with 73.1% respondents loving the language. You want me to talk about C++ too? Okay, if you insist, where is it? Ahhhhh… there it is!!! C++ coming up at 4th place…. from the bottom! So why this great love for Rust and this not so great love for C++? C++ is a great language, you get awesome performance, you can build super fast applications with its rich function library. You can build a wide variety of applications from GUI apps to 3D graphics, games, desktop apps, as well as hard core computer vision applications. On the other hand, Rust is pretty fast too. It can be used just about anywhere C++ can be used. It has a superb community and most of all, it’s memory safe! Rust’s concurrency capabilities have often been hailed as being superior to C++, and developers all around are eager to get their hands on Rust for this feature! Wondering how I know? I have access to a dashboard that puts a smile on my face, everytime I check the sales of Hands-On Concurrency with Rust! ;) You should get the book too, you know. Coming back to our discussion, Rust’s build and dependency injection tool, Cargo, is a breeze to work with. Why Rust is a winner When compared with C++, the main advantage of using Rust is safety. C++ doesn’t protect its own abstractions, and so, doesn’t allow programmers to protect theirs either. Rust on the other hand, does both. If you make a mistake in C++, your program will technically have no meaning, which can result in arbitrary behavior. Unlike C++, Rust protects you from such dangers, so you can instead concentrate on solving problems. If you’re already a C++ programmer, Rust will allow you to be more effective, while allowing those with little to no low level programming experience, to create things they might not have been capable of doing before. Mozilla was very wise in creating Rust, and the reason behind it was that they wanted web developers to have a practical and efficient language at hand, should they need to write low level code. Kudos to Mozilla! Now back to the question - Will Rust replace C++? Should C++ really worry about Rust replacing it someday? Honestly speaking, I think it has a pretty good shot at replacing C++. Rust is much better in several aspects, like memory safety, concurrency and it lets you think more carefully about memory usage and pointers. Rust will make you a better and more efficient programmer. The transition is already happening in various fields. In game development, for example, AAA game studio, At Dawn Studios is switching entirely to Rust, after close to 3 decades of using C++. That’s a pretty huge step, considering there might be a lot of considerations and workarounds to figure out. But if you look at the conversations on Twitter, the Rust team is delighted at this move and is willing to offer any kind of support if need be. Don’t you just want to give the Rust team a massive bear hug? IoT is another booming field, where Rust is finding rapid adoption. Hardware makers like Tessel provide support for Rust already. In terms of security, Microsoft created an open source repo on github, for an IoT Edge Security Daemon, written entirely in Rust. Rust seems to be doing pretty well in the GUI department too, with tools like Piston. In fact, you might also find Rust being used along with popular GUI framework, Qt. All this shows that Rust is seriously growing in adoption. While I say it might eventually be the next C++, it’s probably going to take years for that to happen. This is mainly because entire ecosystems are built on C++ and they will continue to be. Today there are many dead programming languages whose applications still live on and breed newer generations of developers. (I’m looking at you, COBOL!) In this world of Polyglotism, if that’s even a word, the bigger question we should be asking is how much will we benefit if both C++ and Rust are implemented together. There is definitely a strong case for C++ developers to learn Rust. The question then really is: Do you want to be a programmer working in mature industries and projects or do you want to be a code developer working at the cutting edge of technological progress? I’ll flip the original question and pose it to you: Will you replace C++ with Rust? Perform Advanced Programming with Rust Learn a Framework; forget the language! Firefox 61 builds on Firefox Quantum, adds Tab Warming, WebExtensions, and TLS 1.3  
Read more
  • 0
  • 5
  • 54885

article-image-cryptocurrency-based-firm-tron-acquires-bittorrent
Savia Lobo
26 Jul 2018
3 min read
Save for later

Cryptocurrency-based firm, Tron acquires BitTorrent

Savia Lobo
26 Jul 2018
3 min read
Justin Sun, founder of the decentralized Internet platform, Tron announced the acquisition of BitTorrent, which is a popular file-sharing network. As reported by Techcrunch, the blockchain-based platform is said to have acquired BitTorrent for a sum of about $126 million. TRON foundation is a decentralized platform for sharing entertainment content, including music and games. It uses blockchain and peer-to-peer (p2p) network technology to exclude the need for a middleman between content producers and consumers such as Google and Amazon. BitTorrent, founded in the year 2004, is a popular peer-to-peer file sharing protocol with 100 million users. It also owns the popular, uTorrent client software and torrent client. BitTorrent is known to stream movies and music with great ease and is also fast and reliable. Moreover, it has changed how and why we watch things online. With the BitTorrent acquisition, Justin wants to make Tron the largest decentralized ecosystem in the world. While that’s an exciting prospect for both tech users, users had questions if Tron would charge them via cryptocurrency for the services offered. BitTorrent, in their blog, stated that “it has no plans to change what we do or charge for the services we provide. We have no plans to enable mining of cryptocurrency now or in the future." However, Tron’s plans for BitTorrent are still under the hood. ‘TRON + BitTorrent: The world’s largest decentralized ecosystem’ In an official letter by the Tron foundation, it stated that the firm would continue BitTorrent’s protocol legacy post integrating it within the Tron ecosystem. https://twitter.com/BitTorrent/status/1021629735258841088 The letter also states that, “With the integration of BitTorrent, TRON aims to liberate the Internet from the stranglehold of large corporations, give data rights back to the individual, and reignite the early 21st century vision of a free, transparent, decentralized network to connect the world, because the internet belongs to the people.” Sun in his letter also mentioned BitTorrent as the genesis of the decentralization movement. Tron’s developers, entrepreneurs, and the community consider BitTorrent as the original pioneers of decentralization technology. Sun stated, "We believe that joining the TRON network will further enhance BitTorrent and accelerate our mission of creating an Internet of options, not rules." Due to this acquisition, BitTorrent may lose its primary illegal user base. However, it still continues to demonstrate its legal uses and will further continue to evolve with TRON’s ecosystem. It will also take control of its two popular Torrent applications, BitTorrent and μTorrent clients, which will be free to download, and supported by ads. This merger is a happy turning point for BitTorrent. BitTorrent was in a total mess some years back and had not raised any money since 2008 following which they fired its dual CEOs. Given its commitment to the notion of a decentralized internet, BitTorrent still attempted to function as a business, with its app or service. But these strategies did not work out well. However, TRON’s acquisition has turned the tables for BitTorrent recently. It could be the story of Cinderella meeting Prince Charming of this decade. Read more about BitTorrent’s acquisition on Techcrunch. Top 15 Cryptocurrency Trading Bots Crypto-ML, a machine learning powered cryptocurrency platform Can Cryptocurrency establish a new economic world order?
Read more
  • 0
  • 0
  • 12962

article-image-googles-event-driven-serverless-platform-cloud-function-is-now-generally-available
Vijin Boricha
26 Jul 2018
2 min read
Save for later

Google's event-driven serverless platform, Cloud Function, is now generally available

Vijin Boricha
26 Jul 2018
2 min read
Early this week, Google announced the general availability of its most awaited service Cloud Function at its Google Cloud Next ‘18 conference, San Francisco. Google finally managed to board the serverless bus, that it missed two years ago allowing AWS and Azure to reach their current milestones. This move takes the current cloud platform war to a new level. Google’s Cloud Function now directly competes with Amazon’s Lambda and Microsoft’s Azure Functions. Of late, application development has changed massively, with developers now focusing on application logic instead of infrastructure management, thanks to serverless computing. Developers can now prioritize agility, application quality, and faster deployment with zero server management, auto-scaling traffic management, and integrated security.   Source: Google Cloud website Google’s event driven serverless platform showcases the ability to scale automatically, run codes in response to events, pay while your code runs, and zero server management. Cloud Functions can be used to build: Serverless application backends Developers can quickly build highly available, secure and cost-effective applications as a connective layer of logic is present in Cloud Functions that helps integrate and extend GCP and third-party services. In other words, you can directly call your code from any web, mobile, or backend application or trigger it from GCP services. Real-time data processing Cloud Functions can provide a variety of real-time data processing systems as it responds to events from GCP services such as Stackdriver logging, Cloud Storage, and more. This helps developers to execute their code in response to any change in data. Intelligent applications Developers can leverage Cloud Functions to build intelligent applications with Google Cloud AI integration. One can easily introduce pre-trained machine learning models into the application that can later analyze videos, classify images, convert speech to text, perform NLP (natural language processing) and more. Developers can start making the most of Google Cloud Functions unless they are deploying functions written in Node.js 8 or Python, as these still remain in Beta. In addition to Cloud Functions, Google also announced a preview of serverless containers, a refreshed way of running containers in a fully managed environment. You can read more about this release from Google Cloud release notes. Related Links A serverless online store on AWS could save you money. Build one Learn Azure serverless computing for free – Download a free eBook from Microsoft AWS SAM (AWS Serverless Application Model) is now open source!
Read more
  • 0
  • 0
  • 23187

article-image-whats-new-in-intellij-idea-2018-2
Sugandha Lahoti
26 Jul 2018
4 min read
Save for later

What’s new in IntelliJ IDEA 2018.2

Sugandha Lahoti
26 Jul 2018
4 min read
JetBrains has released the second version of their popular IDE for this year. IntelliJ IDEA 2018.2 is full of changes including support for Java 11, updates to version editor, user interface, JVM debugger, Gradle and more. Let’s have a quick look at all the different features and updates. Updates to Java IntelliJ IDEA 2018.2 brings support for the upcoming Java 11. The IDE now supports local-variable syntax for lambda parameters according to JEP 323. The Dataflow information can now be viewed in the editor. Quick Documentation can now be configured to pop-up together with autocompletion. Extract Method has a new preview panel to check the results of refactoring before actual changes are made. The @Contract annotation adds new return values: new, this, and paramX. The IDE also has updated its inspections and intention actions including smarter Join Line action and Stream API support, among many others. Improvements to Editor IntelliJ IDEA now underlines reassigned local variables and reassigned parameters, by default. While typing, users can use Tab to navigate outside the closing brackets or closing quotes. For or while keywords are highlighted when the caret is placed on the corresponding break or continue keyword. Changes to User Interface IntelliJ IDEA 2018.2 comes with support for the MacBook Touch Bar. Users can now use dark window headers in MacOS There are new cleaner and simpler icons on the IDE toolbar and tool windows for better readability. The IntelliJ theme on Linux has been updated to look more modern. Updates to the Version Control System The updated Files Merged with Conflicts dialog displays Git branch names and adds a new Group files by directory option. Users can now open several Log tabs in the Version Control toolwindow. The IDE now displays the Favorites branches in the Branch filter on the Log tab. While using the Commit and Push action users can either skip the Push dialog completely or only show this dialog when pushing to protected branches. The  IDE also adds support for configuring multiple GitHub accounts. Improvements in the JVM debugger IntelliJ IDEA 2018.2 includes several new breakpoint intentions actions for debugging Java projects. Users now have the ability to filter a breakpoint hit by the caller method. Changes to Gradle The included buildSrc Gradle projects are now discovered automatically. Users can now debug Gradle DSL blocks. Updates to Kotlin The Kotlin plugin bundled with the IDE has been updated to v1.2.51. Users can now run Kotlin Script scratch files and see the results right inside the editor. An intention to convert end-of-line comments into the block comments and vice versa has been added. New coroutine inspections and intentions added. Improvements in the Scala plugin The Scala plugin can show implicits right in the editor and can even show places where implicits are not found. The Scalafmt formatter has been integrated. Added updates to Semantic highlighting and improved auto-completion for pattern matching. JavaScript & TypeScript changes The newExtract React component can be used for refactoring to break a component into two. A new intention to Convert React class components into functional components added. New features can be added to an Angular app using the integration with ng add. New JavaScript and TypeScript intentions: Implement interface, Create derived class, Implement members of an interface or abstract class, Generate cases for ‘switch’, and Iterate with ‘for..of’. A new Code Coverage feature for finding the unused code in client-side apps. These are just a select few updates from the IntelliJ IDEA 2018.2 release. A complete list of all the changes can be found in the release notes. You can also read the JetBrains blog for a concise version. How to set up the Scala Plugin in IntelliJ IDE [Tutorial] Eclipse IDE’s Photon release will support Rust GitLab open sources its Web IDE in GitLab 10.7
Read more
  • 0
  • 0
  • 21118
article-image-powerful-custom-visuals-in-power-bi-tutorial
Pravin Dhandre
25 Jul 2018
17 min read
Save for later

4 powerful custom visuals in Power BI: Why, When, and How to add [Tutorial]

Pravin Dhandre
25 Jul 2018
17 min read
Power BI report authors and BI teams are well-served to remain conscience of both the advantages and limitations of custom visuals. For example, when several measures or dimension columns need to be displayed within the same visual, custom visuals such as the Impact Bubble Chart and the Dot Plot by Maq Software may exclusively address this need. In many other scenarios, a trade-off or compromise must be made between the incremental features provided by a custom visual and the rich controls built into a standard Power BI visual. In this tutorial, we show how to add a custom visual to Power BI and explore 4 powerful custom visuals, and the distinct scenarios and features they support. The Power BI tutorial is taken from Mastering Microsoft Power BI. Learn more - read the book here. Custom visuals available in AppSource and within the integrated custom visuals store for Power BI Desktop are all approved for running in browsers and on mobile devices via the Power BI mobile apps. A subset of these visuals have been certified by Microsoft and support additional Power BI features such as email subscriptions and export to PowerPoint. Additionally, certified custom visuals have met a set of code requirements and have passed strict security tests. The list of certified custom visuals and additional details on the certification process is available here. Adding a custom visual Custom visuals can be added to Power BI reports by either downloading .pbiviz files from Microsoft AppSource or via the integrated Office Store of custom visuals in Power BI Desktop. Utilizing AppSource requires the additional step of downloading the file; however, it can be more difficult to find the appropriate visual as the visuals are not categorized. However, AppSource provides a link to download a sample Power BI report (.pbix file) to learn how the visual is used, such as how it uses field inputs and formatting options. Additionally, AppSource includes a short video tutorial on building report visualizations with the custom visual. The following image reflects Microsoft AppSource filtered by the Power BI visuals Add-ins category: The following link filters AppSource to the Power BI custom visuals per the preceding image: http://bit.ly/2BIZZbZ. The search bar at the top and the vertical scrollbar on the right can be used to browse and identify custom visuals to download. Each custom visual tile in AppSource includes a Get it now link which, if clicked, presents the option to download either the custom visual itself (.pbiviz file) or the sample report for the custom visual (.pbix file). Clicking anywhere else in the tile other than Get it now prompts a window with a detailed overview of the visual, a video tutorial, and customer reviews. To add custom visuals directly to Power BI reports, click the Import from store option via the ellipsis of the Visulaizations pane, as per the following image: If a custom visual (.pbiviz file) has been downloaded from AppSource, the Import from file option can be used to import this custom visual to the report. Additionally, both the Import from store and Import from file options are available as icons on the Home tab of the Report view in Power BI Desktop. Selecting Import from store launches an MS Office Store window of Power BI Custom Visuals. Unlike AppSource, the visuals are assigned to categories such as KPIs, Maps, and Advanced Analytics, making it easy to browse and compare related visuals. More importantly, utilizing the integrated Custom Visuals store avoids the need to manage .pbiviz files and allows report authors to remain focused on report development. As an alternative to the VISUALIZATIONS pane, the From Marketplace and From File icons on the Home tab of the Report view can also be used to add a custom visual. Clicking the From Marketplace icon in the follow image launches the same MS Office Store window of Power BI Custom visuals as selecting Import from store via the VISUALIZATIONS pane: In the following image, the KPIs category of Custom visuals is selected from within the MS Office store: The Add button will directly add the custom visual as a new icon in the Visualizations pane. Selecting the custom visual icon will provide a description of the custom visual and any customer reviews. The Power BI team regularly features new custom visuals in the blog post and video associated with the monthly update to Power BI Desktop. The visual categories, customer reviews, and supporting documentation and sample reports all assist report authors in choosing the appropriate visual and using it correctly. Organizations can also upload custom visuals to the Power BI service via the organization visuals page of the Power BI Admin portal. Once uploaded, these visuals are exposed to report authors in the MY ORGANIZATION tab of the custom visuals MARKETPLACE as per the following example: This feature can help both organizations and report authors simplify their use of custom visuals by defining and exposing a particular set of approved custom visuals. For example, a policy could define that new Power BI reports must only utilize standard and organizational custom visuals. The list of organizational custom visuals could potentially only include a subset of the visuals which have been certified by Microsoft. Alternatively, an approval process could be implemented so that the use case for a custom visual would have to be proven or validated prior to adding this visual to the list of organizational custom visuals. Power KPI visual Key Performance Indicators (KPIs) are often prominently featured in Power BI dashboards and in the top left area of Power BI report pages, given their ability to quickly convey important insights. Unlike card and gauge visuals which only display a single metric or a single metric relative to a target respectively, KPI visuals support trend, variance, and conditional formatting logic. For example, without analyzing any other visuals, a user could be drawn to a red KPI indicator symbol and immediately understand the significance of a variance to a target value as well as the recent performance of the KPI metric. For some users, particularly executives and senior managers, a few KPI visuals may represent their only exposure to an overall Power BI solution, and this experience will largely define their impression of Power BI's capabilities and the Power BI project. Given their power and important use cases, report authors should become familiar with both the standard KPI visual and the most robust custom KPI visuals such as the Power KPI Matrix, the Dual KPI, and the Power KPI. Each of these three visuals have been developed by Microsoft and provide additional options for displaying more data and customizing the formatting and layout. The Power KPI Matrix supports scorecard layouts in which many metrics can be displayed as rows or columns against a set of dimension categories such as Operational and Financial. The Dual KPI, which was featured in the Microsoft Power BI Cookbook (https://www.packtpub.com/big-data-and-business-intelligence/microsoft-power-bi-cookbook), is a good choice for displaying two closely related metrics such as the volume of customer service calls and the average waiting time for customer service calls. One significant limitation of custom KPI visuals is that data alerts cannot be configured on the dashboard tiles reflecting these visuals in the Power BI service. Data alerts are currently exclusive to the standard card, gauge, and KPI visuals. In the following Power KPI visual, Internet Net Sales is compared to Plan, and the prior year Internet Net Sales and Year-over-Year Growth percent metrics are included to support the context: The Internet Net Sales measure is formatted as a solid, green line whereas the Internet Sales Plan and Internet Net Sales (PY) measures are formatted with Dotted and Dot-dashed line styles respectively. To avoid clutter, the Y-Axis has been removed and the Label Density property of the Data labels formatting card has been set to 50 percent. This level of detail (three measures with variances) and formatting makes the Power KPI one of the richest visuals in Power BI. The Power KPI provides many options for report authors to include additional data and to customize the formatting logic and layout. Perhaps its best feature, however, is the Auto Scale property, which is enabled by default under the Layout formatting card. For example, in the following image, the Power KPI visual has been pinned to a Power BI dashboard and resized to the smallest tile size possible: As per the preceding dashboard tile, the less critical data elements such as July through August and the year-over- year % metric were removed. This auto scaling preserved space for the KPI symbol, the axis value (2017-Nov), and the actual value ($296K). With Auto Scale, a large Power KPI custom visual can be used to provide granular details in a report and then re-used in a more compact format as a tile in a Power BI dashboard. Another advantage of the Power KPI is that minimal customization of the data model is required. The following image displays the dimension column and measures of the data model mapped to the field inputs of the aforementioned Power KPI visual: The Sales and Margin Plan data is available at the monthly grain and thus the Calendar Yr-Mo column is used as the Axis input. In other scenarios, a Date column would be used for the Axis input provided that the actual and target measures both support this grain. The order of the measures used in the Values field input is interpreted by the visual as the actual value, the target value, and the secondary value. In this example, Internet Net Sales is the first or top measure in the Values field and thus is used as the actual value (for example, $296K for November). A secondary value as the third measure in the Values input (Internet Net Sales (PY)) is not required if the intent is to only display the actual value versus its target. The KPI Indicator Value and Second KPI Indicator Value fields are also optional. If left blank, the Power KPI visual will automatically calculate these two values as the percentage difference between the actual value and the target value, and the actual value and the secondary value respectively. In this example, these two calculations are already included as measures in the data model and thus applying the Internet Net Sales Var to Plan % and Internet Net Sales (YOY %) measures to these fields further clarifies how the visual is being used. If the metric being used as the actual value is truly a critical measure (for example, revenue or count of customers) to the organization or the primary user, it's almost certainly appropriate that related target and variance measures are built into the Power BI dataset. In many cases, these additional measures will be used independently in their own visuals and reports. Additionally, if a target value is not readily available, such as the preceding example with the Internet Net Sales Plan, BI teams can work with stakeholders on the proper logic to apply to a target measure, for example, 10 percent greater than the previous year. The only customization required is the KPI Indicator Index field. The result of the expression used for this field must correspond to one of five whole numbers (1-5) and thus one of the five available KPI Indicators. In the following example, the KPI Indicators KPI 1 and KPI 2 have been customized to display a green caret up icon and a red caret down icon respectively: Many different KPI Indicator symbols are available including up and down arrows, flags, stars, and exclamation marks. These different symbols can be formatted and then displayed dynamically based on the KPI Indicator Index field expression. In this example, a KPI index measure was created to return the value 1 or 2 based on the positive or negative value of the Internet Net Sales Var to Plan % measure respectively: Internet Net Sales vs Plan Index = IF([Internet Net Sales Var to Plan %] > 0,1,2) Given the positive 4.6 percent variance for November of 2017, the value 1 is returned by the index expression and the green caret up symbol for KPI 1 is displayed. With five available KPI Indicators and their associated symbols, it's possible to embed much more elaborate logic such as five index conditions (for example, poor, below average, average, above average, good) and five corresponding KPI indicators. Four different layouts (Top, Left, Bottom, and Right) are available to display the values relative to the line chart. In the preceding example, the Top layout is chosen as this results in the last value of the Axis input (2017-Nov) to be displayed in the top left corner of the visual. Like the standard line chart visual in Power BI Desktop, the line style (for example, Dotted, Solid, Dashed), color, and thickness can all be customized to help distinguish the different series. Chiclet Slicer The standard slicer visual can display the items of a source column as a list or as a dropdown. Additionally, if presented as a list, the slicer can optionally be displayed horizontally rather than vertically. The custom Chiclet Slicer, developed by Microsoft, allows report authors to take even greater control over the format of slicers to further improve the self-service experience in Power BI reports. In the following example, a Chiclet Slicer has been formatted to display calendar months horizontally as three columns: Additionally, a dark green color is defined as the Selected Color property under the Chiclets formatting card to clearly identify the current selections (May and June). The Padding and Outline Style properties, also available under the Chiclets card, are set to 1 and Square respectively, to obtain a simple and compact layout. Like the slicer controls in Microsoft Excel, Chiclet Slicers also support cross highlighting. To enable cross highlighting, specify a measure which references a fact table as the Values input field to the Chiclet Slicer. For example, with the Internet Net Sales measure set as the Values input of the Chiclet Slicer, a user selection on a bar representing a product in a separate visual would update the Chiclet Slicer to indicate the calendar months without Internet Sales for the given product. The Disabled Color property can be set to control the formatting of these unrelated items. Chiclet Slicers also support images. In the following example, one row is used to display four countries via their national flags: For this visual, the Padding and Outline Style properties under the Chiclets formatting card are set to 2 and Cut respectively. Like the Calendar Month slicer, a dark green color is configured as the Selected Color property helping to identify the country or countries selected—Canada, in this example. The Chiclet Slicer contains three input field wells—Category, Values, and Image. All three input field wells must have a value to display the images. The Category input contains the names of the items to be displayed within the Chiclets. The Image input takes a column with URL links corresponding to images for the given category values. In this example, the Sales Territory Country column is used as the Category input and the Internet Net Sales measure is used as the Values input to support cross highlighting. The Sales Territory URL column, which is set as an Image URL data category, is used as the Image input. For example, the following Sales Territory URL value is associated with the United States: http://www.crwflags.com/fotw/images/u/us.gif. A standard slicer visual can also display images when the data category of the field used is set as Image URL. However, the standard slicer is limited to only one input field and thus cannot also display a text column associated with the image. Additionally, the standard slicer lacks the richer cross-highlighting and formatting controls of the Chiclet Slicer. Impact Bubble Chart One of the limitations with standard Power BI visuals is the number of distinct measures that can be represented graphically. For example, the standard scatter chart visual is limited to three primary measures (X-AXIS, Y-AXIS, and SIZE), and a fourth measure can be used for color saturation. The Impact Bubble Chart custom visual, released in August of 2017, supports five measures by including a left and right bar input for each bubble. In the following visual, the left and right bars of the Impact Bubble Chart are used to visually indicate the distribution of AdWorks Net Sales between Online and Reseller Sales channels: The Impact Bubble Chart supports five input field wells: X-AXIS, Y-AXIS, SIZE, LEFT BAR, and RIGHT BAR. In this example, the following five measures are used for each of these fields respectively: AdWorks Net Sales, AdWorks Net Margin %, AdWorks Net Sales (YTD), Internet Net Sales, and Reseller Net Sales. The length of the left bar indicates that Australia's sales are almost exclusively derived from online sales. Likewise, the length of the right bar illustrates that Canada's sales are almost wholly obtained via Reseller Sales. These graphical insights per item would not be possible for the standard Power BI scatter chart. Specifically, the Internet Net Sales and Reseller Net Sales measures could only be added as Tooltips, thus requiring the user to hover over each individual bubble. In its current release, the Impact Bubble Chart does not support the formatting of data labels, a legend, or the axis titles. Therefore, a supporting text box can be created to advise the user of the additional measures represented. In the top right corner of this visual, a text box is set against the background to associate measures to the two bars and the size of the bubbles. Dot Plot by Maq Software Just as the Impact Bubble Chart supports additional measures, the Dot Plot by Maq Software allows for the visualization of up to four distinct dimension columns. With three Axis fields and a Legend field, a measure can be plotted to a more granular level than any other standard or custom visual currently available to Power BI. Additionally, a rich set of formatting controls are available to customize the Dot Plot's appearance, such as orientation (horizontal or vertical), and whether the Axis categories should be split or stacked. In the following visual, each bubble represents the internet sales for a specific grouping of the following dimension columns: Sales Territory Country, Product Subcategory, Promotion Type, and Customer History Segment: For example, one bubble represents the Internet Sales for the Road Bikes Product Subcategory within the United States Sales Territory Country, which is associated with the volume discount promotion type and the first year Customer History Segment. In this visual, the Customer History Segment column is used as the legend and thus the color of each bubble is automatically formatted to one of the three customer history segments. In the preceding example, the Orientation property is set to Horizontal and the Split labels property under the Axis category formatting card is enabled. The Split labels formatting causes the Sales Territory Country column to be displayed on the opposite axis of the Product Subcategory column. Disabling this property results in the two columns being displayed as a hierarchy on the same axis with the child column (Product Subcategory) positioned inside the parent column (Sales Territory Country). Despite its power in visualizing many dimension columns and its extensive formatting features, data labels are currently not supported. Therefore, when the maximum of four dimension columns are used, such as in the previous example, it's necessary to hover over the individual bubbles to determine which specific grouping the bubble represents, such as in the following example: With this, you can easily extend solutions beyond the capabilities of Power BI's standard visuals and support specific and unique, complex use-cases. If you found this tutorial useful, do check out the book Mastering Microsoft Power BI and develop visually rich, immersive, and interactive Power BI reports and dashboards. Building a Microsoft Power BI Data Model How to build a live interactive visual dashboard in Power BI with Azure Stream How to use M functions within Microsoft Power BI for querying data “Tableau is the most powerful and secure end-to-end analytics platform”: An interview with Joshua Milligan
Read more
  • 0
  • 0
  • 45892

article-image-decoding-the-reasons-behind-alphabets-record-high-earnings-in-q2-2018
Sugandha Lahoti
25 Jul 2018
7 min read
Save for later

Decoding the reasons behind Alphabet’s record high earnings in Q2 2018

Sugandha Lahoti
25 Jul 2018
7 min read
Alphabet, Google’s parent company, saw its stock price rise quickly after it announced its Q2 2018 earning results, shocking analysts (in a good way) all over the world. Shares of Alphabet have jumped more than 5% in after-hours trading Monday, hitting a new record high. Source: NASDAQ It would seem that the EU’s fine of €4.34 billion on Google for breaching EU antitrust laws had little effect on its progress in terms of Q2 earnings. According to Ruth Porat, Google's CFO, Alphabet generated revenue of $32.66 billion during Q2 2018, compared to $26.01 billion during the same quarter last year. Excluding the fine, Alphabet still booked a net income of $3.2 billion, which equals earnings of $4.54 per share. Had the EU decision gone the other way, Alphabet would have had $32.6 billion in revenue and a profit of $8.2 billion. “We want Google to be the source you think of when you run into a problem.” - Sundar Pichai, Google CEO, in the Q2 2018 Earnings Call In Monday afternoon’s earnings call, CEO Sundar Pichai focused on three major domains that have helped Alphabet achieve its Q2 earnings. First, he claimed that machine learning and AI was becoming a crucial unifying component across all of Google's products and offerings helping to cement and consolidate its position in the market. Second, Pichai suggested that investments in computing, video, cloud and advertising platforms have helped push Google into new valuable markets. And third, the company's investment in new businesses and emerging markets was proving to be a real growth driver which should secure Google's future success. Let us look at the various facets of Google’s growth strategy that have proven to be successful this quarter. Investing in AI With the world spinning around the axis of AI, Alphabet is empowering all of its product and service offerings with AI and machine learning. At its annual developer conference earlier this year, Google I/O, Google announced new updates to their products that rely on machine learning. For example, the revamped Google news app uses machine learning to provide relevant news stories for users, and improvements to Google assistant also helped the organization strengthen its position in that particular market. (By the end of 2018, it will be available in more than 30 languages in 80 countries.) This is another smart move by Alphabet in its plan to make information accessible to all while generating more revenue-generating options for themselves and expanding their partnerships to new vendors and enterprise clients. Google Translate also saw a huge bump in volume especially during the World Cup, as fans all over the world traveled to Russia to witness the football gala. Another smart decision was adding updates to Google Maps. This has achieved a 50% year-on-year growth in Indonesia, India, and Nigeria, three very big and expanding markets. Defending its Android ecosystem and business model The first Android Phone arrived in 2008. The project was built on the simple idea of a mobile platform that was free and open to everyone. Today, there are more than 24,000 Android-powered devices from over 1400 phone manufacturers. Google’s decision to build a business model that encourages this open ecosystem to thrive has been a clever strategy. It not only generates significant revenue for the company but it also brings a world of developers and businesses into its ecosystem. It's vendor lock-in with a friendly face. Of course, with the EU watching closely, Google has to be careful to follow regulation. Failure to comply could mean the company would face penalty payments of up to 5% of its average daily worldwide turnover of Alphabet. According to Brian Wieser, an analyst at Pivotal Research Group, however, “There do not appear to be any signs that should cause a meaningful slow down anytime soon, as fines from the EU are not likely to hamper Alphabet’s growth rate. Conversely, regulatory changes such as GDPR in Europe (and similar laws implemented elsewhere) could have the effect of reinforcing Alphabet’s growth.” Forming new partnerships Google has always been very keen to form new partnerships and strategic alliances with a wide variety of companies and startups. It has been very smart in systematically looking for partners that will complement their strengths and bring the end product to the market. Partnering also provides flexibility; instead of developing new solutions and tools in-house, Google can instead bring interesting innovations into the Google ecosystem simply thanks to its financial clout. For example, Google has partnered with many electronic companies to expand the number of devices compatible with Google assistant. Furthermore, its investment in computing platforms and AI has also helped the organization to generate considerable momentum in their Made by Google hardware business across Pixel, Home, Nest, and Chromecast. Interestingly, we also saw an acceleration in business adoption of Chromebooks. Chromebooks are the most cost-efficient and secure way for businesses to enable their employees to work in the cloud. The unit sales of managed Chromebooks in Q2 grew by more than 175% year-on-year. “Advertising on Youtube has always been an incredibly strong and growing source of income for its creators. Now Google is also building new ways for creators to source income such as paid channel memberships, merchandise shelves on Youtube channels, and endorsements opportunities through Famebit.”, said Pichai. Famebit is a startup they acquired in 2016 which uses data analytics to build tools to connect brands with the right creators. This acquisition proved to be quite successful as almost half of the creators that used Famebit in 2018 doubled their revenue in the first 3 months. Google has also made significant strides in developing new shopping and commerce partnerships such as with leading global retailers like Carrefour, designed to give people the power to shop wherever and however they want. Such collaborations are great for Google as it brings their shopping, ads, and cloud products under one hood. The success of Google Cloud’s vertical strategy and customer-centric approach was illustrated by key wins including Domino's Pizza, Soundcloud, and PwC moving to GCP this quarter. Target, the chain of department store retailers in the US, is also migrating key areas of it’s business to GCP. AirAsia has also expanded its relationship with Google for using ML and data analytics. This shows that the cloud business is only going to grow further. Further, Google Cloud Platform catering to clients from across very different industries and domains signals a robust way to expand their cloud empire. Supporting future customers Google is not just thinking about its current customer base but also working on specialized products to support the next wave of people which are coming online for the first time, enabled the rise in accessibility of mobile devices. They have established high-speed public WiFi in 400 train stations in India in collaboration with the Indian railways and proposed the system in Indonesia and Mexico as well. They have also announced Google AI research center in Ghana Africa to spur AI innovation with researchers and engineers from Africa. They have also expanded the Google IT support professional certificate program to more than 25 community colleges in the US. This massive uproar by Alphabet even in the midst of EU antitrust case was the most talked about news among Wall Street analysts. Most of them consider it to be buy-in terms of stocks. For the next quarter, Google wants to continue fueling its growing cloud business “We are investing for the long run.” Pichai said. They also don’t plan to dramatically alter their Android strategy and continue to give the OS for free. Pichai said, “I’m confident that we will find a way to make sure Android is available at scale to users everywhere.” A quick look at E.U.’s antitrust case against Google’s Android Is Google planning to replace Android with Project Fuchsia? Google Cloud Launches Blockchain Toolkit to help developers build apps easily
Read more
  • 0
  • 0
  • 36075

article-image-binary-search-tree-tutorial
Pavan Ramchandani
25 Jul 2018
19 min read
Save for later

Build a C++ Binary search tree [Tutorial]

Pavan Ramchandani
25 Jul 2018
19 min read
A binary tree is a hierarchical data structure whose behavior is similar to a tree, as it contains root and leaves (a node that has no child). The root of a binary tree is the topmost node. Each node can have at most two children, which are referred to as the left child and the right child. A node that has at least one child becomes a parent of its child. A node that has no child is a leaf. In this tutorial, you will be learning about the Binary tree data structures, its principles, and strategies in applying this data structures to various applications. This C++ tutorial has been taken from C++ Data Structures and Algorithms. Read more here.  Take a look at the following binary tree: From the preceding binary tree diagram, we can conclude the following: The root of the tree is the node of element 1 since it's the topmost node The children of element 1 are element 2 and element 3 The parent of elements 2 and 3 is 1 There are four leaves in the tree, and they are element 4, element 5, element 6, and element 7 since they have no child This hierarchical data structure is usually used to store information that forms a hierarchy, such as a file system of a computer. Building a binary search tree ADT A binary search tree (BST) is a sorted binary tree, where we can easily search for any key using the binary search algorithm. To sort the BST, it has to have the following properties: The node's left subtree contains only a key that's smaller than the node's key The node's right subtree contains only a key that's greater than the node's key You cannot duplicate the node's key value By having the preceding properties, we can easily search for a key value as well as find the maximum or minimum key value. Suppose we have the following BST: As we can see in the preceding tree diagram, it has been sorted since all of the keys in the root's left subtree are smaller than the root's key, and all of the keys in the root's right subtree are greater than the root's key. The preceding BST is a balanced BST since it has a balanced left and right subtree. We also can define the preceding BST as a balanced BST since both the left and right subtrees have an equal height (we are going to discuss this further in the upcoming section). However, since we have to put the greater new key in the right subtree and the smaller new key in the left subtree, we might find an unbalanced BST, called a skewed left or a skewed right BST. Please see the following diagram:   The preceding image is a sample of a skewed left BST, since there's no right subtree. Also, we can find a BST that has no left subtree, which is called a skewed right BST, as shown in the following diagram: As we can see in the two skewed BST diagrams, the height of the BST becomes taller since the height equals to N - 1 (where N is the total keys in the BST), which is five. Comparing this with the balanced BST, the root's height is only three. To create a BST in C++, we need to modify our TreeNode class in the preceding binary tree discussion, Building a binary tree ADT. We need to add the Parent properties so that we can track the parent of each node. It will make things easier for us when we traverse the tree. The class should be as follows: class BSTNode { public: int Key; BSTNode * Left; BSTNode * Right; BSTNode * Parent; }; There are several basic operations which BST usually has, and they are as follows: Insert() is used to add a new node to the current BST. If it's the first time we have added a node, the node we inserted will be a root node. PrintTreeInOrder() is used to print all of the keys in the BST, sorted from the smallest key to the greatest key. Search() is used to find a given key in the BST. If the key exists it returns TRUE, otherwise it returns FALSE. FindMin() and FindMax() are used to find the minimum key and the maximum key that exist in the BST. Successor() and Predecessor() are used to find the successor and predecessor of a given key. We are going to discuss these later in the upcoming section. Remove() is used to remove a given key from BST. Now, let's discuss these BST operations further. Inserting a new key into a BST Inserting a key into the BST is actually adding a new node based on the behavior of the BST. Each time we want to insert a key, we have to compare it with the root node (if there's no root beforehand, the inserted key becomes a root) and check whether it's smaller or greater than the root's key. If the given key is greater than the currently selected node's key, then go to the right subtree. Otherwise, go to the left subtree if the given key is smaller than the currently selected node's key. Keep checking this until there's a node with no child so that we can add a new node there. The following is the implementation of the Insert() operation in C++: BSTNode * BST::Insert(BSTNode * node, int key) { // If BST doesn't exist // create a new node as root // or it's reached when // there's no any child node // so we can insert a new node here if(node == NULL) { node = new BSTNode; node->Key = key; node->Left = NULL; node->Right = NULL; node->Parent = NULL; } // If the given key is greater than // node's key then go to right subtree else if(node->Key < key) { node->Right = Insert(node->Right, key); node->Right->Parent = node; } // If the given key is smaller than // node's key then go to left subtree else { node->Left = Insert(node->Left, key); node->Left->Parent = node; } return node; } As we can see in the preceding code, we need to pass the selected node and a new key to the function. However, we will always pass the root node as the selected node when performing the Insert() operation, so we can invoke the preceding code with the following Insert() function: void BST::Insert(int key) { // Invoking Insert() function // and passing root node and given key root = Insert(root, key); } Based on the implementation of the Insert() operation, we can see that the time complexity to insert a new key into the BST is O(h), where h is the height of the BST. However, if we insert a new key into a non-existing BST, the time complexity will be O(1), which is the best case scenario. And, if we insert a new key into a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case scenario. Traversing a BST in order We have successfully created a new BST and can insert a new key into it. Now, we need to implement the PrintTreeInOrder() operation, which will traverse the BST in order from the smallest key to the greatest key. To achieve this, we will go to the leftmost node and then to the rightmost node. The code should be as follows: void BST::PrintTreeInOrder(BSTNode * node) { // Stop printing if no node found if(node == NULL) return; // Get the smallest key first // which is in the left subtree PrintTreeInOrder(node->Left); // Print the key std::cout << node->Key << " "; // Continue to the greatest key // which is in the right subtree PrintTreeInOrder(node->Right); } Since we will always traverse from the root node, we can invoke the preceding code as follows: void BST::PrintTreeInOrder() { // Traverse the BST // from root node // then print all keys PrintTreeInOrder(root); std::cout << std::endl; } The time complexity of the PrintTreeInOrder() function will be O(N), where N is the total number of keys for both the best and the worst cases since it will always traverse to all keys. Finding out whether a key exists in a BST Suppose we have a BST and need to find out if a key exists in the BST. It's quite easy to check whether a given key exists in a BST, since we just need to compare the given key with the current node. If the key is smaller than the current node's key, we go to the left subtree, otherwise we go to the right subtree. We will do this until we find the key or when there are no more nodes to find. The implementation of the Search() operation should be as follows: BSTNode * BST::Search(BSTNode * node, int key) { // The given key is // not found in BST if (node == NULL) return NULL; // The given key is found else if(node->Key == key) return node; // The given is greater than // current node's key else if(node->Key < key) return Search(node->Right, key); // The given is smaller than // current node's key else return Search(node->Left, key); } Since we will always search for a key from the root node, we can create another Search() function as follows: bool BST::Search(int key) { // Invoking Search() operation // and passing root node BSTNode * result = Search(root, key); // If key is found, returns TRUE // otherwise returns FALSE return result == NULL ? false : true; } The time complexity to find out a key in the BST is O(h), where h is the height of the BST. If we find a key which lies in the root node, the time complexity will be O(1), which is the best case. If we search for a key in a skewed tree, the time complexity will be O(N), where N is the total number of keys in the BST, which is the worst case. Retrieving the minimum and maximum key values Finding out the minimum and maximum key values in a BST is also quite simple. To get a minimum key value, we just need to go to the leftmost node and get the key value. On the contrary, we just need to go to the rightmost node and we will find the maximum key value. The following is the implementation of the FindMin() operation to retrieve the minimum key value, and the FindMax() operation to retrieve the maximum key value: int BST::FindMin(BSTNode * node) { if(node == NULL) return -1; else if(node->Left == NULL) return node->Key; else return FindMin(node->Left); } int BST::FindMax(BSTNode * node) { if(node == NULL) return -1; else if(node->Right == NULL) return node->Key; else return FindMax(node->Right); } We return -1 if we cannot find the minimum or maximum value in the tree, since we assume that the tree can only have a positive integer. If we intend to store the negative integer as well, we need to modify the function's implementation, for instance, by returning NULL if no minimum or maximum values are found. As usual, we will always find the minimum and maximum key values from the root node, so we can invoke the preceding operations as follows: int BST::FindMin() { return FindMin(root); } int BST::FindMax() { return FindMax(root); } Similar to the Search() operation, the time complexity of the FindMin() and FindMax() operations is O(h), where h is the height of the BST. However, if we find the maximum key value in a skewed left BST, the time complexity will be O(1), which is the best case, since it doesn't have any right subtree. This also happens if we find the minimum key value in a skewed right BST. The worst case will appear if we try to find the minimum key value in a skewed left BST or try to find the maximum key value in a skewed right BST, since the time complexity will be O(N). Finding out the successor of a key in a BST Other properties that we can find from a BST are the successor and the predecessor. We are going to create two functions named Successor() and Predecessor() in C++. But before we create the code, let's discuss how to find out the successor and the predecessor of a key of a BST. In this section, we are going to learn about the successor first, and then we will discuss the predecessor in the upcoming section. There are three rules to find out the successor of a key of a BST. Suppose we have a key, k, that we have searched for using the previous Search() function. We will also use our preceding BST to find out the successor of a specific key. The successor of k can be found as follows: If k has a right subtree, the successor of k will be the minimum integer in the right subtree of k. From our preceding BST, if k = 31, Successor(31) will give us 53 since it's the minimum integer in the right subtree of 31. Please take a look at the following diagram: If k does not have a right subtree, we have to traverse the ancestors of k until we find the first node, n, which is greater than node k. After we find node n, we will see that node k is the maximum element in the left subtree of n. From our preceding BST, if k = 15, Successor(15) will give us 23 since it's the first greater ancestor compared with 15, which is 23. Please take a look at the following diagram: If k is the maximum integer in the BST, there's no successor of k. From the preceding BST, if we run Successor(88), we will get -1, which means no successor has been found, since 88 is the maximum key of the BST. Based on our preceding discussion about how to find out the successor of a given key in a BST, we can create a Successor() function in C++ with the following implementation: int BST::Successor(BSTNode * node) { // The successor is the minimum key value // of right subtree if (node->Right != NULL) { return FindMin(node->Right); } // If no any right subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node; // If currentNode is not root and // currentNode is its right children // continue moving up while ((parentNode != NULL) && (currentNode == parentNode->Right)) { currentNode = parentNode; parentNode = currentNode->Parent; } // If parentNode is not NULL // then the key of parentNode is // the successor of node return parentNode == NULL ? -1 : parentNode->Key; } } However, since we have to find a given key's node first, we have to run Search() prior to invoking the preceding Successor() function. The complete code for searching for the successor of a given key in a BST is as follows: int BST::Successor(int key) { // Search the key's node first BSTNode * keyNode = Search(root, key); // Return the key. // If the key is not found or // successor is not found, // return -1 return keyNode == NULL ? -1 : Successor(keyNode); } From our preceding Successor() operation, we can say that the average time complexity of running the operation is O(h), where h is the height of the BST. However, if we try to find out the successor of a maximum key in a skewed right BST, the time complexity of the operation is O(N), which is the worst case scenario. Finding out the predecessor of a key in a BST If k has a left subtree, the predecessor of k will be the maximum integer in the left subtree of k. From our preceding BST, if k = 12, Predecessor(12) will be 7 since it's the maximum integer in the left subtree of 12. Please take a look at the following diagram: If k does not have a left subtree, we have to traverse the ancestors of k until we find the first node, n, which is lower than node k. After we find node n, we will see that node n is the minimum element of the traversed elements. From our preceding BST, if k = 29, Predecessor(29) will give us 23 since it's the first lower ancestor compared with 29, which is 23. Please take a look at the following diagram: If k is the minimum integer in the BST, there's no predecessor of k. From the preceding BST, if we run Predecessor(3), we will get -1, which means no predecessor is found since 3 is the minimum key of the BST. Now, we can implement the Predecessor() operation in C++ as follows: int BST::Predecessor(BSTNode * node) { // The predecessor is the maximum key value // of left subtree if (node->Left != NULL) { return FindMax(node->Left); } // If no any left subtree else { BSTNode * parentNode = node->Parent; BSTNode * currentNode = node; // If currentNode is not root and // currentNode is its left children // continue moving up while ((parentNode != NULL) && (currentNode == parentNode->Left)) { currentNode = parentNode; parentNode = currentNode->Parent; } // If parentNode is not NULL // then the key of parentNode is // the predecessor of node return parentNode == NULL ? -1 : parentNode->Key; } } And, similar to the Successor() operation, we have to search for the node of a given key prior to invoking the preceding Predecessor() function. The complete code for searching for the predecessor of a given key in a BST is as follows: int BST::Predecessor(int key) { // Search the key's node first BSTNode * keyNode = Search(root, key); // Return the key. // If the key is not found or // predecessor is not found, // return -1 return keyNode == NULL ? -1 : Predecessor(keyNode); } Similar to our preceding Successor() operation, the time complexity of running the Predecessor() operation is O(h), where h is the height of the BST. However, if we try to find out the predecessor of a minimum key in a skewed left BST, the time complexity of the operation is O(N), which is the worst case scenario. Removing a node based on a given key The last operation in the BST that we are going to discuss is removing a node based on a given key. We will create a Remove() operation in C++. There are three possible cases for removing a node from a BST, and they are as follows: Removing a leaf (a node that doesn't have any child). In this case, we just need to remove the node. From our preceding BST, we can remove keys 7, 15, 29, and 53 since they are leaves with no nodes. Removing a node that has only one child (either a left or right child). In this case, we have to connect the child to the parent of the node. After that, we can remove the target node safely. As an example, if we want to remove node 3, we have to point the Parent pointer of node 7 to node 12 and make the left node of 12 points to 7. Then, we can safely remove node 3. Removing a node that has two children (left and right children). In this case, we have to find out the successor (or predecessor) of the node's key. After that, we can replace the target node with the successor (or predecessor) node. Suppose we want to remove node 31, and that we want 53 as its successor. Then, we can remove node 31 and replace it with node 53. Now, node 53 will have two children, node 29 in the left and node 88 in the right. Also, similar to the Search() operation, if the target node doesn't exist, we just need to return NULL. The implementation of the Remove() operation in C++ is as follows: BSTNode * BST::Remove( BSTNode * node, int key) { // The given node is // not found in BST if (node == NULL) return NULL; // Target node is found if (node->Key == key) { // If the node is a leaf node // The node can be safely removed if (node->Left == NULL && node->Right == NULL) node = NULL; // The node have only one child at right else if (node->Left == NULL && node->Right != NULL) { // The only child will be connected to // the parent's of node directly node->Right->Parent = node->Parent; // Bypass node node = node->Right; } // The node have only one child at left else if (node->Left != NULL && node->Right == NULL) { // The only child will be connected to // the parent's of node directly node->Left->Parent = node->Parent; // Bypass node node = node->Left; } // The node have two children (left and right) else { // Find successor or predecessor to avoid quarrel int successorKey = Successor(key); // Replace node's key with successor's key node->Key = successorKey; // Delete the old successor's key node->Right = Remove(node->Right, successorKey); } } // Target node's key is smaller than // the given key then search to right else if (node->Key < key) node->Right = Remove(node->Right, key); // Target node's key is greater than // the given key then search to left else node->Left = Remove(node->Left, key); // Return the updated BST return node; } Since we will always remove a node starting from the root node, we can simplify the preceding Remove() operation by creating the following one: void BST::Remove(int key) { root = Remove(root, key); } As shown in the preceding Remove() code, the time complexity of the operation is O(1) for both case 1 (the node that has no child) and case 2 (the node that has only one child). For case 3 (the node that has two children), the time complexity will be O(h), where h is the height of the BST, since we have to find the successor or predecessor of the node's key. If you found this tutorial useful, do check out the book C++ Data Structures and Algorithms for more useful material on data structure and algorithms with real-world implementation in C++. Working with shaders in C++ to create 3D games Getting Inside a C++ Multithreaded Application Understanding the Dependencies of a C++ Application
Read more
  • 0
  • 3
  • 189462
article-image-top-5-deep-learning-architectures
Amey Varangaonkar
24 Jul 2018
9 min read
Save for later

Top 5 Deep Learning Architectures

Amey Varangaonkar
24 Jul 2018
9 min read
If you are a deep learning practitioner or someone who wants to get into the world of deep learning, you might be well acquainted with neural networks already. Neural networks, inspired by biological neural networks, are pretty useful when it comes to solving complex, multi-layered computational problems. Deep learning has stood out pretty well in several high-profile research fields - including facial and speech recognition, natural language processing, machine translation, and more. In this article, we look at the top 5 popular and widely-used deep learning architectures you should know in order to advance your knowledge or deep learning research. Convolutional Neural Networks Convolutional Neural Networks, or CNNs in short, are the popular choice of neural networks for different Computer Vision tasks such as image recognition. The name ‘convolution’ is derived from a mathematical operation involving the convolution of different functions. There are 4 primary steps or stages in designing a CNN: Convolution: The input signal is received at this stage Subsampling: Inputs received from the convolution layer are smoothened to reduce the sensitivity of the filters to noise or any other variation Activation: This layer controls how the signal flows from one layer to the other, similar to the neurons in our brain Fully connected: In this stage, all the layers of the network are connected with every neuron from a preceding layer to the neurons from the subsequent layer Here is an in-depth look at the CNN Architecture and its working, as explained by the popular AI Researcher Giancarlo Zaccone. A sample CNN in action Advantages of CNN Very good for visual recognition Once a segment within a particular sector of an image is learned, the CNN can recognize that segment present anywhere else in the image Disadvantages of CNN CNN is highly dependent on the size and quality of the training data Highly susceptible to noise Recurrent Neural Networks Recurrent Neural Networks (RNNs) have been very popular in areas where the sequence in which the information is presented is crucial. As a result, they find a lot applications in real-world domains such as natural language processing, speech synthesis and machine translation. RNNs are called ‘recurrent’ mainly because a uniform task is performed for every single element of a sequence, with the output dependant on the previous computations as well. Think of these networks as having a memory, where every calculated information is captured, stored and utilized to calculate the final outcome. Over the years, quite a few varieties of RNNs have been researched and developed: Bidirectional RNN - The output in this type of RNN depends not only on the past but also the future outcomes Deep RNN - In this type of RNN, there are multiple layers present per step, allowing for a greater rate of learning and more accuracy RNNs can be used to build industry-standard chatbots that can be used to interact with customers on websites. Given a sequence of signals from an audio wave, RNNs can also be used to predict a correct sequence of phonetic segments with a given probability. Advantages of RNN Unlike a traditional neural network, an RNN shares the same parameters across all steps. This greatly reduces the number of parameters that we need to learn RNNs can be used along with CNNs to generate accurate descriptions for unlabeled images. Disadvantages of RNN RNNs find it difficult to track long-term dependencies. This is especially true in case of long sentences and paragraphs having too many words in between the noun and the verb. RNNs cannot be stacked into very deep models. This is due to the activation function used in RNN models, making the gradient decay over multiple layers. Autoencoders Autoencoders apply the principle of backpropagation in an unsupervised environment. Autoencoders, interestingly, have a close resemblance to PCA (Principal Component Analysis) except that they are more flexible. Some of the popular applications of Autoencoders is anomaly detection - for example detecting fraud in financial transactions in banks. Basically, the core task of autoencoders is to identify and determine what constitutes regular, normal data and then identify the outliers or anomalies. Autoencoders usually represent data through multiple hidden layers such that the output signal is as close to the input signal. There are 4 major types of autoencoders being used today: Vanilla autoencoder - the simplest form of autoencoders there is, i.e. a neural net with one hidden layer Multilayer autoencoder - when one hidden layer is not enough, an autoencoder can be extended to include more hidden layers Convolutional autoencoder - In this type, convolutions are used in the autoencoders instead of fully-connected layers Regularized autoencoder - this type of autoencoders use a special loss function that enables the model to have properties beyond the basic ability to copy a given input to the output. This article demonstrates training an autoencoder using H20, a popular machine learning and AI platform. A basic representation of Autoencoder Advantages of Autoencoders Autoencoders give a resultant model which is primarily based on the data rather than predefined filters Very less complexity means it’s easier to train them Disadvantages of Autoencoders Training time can be very high sometimes If the training data is not representative of the testing data, then the information that comes out of the model can be obscured and unclear Some autoencoders, especially of the variational type, cause a deterministic bias being introduced in the model Generative Adversarial Networks The basic premise of Generative Adversarial Networks (GANs) is the training of two deep learning models simultaneously. These deep learning networks basically compete with each other - one model that tries to generate new instances or examples is called as the generator. The other model that tries to classify if a particular instance originates from the training data or from the generator is called as the discriminator. GANs, a breakthrough recently in the field of deep learning,  was a concept put forth by the popular deep learning expert Ian Goodfellow in 2014. It finds large and important applications in Computer Vision, especially image generation. Read more about the structure and the functionality of the GAN from the official paper submitted by Ian Goodfellow. General architecture of GAN (Source: deeplearning4j) Advantages of GAN Per Goodfellow, GANs allow for efficient training of classifiers in a semi-supervised manner Because of the improved accuracy of the model, the generated data is almost indistinguishable from the original data GANs do not introduce any deterministic bias unlike variational autoencoders Disadvantages of GAN Generator and discriminator working efficiently is crucial to the success of GAN. The whole system fails even if one of them fails Both the generator and discriminator are separate systems and trained with different loss functions. Hence the time required to train the entire system can get quite high. Interested to know more about GANs? Here’s what you need to know about them. ResNets Ever since they gained popularity in 2015, ResNets or Deep Residual Networks have been widely adopted and used by many data scientists and AI researchers. As you already know, CNNs are highly useful when it comes to solving image classification and visual recognition problems. As these tasks become more complex, training of the neural network starts to get a lot more difficult, as additional deep layers are required to compute and enhance the accuracy of the model. Residual learning is a concept designed to tackle this very problem, and the resultant architecture is popularly known as a ResNet. A ResNet consists of a number of residual modules - where each module represents a layer. Each layer consists of a set of functions to be performed on the input. The depth of a ResNet can vary greatly - the one developed by Microsoft researchers for an image classification problem had 152 layers! A basic building block of ResNet (Source: Quora) Advantages of ResNets ResNets are more accurate and require less weights than LSTMs and RNNs in some cases They are highly modular. Hundreds and thousands of residual layers can be added to create a network and then trained. ResNets can be designed to determine how deep a particular network needs to be. Disadvantages of ResNets If the layers in a ResNet are too deep, errors can be hard to detect and cannot be propagated back quickly and correctly. At the same time, if the layers are too narrow, the learning might not be very efficient. Apart from the ones above, a few more deep learning models are being increasingly adopted and preferred by data scientists. These definitely deserve a honorable mention: LSTM: LSTMs are a special kind of Recurrent Neural Networks that include a special memory cell that can hold information for long periods of time. A set of gates is used to determine when a particular information enters the memory and when it is forgotten. SqueezeNet: One of the newer but very powerful deep learning architectures which is extremely efficient for low bandwidth platforms such as mobile. CapsNet: CapsNet, or Capsule Networks, is a recent breakthrough in the field of Deep Learning and neural network modeling. Mainly used for accurate image recognition tasks, and is an advanced variation of the CNNs. SegNet: A popular deep learning architecture especially used to solve the image segmentation problem. Seq2Seq: An upcoming deep learning architecture being increasingly used for machine translation and building efficient chatbots So there you have it! Thanks to the intense efforts in research in deep learning and AI, we now have a variety of deep learning models at our disposal to solve a variety of problems - both functional and computational. What’s even better is that we have the liberty to choose the most appropriate deep learning architecture based on the problem at hand. [box type="shadow" align="" class="" width=""]Editor’s Tip: It is very important to know the best deep learning frameworks you can use to train your models. Here are the top 10 deep learning frameworks for you.[/box] In contrast to the traditional programming approach where we tell the computer what to do, the deep learning models figure out the problem and devise the most appropriate solution on their own - however complex the problem may be. No wonder these deep learning architectures are being researched on and deployed on a large scale by the major market players such as Google, Facebook, Microsoft and many others. Packt Explains… Deep Learning in 90 seconds Behind the scenes: Deep learning evolution and core concepts Facelifting NLP with Deep Learning  
Read more
  • 0
  • 0
  • 53414

article-image-implementing-c-libraries-in-delphi-for-hpc-tutorial
Pavan Ramchandani
24 Jul 2018
16 min read
Save for later

Implementing C++ libraries in Delphi for HPC [Tutorial]

Pavan Ramchandani
24 Jul 2018
16 min read
Using C object files in Delphi is hard but possible. Linking to C++ object files is, however, nearly impossible. The problem does not lie within the object files themselves but in C++. While C is hardly more than an assembler with improved syntax, C++ represents a sophisticated high-level language with runtime support for strings, objects, exceptions, and more. All these features are part of almost any C++ program and are as such compiled into (almost) any object file produced by C++. In this tutorial, we will leverage various C++ libraries that enable high-performance with Delphi. It starts with memory management, which is an important program for any high performance applications. The article is an excerpt from a book written by Primož Gabrijelčič, titled Delphi High Performance. The problem here is that Delphi has no idea how to deal with any of that. C++ object is not equal to a Delphi object. Delphi has no idea how to call functions of a C++ object, how to deal with its inheritance chain, how to create and destroy such objects, and so on. The same holds for strings, exceptions, streams, and other C++ concepts. If you can compile the C++ source with C++Builder then you can create a package (.bpl) that can be used from a Delphi program. Most of the time, however, you will not be dealing with a source project. Instead, you'll want to use a commercial library that only gives you a bunch of C++ header files (.h) and one or more static libraries (.lib). Most of the time, the only Windows version of that library will be compiled with Microsoft's Visual Studio. A more general approach to this problem is to introduce a proxy DLL created in C++. You will have to create it in the same development environment as was used to create the library you are trying to link into the project. On Windows, that will in most cases be Visual Studio. That will enable us to include the library without any problems. To allow Delphi to use this DLL (and as such use the library), the DLL should expose a simple interface in the Windows API style. Instead of exposing C++ objects, the API must expose methods implemented by the objects as normal (non-object) functions and procedures. As the objects cannot cross the API boundary we must find some other way to represent them on the Delphi side. Instead of showing how to write a DLL wrapper for an existing (and probably quite complicated) C++ library, I have decided to write a very simple C++ library that exposes a single class, implementing only two methods. As compiling this library requires Microsoft's Visual Studio, which not all of you have installed, I have also included the compiled version (DllLib1.dll) in the code archive. The Visual Studio solution is stored in the StaticLib1 folder and contains two projects. StaticLib1 is the project used to create the library while the Dll1 project implements the proxy DLL. The static library implements the CppClass class, which is defined in the header file, CppClass.h. Whenever you are dealing with a C++ library, the distribution will also contain one or more header files. They are needed if you want to use a library in a C++ project—such as in the proxy DLL Dll1. The header file for the demo library StaticLib1 is shown in the following. We can see that the code implements a single CppClass class, which implements a constructor (CppClass()), destructor (~CppClass()), a method accepting an integer parameter (void setData(int)), and a function returning an integer (int getSquare()). The class also contains one integer private field, data: #pragma once class CppClass { int data; public: CppClass(); ~CppClass(); void setData(int); int getSquare(); }; The implementation of the CppClass class is stored in the CppClass.cpp file. You don't need this file when implementing the proxy DLL. When we are using a C++ library, we are strictly coding to the interface—and the interface is stored in the header file. In our case, we have the full source so we can look inside the implementation too. The constructor and destructor don't do anything and so I'm not showing them here. The other two methods are as follows. The setData method stores its parameter in the internal field and the getSquare function returns the squared value of the internal field: void CppClass::setData(int value) { data = value; } int CppClass::getSquare() { return data * data; } This code doesn't contain anything that we couldn't write in 60 seconds in Delphi. It does, however, serve as a perfect simple example for writing a proxy DLL. Creating such a DLL in Visual Studio is easy. You just have to select File | New | Project, and select the Dynamic-Link Library (DLL) project type from the Visual C++ | Windows Desktop branch. The Dll1 project from the code archive has only two source files. The file, dllmain.cpp was created automatically by Visual Studio and contains the standard DllMain method. You can change this file if you have to run project-specific code when a program and/or a thread attaches to, or detaches from, the DLL. In my example, this file was left just as the Visual Studio created it. The second file, StaticLibWrapper.cpp fully implements the proxy DLL. It starts with two include lines (shown in the following) which bring in the required RTL header stdafx.h and the header definition for our C++ class, CppClass.h: #include "stdafx.h" #include "CppClass.h" The proxy has to be able to find our header file. There are two ways to do that. We could simply copy it to the folder containing the source files for the DLL project, or we can add it to the project's search path. The second approach can be configured in Project | Properties | Configuration Properties | C/C++ | General | Additional Include Directories. This is also the approach used by the demonstration program. The DLL project must be able to find the static library that implements the CppClass object. The path to the library file should be set in project options, in the Configuration Properties | Linker | General | Additional Library Directories settings. You should put the name of the library (StaticLib1.lib) in the Linker | Input | Additional Dependencies settings. The next line in the source file defines a macro called EXPORT, which will be used later in the program to mark a function as exported. We have to do that for every DLL function that we want to use from the Delphi code. Later, we'll see how this macro is used: #define EXPORT comment(linker, "/EXPORT:" __FUNCTION__ "=" __FUNCDNAME__) The next part of the StaticLibWrapper.cpp file implements an IndexAllocator class, which is used internally to cache C++ objects. It associates C++ objects with simple integer identifiers, which are then used outside the DLL to represent the object. I will not show this class in the book as the implementation is not that important. You only have to know how to use it. This class is implemented as a simple static array of pointers and contains at most MAXOBJECTS objects. The constant MAXOBJECTS is set to 100 in the current code, which limits the number of C++ objects created by the Delphi code to 100. Feel free to modify the code if you need to create more objects. The following code fragment shows three public functions implemented by the IndexAllocator class. The Allocate function takes a pointer obj, stores it in the cache, and returns its index in the deviceIndex parameter. The result of the function is FALSE if the cache is full and TRUE otherwise. The Release function accepts an index (which was previously returned from Allocate) and marks the cache slot at that index as empty. This function returns FALSE if the index is invalid (does not represent a value returned from Allocate) or if the cache slot for that index is already empty. The last function, Get, also accepts an index and returns the pointer associated with that index. It returns NULL if the index is invalid or if the cache slot for that index is empty: bool Allocate(int& deviceIndex, void* obj) bool Release(int deviceIndex) void* Get(int deviceIndex) Let's move now to functions that are exported from the DLL. The first two—Initialize and Finalize—are used to initialize internal structures, namely the GAllocator of type IndexAllocator and to clean up before the DLL is unloaded. Instead of looking into them, I'd rather show you the more interesting stuff, namely functions that deal with CppClass. The CreateCppClass function creates an instance of CppClass, stores it in the cache, and returns its index. The important three parts of the declaration are: extern "C", WINAPI, and #pragma EXPORT. extern "C" is there to guarantee that CreateCppClass name will not be changed when it is stored in the library. The C++ compiler tends to mangle (change) function names to support method overloading (the same thing happens in Delphi) and this declaration prevents that. WINAPI changes the calling convention from cdecl, which is standard for C programs, to stdcall, which is commonly used in DLLs. Later, we'll see that we also have to specify the correct calling convention on the Delphi side. The last important part, #pragma EXPORT, uses the previously defined EXPORT macro to mark this function as exported. The CreateCppClass returns 0 if the operation was successful and -1 if it failed. The same approach is used in all functions exported from the demo DLL: extern "C" int WINAPI CreateCppClass (int& index) { #pragma EXPORT CppClass* instance = new CppClass; if (!GAllocator->Allocate(index, (void*)instance)) { delete instance; return -1; } else return 0; } Similarly, the DestroyCppClass function (not shown here) accepts an index parameter, fetches the object from the cache, and destroys it. The DLL also exports two functions that allow the DLL user to operate on an object. The first one, CppClass_setValue, accepts an index of the object and a value. It fetches the CppClass instance from the cache (given the index) and calls its setData method, passing it the value: extern "C" int WINAPI CppClass_setValue(int index, int value) { #pragma EXPORT CppClass* instance = (CppClass*)GAllocator->Get(index); if (instance == NULL) return -1; else { instance->setData(value); return 0; } } The second function, CppClass_getSquare also accepts an object index and uses it to access the CppClass object. After that, it calls the object's getSquare function and stores the result in the output parameter, value: extern "C" int WINAPI CppClass_getSquare(int index, int& value) { #pragma EXPORT CppClass* instance = (CppClass*)GAllocator->Get(index); if (instance == NULL) return -1; else { value = instance->getSquare(); return 0; } } A proxy DLL that uses a mapping table is a bit complicated and requires some work. We could also approach the problem in a much simpler manner—by treating an address of an object as its external identifier. In other words, the CreateCppClass function would create an object and then return its address as an untyped pointer type. A CppClass_getSquare, for example, would accept this pointer, cast it to a CppClass instance, and execute an operation on it. An alternative version of these two methods is shown in the following: extern "C" int WINAPI CreateCppClass2(void*& ptr) { #pragma EXPORT ptr = new CppClass; return 0; } extern "C" int WINAPI CppClass_getSquare2(void* index, int& value) { #pragma EXPORT value = ((CppClass*)index)->getSquare(); return 0; } This approach is simpler but offers far less security in the form of error checking. The table-based approach can check whether the index represents a valid value, while the latter version cannot know if the pointer parameter is valid or not. If we make a mistake on the Delphi side and pass in an invalid pointer, the code would treat it as an instance of a class, do some operations on it, possibly corrupt some memory, and maybe crash. Finding the source of such errors is very hard. That's why I prefer to write more verbose code that implements some safety checks on the code that returns pointers. Using a proxy DLL in Delphi To use any DLL from a Delphi program, we must firstly import functions from the DLL. There are different ways to do this—we could use static linking, dynamic linking, and static linking with delayed loading. There's plenty of information on the internet about the art of DLL writing in Delphi so I won't dig into this topic. I'll just stick with the most modern approach—delay loading. The code archive for this book includes two demo programs, which demonstrate how to use the DllLib1.dll library. The simpler one, CppClassImportDemo uses the DLL functions directly, while CppClassWrapperDemo wraps them in an easy-to-use class. Both projects use the CppClassImport unit to import the DLL functions into the Delphi program. The following code fragment shows the interface part of that unit which tells the Delphi compiler which functions from the DLL should be imported, and what parameters they have. As with the C++ part, there are three important parts to each declaration. Firstly, the stdcall specifies that the function call should use the stdcall (or what is known in C as  WINAPI) calling convention. Secondly, the name after the name specifier should match the exported function name from the C++ source. And thirdly, the delayed keyword specifies that the program should not try to find this function in the DLL when it is started but only when the code calls the function. This allows us to check whether the DLL is present at all before we call any of the functions: const CPP_CLASS_LIB = 'DllLib1.dll'; function Initialize: integer; stdcall; external CPP_CLASS_LIB name 'Initialize' delayed; function Finalize: integer; stdcall; external CPP_CLASS_LIB name 'Finalize' delayed; function CreateCppClass(var index: integer): integer; stdcall; external CPP_CLASS_LIB name 'CreateCppClass' delayed; function DestroyCppClass(index: integer): integer; stdcall; external CPP_CLASS_LIB name 'DestroyCppClass' delayed; function CppClass_setValue(index: integer; value: integer): integer; stdcall; external CPP_CLASS_LIB name 'CppClass_setValue' delayed; function CppClass_getSquare(index: integer; var value: integer): integer; stdcall; external CPP_CLASS_LIB name 'CppClass_getSquare' delayed; The implementation part of this unit (not shown here) shows how to catch errors that occur during delayed loading—that is, when the code that calls any of the imported functions tries to find that function in the DLL. If you get an External exception C06D007F  exception when you try to call a delay-loaded function, you have probably mistyped a name—either in C++ or in Delphi. You can use the tdump utility that comes with Delphi to check which names are exported from the DLL. The syntax is tdump -d <dll_name.dll>. If the code crashes when you call a DLL function, check whether both sides correctly define the calling convention. Also check if all the parameters have correct types on both sides and if the var parameters are marked as such on both sides. To use the DLL, the code in the CppClassMain unit firstly calls the exported Initialize function from the form's OnCreate handler to initialize the DLL. The cleanup function, Finalize is called from the OnDestroy handler to clean up the DLL. All parts of the code check whether the DLL functions return the OK status (value 0): procedure TfrmCppClassDemo.FormCreate(Sender: TObject); begin if Initialize <> 0 then ListBox1.Items.Add('Initialize failed') end; procedure TfrmCppClassDemo.FormDestroy(Sender: TObject); begin if Finalize <> 0 then ListBox1.Items.Add('Finalize failed'); end; When you click on the Use import library button, the following code executes. It uses the DLL to create a CppClass object by calling the CreateCppClass function. This function puts an integer value into the idxClass value. This value is used as an identifier that identifies a CppClass object when calling other functions. The code then calls CppClass_setValue to set the internal field of the CppClass object and CppClass_getSquare to call the getSquare method and to return the calculated value. At the end, DestroyCppClass destroys the CppClass object: procedure TfrmCppClassDemo.btnImportLibClick(Sender: TObject); var idxClass: Integer; value: Integer; begin if CreateCppClass(idxClass) <> 0 then ListBox1.Items.Add('CreateCppClass failed') else if CppClass_setValue(idxClass, SpinEdit1.Value) <> 0 then ListBox1.Items.Add('CppClass_setValue failed') else if CppClass_getSquare(idxClass, value) <> 0 then ListBox1.Items.Add('CppClass_getSquare failed') else begin ListBox1.Items.Add(Format('square(%d) = %d', [SpinEdit1.Value, value])); if DestroyCppClass(idxClass) <> 0 then ListBox1.Items.Add('DestroyCppClass failed') end; end; This approach is relatively simple but long-winded and error-prone. A better way is to write a wrapper Delphi class that implements the same public interface as the corresponding C++ class. The second demo, CppClassWrapperDemo contains a unit CppClassWrapper which does just that. This unit implements a TCppClass class, which maps to its C++ counterpart. It only has one internal field, which stores the index of the C++ object as returned from the CreateCppClass function: type TCppClass = class strict private FIndex: integer; public class procedure InitializeWrapper; class procedure FinalizeWrapper; constructor Create; destructor Destroy; override; procedure SetValue(value: integer); function GetSquare: integer; end; I won't show all of the functions here as they are all equally simple. One—or maybe two— will suffice. The constructor just calls the CreateCppClass function, checks the result, and stores the resulting index in the internal field: constructor TCppClass.Create; begin inherited Create; if CreateCppClass(FIndex) <> 0 then raise Exception.Create('CreateCppClass failed'); end; Similarly, GetSquare just forwards its job to the CppClass_getSquare function: function TCppClass.GetSquare: integer; begin if CppClass_getSquare(FIndex, Result) <> 0 then raise Exception.Create('CppClass_getSquare failed'); end; When we have this wrapper, the code in the main unit becomes very simple—and very Delphi-like. Once the initialization in the OnCreate event handler is done, we can just create an instance of the TCppClass and work with it: procedure TfrmCppClassDemo.FormCreate(Sender: TObject); begin TCppClass.InitializeWrapper; end; procedure TfrmCppClassDemo.FormDestroy(Sender: TObject); begin TCppClass.FinalizeWrapper; end; procedure TfrmCppClassDemo.btnWrapClick(Sender: TObject); var cpp: TCppClass; begin cpp := TCppClass.Create; try cpp.SetValue(SpinEdit1.Value); ListBox1.Items.Add(Format('square(%d) = %d', [SpinEdit1.Value, cpp.GetSquare])); finally FreeAndNil(cpp); end; end; To summarize, we learned about the C/C++ library that provides a solution for high-performance computing working with Delphi as the primary language. If you found this post useful, do check out the book Delphi High Performance to learn more about the intricacies of how to perform High-performance programming with Delphi. Exploring the Usages of Delphi Delphi: memory management techniques for parallel programming Delphi Cookbook
Read more
  • 0
  • 0
  • 22852
Modal Close icon
Modal Close icon