Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-perform-predictive-forecasting-sap-analytics-cloud

17 Feb 2018

7 min read

How to perform predictive forecasting in SAP Analytics Cloud

17 Feb 2018

[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Riaz Ahmed titled Learning SAP Analytics Cloud. This book involves features of the SAP Analytics Cloud which will help you collaborate, predict and solve business intelligence problems with cloud computing.[/box] In this article we will learn how to use predictive forecasting with the help of a trend time series chart to see revenue trends in a range of a year. Time series forecasting is only supported for planning models in SAP Analytics Cloud. So, you need planning rights and a planning license to run a predictive time-series forecast. However, you can add predictive forecast by creating a trend time series chart based on an analytical model to estimate future values. In this article, you will use a trend time series chart to view net revenue trends throughout the range of a year. A predictive time-series forecast runs an algorithm on historical data to predict future values for specific measures. For this type of chart, you can forecast a maximum of three different measures, and you have to specify the time for the prediction and the past time periods to use as historical data. Add a blank chart from the Insert toolbar. Set Data Source to the BestRun_Demo model. Select the Time Series chart from the Trend category. In the Measures section, click on the Add Measure link, and select Net Revenue. Finally, click on the Add Dimension link in the Time section, and select Date as the chart’s dimension: The output of your selections is depicted in the first view in the following screenshot. Every chart you create on your story page has its own unique elements that let you navigate and drill into details. The trend time series chart also allows you to zoom in to different time periods and scroll across the entire timeline. For example, the first figure in the following illustration provides a one-year view (A) of net revenue trends, that is from January to December 2015. Click on the six months link (B) to see the corresponding output, as illustrated in the second view. Drag the rectangle box (C) to the left or right to scroll across the entire timeline: Adding a forecast Click on the last data point representing December 2015, and select Add Forecast from the More Actions menu (D) to add a forecast: You see the Predictive Forecast panel on the right side, which displays the maximum number of forecast periods. Using the slider (E) in this section, you can reduce the number of forecast periods. By default, you see the maximum number (in the current scenario, it is seven) in the slider, which is determined by the amount of historical data you have. In the Forecast On section, you see the measure (F) you selected for the chart. If required, you can forecast a maximum of three different measures in this type of chart that you can add in the Builder panel. For the time being, click on OK to accept the default values for the forecast, as illustrated in the following screenshot: The forecast will be added to the chart. It is indicated by a highlighted area (G) and a dotted line (H). Click on the 1 year link (I) to see an output similar to the one illustrated in the following screenshot under the Modifying forecast section. As you can see, there are several data points that represent forecast. The top and bottom of the highlighted area indicate the upper and lower bounds of the prediction range, and the data points fall in the middle (on the dotted line) of the forecast range for each time period. Select a data point to see the Upper Confidence Bound (J) and Lower Confidence Bound (K) values. Modifying forecast You can modify a forecast using the link provided in the Forecast section at the bottom of the Builder panel. Select the chart, and scroll to the bottom of the Builder panel. Click on the Edit icon (L) to see the Predictive Forecast panel again. Review your settings, and make the required changes in this panel. For example, drag the slider toward the left to set the Forecast Periods value to 3 (M). Click on OK to save your settings. The chart should now display the forecast for three months--January, February, and March 2016 (N): Adding a time calculation If you want to display values such as year-over-year sales trends or year-to-date totals in your chart, then you can utilize the time calculation feature of SAP Analytics Cloud. The time calculation feature provides you with several calculation options. In order to use this feature, your chart must contain a time dimension with the appropriate level of granularity. For example, if you want to see quarter-over-quarter results, the time dimension must include quarterly or even monthly results. The space constraint prevents us from going through all these options. However, we will utilize the year-over-year option to compare yearly results in this article to get an idea about this feature. Execute the following instructions to first create a bar chart that shows the sold quantities of the four product categories. Then, add a time calculation to the chart to reveal the year-over-year changes in quantity sold for each category. As usual, add a blank chart to the page using the chart option on the Insert toolbar. Select the Best Run model as Data Source for the chart. Select the Bar/Column chart from the Comparison category. In the Measures section, click on the Add Measure link, and select Quantity Sold. Click on the Add Dimension link in the Dimensions section, and select Product as the chart’s dimension, as shown here: The chart appears on the page. At this stage, if you click on the More icon representing Quantity sold, you will see that the Add Time Calculation option (A) is grayed out. This is because time calculations require a time dimension to the chart, which we will add next. Click on the Add Dimension link in the Dimensions section, and select Date to add this time dimension to the chart. The chart transforms, as illustrated in the following screenshot: To display the results in the chart at the year level, you need to apply a filter as follows: Click on the filter icon in the Date dimension, and select Filter by Member. In the Set Members for Date dialog box, expand the all node, and select 2014, 2015, and 2016, individually. Once again, the chart changes to reflect the application of filter, as illustrated in the following screenshot: Now that a time dimension has been added to the chart, we can add a time calculation to it as follows: Click on the More icon in the Quantity sold measure. Select Add Time Calculation from the menu. Choose Year Over Year. New bars (A) and a corresponding legend (B) will be added to the chart, which help you compare yearly results, as shown in the following screenshot: To summarize, we provided hands-on exposure on predictive forecasting in SAP Analytics Cloud, where you learned about how to use a trend time series chart to view net revenue trends throughout the range of a year. If you enjoyed this excerpt, check out the book Learning SAP Analytics Cloud, to get an understanding of SAP Analytics Cloud platform and how to create better BI solutions.

0
0
17375

article-image-customizing-graphics-and-creating-bar-chart-and-scatterplot-r

Packt

28 Oct 2010

4 min read

Customizing Graphics and Creating a Bar Chart and Scatterplot in R

Packt

28 Oct 2010

4 min read

Statistical Analysis with R Take control of your data and produce superior statistical analysis with R. An easy introduction for people who are new to R, with plenty of strong examples for you to work through This book will take you on a journey to learn R as the strategist for an ancient Chinese kingdom! A step by step guide to understand R, its benefits, and how to use it to maximize the impact of your data analysis A practical guide to conduct and communicate your data analysis with R in the most effective manner Charts, graphs, and plots in R R features several options for creating charts, graphs, and plots. In this article, we will explore the generation and customization of these visuals, as well as methods for saving and exporting them for use outside of R. The following visuals will be covered in this article: Bar graphs Scatterplots Line charts Box plots Histograms Pie charts Time for action — creating a bar chart A bar chart or bar graph is a common visual that uses rectangles to depict the values of different items. Bar graphs are especially useful when comparing data over time or between diverse groups. Let us create a bar chart in R: Open R and set your working directory: > #set the R working directory > #replace the sample location with one that is relevant to you > setwd("/Users/johnmquick/rBeginnersGuide/") Use the barplot(...) function to create a bar chart: > #create a bar chart that compares the mean durations of the battle methods > #calculate the mean duration of each battle method > meanDurationFire <- mean(subsetFire$DurationInDays) > meanDurationAmbush <- mean(subsetAmbush$DurationInDays) > meanDurationHeadToHead <- mean(subsetHeadToHead$DurationInDays) > meanDurationSurround <- mean(subsetSurround$DurationInDays) > #use a vector to define the chart's bar values > barAllMethodsDurationBars <- c(meanDurationFire, meanDurationAmbush, meanDurationHeadToHead, meanDurationSurround) > #use barplot(...) to create and display the bar chart > barplot(height = barAllMethodsDurationBars) Your chart will be displayed in the graphic window, similar to the following: What just happened? You created your first graphic in R. Let us examine the barplot(...) function that we used to generate our bar chart, along with the new R components that we encountered. barplot(...) We created a bar chart that compared the mean durations of battles between the different combat methods. As it turns out, there is only one required argument in the barplot(...) function. This height argument receives a series of values that specify the length of each bar. Therefore, the barplot(...) function, at its simplest, takes on the following form: barplot(height = heightValues) Accordingly, our bar chart function reflected this same format: > barplot(height = barAllMethodsDurationBars) Vectors We stored the heights of our chart's bars in a vector variable. In R, a vector is a series of data. R's c(...) function can be used to create a vector from one or more data points. For example, the numbers 1, 2, 3, 4, and 5 can be arranged into a vector like so: > #arrange the numbers 1, 2, 3, 4, and 5 into a vector > numberVector <- c(1, 2, 3, 4, 5) Similarly, text data can also be placed into vector form, so long as the values are contained within quotation marks: > #arrange the letters a, b, c, d, and e into a vector > textVector <- c("a", "b", "c", "d", "e") Our vector defined the values for our bars: > #use a vector to define the chart's bar values > barAllMethodsDurationBars <- c(meanDurationFire, meanDurationAmbush, meanDurationHeadToHead, meanDurationSurround) Many function arguments in R require vector input. Hence, it is very common to use and encounter the c(...) function when working in R. Graphic window When you executed your barplot(...) function in the R console, the graphic window opened to display it. The graphic window will have different names across different operating systems, but its purpose and function remain the same. For example, in Mac OS X, the graphic window is named Quartz. For the remainder of this article, all R graphics will be displayed without the graphics window frame, which will allow us to focus on the visuals themselves. Pop quiz When entering text into a vector using the c(...) function, what characters must surround each text value? Quotation marks Parenthesis Asterisks Percent signs What is the purpose of the R graphic window? To debug graphics functions To execute graphics functions To edit graphics To display graphics

0
0
17320

article-image-debugging-mechanisms-oracle-sql-developer-21-using-plsql

Packt

08 Sep 2010

7 min read

Debugging mechanisms in Oracle SQL Developer 2.1 using Pl/SQL

Packt

08 Sep 2010

7 min read

Once your PL/SQL code has successfully compiled, it is important to review it to make sure it does what is required and that it performs well. You can consider a number of approaches when tuning and testing code. These approaches include: Debugging—run the code and add break points to stop and inspect areas of concern. SQL performance—use Explain Plan results to review the performance. PL/SQL performance—use the PL/SQL Hierarchical Profiler to identify bottlenecks. Unit testing—review edge cases and general function testing. Does the code do what you intended it to do? In this article by Sue Harper, author of Oracle SQL Developer 2.1, we'll review the debugger. We will see how to debug PL/SQL packages, procedures, and functions. Debugging PL/SQL code SQL and PL/SQL code may execute cleanly, and even produce an output. PL/SQL code may compile and produce results, but this is part of the task. Does it do what you are expecting it to do? Are the results accurate? Does it behave as expected for high and low values, odd dates or names? Does it behave the same way when it's called from within a program as it does when tested in isolation? Does it perform as well for massive sets of data as it does for a small test case? All of these are aspects to consider when testing code, and many can been tracked by debugging the code. Using the debugging mechanism in SQL Developer You will need a piece of compiled, working code. For this exercise, we will use the following piece of code: PROCEDURE EMP_DEPTS(P_MAXROWS VARCHAR2)ASCURSOR EMPDEPT_CURSOR ISSELECT D.DEPARTMENT_NAME, E.LAST_NAME, J.JOB_TITLEFROM DEPARTMENTS D, EMPLOYEES E, JOBS JWHERE D.DEPARTMENT_ID = E.DEPARTMENT_IDAND E.JOB_ID = J.JOB_ID;EMP_RECORD EMPDEPT_CURSOR % ROWTYPE;TYPE EMP_TAB_TYPE IS TABLE OF EMPDEPT_CURSOR % ROWTYPE INDEX BYBINARY_INTEGER;EMP_TAB EMP_TAB_TYPE;I NUMBER := 1;BEGINOPEN EMPDEPT_CURSOR;FETCH EMPDEPT_CURSORINTO EMP_RECORD;EMP_TAB(I) := EMP_RECORD;WHILE((EMPDEPT_CURSOR % FOUND) AND(I <= P_MAXROWS))LOOP I := I + 1;FETCH EMPDEPT_CURSORINTO EMP_RECORD;EMP_TAB(I) := EMP_RECORD;END LOOP;CLOSE EMPDEPT_CURSOR; FOR J IN REVERSE 1 .. ILOOP DBMS_OUTPUT.PUT_LINE('THE EMPLOYEE '|| EMP_TAB(J).LAST_NAME ||' WORKS IN DEPARTMENT '|| EMP_TAB(J).DEPARTMENT_NAME);END LOOP;END; Before you can debug code, you need to have the following privileges: EXECUTE and DEBUG—you need to be able to execute the required procedure DEBUG CONNECT SESSION—to be able to debug procedures you execute in the session Note, when granting the system privilege DEBUG ANY PROCEDURE, you are granting access to debug any procedure that you have execute privilege for and has been compiled for debug. Using the Oracle debugging packages Oracle provides two packages for debugging PL/SQL code. The first, DBMS_DEBUG, was introduced in Oracle 8i and is not used by newer IDEs. The second, DBMS_DEBUG_JWP, was introduced in Oracle 9i Release 2, and is used in SQL Developer when debugging sub-programs. Debugging When preparing to debug any code, you need to set at least one break point, and then you should select Compile for Debug. In the following screenshot, the breakpoint is set at the opening of the cursor, and the Compile for Debug option is shown in the drop-down list: Instead of using the drop-down list to select the Compile or Compile for Debug options, just click the Compile button. This compiles the PL/SQL code using the optimization level set in the Preferences. Select Database PL/SQL Compiler|. By setting the Optimization Level preference to 0 or 1 the PL/SQL is compiled with debugging information. Any PL/SQL code that has been compiled for debugging will show the little green bug overlaying the regular icon in the Connections navigator. The next screenshot shows the EMP_DEPTS procedure and the GET_SALARY function have both been compiled for debug: Compile for debug Once you have completed a debugging session, be sure to compile again afterwards to remove any debug compiler directives. While negligible, omitting this step can have a performance impact on the PL/SQL program. You are now ready to debug. To debug, click the Debug button in the toolbar. SQL Developer then sets the sessions to a debug session and issues the command DBMS_DEBUG_JDWP.CONNECT_TCP (hostname, port); and sets up the debug windows as shown in the following screenshot: This connects you to a debugger session in the database. In some instances, the port selected is not open, due to firewall or other restrictions. In this case, you can have SQL Developer prompt you for the port. To set this option, open the Preferences dialog, and select the Debugger node. You can also specify the port range available for SQL Developer to use. These options mean that you can have more control over the ports used. Navigating through the code The PL/SQL debugger provides a selection of buttons (or menu items) to step through individual lines of code, or to step over blocks of code. You can step through or over procedures, navigating to the point of contention or the area you wish to inspect. Once you start stepping into the code, you can track the data as it changes. The data is displayed in a second set of tabbed dialogs. In this example, we are looping through a set of records in order for you to see how each of the windows behaves. As you start stepping into the code, the Data tab starts to display the values: This Data tab continues to collect all of the variables as you continue to step through the code. Even if you step over and skip blocks of code, all of the code is executed and the results are gathered here. The Smart Data tab keeps track of the same detail, but only the values immediately related to the area you are working in. This is more useful in a large procedure than in a small one like the example shown. The context menu provides you with a set of options while debugging. These include: Run to Cursor—allows you to start debugging and then to quickly move to another part of the code. The code in-between is quickly executed and you can continue debugging. Watch—allows you to watch an expression or code while you are debugging. Inspect—allows you to watch values as you debug. In the following screenshot, the current execution point is at the start of the WHILE loop. If the loop is required to loop multiple times, you can skip that and have the code execute to a point further down in the code, in this case after the cursor has been completed and closed: The Watch and Inspect options remain set up if you stop and restart the debug session. This allows you to stop, change the input values, and start debugging and these will change according to the new parameters. You do not need to set up watch or inspector values each time you debug the procedure. The values appear in dockable windows, so you can dock or float them near the code as required: You can modify values that you are watching. In the following example, 'i' is the counter that we're using in the loop. You can modify this value to skip over chunks of the loop, and then continue from a particular point. Modifying values in the middle of the code can be useful, as you might want to test how the program reacts in certain circumstances. For example, before the millennium, testers may have wanted to see how code behaved, or output changed once the date switched over to the year 2000.

0
1
17274

article-image-language-modeling-with-deep-learning

Mohammad Pezeshki

23 Sep 2016

5 min read

Language Modeling with Deep Learning

Mohammad Pezeshki

23 Sep 2016

5 min read

Language modeling is defining a joint probability distribution over a sequence of tokens (words or characters). Considering a sequence of tokens fx1; :::; xT g. A language model defines P (x1; : : : ; xT ), which can be used in many areas of natural language processing. Language modelings define a joint probability distribution over a sequence of tokens (words or characters). Consider a sequence of tokens x1; : : : ; xT. For example, a language model can significantly improve the accuracy of a speech recognition system. As an example, in the case of two words that have the same sound but different meanings, a language model can fix the problem of recognizing the right word. In Figure 1, the speech recognizer (aka acoustic model) has assigned the same high probabilities to the words meet" and meat". It is even possible that the speech recognizer assigns a higher probability to meet" rather than meat". However, by conditioning the language model on the three rst tokens (I-cooked-some"), the next word could be sh", pasta", or meat" with a reasonable probability higher than the probability of meet". To get the final answer, we can simply multiply two tables of probabilities and normalize them. Now the word meat" has a very high relative probability! One family of deep learning models that are capable of modeling sequential data (such as language) is Recurrent Neural Networks (RNNs). RNNs have recently achieved impressive results on different problems such as the language modeling. In this article, we briefly describe RNNs and demonstrate how to code them using the Blocks library on top of Theano. Consider a sequence of T input elements x1; : : : ; xT . RNN models the sequence by applying the same operation in a recursive way. Formally, ht = f(ht 1; xt); (1) yt = g(h t); (2) Where ht is the internal hidden representation of the RNN and yt is the output at tth time-step. For the very first time-step, we also have an initial state h0. f and g are two functions, which are shared across the time axis. In the simplest case, f and g can be a linear transformation followed by a non-linearity. There are more complicated forms of f and g such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU). Here we skip the exact formulations of f and g to use LSTM as a black box. Consequently, suppose we have B sequences, each with a length of T, such that each time-step is presented in a vector of size F . So the input can be seen as a 3D tensor with size T xBxF, the hidden representation with size T xBxF 0, and the output with size T xBxF 00. Let's build a character-level language model that can model the joint probability P (x1; : : : ; xT ) using the chain rule: P (x1; : : : ; xT ) = P (x1)P (x2jx1)P (x3jx1; x2):::P (xT jx1::T1) (3) We can model P (xtjx1::t1) using an RNN by predicting xt given xt1::1. In other words, given a sequence fx1; : : : ; xT g, the input sequence is fx1; : : : ; xT1g and the target sequence is fx2; : : : ; xT g. To define input and target, we can write: Now to define the model, we need a linear transformation from the input to the LSTM, and from the LSTM to the output. To train the model, we use the cross entropy between the model output and the true target: Now assuming that data is provided to us, using data stream, we can start training by initializing the model, and tuning parameters: After the model is trained, we can condition the model on an initial sequence and start generating the next token. We can repeatedly feed the predicted token into the model and get the next token. We can even just start from the initial state and ask the model to hallucinate! Here is a sample generated text from a model trained on a 96 MB text data of wikipedia (figure adapted from here): Here is a visualization of the model's output. The first line is the real data and the next six lines are the candidate with the highest output probability of for each character. The more red a cell is, the higher probability the model assigns to that character. For example, as soon as the model sees ttp://ww, it is confident that the next character is also a w" and the next one is a .". Butat this point, there is no more clue about the next character. So the model assigns almost the same probability to all the characters (figure adapted from here): In this post we learned about language modeling and one of its applications in speech recognition. We also learned how to code a recurrent neural network in order to train such a model. You can find the complete code and experiment on a bunch of datasets such as wikipedia at Github. The code is written by my close friend Eloi Zablocki and me. About the author Mohammad Pezeshk is a master’s student in the LISA lab of Universite de Montreal working under the supervision of Yoshua Bengio and Aaron Courville. He obtained his bachelor's in computer engineering from Amirkabir University of Technology (Tehran Polytechnic) in July 2014 and then started his master’s in September 2014. His research interests lie in the fields of artifitial intelligence, machine learning, probabilistic models and specifically deep learning.

0
0
17231

article-image-implementing-fault-tolerance-in-spark-streaming-data-processing-applications-with-apache-kafka

Pravin Dhandre

01 Feb 2018

16 min read

Implementing fault-tolerance in Spark Streaming data processing applications with Apache Kafka

Pravin Dhandre

01 Feb 2018

16 min read

0
0
17218

article-image-introduction-titanic-datasets

Packt

09 May 2017

11 min read

Introduction to Titanic Datasets

Packt

09 May 2017

11 min read

In this article by Alexis Perrier, author of the book Effective Amazon Machine Learning says artificial intelligence and big data have become a ubiquitous part of our everyday lives; cloud-based machine learning services are part of a rising billion-dollar industry. Among the several such services currently available on the market, Amazon Machine Learning stands out for its simplicity. Amazon Machine Learning was launched in April 2015 with a clear goal of lowering the barrier to predictive analytics by offering a service accessible to companies without the need for highly skilled technical resources. (For more resources related to this topic, see here.) Working with datasets You cannot do predictive analytics without a dataset. Although we are surrounded by data, finding datasets that are adapted to predictive analytics is not always straightforward. In this section, we present some resources that are freely available. The Titanic datasetis a classic introductory datasets for predictive analytics. Finding open datasets There is a multitude of dataset repositories available online, from local to global public institutions to non-profit and data-focused start-ups. Here’s a small list of open dataset resources that are well suited forpredictive analytics. This, by far, is not an exhaustive list. This thread on Quora points to many other interesting data sources:https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public.You can also ask for specific datasets on Reddit at https://www.reddit.com/r/datasets/. The UCI Machine Learning Repository is a collection of datasets maintained by UC Irvine since 1987, hosting over 300 datasets related to classification, clustering, regression, and other ML tasks Mldata.org from the University of Berlinor the Stanford Large Network Dataset Collection and other major universities alsooffer great collections of open datasets Kdnuggets.com has an extensive list of open datasets at http://www.kdnuggets.com/datasets Data.gov and other US government agencies;data.UN.org and other UN agencies AWS offers open datasets via partners at https://aws.amazon.com/government-education/open-data/. The following startups are data centered and give open access to rich data repositories: Quandl and quantopian for financial datasets Datahub.io, Enigma.com, and Data.world are dataset-sharing sites Datamarket.com is great for time series datasets Kaggle.com, the data science competition website, hosts over 100 very interesting datasets AWS public datasets:AWS hosts a variety of public datasets,such as the Million Song Dataset, the mapping of the Human Genome, the US Census data as well as many others in Astrology, Biology, Math, Economics, and so on. These datasets are mostly available via EBS snapshots although some are directly accessible on S3. The datasets are large, from a few gigabytes to several terabytes, and are not meant to be downloaded on your local machine; they are only to be accessible via an EC2 instance (take a look at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-public-data-sets.htmlfor further details).AWS public datasets are accessible at https://aws.amazon.com/public-datasets/. Introducing the Titanic dataset We will use the classic Titanic dataset. The dataconsists of demographic and traveling information for1,309 of the Titanic passengers, and the goal isto predict the survival of these passengers. The full Titanic dataset is available from the Department of Biostatistics at the Vanderbilt University School of Medicine (http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv)in several formats. The Encyclopedia Titanica website (https://www.encyclopedia-titanica.org/) is the website of reference regarding the Titanic. It contains all the facts, history, and data surrounding the Titanic, including a full list of passengers and crew members. The Titanic datasetis also the subject of the introductory competition on Kaggle.com (https://www.kaggle.com/c/titanic, requires opening an account with Kaggle). You can also find a csv version in GitHub repository at https://github.com/alexperrier/packt-aml/blob/master/ch4. The Titanic data containsa mix of textual, Boolean, continuous, and categorical variables. It exhibits interesting characteristics such as missing values, outliers, and text variables ripe for text mining--a rich database that will allow us to demonstrate data transformations. Here’s a brief summary of the 14attributes: pclass: Passenger class (1 = 1st; 2 = 2nd; 3 = 3rd) survival: A Boolean indicating whether the passenger survived or not (0 = No; 1 = Yes); this is our target name: A field rich in information as it contains title and family names sex: male/female age: Age, asignificant portion of values aremissing sibsp: Number of siblings/spouses aboard parch: Number of parents/children aboard ticket: Ticket number. fare: Passenger fare (British Pound). cabin: Doesthe location of the cabin influence chances of survival? embarked: Port of embarkation (C = Cherbourg; Q = Queenstown; S = Southampton) boat: Lifeboat, many missing values body: Body Identification Number home.dest: Home/destination Take a look at http://campus.lakeforest.edu/frank/FILES/MLFfiles/Bio150/Titanic/TitanicMETA.pdf for more details on these variables. We have 1,309 records and 14 attributes, three of which we will discard. The home.dest attribute hastoo few existing values, the boat attribute is only present for passengers who have survived, and thebody attributeis only for passengers who have not survived. We will discard these three columnslater on while using the data schema. Preparing the data Now that we have the initial raw dataset, we are going to shuffle it, split it into a training and a held-out subset, and load it to an S3 bucket. Splitting the data In order to build and select the best model, we need to split the dataset into three parts: training, validation, and test, with the usual ratios being 60%, 20%, and 20%. The training and validation sets are used to build several models and select the best one while the test or held-out set, is used for the final performance evaluation on previously unseen data. Since Amazon ML does the job of splitting the dataset used for model training and model evaluation into a training and a validation subsets, we only need to split our initial dataset into two parts: the global training/evaluation subset (80%) for model building and selection, and the held-out subset (20%) for predictions and final model performance evaluation. Shuffle before you split:If you download the original data from the Vanderbilt University website,you will notice that it is ordered by pclass, the class of the passenger and by alphabetical order of the name column. The first 323 rows correspond to the 1st class followed by 2nd (277) and 3rd (709) class passengers. It is important to shuffle the data before you split it so that all the different variables have have similar distributions in each training and held-out subsets. You can shuffle the data directly in the spreadsheet by creating a new column, generating a random number for each row and then ordering by that column. On GitHub: You will find an already shuffledtitanic.csv file at https://github.com/alexperrier/packt-aml/blob/master/ch4/titanic.csv. In addition to shuffling the data, we have removed punctuation in the name column: commas, quotes, and parenthesis, which can add confusion when parsing a csv file. We end up with two files:titanic_train.csv with 1047 rows and titanic_heldout.csv with 263rows. These files are also available in the GitHub repo (https://github.com/alexperrier/packt-aml/blob/master/ch4). The next step is to upload these files on S3 so that Amazon ML can access them. Loading data on S3 AWS S3 is one of the main AWS services dedicated to hosting files and managing their access. Files in S3 can be public and open to the internet or have access restricted to specific users, roles, or services.S3 is also used extensively by AWS for operations such as storing log files or results (predictions, scripts, queries, and so on). Files in S3 are organized around the notion of buckets. Buckets are placeholders with unique names similar to domain names for websites. A file in S3 will have a unique locator URI: s3://bucket_name/{path_of_folders}/filename. The bucket name is unique across S3. In this section, we will create a bucket for our data, upload the titanic training file, and open its access to Amazon ML. Go to https://console.aws.amazon.com/s3/home, and open an S3 account if you don’t have one yet. S3 pricing:S3 charges for the total volume of files you host and the volume of file transfers depends on the region where the files are hosted. At time of writing, for less than 1TB, AWS S3 charges $0.03/GB per month in the US east region. All S3 prices are available at https://aws.amazon.com/s3/pricing/. See also http://calculator.s3.amazonaws.com/index.htmlfor the AWS cost calculator. Creating a bucket Once you have created your S3 account, the next step is to create a bucket for your files.Click on the Create bucket button: Choose a name and a region, since bucket names are unique across S3, you must choose a name for your bucket that has not been already taken. We chose the name aml.packt for our bucket, and we will use this bucket throughout. Regarding the region, you should always select a region that is the closest to the person or application accessing the files in order to reduce latency and prices. Set Versioning, Logging, and Tags, versioning will keep a copy of every version of your files, which prevents from accidental deletions. Since versioning and logging induce extra costs, we chose to disable them. Set permissions. Review and save. Loading the data To upload the data, simply click on the upload button and select the titanic_train.csv file we created earlier on. You should, at this point, have the training dataset uploaded to your AWS S3 bucket. We added a/data folder in our aml.packt bucket to compartmentalize our objects. It will be useful later on when the bucket will also contain folders created by S3. At this point, only the owner of the bucket (you) is able to access and modify its contents. We need to grant the Amazon ML service permissions to read the data and add other files to the bucket. When creating the Amazon ML datasource, we will be prompted to grant these permissions inthe Amazon ML console. We can also modify the bucket’s policy upfront. Granting permissions We need to edit the policy of the aml.packt bucket. To do so, we have to perform the following steps: Click into your bucket. Select the Permissions tab. In the drop down, select Bucket Policy as shown in the following screenshot. This will open an editor: Paste in the following JSON. Make sure to replace {YOUR_BUCKET_NAME} with the name of your bucket and save: { “Version”: “2012-10-17”, “Statement”: [ { “Sid”: “AmazonML_s3:ListBucket”, “Effect”: “Allow”, “Principal”: { “Service”: “machinelearning.amazonaws.com” }, “Action”: “s3:ListBucket”, “Resource”: “arn:aws:s3:::{YOUR_BUCKET_NAME}”, “Condition”: { “StringLike”: { “s3:prefix”: “*” } } }, { “Sid”: “AmazonML_s3:GetObject”, “Effect”: “Allow”, “Principal”: { “Service”: “machinelearning.amazonaws.com” }, “Action”: “s3:GetObject”, “Resource”: “arn:aws:s3:::{YOUR_BUCKET_NAME}/*” }, { “Sid”: “AmazonML_s3:PutObject”, “Effect”: “Allow”, “Principal”: { “Service”: “machinelearning.amazonaws.com” }, “Action”: “s3:PutObject”, “Resource”: “arn:aws:s3:::{YOUR_BUCKET_NAME}/*” } ] } Further details on this policy are available at http://docs.aws.amazon.com/machine-learning/latest/dg/granting-amazon-ml-permissions-to-read-your-data-from-amazon-s3.html. Once again, this step is optional since Amazon ML will prompt you for access to the bucket when you create the datasource. Formatting the data Amazon ML works on comma separated values files (.csv)--a very simple format where each rowis an observation and each column is a variable or attribute. There are, however, a few conditionsthat shouldbe met: The data must be encoded in plain text using a character set, such asASCII, Unicode, or EBCDIC All values must be separated by commas; if a value contains a comma, it should be enclosed by double quotes Each observation (row) must be smaller than 100k There are also conditions regarding end of line characters that separate rows. Special care must be taken when using Excel on OS X (Mac) as explained on this page: http://docs.aws.amazon.com/machine-learning/latest/dg/understanding-the-data-format-for-amazon-ml.html What about other data file formats? Unfortunately, Amazon ML datasource are only compatible with csv files and Redshift databases and does not accept formats such as JSON, TSV, or XML. However, other services such as Athena, a serverless database service, do accept a wider range of formats. Summary In this article we learnt about how to use and work around with datasets using Amazon web services and Titanic datasets. We also learnt how prepare data and Amazon S3 services. Resources for Article: Further resources on this subject: Processing Massive Datasets with Parallel Streams – the MapReduce Model [article] Processing Next-generation Sequencing Datasets Using Python [article] Combining Vector and Raster Datasets [article]

0
1
17111

Packt

12 Apr 2016

17 min read

Market Basket Analysis

Packt

12 Apr 2016

17 min read

In this article by Boštjan Kaluža, author of the book Machine Learning in Java, we will discuss affinity analysis which is the heart of Market Basket Analysis (MBA). It can discover co-occurrence relationships among activities performed by specific users or groups. In retail, affinity analysis can help you understand the purchasing behavior of customers. These insights can drive revenue through smart cross-selling and upselling strategies and can assist you in developing loyalty programs, sales promotions, and discount plans. In this article, we will look into the following topics: Market basket analysis Association rule learning Other applications in various domains First, we will revise the core association rule learning concepts and algorithms, such as support, lift, Apriori algorithm, and FP-growth algorithm. Next, we will use Weka to perform our first affinity analysis on supermarket dataset and study how to interpret the resulting rules. We will conclude the article by analyzing how association rule learning can be applied in other domains, such as IT Operations Analytics, medicine, and others. (For more resources related to this topic, see here.) Market basket analysis Since the introduction of electronic point of sale, retailers have been collecting an incredible amount of data. To leverage this data in order to produce business value, they first developed a way to consolidate and aggregate the data to understand the basics of the business. What are they selling? How many units are moving? What is the sales amount? Recently, the focus shifted to the lowest level of granularity—the market basket transaction. At this level of detail, the retailers have direct visibility into the market basket of each customer who shopped at their store, understanding not only the quantity of the purchased items in that particular basket, but also how these items were bought in conjunction with each other. This can be used to drive decisions about how to differentiate store assortment and merchandise, as well as effectively combine offers of multiple products, within and across categories, to drive higher sales and profits. These decisions can be implemented across an entire retail chain, by channel, at the local store level, and even for the specific customer with the so-called personalized marketing, where a unique product offering is made for each customer. MBA covers a wide variety of analysis: Item affinity: This defines the likelihood of two (or more) items being purchased together Identification of driver items: This enables the identification of the items that drive people to the store and always need to be in stock Trip classification: This analyzes the content of the basket and classifies the shopping trip into a category: weekly grocery trip, special occasion, and so on Store-to-store comparison: Understanding the number of baskets allows any metric to be divided by the total number of baskets, effectively creating a convenient and easy way to compare the stores with different characteristics (units sold per customer, revenue per transaction, number of items per basket, and so on) Revenue optimization: This helps in determining the magic price points for this store, increasing the size and value of the market basket Marketing: This helps in identifying more profitable advertising and promotions, targeting offers more precisely in order to improve ROI, generating better loyalty card promotions with longitudinal analysis, and attracting more traffic to the store Operations optimization: This helps in matching the inventory to the requirement by customizing the store and assortment to trade area demographics, optimizing store layout Predictive models help retailers to direct the right offer to the right customer segments/profiles, as well as gain understanding of what is valid for which customer, predict the probability score of customers responding to this offer, and understand the customer value gain from the offer acceptance. Affinity analysis Affinity analysis is used to determine the likelihood that a set of items will be bought together. In retail, there are natural product affinities, for example, it is very typical for people who buy hamburger patties to buy hamburger rolls, along with ketchup, mustard, tomatoes, and other items that make up the burger experience. While there are some product affinities that might seem trivial, there are some affinities that are not very obvious. A classic example is toothpaste and tuna. It seems that people who eat tuna are more prone to brush their teeth right after finishing their meal. So, why it is important for retailers to get a good grasp of the product affinities? This information is critical to appropriately plan promotions as reducing the price for some items may cause a spike on related high-affinity items without the need to further promote these related items. In the following section, we'll look into the algorithms for association rule learning: Apriori and FP-growth. Association rule learning Association rule learning has been a popular approach for discovering interesting relations between items in large databases. It is most commonly applied in retail for discovering regularities between products. Association rule learning approaches find patterns as interesting strong rules in the database using different measures of interestingness. For example, the following rule would indicate that if a customer buys onions and potatoes together, they are likely to also buy hamburger meat:{onions, potatoes} à {burger} Another classic story probably told in every machine learning class is the beer and diaper story. An analysis of supermarket shoppers' behavior showed that customers, presumably young men, who buy diapers tend also to buy beer. It immediately became a popular example of how an unexpected association rule might be found from everyday data; however, there are varying opinions as to how much of the story is true. Daniel Powers says (DSS News, 2002): In 1992, Thomas Blischok, manager of a retail consulting group at Teradata, and his staff prepared an analysis of 1.2 million market baskets from about 25 Osco Drug stores. Database queries were developed to identify affinities. The analysis "did discover that between 5:00 and 7:00 p.m. that consumers bought beer and diapers". Osco managers did NOT exploit the beer and diapers relationship by moving the products closer together on the shelves. In addition to the preceding example from MBA, association rules are today employed in many application areas, including web usage mining, intrusion detection, continuous production, and bioinformatics. We'll take a closer look these areas later in this article. Basic concepts Before we dive into algorithms, let's first review the basic concepts. Database of transactions First, there is no class value, as this is not required for learning association rules. Next, the dataset is presented as a transactional table, where each supermarket item corresponds to a binary attribute. Hence, the feature vector could be extremely large. Consider the following example. Suppose we have five receipts as shown in the following image. Each receipt corresponds a purchasing transaction: To write these receipts in the form of transactional database, we first identify all the possible items that appear in the receipts. These items are onions, potatoes, burger, beer, and dippers. Each purchase, that is, transaction, is presented in a row, and there is 1 if an item was purchased within the transaction and 0 otherwise, as shown in the following table: Transaction ID Onions Potatoes Burger Beer Dippers 1 0 1 1 0 0 2 1 1 1 1 0 3 0 0 0 1 1 4 1 0 1 1 0 This example is really small. In practical applications, the dataset often contains thousands or millions of transactions, which allow learning algorithm discovery of statistically significant patterns. Itemset and rule Itemset is simply a set of items, for example, {onions, potatoes, burger}. A rule consists of two itemsets, X and Y, in the following format X -> Y. This indicates a pattern that when the X itemset is observed, Y is also observed. To select interesting rules, various measures of significance can be used. Support Support, for an itemset, is defined as the proportion of transactions that contain the itemset. The {potatoes, burger} itemset in the previous table has the following support as it occurs in 50% of transactions (2 out of 4 transactions) supp({potatoes, burger }) = 2/4 = 0.5. Intuitively, it indicates the share of transactions that support the pattern. Confidence Confidence of a rule indicates its accuracy. It is defined as Conf(X -> Y) = supp(X U Y) / supp(X). For example, the {onions, burger} -> {beer} rule has the confidence 0.5/0.5 = 1.0 in the previous table, which means that 100% of the times when onions and burger are bought together, beer is bought as well. Apriori algorithm Apriori algorithm is a classic algorithm used for frequent pattern mining and association rule learning over transactional. By identifying the frequent individual items in a database and extending them to larger itemsets, Apriori can determine the association rules, which highlight general trends about a database. Apriori algorithm constructs a set of itemsets, for example, itemset1= {Item A, Item B}, and calculates support, which counts the number of occurrences in the database. Apriori then uses a bottom up approach, where frequent itemsets are extended, one item at a time, and it works by eliminating the largest sets as candidates by first looking at the smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. The algorithm terminates when no further successful extensions are found. Although, Apriori algorithm is an important milestone in machine learning, it suffers from a number of inefficiencies and tradeoffs. In the following section, we'll look into a more recent FP-growth technique. FP-growth algorithm FP-growth, where frequent pattern (FP), represents the transaction database as a prefix tree. First, the algorithm counts the occurrence of items in the dataset. In the second pass, it builds a prefix tree, an ordered tree data structure commonly used to store a string. An example of prefix tree based on the previous example is shown in the following diagram: If many transactions share most frequent items, prefix tree provides high compression close to the tree root. Large itemsets are grown directly, instead of generating candidate items and testing them against the entire database. Growth starts at the bottom of the tree, by finding all the itemsets matching minimal support and confidence. Once the recursive process has completed, all large itemsets with minimum coverage have been found and association rule creation begins. FP-growth algorithms have several advantages. First, it constructs an FP-tree, which encodes the original dataset in a substantially compact presentation. Second, it efficiently builds frequent itemsets, leveraging the FP-tree structure and divide-and-conquer strategy. The supermarket dataset The supermarket dataset, located in datasets/chap5/supermarket.arff, describes the shopping habits of supermarket customers. Most of the attributes stand for a particular item group, for example, diary foods, beef, potatoes; or department, for example, department 79, department 81, and so on. The value is t if the customer had bought an item and missing otherwise. There is one instance per customer. The dataset contains no class attribute, as this is not required to learn association rules. A sample of data is shown in the following table: Discover patterns To discover shopping patterns, we will use the two algorithms that we have looked into before, Apriori and FP-growth. Apriori We will use the Apriori algorithm as implemented in Weka. It iteratively reduces the minimum support until it finds the required number of rules with the given minimum confidence: import java.io.BufferedReader; import java.io.FileReader; import weka.core.Instances; import weka.associations.Apriori; First, we will load the supermarket dataset: Instances data = new Instances( new BufferedReader( new FileReader("datasets/chap5/supermarket.arff"))); Next, we will initialize an Apriori instance and call the buildAssociations(Instances) function to start frequent pattern mining, as follows: Apriori model = new Apriori(); model.buildAssociations(data); Finally, we can output the discovered itemsets and rules, as shown in the following code: System.out.println(model); The output is as follows: Apriori ======= Minimum support: 0.15 (694 instances) Minimum metric <confidence>: 0.9 Number of cycles performed: 17 Generated sets of large itemsets: Size of set of large itemsets L(1): 44 Size of set of large itemsets L(2): 380 Size of set of large itemsets L(3): 910 Size of set of large itemsets L(4): 633 Size of set of large itemsets L(5): 105 Size of set of large itemsets L(6): 1 Best rules found: 1. biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35) 2. baking needs=t biscuits=t fruit=t total=high 760 ==> bread and cake=t 696 <conf:(0.92)> lift:(1.27) lev:(0.03) [149] conv:(3.28) 3. baking needs=t frozen foods=t fruit=t total=high 770 ==> bread and cake=t 705 <conf:(0.92)> lift:(1.27) lev:(0.03) [150] conv:(3.27) ... The algorithm outputs ten best rules according to confidence. Let's look the first rule and interpret the output, as follows: biscuits=t frozen foods=t fruit=t total=high 788 ==> bread and cake=t 723 <conf:(0.92)> lift:(1.27) lev:(0.03) [155] conv:(3.35) It says that when biscuits, frozen foods, and fruits are bought together and the total purchase price is high, it is also very likely that bread and cake are purchased as well. The {biscuits, frozen foods, fruit, total high} itemset appears in 778 transactions, while the {bread, cake} itemset appears in 723 transactions. The confidence of this rule is 0.92, meaning that the rule holds true in 92% of transactions where the {biscuits, frozen foods, fruit, total high} itemset is present. The output also reports additional measures such as lift, leverage, and conviction, which estimate the accuracy against our initial assumptions, for example, the 3.35 conviction value indicates that the rule would be incorrect 3.35 times as often if the association was purely a random chance. Lift measures the number of times X and Y occur together than expected if they where statistically independent (lift=1). The 2.16 lift in the X -> Y rule means that the probability of X is 2.16 times greater than the probability of Y. FP-growth Now, let's try to get the same results with more efficient FP-growth algorithm. FP-growth is also implemented in the weka.associations package: import weka.associations.FPGrowth; The FP-growth is initialized similarly as we did earlier: FPGrowth fpgModel = new FPGrowth(); fpgModel.buildAssociations(data); System.out.println(fpgModel); The output reveals that FP-growth discovered 16 rules: FPGrowth found 16 rules (displaying top 10) 1. [fruit=t, frozen foods=t, biscuits=t, total=high]: 788 ==> [bread and cake=t]: 723 <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.35) 2. [fruit=t, baking needs=t, biscuits=t, total=high]: 760 ==> [bread and cake=t]: 696 <conf:(0.92)> lift:(1.27) lev:(0.03) conv:(3.28) ... We can observe that FP-growth found the same set of rules as Apriori; however, the time required to process larger datasets can be significantly shorter. Other applications in various areas We looked into affinity analysis to demystify shopping behavior patterns in supermarkets. Although, the roots of association rule learning are in analyzing point-of-sale transactions, they can be applied outside the retail industry to find relationships among other types of baskets. The notion of a basket can easily be extended to services and products, for example, to analyze items purchased using a credit card, such as rental cars and hotel rooms, and to analyze information on value-added services purchased by telecom customers (call waiting, call forwarding, DSL, speed call, and so on), which can help the operators determine the ways to improve their bundling of service packages. Additionally, we will look into the following examples of potential cross-industry applications: Medical diagnosis Protein sequences Census data Customer relationship management IT Operations Analytics Medical diagnosis Applying association rules in medical diagnosis can be used to assist physicians while curing patients. The general problem of the induction of reliable diagnostic rules is hard as, theoretically, no induction process can guarantee the correctness of induced hypotheses by itself. Practically, diagnosis is not an easy process as it involves unreliable diagnosis tests and the presence of noise in training examples. Nevertheless, association rules can be used to identify likely symptoms appearing together. A transaction, in this case, corresponds to a medical case, while symptoms correspond to items. When a patient is treated, a list of symptoms is recorded as one transaction. Protein sequences A lot of research has gone into understanding the composition and nature of proteins; yet many things remain to be understood satisfactorily. It is now generally believed that amino-acid sequences of proteins are not random. With association rules, it is possible to identify associations between different amino acids that are present in a protein. A protein is a sequences made up of 20 types of amino acids. Each protein has a unique three-dimensional structure, which depends on amino-acid sequence; slight change in the sequence may change the functioning of protein. To apply association rules, a protein corresponds to a transaction, while amino acids, their two grams and structure correspond to the items. Such association rules are desirable for enhancing our understanding of protein composition and hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in the proteins. Knowledge of these association rules or constraints is highly desirable for synthesis of artificial proteins. Census data Censuses make a huge variety of general statistical information about the society available to both researchers and general public. The information related to population and economic census can be forecasted in planning public services (education, health, transport, and funds) as well as in public business(for setting up new factories, shopping malls, or banks and even marketing particular products). To discover frequent patterns, each statistical area (for example, municipality, city, and neighborhood) corresponds to a transaction, and the collected indicators correspond to the items. Customer relationship management Association rules can reinforce the knowledge management process and allow the marketing personnel to know their customers well in order to provide better quality services. For example, association rules can be applied to detect a change of customer behavior at different time snapshots from customer profiles and sales data. The basic idea is to discover changes from two datasets and generate rules from each dataset to carry out rule matching. IT Operations Analytics Based on records of a large number of transactions, association rule learning is well-suited to be applied to the data that is routinely collected in day-to-day IT operations, enabling IT Operations Analytics tools to detect frequent patterns and identify critical changes. IT specialists need to see the big picture and understand, for example, how a problem on a database could impact an application server. For a specific day, IT operations may take in a variety of alerts, presenting them in a transactional database. Using an association rule learning algorithm, IT Operations Analytics tools can correlate and detect the frequent patterns of alerts appearing together. This can lead to a better understanding about how a component impacts another. With identified alert patterns, it is possible to apply predictive analytics. For example, a particular database server hosts a web application and suddenly an alert about a database is triggered. By looking into frequent patterns identified by an association rule learning algorithm, this means that the IT staff needs to take action before the web application is impacted. Association rule learning can also discover alert events originating from the same IT event. For example, every time a new user is added, six changes in the Windows operating systems are detected. Next, in the Application Portfolio Management (APM), IT may face multiple alerts, showing that the transactional time in a database as high. If all these issues originate from the same source (such as getting hundreds of alerts about changes that are all due to a Windows update), this frequent pattern mining can help to quickly cut through a number of alerts, allowing the IT operators to focus on truly critical changes. Summary In this article, you learned how to leverage association rules learning on transactional datasets to gain insight about frequent patterns We performed an affinity analysis in Weka and learned that the hard work lies in the analysis of results—careful attention is required when interpreting rules, as association (that is, correlation) is not the same as causation. Resources for Article: Further resources on this subject: Debugging Java Programs using JDB [article] Functional Testing with JMeter [article] Implementing AJAX Grid using jQuery data grid plugin jqGrid [article]

0
0
17050

article-image-getting-started-with-python-web-scraping

Amarabha Banerjee

20 Mar 2018

13 min read

Getting started with Python Web Scraping

Amarabha Banerjee

20 Mar 2018

13 min read

0
0
17002

article-image-working-with-pandas-dataframes

Sugandha Lahoti

23 Feb 2018

15 min read

Working with pandas DataFrames

Sugandha Lahoti

23 Feb 2018

15 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Python Data Analysis - Second Edition written by Armando Fandango. From this book, you will learn how to process and manipulate data with Python for complex data analysis and modeling. Code bundle for this article is hosted on GitHub.[/box] The popular open source Python library, pandas is named after panel data (an econometric term) and Python data analysis. We shall learn about basic panda functionalities, data structures, and operations in this article. The official pandas documentation insists on naming the project pandas in all lowercase letters. The other convention the pandas project insists on, is the import pandas as pd import statement. We will follow these conventions in this text. In this tutorial, we will install and explore pandas. We will also acquaint ourselves with the a central pandas data structure–DataFrame. Installing and exploring pandas The minimal dependency set requirements for pandas is given as follows: NumPy: This is the fundamental numerical array package that we installed and covered extensively in the preceding chapters python-dateutil: This is a date handling library pytz: This handles time zone definitions This list is the bare minimum; a longer list of optional dependencies can be located at http://pandas.pydata.org/pandas-docs/stable/install.html. We can install pandas via PyPI with pip or easy_install, using a binary installer, with the aid of our operating system package manager, or from the source by checking out the code. The binary installers can be downloaded from http://pandas.pydata.org/getpandas.html. The command to install pandas with pip is as follows: $ pip3 install pandas rpy2 rpy2 is an interface to R and is required because rpy is being deprecated. You may have to prepend the preceding command with sudo if your user account doesn't have sufficient rights. The pandas DataFrames A pandas DataFrame is a labeled two-dimensional data structure and is similar in spirit to a worksheet in Google Sheets or Microsoft Excel, or a relational database table. The columns in pandas DataFrame can be of different types. A similar concept, by the way, was invented originally in the R programming language. (For more information, refer to http://www.r-tutor.com/r-introduction/data-frame). A DataFrame can be created in the following ways: Using another DataFrame. Using a NumPy array or a composite of arrays that has a two-dimensional shape. Likewise, we can create a DataFrame out of another pandas data structure called Series. We will learn about Series in the following section. A DataFrame can also be produced from a file, such as a CSV file. From a dictionary of one-dimensional structures, such as one-dimensional NumPy arrays, lists, dicts, or pandas Series. As an example, we will use data that can be retrieved from http://www.exploredata.net/Downloads/WHO-Data-Set. The original data file is quite large and has many columns, so we will use an edited file instead, which only contains the first nine columns and is called WHO_first9cols.csv; the file is in the code bundle of this book. These are the first two lines, including the header: Country,CountryID,Continent,Adolescent fertility rate (%),Adult literacy rate (%),Gross national income per capita (PPP international $),Net primary school enrolment ratio female (%),Net primary school enrolment ratio male (%),Population (in thousands) totalAfghanistan,1,1,151,28,,,,26088 In the next steps, we will take a look at pandas DataFrames and its attributes: To kick off, load the data file into a DataFrame and print it on the screen: from pandas.io.parsers import read_csv df = read_csv("WHO_first9cols.csv") print("Dataframe", df) The printout is a summary of the DataFrame. It is too long to be displayed entirely, so we will just grab the last few lines: 199 21732.0 200 11696.0 201 13228.0 [202 rows x 9 columns] The DataFrame has an attribute that holds its shape as a tuple, similar to ndarray. Query the number of rows of a DataFrame as follows: print("Shape", df.shape) print("Length", len(df)) The values we obtain comply with the printout of the preceding step: Shape (202, 9) Length 202 Check the column header and data types with the other attributes: print("Column Headers", df.columns) print("Data types", df.dtypes) We receive the column headers in a special data structure: Column Headers Index([u'Country', u'CountryID', u'Continent', u'Adolescent fertility rate (%)', u'Adult literacy rate (%)', u'Gross national income per capita (PPP international $)', u'Net primary school enrolment ratio female (%)', u'Net primary school enrolment ratio male (%)', u'Population (in thousands) total'], dtype='object') The data types are printed as follows: 4. The pandas DataFrame has an index, which is like the primary key of relational database tables. We can either specify the index or have pandas create it automatically. The index can be accessed with a corresponding property, as follows: Print("Index", df.index) An index helps us search for items quickly, just like the index in this book. In our case, the index is a wrapper around an array starting at 0, with an increment of one for each row: Sometimes, we wish to iterate over the underlying data of a DataFrame. Iterating over column values can be inefficient if we utilize the pandas iterators. It's much better to extract the underlying NumPy arrays and work with those. The pandas DataFrame has an attribute that can aid with this as well: print("Values", df.values) Please note that some values are designated nan in the output, for 'not a number'. These values come from empty fields in the input datafile: The preceding code is available in Python Notebook ch-03.ipynb, available in the code bundle of this book. Querying data in pandas Since a pandas DataFrame is structured in a similar way to a relational database, we can view operations that read data from a DataFrame as a query. In this example, we will retrieve the annual sunspot data from Quandl. We can either use the Quandl API or download the data manually as a CSV file from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. If you want to install the API, you can do so by downloading installers from https://pypi.python.org/pypi/Quandl or by running the following command: $ pip3 install Quandl Using the API is free, but is limited to 50 API calls per day. If you require more API calls, you will have to request an authentication key. The code in this tutorial is not using a key. It should be simple to change the code to either use a key or read a downloaded CSV file. If you have difficulties, search through the Python docs at https://docs.python.org/2/. Without further preamble, let's take a look at how to query data in a pandas DataFrame: As a first step, we obviously have to download the data. After importing the Quandl API, get the data as follows: import quandl # Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual # PyPi url https://pypi.python.org/pypi/Quandl sunspots = quandl.get("SIDC/SUNSPOTS_A") The head() and tail() methods have a purpose similar to that of the Unix commands with the same name. Select the first n and last n records of a DataFrame, where n is an integer parameter: print("Head 2", sunspots.head(2) ) print("Tail 2", sunspots.tail(2)) This gives us the first two and last two rows of the sunspot data (for the sake of brevity we have not shown all the columns here; your output will have all the columns from the dataset): Head 2 Number Year 1700-12-31 5 1701-12-31 11 [2 rows x 1 columns] Tail 2 Number Year 2012-12-31 57.7 2013-12-31 64.9 [2 rows x 1 columns] Please note that we only have one column holding the number of sunspots per year. The dates are a part of the DataFrame index. The following is the query for the last value using the last date: last_date = sunspots.index[-1] print("Last value", sunspots.loc[last_date]) You can check the following output with the result from the previous step: Last value Number 64.9 Name: 2013-12-31 00:00:00, dtype: float64 Query the date with date strings in the YYYYMMDD format as follows: print("Values slice by date:n", sunspots["20020101": "20131231"]) This gives the records from 2002 through to 2013: Values slice by date Number Year 2002-12-31 104.0 [TRUNCATED] 2013-12-31 64.9 [12 rows x 1 columns] A list of indices can be used to query as well: print("Slice from a list of indices:n", sunspots.iloc[[2, 4, -4, -2]]) The preceding code selects the following rows: Slice from a list of indices Number Year 1702-12-31 16.0 1704-12-31 36.0 2010-12-31 16.0 2012-12-31 57.7 [4 rows x 1 columns] To select scalar values, we have two options. The second option given here should be faster. Two integers are required, the first for the row and the second for the column: print("Scalar with Iloc:", sunspots.iloc[0, 0]) print("Scalar with iat", sunspots.iat[1, 0]) This gives us the first and second values of the dataset as scalars: Scalar with Iloc 5.0 Scalar with iat 11.0 Querying with Booleans works much like the Where clause of SQL. The following code queries for values larger than the arithmetic mean. Note that there is a difference between when we perform the query on the whole DataFrame and when we perform it on a single column: print("Boolean selection", sunspots[sunspots > sunspots.mean()]) print("Boolean selection with column label:n", sunspots[sunspots['Number of Observations'] > sunspots['Number of Observations'].mean()]) The notable difference is that the first query yields all the rows, with some rows not conforming to the condition that has a value of NaN. The second query returns only the rows where the value is larger than the mean: Boolean selection Number Year 1700-12-31 NaN [TRUNCATED] 1759-12-31 54.0 ... [314 rows x 1 columns] Boolean selection with column label Number Year 1705-12-31 58.0 [TRUNCATED] 1870-12-31 139.1 ... [127 rows x 1 columns] The preceding example code is in the ch_03.ipynb file of this book's code bundle. Data aggregation with pandas DataFrames Data aggregation is a term used in the field of relational databases. In a database query, we can group data by the value in a column or columns. We can then perform various operations on each of these groups. The pandas DataFrame has similar capabilities. We will generate data held in a Python dict and then use this data to create a pandas DataFrame. We will then practice the pandas aggregation features: Seed the NumPy random generator to make sure that the generated data will not differ between repeated program runs. The data will have four columns: Weather (a string) Food (also a string) Price (a random float) Number (a random integer between one and nine) The use case is that we have the results of some sort of consumer-purchase research, combined with weather and market pricing, where we calculate the average of prices and keep a track of the sample size and parameters: import pandas as pd from numpy.random import seed from numpy.random import rand from numpy.random import rand_int import numpy as np seed(42) df = pd.DataFrame({'Weather' : ['cold', 'hot', 'cold','hot', 'cold', 'hot', 'cold'], 'Food' : ['soup', 'soup', 'icecream', 'chocolate', 'icecream', 'icecream', 'soup'], 'Price' : 10 * rand(7), 'Number' : rand_int(1, 9,)}) print(df) You should get an output similar to the following: Please note that the column labels come from the lexically ordered keys of the Python dict. Lexical or lexicographical order is based on the alphabetic order of characters in a string. Group the data by the Weather column and then iterate through the groups as follows: weather_group = df.groupby('Weather') i = 0 for name, group in weather_group: i = i + 1 print("Group", i, name) print(group) We have two types of weather, hot and cold, so we get two groups: The weather_group variable is a special pandas object that we get as a result of the groupby() method. This object has aggregation methods, which are demonstrated as follows: print("Weather group firstn", weather_group.first()) print("Weather group lastn", weather_group.last()) print("Weather group meann", weather_group.mean()) The preceding code snippet prints the first row, last row, and mean of each group: Just as in a database query, we are allowed to group on multiple columns. The groups attribute will then tell us the groups that are formed, as well as the rows in each group: wf_group = df.groupby(['Weather', 'Food']) print("WF Groups", wf_group.groups) For each possible combination of weather and food values, a new group is created. The membership of each row is indicated by their index values as follows: WF Groups {('hot', 'chocolate'): [3], ('cold', 'icecream'): [2, 4], ('hot', 'icecream'): [5], ('hot', 'soup'): [1], ('cold', 'soup'): [0, 6] 5. Apply a list of NumPy functions on groups with the agg() method: print("WF Aggregatedn", wf_group.agg([np.mean, np.median])) Obviously, we could apply even more functions, but it would look messier than the following output: Concatenating and appending DataFrames The pandas DataFrame allows operations that are similar to the inner and outer joins of database tables. We can append and concatenate rows as well. To practice appending and concatenating of rows, we will reuse the DataFrame from the previous section. Let's select the first three rows: print("df :3n", df[:3]) Check that these are indeed the first three rows: df :3 Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold The concat() function concatenates DataFrames. For example, we can concatenate a DataFrame that consists of three rows to the rest of the rows, in order to recreate the original DataFrame: print("Concat Back togethern", pd.concat([df[:3], df[3:]])) The concatenation output appears as follows: Concat Back together Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 3 chocolate 8 5.986585 hot 4 icecream 8 1.560186 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [7 rows x 4 columns] To append rows, use the append() function: print("Appending rowsn", df[:3].append(df[5:])) The result is a DataFrame with the first three rows of the original DataFrame and the last two rows appended to it: Appending rows Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [5 rows x 4 columns] Joining DataFrames To demonstrate joining, we will use two CSV files-dest.csv and tips.csv. The use case behind it is that we are running a taxi company. Every time a passenger is dropped off at his or her destination, we add a row to the dest.csv file with the employee number of the driver and the destination: EmpNr,Dest5,The Hague3,Amsterdam9,Rotterdam Sometimes drivers get a tip, so we want that registered in the tips.csv file (if this doesn't seem realistic, please feel free to come up with your own story): EmpNr,Amount5,109,57,2.5 Database-like joins in pandas can be done with either the merge() function or the join() DataFrame method. The join() method joins onto indices by default, which might not be what you want. In SQL a relational database query language we have the inner join, left outer join, right outer join, and full outer join. An inner join selects rows from two tables, if and only if values match, for columns specified in the join condition. Outer joins do not require a match, and can potentially return more rows. More information on joins can be found at http://en.wikipedia.org/wiki/Join_%28SQL%29. All these join types are supported by pandas, but we will only take a look at inner joins and full outer joins: A join on the employee number with the merge() function is performed as follows: print("Merge() on keyn", pd.merge(dests, tips, on='EmpNr')) This gives an inner join as the outcome: Merge() on key EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] Joining with the join() method requires providing suffixes for the left and right operands: print("Dests join() tipsn", dests.join(tips, lsuffix='Dest', rsuffix='Tips')) This method call joins index values so that the result is different from an SQL inner join: Dests join() tips EmpNrDest Dest EmpNrTips Amount 0 5 The Hague 5 10.0 1 3 Amsterdam 9 5.0 2 9 Rotterdam 7 2.5 [3 rows x 4 columns] An even more explicit way to execute an inner join with merge() is as follows: print("Inner join with merge()n", pd.merge(dests, tips, how='inner')) The output is as follows: Inner join with merge() EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] To make this a full outer join requires only a small change: print("Outer joinn", pd.merge(dests, tips, how='outer')) The outer join adds rows with NaN values: Outer join EmpNr Dest Amount 0 5 The Hague 10.0 1 3 Amsterdam NaN 2 9 Rotterdam 5.0 3 7 NaN 2.5 [4 rows x 3 columns] In a relational database query, these values would have been set to NULL. The demo code is in the ch-03.ipynb file of this book's code bundle. We learnt how to perform various data manipulation techniques such as aggregating, concatenating, appending, cleaning, and handling missing values, with pandas. If you found this post useful, check out the book Python Data Analysis - Second Edition to learn advanced topics such as signal processing, textual data analysis, machine learning, and more.

0
0
16966

article-image-using-firebase-real-time-database

Oliver Blumanski

18 Jan 2017

5 min read

Using the Firebase Real-Time Database

Oliver Blumanski

18 Jan 2017

5 min read

In this post, we are going to look at how to use the Firebase real-time database, along with an example. Here we are writing and reading data from the database using multiple platforms. To do this, we first need a server script that is adding data, and secondly we need a component that pulls the data from the Firebase database. Step 1 - Server Script to collect data Digest an XML feed and transfer the data into the Firebase real-time database. The script runs as cronjob frequently to refresh the data. Step 2 - App Component Subscribe to the data from a JavaScript component, in this case, React-Native. About Firebase Now that those two steps are complete, let's take a step back and talk about Google Firebase. Firebase offers a range of services such as a real-time database, authentication, cloud notifications, storage, and much more. You can find the full feature list here. Firebase covers three platforms: iOS, Android, and Web. The server script uses the Firebases JavaScript Web API. Having data in this real-time database allows us to query the data from all three platforms (iOS, Android, Web), and in addition, the real-time database allows us to subscribe (listen) to a database path (query), or to query a path once. Step 1 - Digest XML feed and transfer into Firebase Firebase Set UpThe first thing you need to do is to set up a Google Firebase project here In the app, click on "Add another App" and choose Web, a pop-up will show you the configuration. You can copy paste your config into the example script. Now you need to set the rules for your Firebase database. You should make yourself familiar with the database access rules. In my example, the path latestMarkets/ is open for write and read. In a real-world production app, you would have to secure this, having authentication for the write permissions. Here are the database rules to get started: { "rules": { "users": { "$uid": { ".read": "$uid === auth.uid", ".write": "$uid === auth.uid" } }, "latestMarkets": { ".read": true, ".write": true } } } The Server Script Code The XML feed contains stock market data and is frequently changing, except on the weekend. To build the server script, some NPM packages are needed: Firebase Request xml2json babel-preset-es2015 Require modules and configure Firebase web api: const Firebase = require('firebase'); const request = require('request'); const parser = require('xml2json'); // firebase access config const config = { apiKey: "apikey", authDomain: "authdomain", databaseURL: "dburl", storageBucket: "optional", messagingSenderId: "optional" } // init firebase Firebase.initializeApp(config) [/Code] I write JavaScript code in ES6. It is much more fun. It is a simple script, so let's have a look at the code that is relevant to Firebase. The code below is inserting or overwriting data in the database. For this script, I am happy to overwrite data: Firebase.database().ref('latestMarkets/'+value.Symbol).set({ Symbol: value.Symbol, Bid: value.Bid, Ask: value.Ask, High: value.High, Low: value.Low, Direction: value.Direction, Last: value.Last }) .then((response) => { // callback callback(true) }) .catch((error) => { // callback callback(error) }) Firebase Db first references the path: Firebase.database().ref('latestMarkets/'+value.Symbol) And then the action you want to do: // insert/overwrite (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).set({}).then((result)) // get data once (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).once('value').then((snapshot)) // listen to db path, get data on change (callback) Firebase.database().ref('latestMarkets/'+value.Symbol).on('value', ((snapshot) => {}) // ...... Here is the Github repository: Displaying the data in a React-Native app This code below will listen to a database path, on data change, all connected devices will synchronise the data: Firebase.database().ref('latestMarkets/').on('value', snapshot => { // do something with snapshot.val() }) To close the listener, or unsubscribe the path, one can use "off": Firebase.database().ref('latestMarkets/').off() I’ve created an example react-native app to display the data: The Github repository Conclusion In mobile app development, one big question is: "What database and cache solution can I use to provide online and offline capabilities?" One way to look at this question is like you are starting a project from scratch. If so, you can fit your data into Firebase, and then this would be a great solution for you. Additionally, you can use it for both web and mobile apps. The great thing is that you don't need to write a particular API, and you can access data straight from JavaScript. On the other hand, if you have a project that uses MySQL for example, the Firebase real-time database won't help you much. You would need to have a remote API to connect to your database in this case. But even if using the Firebase database isn't a good fit for your project, there are still other features, such as Firebase Storage or Cloud Messaging, which are very easy to use, and even though they are beyond the scope of this post, they are worth checking out. About the author Oliver Blumanski is a developer based out of Townsville, Australia. He has been a software developer since 2000, and can be found on GitHub at @blumanski.

0
0
16942

article-image-sequence-generator-transformation-informatica-powercenter

Savia Lobo

12 Dec 2017

7 min read

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Savia Lobo

12 Dec 2017

7 min read

0
0
16929

article-image-facebook-fails-to-block-ecj-data-security-case-from-proceeding

Guest Contributor

19 Jun 2019

6 min read

Facebook fails to block ECJ data security case from proceeding

Guest Contributor

19 Jun 2019

6 min read

This July, the European Court of Justice (ECJ) in Luxembourg will now hear a case to answer questions on whether the American government's surveillance, Privacy Shield and Standard Contract Clauses, during EU-US data transfers, provides adequate protection of EU citizen's personal information. The ECJ set the case hearing after the supreme court of Ireland — where Facebook's international headquarters is located — decided, on Friday, May 31st, 2019, to dismiss an appeal by Facebook to block the data security case from progressing to the ECJ. The Austrian Supreme Court has also recently rejected Facebook’s bid to stop a similar case. If Europe's Court of Justice makes a ruling against the current legal arrangements, this would majorly impact thousands of companies, which make millions of data transfers every day. Companies potentially affected, include human resources databases, storage of internet browsing histories and credit card companies. Background on this case The case started with the Austrian privacy lawyer and campaigner, Max Schrems. In 2013, Schrems made a complaint regarding concerns that US surveillance programs like the PRISM system were accessing the data of European Facebook users, as whistleblower Edward Snowden described. His concerns also dealt with Facebook’s use of a separate data transfer mechanism — Standard Contractual Clauses (SCCs). Around the time Snowden disclosed about the US government's mass surveillance programs, Schrems also challenged the legality of the prior EU-US data transfer arrangement, Safe Harbor, eventually bringing it down. After Schrems stated that the transfer of his data by Facebook to the US infringed upon his rights as an EU citizen, Ireland's High Court ruled, in 2017, that the US government partook in "mass indiscriminate processing of data" and deferred concerns to the European Court of Justice. Then, in October of last year, the High Court referred this case to the ECJ based on the Data Protection Commissioner's "well-founded" concerns about whether or not US law provided adequate protection for EU citizens' data privacy rights. Within all of this, there also exist people questioning the compatibility between US law which focuses on national security and EU law which aims for personal privacy. Whistleblowers like Edward Snowden played a role in what has lead up to this case, and whistleblower attorneys and paraprofessionals continue working to expose fraud against the government through means of the False Claims Acts (FCA). Why Facebook appealed the case Although Irish law doesn't require an appeal against CJEU referrals, Facebook chose to stay and appeal the decision anyway, aiming to keep it from progressing to court. The court denied them the stay but granted them leave to appeal last year. Keep in mind that Facebook was already under a lot of scrutiny after playing a part in the Cambridge Analytica data scandal, which showed that up to 87 million users faced having their data compromised by Cambridge Analytica. One of the reasons Facebook said it wanted to block this case from progressing was that the High Court failed to regard the 'Privacy Shield' decision. Under the Privacy Shield decision, the European Commission had approved the use of certain EU-US data transfer channels. Another main issue here was whether Facebook actually had the legal rights to appeal a referral to the ECJ. Privacy Shield is also in question by French digital rights groups who claim it disrupts fundamental EU rights and will be heard by the General Court of the EU in July. Why the appeal was dismissed The five-judge High Court, headed by the Chief Justice Frank Clarke, decided they cannot entertain an appeal over the referral decision itself. In addition, he said Facebook’s criticisms related to the “proper characterization” of underlying facts rather than the facts themselves. If there had been any actual finding of facts not sustainable on the evidence before the High Court per Irish procedural law, he would have overturned it, but no such matter had been established on this appeal, he ruled. "Joint Control" and its possible impact on the case In June 2018, after a Facebook fan page was found to have been allowing visitor data to be collected by Facebook via a cookie on the fan page, without informing visitors, The Federal Administrative Court of Germany referred the case to ECJ. This resulted in the ECJ deciding to deem joint responsibility between social media networks and administrators in the processing of visitor data. The ECJ´s ruling, in this case, has consequences not only for Facebook sites but for other situations where more than one company or administrator plays an active role in the data processing. The concept of “joint control” is now on the table, and further decisions of authorities and courts in this area are likely. What's next for data security Currently, Facebook also faces questioning by Ireland's Data Protection Commission over numerous potential infringements of strict European privacy laws that the new General Data Protection Regulation (GDPR) outlines. Facebook, however, already stated it will take the necessary steps to ensure the site operators can comply with the GDPR. There have even been pleas for Global Data Laws. A common misconception exists that only big organizations, governments and businesses are at risk for data security breaches, but this is simply not true. Data security is important for everyone — now more than ever. Your computer, tablet and mobile devices could be affected by attackers for their sensitive information, such as credit card details, banking details and passwords, by way of phishing attacks, malware attacks, ransomware attacks, man-in-the-middle attacks and more. Therefore, bringing continual awareness to these US and global data security issues will enable stricter laws to be put in place. Kayla Matthews writes about big data, cybersecurity and technology. You can find her work on The Week, Information Age, KDnuggets and CloudTweaks, or over at ProductivityBytes.com. Facebook releases Pythia, a deep learning framework for vision and language multimodal research Zuckberg just became the target of the world’s first high profile white hat deepfake op. Can Facebook come out unscathed? US regulators plan to probe Google on anti-trust issues; Facebook, Amazon & Apple also under legal scrutiny

0
0
16864

article-image-25-startups-machine-learning-differently-2018

Fatema Patrawala

29 Dec 2017

14 min read

25 Startups using machine learning differently in 2018: From farming to brewing beer to elder care

Fatema Patrawala

29 Dec 2017

14 min read

What really excites me about data science and by extension machine learning is the sheer number of possibilities! You can think of so many applications off the top of your head: robo-advisors, computerized lawyers, digital medicine, even automating VC decisions when they invest in startups. You can even venture into automation of art and music, algorithms writing papers which are indistinguishable from human-written papers. It's like solving a puzzle, but a puzzle that's meaningful and that has real world implications. The things that we can do today weren’t possible 5 years ago, and this is largely thanks to growth in computational power, data availability, and the adoption of the cloud that made accessing these resources economical for everyone, all key enabling factors for the advancement of Machine learning and AI. Having witnessed the growth of data science as discipline, industries like finance, health-care, education, media & entertainment, insurance, retail as well as energy has left no stone unturned to harness this opportunity. Data science has the capability to offer even more; and we will see the wide range of applications in the future in places haven’t even been explored. In the years to come, we will increasingly see data powered/AI enabled products and services take on roles traditionally handled by humans as they required innately human qualities to successfully perform. In this article we have covered some use cases of Data Science being used differently and start-ups who have practically implemented it: The Nurturer: For elder care The world is aging rather rapidly. According to the World Health Organization, nearly two billion people across the world are expected to be over 60 years old by 2050, a figure that’s more than triple what it was in 2000. In order to adapt to their increasingly aging population, many countries have raised the retirement age, reducing pension benefits, and have started spending more on elderly care. Research institutions in countries like Japan, home to a large elderly population, are focusing their R&D efforts on robots that can perform tasks like lifting and moving chronically ill patients, many startups are working on automating hospital logistics and bringing in virtual assistance. They also offer AI-based virtual assistants to serve as middlemen between nurses and patients, reducing the need for frequent in-hospital visits. Dr Ben Maruthappu, a practising doctor, has brought a change to the world of geriatric care with an AI based app Cera. It is an on-demand platform to aid the elderly in need. The Cera app firmly puts itself in the category of Uber & Amazon, whereby it connects elderly people in need of care with a caregiver in a matter of few hours. The team behind this innovation also plans to use AI to track patients’ health conditions and reduce the number of emergency patients admitted in hospitals. A social companion technology - Elliq created by Intuition Robotics helps older adults stay active and engaged with a proactive social robot that overcomes the digital divide. AliveCor, a leading FDA-cleared mobile heart solution helps save lives, money, and has brought modern healthcare alive into the 21st century. The Teacher: Personalized education platform for lifelong learning With children increasingly using smartphones and tablets and coding becoming a part of national curricula around the world, technology has become an integral part of classrooms. We have already witnessed the rise and impact of education technology especially through a multitude of adaptive learning platforms that allow learners to strengthen their skills and knowledge - CBTs, LMSes, MOOCs and more. And now virtual reality (VR) and artificial intelligence (AI) are gaining traction to provide us with lifelong learning companion that can accompany and support individuals throughout their studies - in and beyond school . An AI based educational platform learns the amount of potential held by each particular student. Based on this data, tailored guidance is provided to fix mistakes and improvise on the weaker areas. A detailed report can be generated by the teachers to help them customise lesson plans to best suit the needs of the student. Take Gruff Davies’ Kwiziq for example. Gruff with his team leverage AI to provide a personalised learning experience for students based on their individual needs. Students registered on the platform get an advantage of an AI based language coach which asks them to solve various micro quizzes. Quiz solutions provided by students are then turned into detailed “brain maps”. These brain maps are further used to provide tailored instructions and feedback for improvement. Other startup firms like Blippar specialize in Augmented reality for visual and experiential learning. Unelma Platforms, a software platform development company provides state-of-the-art software for higher-education, healthcare and business markets. The Provider: Farming to be more productive, sustainable and advanced Though farming is considered the backbone of many national economies especially in the developing world, there is often an outdated view of it involving a small, family-owned lands where crops are hand harvested. The reality of modern-day farms have had to overhaul operations to meet demand and remain competitively priced while adapting to the ever-changing ways technology is infiltrating all parts of life. Climate change is a serious environmental threat farmers must deal with every season: Strong storms and severe droughts have made farming even more challenging. Additionally lack of agricultural input, water scarcity, over-chemicalization in fertilizers, water & soil pollution or shortage of storage systems has made survival for farmers all the more difficult. To overcome these challenges, smart farming techniques are the need of an hour for farmers in order to manage resources and sustain in the market. For instance, in a paper published by arXiv, the team explains how they used a technique known as transfer learning to teach the AI how to recognize crop diseases and pest damage.They utilized TensorFlow, to build and train a neural network of their own, which involved showing the AI 2,756 images of cassava leaves from plants in Tanzania. Their efforts were a success, as the AI was able to correctly identify brown leaf spot disease with 98 percent accuracy. WeFarm, SaaS based agritech firm, headquartered in London, aims to bridge the connectivity gap amongst the farmer community. It allows them to send queries related to farming via text message which is then shared online into several languages. The farmer then receives a crowdsourced response from other farmers around the world. In this way, a particular farmer in Kenya can get a solution from someone sitting in Uganda, without having to leave his farm, spend additional money or without accessing the internet. Benson Hill Bio-systems, by Matthew B. Crisp, former President of Agricultural Biotechnology Division, has differentiated itself by bringing the power of Cloud Biology™ to agriculture. It combines cloud computing, big data analytics, and plant biology to inspire innovation in agriculture. At the heart of Benson Hill is CropOS™, a cognitive engine that integrates crop data and analytics with the biological expertise and experience of the Benson Hill scientists. CropOS™ continuously advances and improves with every new dataset, resulting in the strengthening of the system’s predictive power. Firms like Plenty Inc and Bowery Farming Inc are nowhere behind in offering smart farming solutions. Plenty Inc is an agriculture technology company that develops plant sciences for crops to flourish in a pesticide and GMO-free environment. While Bowery Farming uses high-tech approaches such as robotics, LED lighting and data analytics to grow leafy greens indoors. The Saviour: For sustainability and waste management The global energy landscape continues to evolve, sometimes by the nanosecond, sometimes by the day. The sector finds itself pulled to economize and pushed to innovate due to a surge in demand for new power and utilities offerings. Innovations in power-sector technology, such as new storage battery options and smartphone-based thermostat apps, AI enabled sensors etc; are advancing at a pace that has surprised developers and adopters alike. Consumer’s demands for such products have increased. To meet this, industry leaders are integrating those innovations into their operations and infrastructure as rapidly as they can. On the other hand, companies pursuing energy efficiency have two long-standing goals — gaining a competitive advantage and boosting the bottom line — and a relatively new one: environmental sustainability. Realising the importance of such impending situations in the industry, we have startups like SmartTrace offering an innovative cloud-based platform to quickly manage waste at multiple levels. This includes bridging rough data from waste contractors, extrapolating to volume, EWC, finance and Co2 statistics. Data extracted acts as a guide to improve methodology, educate, strengthen oversight and direct improvements to the bottom line, as well as environmental outcomes. One Concern provides damage estimates using artificial intelligence on natural phenomena sciences. Autogrid organizes energy data and employs big data analytics to generate real-time predictions to create actionable data. The Dreamer: For lifestyle and creative product development and design Consumers in our modern world continually make multiple decisions with regard to product choice due to many competing products in the market.Often those choices boil down to whether it provides better value than others either in terms of product quality, price or by aligning with their personal beliefs and values.Lifestyle products and brands operate off ideologies, hoping to attract a relatively high number of people and ultimately becoming a recognized social phenomenon. While ecommerce has leveraged data science to master the price dimension, here are some examples of startups trying to deconstruct the other two dimensions: product development and branding. I wonder if you have ever imagined your beer to be brewed by AI? Well, now you can with IntelligentX. The Intelligent X team claim to have invented the world's first beer brewed by Artificial intelligence. They also plan to craft a premium beer using complex machine learning algorithms which can improve itself from the feedback given by its customers. Customers are given to try one of their four bottled conditioned beers, after the trial they are asked by their AI what they think of the beer, via an online feedback messenger. The data then collected is used by an algorithm to brew the next batch. Because their AI is constantly reacting to user feedback, they can brew beer that matches what customers want, more quickly than anyone else can. What this actually means that the company gets more data and customers get a customized fresh beer! In the lifestyle domain, we have Stitch Fix which has brought a personal touch to the online shopping journey. They are no regular other apparel e-commerce company. They have created a perfect formula for blending human expertise with the right amount of Data Science to serve their customers. According to Katrina Lake, Founder, and CEO, "You can look at every product on the planet, but trying to figure out which one is best for you is really the challenge” and that’s where Stitch Fix has come into the picture. The company is disrupting traditional retail by bridging the gap of personalized shopping, that the former could not achieve. To know how StitchFix uses Full Stack Data Science read our detailed article. The Writer: From content creation to curation to promotion In the publishing industry, we have seen a digital revolution coming in too. Echobox are one of the pioneers in building AI for the publishing industry. Antoine Amann, founder of Echobox, wrote in a blog post that they have "developed an AI platform that takes large quantity of variables into account and analyses them in real time to determine optimum post performance". Echobox pride itself to currently work with Facebook and Twitter for optimizing social media content, perform advanced analytics with A/B testing and also curate content for desired CTRs. With global client base like The Le Monde, The Telegraph, The Guardian etc. they have conveniently ripped social media editors. New York-based startup Agolo uses AI to create real-time summaries of information. It initially use to curate Twitter feeds in order to focus on conversations, tweets and hashtags that were most relevant to its user's preferences. Using natural language processing, Agolo scans content, identifies relationships among the information sources, and picks out the most relevant information, all leading to a comprehensive summary of the original piece of information. Other websites like Grammarly, offers AI-powered solutions to help people write, edit and formulate mistake-free content. Textio came up with augmented writing which means every time you wrote something and you would come to know ahead of time exactly who is going to respond. It basically means writing which is supported by outcomes in real time. Automated Insights, Creator of Wordsmith, the natural language generation platform enables you to produce human-sounding narratives from data. The Matchmaker: Connecting people, skills and other entities AI will make networking at B2B events more fun and highly productive for business professionals. Grip, a London based startup, formerly known as Network, rebranded itself in the month of April, 2016. Grip is using AI as a platform to make networking at events more constructive and fruitful. It acts as a B2B matchmaking engine that accumulates data from social accounts (LinkedIn, Twitter) and smartly matches the event registration data. Synonymous to Tinder for networking, Grip uses advanced algorithms to recommend the right people and presents them with an easy to use swiping interface feature. It also delivers a detailed report to the event organizer on the success of the event for every user or a social Segment. We are well aware of the data scientist being the sexiest job of the 21st century. JamieAi harnessing this fact connects technical talent with data-oriented jobs organizations of all types and sizes. The start-up firm has combined AI insights and human oversight to reduce hiring costs and eliminate bias. Also, third party recruitment agencies are removed from the process to boost transparency and efficiency in the path to employment. Another example is Woo.io, a marketplace for matching tech professionals and companies. The Manager: Virtual assistants of a different kind Artificial Intelligence can also predict how much your household appliance will cost on your electricity bill. Verv, a producer of clever home energy assistance provides intelligent information on your household appliances. It helps its customers with a significant reduction on their electricity bills and carbon footprints. The technology uses machine learning algorithms to provide real-time information by learning how much power and money each device is using. Not only this, it can also suggest eco-friendly alternatives, alert homeowners of appliances in use for a longer duration and warn them of any dangerous activity when they aren’t present at home. Other examples include firms like Maana which manages machines and improves operational efficiencies in order to make fast data driven decisions. Gong.io, acts as a sales representative’s assistant to understand sales conversations resulting into actionable insights. ObEN, creates complete virtual identities for consumers and celebrities in the emerging digital world. The Motivator: For personal and business productivity and growth A super cross-functional company Perkbox, came up with an employee engagement platform. Saurav Chopra founder of Perkbox believes teams perform their best when they are happy and engaged! Hence, Perkbox helps companies boost employee motivation and create a more inspirational atmosphere to work. The platform offers gym services, dental discounts and rewards for top achievers in the team to firms in UK. Perkbox offers a wide range of perks, discounts and tools to help organizations retain and motivate their employees. Technologies like AWS and Kubernetes allow to closely knit themselves with their development team. In order to build, scale and support Perkbox application for the growing number of user base. So, these are some use cases where we found startups using data science and machine learning differently. Do you know of others? Please share them in the comments below.

0
0
16848

article-image-write-first-blockchain-learning-solidity-programming-15-minutes

Aaron Lazar

03 Jan 2018

15 min read

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Aaron Lazar

03 Jan 2018

15 min read

[box type="note" align="" class="" width=""]This post is a book extract from the title Mastering Blockchain, authored by Imran Bashir. The book begins with the technical foundations of blockchain, teaching you the fundamentals of cryptography and how it keeps data secure.[/box] Our article aims to quickly get you up to speed with Blockchain development using the Solidity Programming language. Introducing solidity Solidity is a domain-specific language of choice for programming contracts in Ethereum. There are, however, other languages, such as serpent, Mutan, and LLL but solidity is the most popular at the time of writing this. Its syntax is closer to JavaScript and C. Solidity has evolved into a mature language over the last few years and is quite easy to use, but it still has a long way to go before it can become advanced and feature-rich like other well established languages. Nevertheless, this is the most widely used language available for programming contracts currently. It is a statically typed language, which means that variable type checking in solidity is carried out at compile time. Each variable, either state or local, must be specified with a type at compile time. This is beneficial in the sense that any validation and checking is completed at compile time and certain types of bugs, such as interpretation of data types, can be caught earlier in the development cycle instead of at run time, which could be costly, especially in the case of the blockchain/smart contracts paradigm. Other features of the language include inheritance, libraries, and the ability to define composite data types. Solidity is also a called contract-oriented language. In solidity, contracts are equivalent to the concept of classes in other object-oriented programming languages. Types Solidity has two categories of data types: value types and reference types. Value types These are explained in detail here. Boolean This data type has two possible values, true or false, for example: bool v = true; This statement assigns the value true to v. Integers This data type represents integers. A table is shown here, which shows various keywords used to declare integer data types. For example, in this code, note that uint is an alias for uint256: uint256 x; uint y; int256 z; These types can also be declared with the constant keyword, which means that no storage slot will be reserved by the compiler for these variables. In this case, each occurrence will be replaced with the actual value: uint constant z=10+10; State variables are declared outside the body of a function, and they remain available throughout the contract depending on the accessibility assigned to them and as long as the contract persists. Address This data type holds a 160-bit long (20 byte) value. This type has several members that can be used to interact with and query the contracts. These members are described here: Balance The balance member returns the balance of the address in Wei. Send This member is used to send an amount of ether to an address (Ethereum's 160-bit address) and returns true or false depending on the result of the transaction, for example, the following: address to = 0x6414cc08d148dce9ebf5a2d0b7c220ed2d3203da; address from = this; if (to.balance < 10 && from.balance > 50) to.send(20); Call functions The call, callcode, and delegatecall are provided in order to interact with functions that do not have Application Binary Interface (ABI). These functions should be used with caution as they are not safe to use due to the impact on type safety and security of the contracts. Array value types (fixed size and dynamically sized byte arrays) Solidity has fixed size and dynamically sized byte arrays. Fixed size keywords range from bytes1 to bytes32, whereas dynamically sized keywords include bytes and strings. bytes are used for raw byte data and string is used for strings encoded in UTF-8. As these arrays are returned by the value, calling them will incur gas cost. length is a member of array value types and returns the length of the byte array. An example of a static (fixed size) array is as follows: bytes32[10] bankAccounts; An example of a dynamically sized array is as follows: bytes32[] trades; Get length of trades: trades.length; Literals These are used to represent a fixed value. Integer literals Integer literals are a sequence of decimal numbers in the range of 0-9. An example is shown as follows: uint8 x = 2; String literals String literals specify a set of characters written with double or single quotes. An example is shown as follows: 'packt' "packt” Hexadecimal literals Hexadecimal literals are prefixed with the keyword hex and specified within double or single quotation marks. An example is shown as follows: (hex'AABBCC'); Enums This allows the creation of user-defined types. An example is shown as follows: enum Order{Filled, Placed, Expired }; Order private ord; ord=Order.Filled; Explicit conversion to and from all integer types is allowed with enums. Function types There are two function types: internal and external functions. Internal functions These can be used only within the context of the current contract. External functions External functions can be called via external function calls. A function in solidity can be marked as a constant. Constant functions cannot change anything in the contract; they only return values when they are invoked and do not cost any gas. This is the practical implementation of the concept of call as discussed in the previous chapter. The syntax to declare a function is shown as follows: function <nameofthefunction> (<parameter types> <name of the variable>) {internal|external} [constant] [payable] [returns (<return types> <name of the variable>)] Reference types As the name suggests, these types are passed by reference and are discussed in the following section. Arrays Arrays represent a contiguous set of elements of the same size and type laid out at a memory location. The concept is the same as any other programming language. Arrays have two members named length and push: uint[] OrderIds; Structs These constructs can be used to group a set of dissimilar data types under a logical group. These can be used to define new types, as shown in the following example: Struct Trade { uint tradeid; uint quantity; uint price; string trader; } Data location Data location specifies where a particular complex data type will be stored. Depending on the default or annotation specified, the location can be storage or memory. This is applicable to arrays and structs and can be specified using the storage or memory keywords. As copying between memory and storage can be quite expensive, specifying a location can be helpful to control the gas expenditure at times. Calldata is another memory location that is used to store function arguments. Parameters of external functions use calldata memory. By default, parameters of functions are stored in memory, whereas all other local variables make use of storage. State variables, on the other hand, are required to use storage. Mappings Mappings are used for a key to value mapping. This is a way to associate a value with a key. All values in this map are already initialized with all zeroes, for example, the following: mapping (address => uint) offers; This example shows that offers is declared as a mapping. Another example makes this clearer: mapping (string => uint) bids; bids["packt"] = 10; This is basically a dictionary or a hash table where string values are mapped to integer values. The mapping named bids has a packt string value mapped to value 10. Global variables Solidity provides a number of global variables that are always available in the global namespace. These variables provide information about blocks and transactions. Additionally, cryptographic functions and address-related variables are available as well. A subset of available functions and variables is shown as follows: keccak256(...) returns (bytes32) This function is used to compute the keccak256 hash of the argument provided to the Function: ecrecover(bytes32 hash, uint8 v, bytes32 r, bytes32 s) returns (address) This function returns the associated address of the public key from the elliptic curve signature: block.number This returns the current block number. Control structures Control structures available in solidity are if - else, do, while, for, break, continue, return. They work in a manner similar to how they work in C-language or JavaScript. Events Events in solidity can be used to log certain events in EVM logs. These are quite useful when external interfaces are required to be notified of any change or event in the contract. These logs are stored on the blockchain in transaction logs. Logs cannot be accessed from the contracts but are used as a mechanism to notify change of state or the occurrence of an event (meeting a condition) in the contract. In a simple example here, the valueEvent event will return true if the x parameter passed to function Matcher is equal to or greater than 10: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } Inheritance Inheritance is supported in solidity. The is keyword is used to derive a contract from another contract. In the following example, valueChecker2 is derived from the valueChecker contract. The derived contract has access to all nonprivate members of the parent contract: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } contract valueChecker2 is valueChecker { function Matcher2() returns (uint) { return price + 10; } } In the preceding example, if uint8 price = 10 is changed to uint8 private price = 10, then it will not be accessible by the valuechecker2 contract. This is because now the member is declared as private, it is not allowed to be accessed by any other contract. Libraries Libraries are deployed only once at a specific address and their code is called via CALLCODE/DELEGATECALL Opcode of the EVM. The key idea behind libraries is code reusability. They are similar to contracts and act as base contracts to the calling contracts. A library can be declared as shown in the following example: library Addition { function Add(uint x,uint y) returns (uint z) { return x + y; } } This library can then be called in the contract, as shown here. First, it needs to be imported and it can be used anywhere in the code. A simple example is shown as follows: Import "Addition.sol" function Addtwovalues() returns(uint) { return Addition.Add(100,100); } There are a few limitations with libraries; for example, they cannot have state variables and cannot inherit or be inherited. Moreover, they cannot receive Ether either; this is in contrast to contracts that can receive Ether. Functions Functions in solidity are modules of code that are associated with a contract. Functions are declared with a name, optional parameters, access modifier, optional constant keyword, and optional return type. This is shown in the following example: function orderMatcher(uint x) private constant returns(bool returnvalue) In the preceding example, function is the keyword used to declare the function. orderMatcher is the function name, uint x is an optional parameter, private is the access modifier/specifier that controls access to the function from external contracts, constant is an optional keyword used to specify that this function does not change anything in the contract but is used only to retrieve values from the contract instead, and returns (bool returnvalue) is the optional return type of the function. How to define a function: The syntax of defining a function is shown as follows: function <name of the function>(<parameters>) <visibility specifier> returns (<return data type> <name of the variable>) { <function body> } Function signature: Functions in solidity are identified by its signature, which is the first four bytes of the keccak-256 hash of its full signature string. This is also visible in browser solidity, as shown in the following screenshot. D99c89cb is the first four bytes of 32 byte keccak-256 hash of the function named Matcher. In this example function, Matcher has the signature hash of d99c89cb. This information is useful in order to build interfaces. Input parameters of a function: Input parameters of a function are declared in the form of <data type> <parameter name>. This example clarifies the concept where uint x and uint y are input parameters of the checkValues function: contract myContract { function checkValues(uint x, uint y) { } } Output parameters of a function: Output parameters of a function are declared in the form of <data type> <parameter name>. This example shows a simple function returning a uint value: contract myContract { Function getValue() returns (uint z) { z=x+y; } } A function can return multiple values. In the preceding example function, getValue only returns one value, but a function can return up to 14 values of different data types. The names of the unused return parameters can be omitted optionally. Internal function calls: Functions within the context of the current contract can be called internally in a direct manner. These calls are made to call the functions that exist within the same contract. These calls result in simple JUMP calls at the EVM byte code level. External function calls: External function calls are made via message calls from a contract to another contract. In this case, all function parameters are copied to the memory. If a call to an internal function is made using the this keyword, it is also considered an external call. The this variable is a pointer that refers to the current contract. It is explicitly convertible to an address and all members for a contract are inherited from the address. Fall back functions: This is an unnamed function in a contract with no arguments and return data. This function executes every time ether is received. It is required to be implemented within a contract if the contract is intended to receive ether; otherwise, an exception will be thrown and ether will be returned. This function also executes if no other function signatures match in the contract. If the contract is expected to receive ether, then the fall back function should be declared with the payable modifier. The payable is required; otherwise, this function will not be able to receive any ether. This function can be called using the address.call() method as, for example, in the following: function () { throw; } In this case, if the fallback function is called according to the conditions described earlier; it will call throw, which will roll back the state to what it was before making the call. It can also be some other construct than throw; for example, it can log an event that can be used as an alert to feed back the outcome of the call to the calling application. Modifier functions: These functions are used to change the behavior of a function and can be called before other functions. Usually, they are used to check some conditions or verification before executing the function. _(underscore) is used in the modifier functions that will be replaced with the actual body of the function when the modifier is called. Basically, it symbolizes the function that needs to be guarded. This concept is similar to guard functions in other languages. Constructor function: This is an optional function that has the same name as the contract and is executed once a contract is created. Constructor functions cannot be called later on by users, and there is only one constructor allowed in a contract. This implies that no overloading functionality is available. Function visibility specifiers (access modifiers): Functions can be defined with four access specifiers as follows: External: These functions are accessible from other contracts and transactions. They cannot be called internally unless the this keyword is used. Public: By default, functions are public. They can be called either internally or using messages. Internal: Internal functions are visible to other derived contracts from the parent contract. Private: Private functions are only visible to the same contract they are declared in. Other important keywords/functions throw: throw is used to stop execution. As a result, all state changes are reverted. In this case, no gas is returned to the transaction originator because all the remaining gas is consumed. Layout of a solidity source code file Version pragma In order to address compatibility issues that may arise from future versions of the solidity compiler version, pragma can be used to specify the version of the compatible compiler as, for example, in the following: pragma solidity ^0.5.0 This will ensure that the source file does not compile with versions smaller than 0.5.0 and versions starting from 0.6.0. Import Import in solidity allows the importing of symbols from the existing solidity files into the current global scope. This is similar to import statements available in JavaScript, as for example, in the following: Import "module-name"; Comments Comments can be added in the solidity source code file in a manner similar to C-language. Multiple line comments are enclosed in /* and */, whereas single line comments start with //. An example solidity program is as follows, showing the use of pragma, import, and comments: To summarize, we went through a brief introduction to the solidity language. Detailed documentation and coding guidelines are available online. If you found this article useful, and would like to learn more about building blockchains, go ahead and grab the book Mastering Blockchain, authored by Imran Bashir.

0
0
16842

article-image-get-ready-for-open-data-science-conference-2019-in-europe-and-california

Sugandha Lahoti

10 Oct 2019

3 min read

Get Ready for Open Data Science Conference 2019 in Europe and California

Sugandha Lahoti

10 Oct 2019

3 min read

Get ready to learn and experience the very latest in data science and AI with expert-led trainings, workshops, and talks at ODSC West 2019 in San Francisco and ODSC Europe 2019 in London. ODSC events are built for the community and feature the most comprehensive breadth and depth of training opportunities available in data science, machine learning, and deep learning. They also provide numerous opportunities to connect, network, and exchange ideas with data science peers and experts from across the country and the world. What to expect at ODSC West 2019 ODSC West 2019 is scheduled to take place in San Francisco, California on Tuesday, Oct 29, 2019, 9:00 AM – Friday, Nov 1, 2019, 6:00 PM PDT. This year, ODSC West will host several networking events, including ODSC Networking Reception, Dinner and Drinks with Data Scientists, Meet the Speakers, Meet the Experts, and Book Signings Hallway Track. Core areas of focus include Open Data Science, Machine Learning & Deep Learning, Research frontiers, Data Science Kick-Start, AI for engineers, Data Visualization, Data Science for Good, and management & DataOps. Here are just a few of the experts who will be presenting at ODSC: Anna Veronika Dorogush, CatBoost Team Lead, Yandex Sarah Aerni, Ph.D., Director of Data Science and Engineering, Salesforce Brianna Schuyler, Ph.D., Data Scientist, Fenix International Katie Bauer, Senior Data Scientist, Reddit, Inc Jennifer Redmon, Chief Data Evangelist, Cisco Systems, Inc Sanjana Ramprasad, Machine Learning Engineer, Mya Systems Cassie Kozyrkov, Ph.D., Chief Decision Scientist, Google Rachel Thomas, Ph.D., Co-Founder, fast.ai Check out the conference’s more industry-leading speakers here. ODSC also conducts the Accelerate AI Business Summit, which brings together leading experts in AI and business to discuss three core topics: AI Innovation, Expertise, and Management. Don’t miss out on the event You can also use code ODSC_PACKT right now to exclusively save 30% before Friday on your ticket to ODSC West 2019. What to expect at ODSC Europe 2019 ODSC Europe 2019 is scheduled to take place in London, the UK on Tuesday, Nov 19, 2019 – Friday, Nov 22, 2019. Europe Talks/Workshops schedule includes Thursday, Nov 21st and Friday, Nov 22nd. It is available to Silver, Gold, Platinum, and Diamond pass holders. Europe Trainings schedule includes Tuesday, November 19th and Wednesday, November 20th. It is available to Training, Gold ( Wed Nov 20th only), Platinum, and Diamond pass holders. Some talks scheduled to take place include ML for Social Good: Success Stories and Challenges, Machine Learning Interpretability Toolkit, Tools for High-Performance Python, The Soul of a New AI, Machine Learning for Continuous Integration, Practical, Rigorous Explainability in AI, and more. ODSC has released a preliminary schedule with information on attending speakers and their training, workshop, and talk topics. The full schedule is going to be available soon. They’ve also recently added several excellent speakers, including Manuela Veloso, Ph.D. | Head of AI Research, JP Morgan Dr. Wojciech Samek | Head of Machine Learning, Fraunhofer Heinrich Hertz Institute Samik Chandanara | Head of Analytics and Data Science, JP Morgan Tom Cronin | Head of Data Science & Data Engineering, Lloyds Banking Group Gideon Mann, Ph.D. | Head of Data Science, Bloomberg, LP There are more chances to learn, connect, and share ideas at this year’s event than ever before. Don’t miss out. Use code ODSC_PACKT right now to save 30% on your ticket to ODSC Europe 2019.

0
0
16796

How-To Tutorials - Data

How to perform predictive forecasting in SAP Analytics Cloud

Customizing Graphics and Creating a Bar Chart and Scatterplot in R

Debugging mechanisms in Oracle SQL Developer 2.1 using Pl/SQL

Language Modeling with Deep Learning

Implementing fault-tolerance in Spark Streaming data processing applications with Apache Kafka

Introduction to Titanic Datasets

Market Basket Analysis

Getting started with Python Web Scraping

Working with pandas DataFrames

Using the Firebase Real-Time Database

Trending Topics

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Facebook fails to block ECJ data security case from proceeding

25 Startups using machine learning differently in 2018: From farming to brewing beer to elder care

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Get Ready for Open Data Science Conference 2019 in Europe and California

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access