How-To Tutorials

article-image-building-surveys-using-xcode

19 Jan 2016

14 min read

Building Surveys using Xcode

19 Jan 2016

0
0
16980

article-image-working-with-pandas-dataframes

Sugandha Lahoti

23 Feb 2018

15 min read

Working with pandas DataFrames

Sugandha Lahoti

23 Feb 2018

15 min read

[box type="note" align="" class="" width=""]This article is an excerpt from the book Python Data Analysis - Second Edition written by Armando Fandango. From this book, you will learn how to process and manipulate data with Python for complex data analysis and modeling. Code bundle for this article is hosted on GitHub.[/box] The popular open source Python library, pandas is named after panel data (an econometric term) and Python data analysis. We shall learn about basic panda functionalities, data structures, and operations in this article. The official pandas documentation insists on naming the project pandas in all lowercase letters. The other convention the pandas project insists on, is the import pandas as pd import statement. We will follow these conventions in this text. In this tutorial, we will install and explore pandas. We will also acquaint ourselves with the a central pandas data structure–DataFrame. Installing and exploring pandas The minimal dependency set requirements for pandas is given as follows: NumPy: This is the fundamental numerical array package that we installed and covered extensively in the preceding chapters python-dateutil: This is a date handling library pytz: This handles time zone definitions This list is the bare minimum; a longer list of optional dependencies can be located at http://pandas.pydata.org/pandas-docs/stable/install.html. We can install pandas via PyPI with pip or easy_install, using a binary installer, with the aid of our operating system package manager, or from the source by checking out the code. The binary installers can be downloaded from http://pandas.pydata.org/getpandas.html. The command to install pandas with pip is as follows: $ pip3 install pandas rpy2 rpy2 is an interface to R and is required because rpy is being deprecated. You may have to prepend the preceding command with sudo if your user account doesn't have sufficient rights. The pandas DataFrames A pandas DataFrame is a labeled two-dimensional data structure and is similar in spirit to a worksheet in Google Sheets or Microsoft Excel, or a relational database table. The columns in pandas DataFrame can be of different types. A similar concept, by the way, was invented originally in the R programming language. (For more information, refer to http://www.r-tutor.com/r-introduction/data-frame). A DataFrame can be created in the following ways: Using another DataFrame. Using a NumPy array or a composite of arrays that has a two-dimensional shape. Likewise, we can create a DataFrame out of another pandas data structure called Series. We will learn about Series in the following section. A DataFrame can also be produced from a file, such as a CSV file. From a dictionary of one-dimensional structures, such as one-dimensional NumPy arrays, lists, dicts, or pandas Series. As an example, we will use data that can be retrieved from http://www.exploredata.net/Downloads/WHO-Data-Set. The original data file is quite large and has many columns, so we will use an edited file instead, which only contains the first nine columns and is called WHO_first9cols.csv; the file is in the code bundle of this book. These are the first two lines, including the header: Country,CountryID,Continent,Adolescent fertility rate (%),Adult literacy rate (%),Gross national income per capita (PPP international $),Net primary school enrolment ratio female (%),Net primary school enrolment ratio male (%),Population (in thousands) totalAfghanistan,1,1,151,28,,,,26088 In the next steps, we will take a look at pandas DataFrames and its attributes: To kick off, load the data file into a DataFrame and print it on the screen: from pandas.io.parsers import read_csv df = read_csv("WHO_first9cols.csv") print("Dataframe", df) The printout is a summary of the DataFrame. It is too long to be displayed entirely, so we will just grab the last few lines: 199 21732.0 200 11696.0 201 13228.0 [202 rows x 9 columns] The DataFrame has an attribute that holds its shape as a tuple, similar to ndarray. Query the number of rows of a DataFrame as follows: print("Shape", df.shape) print("Length", len(df)) The values we obtain comply with the printout of the preceding step: Shape (202, 9) Length 202 Check the column header and data types with the other attributes: print("Column Headers", df.columns) print("Data types", df.dtypes) We receive the column headers in a special data structure: Column Headers Index([u'Country', u'CountryID', u'Continent', u'Adolescent fertility rate (%)', u'Adult literacy rate (%)', u'Gross national income per capita (PPP international $)', u'Net primary school enrolment ratio female (%)', u'Net primary school enrolment ratio male (%)', u'Population (in thousands) total'], dtype='object') The data types are printed as follows: 4. The pandas DataFrame has an index, which is like the primary key of relational database tables. We can either specify the index or have pandas create it automatically. The index can be accessed with a corresponding property, as follows: Print("Index", df.index) An index helps us search for items quickly, just like the index in this book. In our case, the index is a wrapper around an array starting at 0, with an increment of one for each row: Sometimes, we wish to iterate over the underlying data of a DataFrame. Iterating over column values can be inefficient if we utilize the pandas iterators. It's much better to extract the underlying NumPy arrays and work with those. The pandas DataFrame has an attribute that can aid with this as well: print("Values", df.values) Please note that some values are designated nan in the output, for 'not a number'. These values come from empty fields in the input datafile: The preceding code is available in Python Notebook ch-03.ipynb, available in the code bundle of this book. Querying data in pandas Since a pandas DataFrame is structured in a similar way to a relational database, we can view operations that read data from a DataFrame as a query. In this example, we will retrieve the annual sunspot data from Quandl. We can either use the Quandl API or download the data manually as a CSV file from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual. If you want to install the API, you can do so by downloading installers from https://pypi.python.org/pypi/Quandl or by running the following command: $ pip3 install Quandl Using the API is free, but is limited to 50 API calls per day. If you require more API calls, you will have to request an authentication key. The code in this tutorial is not using a key. It should be simple to change the code to either use a key or read a downloaded CSV file. If you have difficulties, search through the Python docs at https://docs.python.org/2/. Without further preamble, let's take a look at how to query data in a pandas DataFrame: As a first step, we obviously have to download the data. After importing the Quandl API, get the data as follows: import quandl # Data from http://www.quandl.com/SIDC/SUNSPOTS_A-Sunspot-Numbers-Annual # PyPi url https://pypi.python.org/pypi/Quandl sunspots = quandl.get("SIDC/SUNSPOTS_A") The head() and tail() methods have a purpose similar to that of the Unix commands with the same name. Select the first n and last n records of a DataFrame, where n is an integer parameter: print("Head 2", sunspots.head(2) ) print("Tail 2", sunspots.tail(2)) This gives us the first two and last two rows of the sunspot data (for the sake of brevity we have not shown all the columns here; your output will have all the columns from the dataset): Head 2 Number Year 1700-12-31 5 1701-12-31 11 [2 rows x 1 columns] Tail 2 Number Year 2012-12-31 57.7 2013-12-31 64.9 [2 rows x 1 columns] Please note that we only have one column holding the number of sunspots per year. The dates are a part of the DataFrame index. The following is the query for the last value using the last date: last_date = sunspots.index[-1] print("Last value", sunspots.loc[last_date]) You can check the following output with the result from the previous step: Last value Number 64.9 Name: 2013-12-31 00:00:00, dtype: float64 Query the date with date strings in the YYYYMMDD format as follows: print("Values slice by date:n", sunspots["20020101": "20131231"]) This gives the records from 2002 through to 2013: Values slice by date Number Year 2002-12-31 104.0 [TRUNCATED] 2013-12-31 64.9 [12 rows x 1 columns] A list of indices can be used to query as well: print("Slice from a list of indices:n", sunspots.iloc[[2, 4, -4, -2]]) The preceding code selects the following rows: Slice from a list of indices Number Year 1702-12-31 16.0 1704-12-31 36.0 2010-12-31 16.0 2012-12-31 57.7 [4 rows x 1 columns] To select scalar values, we have two options. The second option given here should be faster. Two integers are required, the first for the row and the second for the column: print("Scalar with Iloc:", sunspots.iloc[0, 0]) print("Scalar with iat", sunspots.iat[1, 0]) This gives us the first and second values of the dataset as scalars: Scalar with Iloc 5.0 Scalar with iat 11.0 Querying with Booleans works much like the Where clause of SQL. The following code queries for values larger than the arithmetic mean. Note that there is a difference between when we perform the query on the whole DataFrame and when we perform it on a single column: print("Boolean selection", sunspots[sunspots > sunspots.mean()]) print("Boolean selection with column label:n", sunspots[sunspots['Number of Observations'] > sunspots['Number of Observations'].mean()]) The notable difference is that the first query yields all the rows, with some rows not conforming to the condition that has a value of NaN. The second query returns only the rows where the value is larger than the mean: Boolean selection Number Year 1700-12-31 NaN [TRUNCATED] 1759-12-31 54.0 ... [314 rows x 1 columns] Boolean selection with column label Number Year 1705-12-31 58.0 [TRUNCATED] 1870-12-31 139.1 ... [127 rows x 1 columns] The preceding example code is in the ch_03.ipynb file of this book's code bundle. Data aggregation with pandas DataFrames Data aggregation is a term used in the field of relational databases. In a database query, we can group data by the value in a column or columns. We can then perform various operations on each of these groups. The pandas DataFrame has similar capabilities. We will generate data held in a Python dict and then use this data to create a pandas DataFrame. We will then practice the pandas aggregation features: Seed the NumPy random generator to make sure that the generated data will not differ between repeated program runs. The data will have four columns: Weather (a string) Food (also a string) Price (a random float) Number (a random integer between one and nine) The use case is that we have the results of some sort of consumer-purchase research, combined with weather and market pricing, where we calculate the average of prices and keep a track of the sample size and parameters: import pandas as pd from numpy.random import seed from numpy.random import rand from numpy.random import rand_int import numpy as np seed(42) df = pd.DataFrame({'Weather' : ['cold', 'hot', 'cold','hot', 'cold', 'hot', 'cold'], 'Food' : ['soup', 'soup', 'icecream', 'chocolate', 'icecream', 'icecream', 'soup'], 'Price' : 10 * rand(7), 'Number' : rand_int(1, 9,)}) print(df) You should get an output similar to the following: Please note that the column labels come from the lexically ordered keys of the Python dict. Lexical or lexicographical order is based on the alphabetic order of characters in a string. Group the data by the Weather column and then iterate through the groups as follows: weather_group = df.groupby('Weather') i = 0 for name, group in weather_group: i = i + 1 print("Group", i, name) print(group) We have two types of weather, hot and cold, so we get two groups: The weather_group variable is a special pandas object that we get as a result of the groupby() method. This object has aggregation methods, which are demonstrated as follows: print("Weather group firstn", weather_group.first()) print("Weather group lastn", weather_group.last()) print("Weather group meann", weather_group.mean()) The preceding code snippet prints the first row, last row, and mean of each group: Just as in a database query, we are allowed to group on multiple columns. The groups attribute will then tell us the groups that are formed, as well as the rows in each group: wf_group = df.groupby(['Weather', 'Food']) print("WF Groups", wf_group.groups) For each possible combination of weather and food values, a new group is created. The membership of each row is indicated by their index values as follows: WF Groups {('hot', 'chocolate'): [3], ('cold', 'icecream'): [2, 4], ('hot', 'icecream'): [5], ('hot', 'soup'): [1], ('cold', 'soup'): [0, 6] 5. Apply a list of NumPy functions on groups with the agg() method: print("WF Aggregatedn", wf_group.agg([np.mean, np.median])) Obviously, we could apply even more functions, but it would look messier than the following output: Concatenating and appending DataFrames The pandas DataFrame allows operations that are similar to the inner and outer joins of database tables. We can append and concatenate rows as well. To practice appending and concatenating of rows, we will reuse the DataFrame from the previous section. Let's select the first three rows: print("df :3n", df[:3]) Check that these are indeed the first three rows: df :3 Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold The concat() function concatenates DataFrames. For example, we can concatenate a DataFrame that consists of three rows to the rest of the rows, in order to recreate the original DataFrame: print("Concat Back togethern", pd.concat([df[:3], df[3:]])) The concatenation output appears as follows: Concat Back together Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 3 chocolate 8 5.986585 hot 4 icecream 8 1.560186 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [7 rows x 4 columns] To append rows, use the append() function: print("Appending rowsn", df[:3].append(df[5:])) The result is a DataFrame with the first three rows of the original DataFrame and the last two rows appended to it: Appending rows Food Number Price Weather 0 soup 8 3.745401 cold 1 soup 5 9.507143 hot 2 icecream 4 7.319939 cold 5 icecream 3 1.559945 hot 6 soup 6 0.580836 cold [5 rows x 4 columns] Joining DataFrames To demonstrate joining, we will use two CSV files-dest.csv and tips.csv. The use case behind it is that we are running a taxi company. Every time a passenger is dropped off at his or her destination, we add a row to the dest.csv file with the employee number of the driver and the destination: EmpNr,Dest5,The Hague3,Amsterdam9,Rotterdam Sometimes drivers get a tip, so we want that registered in the tips.csv file (if this doesn't seem realistic, please feel free to come up with your own story): EmpNr,Amount5,109,57,2.5 Database-like joins in pandas can be done with either the merge() function or the join() DataFrame method. The join() method joins onto indices by default, which might not be what you want. In SQL a relational database query language we have the inner join, left outer join, right outer join, and full outer join. An inner join selects rows from two tables, if and only if values match, for columns specified in the join condition. Outer joins do not require a match, and can potentially return more rows. More information on joins can be found at http://en.wikipedia.org/wiki/Join_%28SQL%29. All these join types are supported by pandas, but we will only take a look at inner joins and full outer joins: A join on the employee number with the merge() function is performed as follows: print("Merge() on keyn", pd.merge(dests, tips, on='EmpNr')) This gives an inner join as the outcome: Merge() on key EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] Joining with the join() method requires providing suffixes for the left and right operands: print("Dests join() tipsn", dests.join(tips, lsuffix='Dest', rsuffix='Tips')) This method call joins index values so that the result is different from an SQL inner join: Dests join() tips EmpNrDest Dest EmpNrTips Amount 0 5 The Hague 5 10.0 1 3 Amsterdam 9 5.0 2 9 Rotterdam 7 2.5 [3 rows x 4 columns] An even more explicit way to execute an inner join with merge() is as follows: print("Inner join with merge()n", pd.merge(dests, tips, how='inner')) The output is as follows: Inner join with merge() EmpNr Dest Amount 0 5 The Hague 10 1 9 Rotterdam 5 [2 rows x 3 columns] To make this a full outer join requires only a small change: print("Outer joinn", pd.merge(dests, tips, how='outer')) The outer join adds rows with NaN values: Outer join EmpNr Dest Amount 0 5 The Hague 10.0 1 3 Amsterdam NaN 2 9 Rotterdam 5.0 3 7 NaN 2.5 [4 rows x 3 columns] In a relational database query, these values would have been set to NULL. The demo code is in the ch-03.ipynb file of this book's code bundle. We learnt how to perform various data manipulation techniques such as aggregating, concatenating, appending, cleaning, and handling missing values, with pandas. If you found this post useful, check out the book Python Data Analysis - Second Edition to learn advanced topics such as signal processing, textual data analysis, machine learning, and more.

0
0
16966

article-image-r-statistical-package-interfacing-python

Janu Verma

17 Nov 2016

8 min read

The R Statistical Package Interfacing with Python

Janu Verma

17 Nov 2016

8 min read

One of my coding hobbies is to explore different Python packages and libraries. In this post, I'll talk about the package rpy2, which is used to call R inside python. Being an avid user of R and a huge supporter of R graphical packages, I had always desired to call R inside my Python code to be able to produce beautiful visualizations. The R framework offers machinery for a variety of statistical and data mining tasks. Let's review the basics of R before we delve into R-Python interfacing. R is a statistical language which is free, is open source, and has comprehensive support for various statistical, data mining, and visualization tasks. Quick-R describes it as: "R is an elegant and comprehensive statistical and graphical programming language." R is one of the fastest growing languages, mainly due to the surge in interest in statistical learning and data science. The Data Science Specialization on Coursera has all courses taught in R. There are R packages for machine learning, graphics, text mining, bioinformatics, topics modeling, interactive visualizations, markdown, and many others. In this post, I'll give a quick introduction to R. The motivation is to acquire some knowledge of R to be able to follow the discussion on R-Python interfacing. Installing R R can be downloaded from one of the Comprehensive R Archive Network (CRAN) mirror sites. Running R To run R interactively on the command line, type r. Launch the standard GUI (which should have been included in the download) and type R code in it. RStudio is the most popular IDE for R. It is recommended, though not required, to install RStudio and run R on it. To write a file with R code, create a file with the .r extension (for example, myFirstCode.r). And run the code by typing the following on the terminal: Rscript file.r Basics of R The most fundamental data structure in R is a vector; actually everything in R is a vector (even numbers are 1-dimensional vectors). This is one of the strangest things about R. Vectors contain elements of the same type. A vector is created by using the c() function. a = c(1,2,5,9,11) a [1] 1 2 5 9 11 strings = c("aa", "apple", "beta", "down") strings [1] "aa" "apple" "beta" "down" The elements in a vector are indexed, but the indexing starts at 1 instead of 0, as in most major languages (for example, python). strings[1] [1] "aa" The fact that everything in R is a vector and that the indexing starts at 1 are the main reasons for people's initial frustration with R (I forget this all the time). Data Frames A lot of R packages expect data as a data frame, which are essentially matrices but the columns can be accessed by names. The columns can be of different types. Data frames are useful outside of R also. The Python package Pandas was written primarily to implement data frames and to do analysis on them. In R, data frames are created (from vectors) as follows: students = c("Anne", "Bret", "Carl", "Daron", "Emily") scores = c(7,3,4,9,8) grades = c('B', 'D', 'C', 'A', 'A') results = data.frame(students, scores, grades) results students scores grades 1 Anne 7 B 2 Bret 3 D 3 Carl 4 C 4 Daron 9 A 5 Emily 8 A The elements of a data frame can be accessed as: results$students [1] Anne Bret Carl Daron Emily Levels: Anne Bret Carl Daron Emily This gives a vector, the elements of which can be called by indexing. results$students[1] [1] Anne Levels: Anne Bret Carl Daron Emily Reading Files Most of the times the data is given as a comma-separated values (csv) file or a tab-separated values (tsv) file. We will see how to read a csv/tsv file in R and create a data frame from it. (Aside: The datasets in most Kaggle competitions are given as csv files and we are required to do machine learning on them. In Python, one creates a pandas data frame or a numpy array from this csv file.) In R, we use a read.csv or read.table command to load a csv file into memory, for example, for the Titanic competition on Kaggle: training_data <- read.csv("train.csv", header=TRUE) train <- data.frame(survived=train_all$Survived, age=train_all$Age, fare=train_all$Fare, pclass=train_all$Pclass) Similarly, a tsv file can be loaded as: data <- read.csv("file.tsv";, header=TRUE, delimiter="t") Thus given a csv/tsv file with or without headers, we can read it using the read.csv function and create a data frame using: data.frame(vector_1, vector_2, ... vector_n). This should be enough to start exploring R packages. Another command that is very useful in R is head(), which is similar to the less command on Unix. rpy2 First things first, we need to have both Python and R installed. Then install rpy2 from the Python package index (Pypi). To do this, simply type the following on the command line: pip install rpy2 We will use the high-level interface to R, the robjects subpackage of rpy2. import rpy2.robjects as ro We can pass commands to the R session by putting the R commands in the ro.r() method as strings. Recall that everything in R is a vector. Let's create a vector using robjects: ro.r('x=c(2,4,6,8)') print(ro.r('x')) [1] 2 4 6 8 Keep in mind that though x is an R object (vector), ro.r('x') is a Python object (rpy2 object). This can be checked as follows: type(ro.r('x')) <class 'rpy2.robjects.vectors.FloatVector'> The most important data types in R are data frames, which are essentially matrices. We can create a data frame using rpy2: ro.r('x=c(2,4,6,8)') ro.r('y=c(4,8,12,16)') ro.r('rdf=data.frame(x,y)') This created an R data frame, rdf. If we want to manipulate this data frame using Python, we need to convert it to a python object. We will convert the R data frame to a pandas data frame. The Python package pandas contains efficient implementations of data frame objects in python. import pandas.rpy.common as com df = com.load_data('rdf') print type(df) <class 'pandas.core.frame.DataFrame'> df.x = 2*df.x Here we have doubled each of the elements of the x vector in the data frame df. But df is a Python object, which we can convert back to an R data frame using pandas as: rdf = com.convert_to_r_dataframe(df) print type(rdf) <class 'rpy2.robjects.vectors.DataFrame'> Let's use the plotting machinery of R, which is the main purpose of studying rpy2: ro.r('plot(x,y)') Not only R data types, but rpy2 lets us import R packages as well (given that these packages are installed on R) and use them for analysis. Here we will build a linear model on x and y using the R package stats: from rpy2.robjects.packages import importr stats = importr('stats') base = importr('base') fit = stats.lm('y ~ x', data=rdf) print(base.summary(fit)) We get the following results: Residuals: 1 2 3 4 0 0 0 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0 0 NA NA x 2 0 Inf <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0 on 2 degrees of freedom Multiple R-squared: 1, Adjusted R-squared: 1 F-statistic: Inf on 1 and 2 DF, p-value: < 2.2e-16 R programmers will immediately recognize the output as coming from applying linear model function lm() on data. I'll end this discussion with an example using my favorite R package ggplot2. I have written a lot of posts on data visualization using ggplot2. The following example is borrowed from the official documentation of rpy2. import math, datetime import rpy2.robjects.lib.ggplot2 as ggplot2 import rpy2.robjects as ro from rpy2.robjects.packages import importr base = importr('base') datasets = importr('datasets') mtcars = datasets.data.fetch('mtcars')['mtcars'] pp = ggplot2.ggplot(mtcars) + ggplot2.aes_string(x='wt', y='mpg', col='factor(cyl)') + ggplot2.geom_point() + ggplot2.geom_smooth(ggplot2.aes_string(group = 'cyl'), method = 'lm') pp.plot() Author: Janu Verma is a researcher in the IBM T.J. Watson Research Center, New York. His research interests are in mathematics, machine learning, information visualization, computational biology, and healthcare analytics. He has held research positions at Cornell University, Kansas State University, Tata Institute of Fundamental Research, Indian Institute of Science, and the Indian Statistical Institute. He has written papers for IEEE Vis, KDD, International Conference on HealthCare Informatics, Computer Graphics and Applications, Nature Genetics, IEEE Sensors Journals and so on. His current focus is on the development of visual analytics systems for prediction and understanding. He advises start-ups and other companies on data science and machine learning in the Delhi-NCR area. He can be found at Here.

0
0
16961

article-image-tuning-solr-jvm-and-container

Packt

22 Jul 2014

6 min read

Tuning Solr JVM and Container

Packt

22 Jul 2014

6 min read

(For more resources related to this topic, see here.) Some of these JVMs are commercially optimized for production usage; you may find comparison studies at http://dior.ics.muni.cz/~makub/java/speed.html. Some of the JVM implementations provide server versions, which would be more appropriate than normal ones. Since Solr runs in JVM, all the standard optimizations for applications are applicable to it. It starts with choosing the right heap size for your JVM. The heap size depends upon the following aspects: Use of facets and sorting options Size of the Solr index Update frequencies on Solr Solr cache Heap size for JVM can be controlled by the following parameters: Parameter Description -Xms This is the minimum heap size required during JVM initialization, that is, container -Xmx This is the maximum heap size up to which the JVM or J2EE container can consume Deciding heap size Heap in JVM contributes as a major factor while optimizing the performance of any system. JVM uses heap to store its objects, as well as its own content. Poor allocation of JVM heap results in Java heap space OutOfMemoryError thrown at runtime crashing the application. When the heap is allocated with less memory, the application takes a longer time to initialize, as well as slowing the execution speed of the Java process during runtime. Similarly, higher heap size may underutilize expensive memory, which otherwise could have been used by the other application. JVM starts with initial heap size, and as the demand grows, it tries to resize the heap to accommodate new space requirements. If a demand for memory crosses the maximum limit, JVM throws an Out of Memory exception. The objects that expire or are unused, unnecessarily consume memory in JVM. This memory can be taken back by releasing these objects by a process called garbage collection. Although it's tricky to find out whether you should increase or reduce the heap size, there are simple ways that can help you out. In a memory graph, typically, when you start the Solr server and run your first query, the memory usage increases, and based on subsequent queries and memory size, the memory graph may increase or remain constant. When garbage collection is run automatically by the JVM container, it sharply brings down its usage. If it's difficult to trace GC execution from the memory graph, you can run Solr with the following additional parameters: -Xloggc:<some file> -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails If you are monitoring the heap usage continuously, you will find a graph that increases and decreases (sawtooth); the increase is due to the querying that is going on consistently demanding more memory by your Solr cache, and decrease is due to GC execution. In a running environment, the average heap size should not grow over time or the number of GC runs should be less than the number of queries executed on Solr. If that's not the case, you will need more memory. Features such as Solr faceting and sorting requires more memory on top of traditional search. If memory is unavailable, the operating system needs to perform hot swapping with the storage media, thereby increasing the response time; thus, users find huge latency while searching on large indexes. Many of the operating systems allow users to control swapping of programs. How can we optimize JVM? Whenever a facet query is run in Solr, memory is used to store each unique element in the index for each field. So, for example, a search over a small set of facet value (an year from 1980 to 2014) will consume less memory than a search with larger set of facet value, such as people's names (can vary from person to person). To reduce the memory usage, you may set the term index divisor to 2 (default is 4) by setting the following in solrconfig.xml: <indexReaderFactory name="IndexReaderFactory" class="solr.StandardIndexReaderFactory"> <int name="setTermIndexDivisor">2</int> </indexReaderFactory > From Solr 4.x onwards, the ability to set the min, max (term index divisor) block size ability is not available. This will reduce the memory usage for storing all the terms to half; however, it will double the seek time for terms and will impact a little on your search runtime. One of the causes of large heap is the size of index, so one solution is to introduce SolrCloud and the distributed large index into multiple shards. This will not reduce your memory requirement, but will spread it across the cluster. You can look at some of the optimized GC parameters described at http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning page. Similarly, Oracle provides a GC tuning guide for advanced development stages, and it can be seen at http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html. Additionally, you can look at the Solr performance problems at http://wiki.apache.org/solr/SolrPerformanceProblems. Optimizing JVM container JVM containers allow users to have their requests served in threads. This in turn enables JVM to support concurrent sessions created for different users connecting at the same time. The concurrency can, however, be controlled to reduce the load on the search server. If you are using Apache Tomcat, you can modify the following entries in server.xml for changing the number of concurrent connections: Similarly, in Jetty, you can control the number of connections held by modifying jetty.xml: Similarly, for other containers, these files can change appropriately. Many containers provide a cache on top of the application to avoid server hits. This cache can be utilized for static pages such as the search page. Containers such as Weblogic provide a development versus production mode. Typically, a development mode runs with 15 threads and a limited JDBC pool size by default, whereas, for a production mode, this can be increased. For tuning containers, besides standard optimization, specific performance-tuning guidelines should be followed, as shown in the following table: Container Performance tuning guide Jetty http://wiki.eclipse.org/Jetty/Howto/High_Load Tomcat http://www.mulesoft.com/tcat/tomcat-performance and http://javamaster.wordpress.com/2013/03/13/apache-tomcat-tuning-guide/ JBoss https://access.redhat.com/site/documentation/en-US/JBoss_Enterprise_Application_Platform/5/pdf/Performance_Tuning_Guide/JBoss_Enterprise_Application_Platform-5-Performance_Tuning_Guide-en-US.pdf Weblogic http://docs.oracle.com/cd/E13222_01/wls/docs92/perform/WLSTuning.html Websphere http://www.ibm.com/developerworks/websphere/techjournal/0909_blythe/0909_blythe.html Apache Solr works better with the default container it ships with, Jetty, since it offers a small footprint compared to other containers such as JBoss and Tomcat for which the memory required is a little higher. Summary In this article, we have learned about about Apache Solr which runs on the underlying JVM in the J2EE container and tuning containers. Resources for Article: Further resources on this subject: Apache Solr: Spellchecker, Statistics, and Grouping Mechanism [Article] Getting Started with Apache Solr [Article] Apache Solr PHP Integration [Article]

0
0
16951

How-To Tutorials

article-image-bias-variance-tradeoff-choose-bias-and-variance-machine-learning-model-tutorial

Savia Lobo

17 Sep 2018

15 min read

Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial]

Savia Lobo

17 Sep 2018

15 min read

0
0
16949

How-To Tutorials

article-image-using-firebase-real-time-database

Oliver Blumanski

18 Jan 2017

5 min read

Using the Firebase Real-Time Database

Oliver Blumanski

18 Jan 2017

5 min read

In this post, we are going to look at how to use the Firebase real-time database, along with an example. Here we are writing and reading data from the database using multiple platforms. To do this, we first need a server script that is adding data, and secondly we need a component that pulls the data from the Firebase database. Step 1 - Server Script to collect data Digest an XML feed and transfer the data into the Firebase real-time database. The script runs as cronjob frequently to refresh the data. Step 2 - App Component Subscribe to the data from a JavaScript component, in this case, React-Native. About Firebase Now that those two steps are complete, let's take a step back and talk about Google Firebase. Firebase offers a range of services such as a real-time database, authentication, cloud notifications, storage, and much more. You can find the full feature list here. Firebase covers three platforms: iOS, Android, and Web. The server script uses the Firebases JavaScript Web API. Having data in this real-time database allows us to query the data from all three platforms (iOS, Android, Web), and in addition, the real-time database allows us to subscribe (listen) to a database path (query), or to query a path once. Step 1 - Digest XML feed and transfer into Firebase Firebase Set UpThe first thing you need to do is to set up a Google Firebase project here In the app, click on "Add another App" and choose Web, a pop-up will show you the configuration. You can copy paste your config into the example script. Now you need to set the rules for your Firebase database. You should make yourself familiar with the database access rules. In my example, the path latestMarkets/ is open for write and read. In a real-world production app, you would have to secure this, having authentication for the write permissions. Here are the database rules to get started: { "rules": { "users": { "$uid": { ".read": "$uid === auth.uid", ".write": "$uid === auth.uid" } }, "latestMarkets": { ".read": true, ".write": true } } } The Server Script Code The XML feed contains stock market data and is frequently changing, except on the weekend. To build the server script, some NPM packages are needed: Firebase Request xml2json babel-preset-es2015 Require modules and configure Firebase web api: const Firebase = require('firebase'); const request = require('request'); const parser = require('xml2json'); // firebase access config const config = { apiKey: "apikey", authDomain: "authdomain", databaseURL: "dburl", storageBucket: "optional", messagingSenderId: "optional" } // init firebase Firebase.initializeApp(config) [/Code] I write JavaScript code in ES6. It is much more fun. It is a simple script, so let's have a look at the code that is relevant to Firebase. The code below is inserting or overwriting data in the database. For this script, I am happy to overwrite data: Firebase.database().ref('latestMarkets/'+value.Symbol).set({ Symbol: value.Symbol, Bid: value.Bid, Ask: value.Ask, High: value.High, Low: value.Low, Direction: value.Direction, Last: value.Last }) .then((response) => { // callback callback(true) }) .catch((error) => { // callback callback(error) }) Firebase Db first references the path: Firebase.database().ref('latestMarkets/'+value.Symbol) And then the action you want to do: // insert/overwrite (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).set({}).then((result)) // get data once (promise) Firebase.database().ref('latestMarkets/'+value.Symbol).once('value').then((snapshot)) // listen to db path, get data on change (callback) Firebase.database().ref('latestMarkets/'+value.Symbol).on('value', ((snapshot) => {}) // ...... Here is the Github repository: Displaying the data in a React-Native app This code below will listen to a database path, on data change, all connected devices will synchronise the data: Firebase.database().ref('latestMarkets/').on('value', snapshot => { // do something with snapshot.val() }) To close the listener, or unsubscribe the path, one can use "off": Firebase.database().ref('latestMarkets/').off() I’ve created an example react-native app to display the data: The Github repository Conclusion In mobile app development, one big question is: "What database and cache solution can I use to provide online and offline capabilities?" One way to look at this question is like you are starting a project from scratch. If so, you can fit your data into Firebase, and then this would be a great solution for you. Additionally, you can use it for both web and mobile apps. The great thing is that you don't need to write a particular API, and you can access data straight from JavaScript. On the other hand, if you have a project that uses MySQL for example, the Firebase real-time database won't help you much. You would need to have a remote API to connect to your database in this case. But even if using the Firebase database isn't a good fit for your project, there are still other features, such as Firebase Storage or Cloud Messaging, which are very easy to use, and even though they are beyond the scope of this post, they are worth checking out. About the author Oliver Blumanski is a developer based out of Townsville, Australia. He has been a software developer since 2000, and can be found on GitHub at @blumanski.

0
0
16942

article-image-sequence-generator-transformation-informatica-powercenter

Savia Lobo

12 Dec 2017

7 min read

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Savia Lobo

12 Dec 2017

7 min read

0
0
16929

article-image-building-recommendation-system-with-scala-and-apache-spark-tutorial

Savia Lobo

08 Sep 2018

12 min read

Building Recommendation System with Scala and Apache Spark [Tutorial]

Savia Lobo

08 Sep 2018

12 min read

Recommendation systems can be defined as software applications that draw out and learn from data such as preferences, their actions (clicks, for example), browsing history, and generated recommendations, which are products that the system determines are appealing to the user in the immediate future. In this tutorial, we will learn to build a recommendation system with Scala and Apache Spark. This article is an excerpt taken from Modern Scala Projects written Ilango Gurusamy. What does a recommendation system look like The following diagram is representative of a typical recommendation system: Recommendation system In the preceding diagram, can be thought of as a recommendation ecosystem, where the recommendation system is at the heart of it. This system needs three entities: Users Products Transactions between users and products where transactions contain feedback from users about products Implementation and deployment Implementation is documented in the following subsections. All code is developed in an Intellij code editor. The very first step is to create an empty Scala project called Chapter7. Step 1 – creating the Scala project Let's create a Scala project called Chapter7 with the following artifacts: RecommendationSystem.scala RecommendationWrapper.scala Let's break down the project's structure: .idea: Generated IntelliJ configuration files. project: Contains build.properties and plugins.sbt. project/assembly.sbt: This file specifies the sbt-assembly plugin needed to build a fat JAR for deployment. src/main/scala: This is a folder that houses Scala source files in the com.packt.modern.chapter7 package. target: This is where artifacts of the compile process are stored. The generated assembly JAR file goes here. build.sbt: This is the main SBT configuration file. Spark and its dependencies are specified here. At this point, we will start developing code in the IntelliJ code editor. We will start with the AirlineWrapper Scala file and end with the deployment of the final application JAR into Spark with spark-submit. Step 2 – creating the AirlineWrapper definition Let's create the trait definition. The trait will hold the SparkSession variable, schema definitions for the datasets, and methods to build a dataframe: trait RecWrapper { } Next, let's create a schema for past weapon sales orders. Step 3 – creating a weapon sales orders schema Let's create a schema for the past sales order dataset: val salesOrderSchema: StructType = StructType(Array( StructField("sCustomerId", IntegerType,false), StructField("sCustomerName", StringType,false), StructField("sItemId", IntegerType,true), StructField("sItemName", StringType,true), StructField("sItemUnitPrice",DoubleType,true), StructField("sOrderSize", DoubleType,true), StructField("sAmountPaid", DoubleType,true) )) Next, let's create a schema for weapon sales leads. Step 4 – creating a weapon sales leads schema Here is a schema definition for the weapon sales lead dataset: val salesLeadSchema: StructType = StructType(Array( StructField("sCustomerId", IntegerType,false), StructField("sCustomerName", StringType,false), StructField("sItemId", IntegerType,true), StructField("sItemName", StringType,true) )) Next, let's build a weapon sales order dataframe. Step 5 – building a weapon sales order dataframe Let's invoke the read method on our SparkSession instance and cache it. We will call this method later from the RecSystem object: def buildSalesOrders(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesOrderSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache() } Next up, let's build a sales leads dataframe: def buildSalesLeads(dataSet: String): DataFrame = { session.read .format("com.databricks.spark.csv") .option("header", true).schema(salesLeadSchema).option("nullValue", "") .option("treatEmptyValuesAsNulls", "true") .load(dataSet).cache() } This completes the trait. Overall, it looks like this: trait RecWrapper { 1) Create a lazy SparkSession instance and call it session. 2) Create a schema for the past sales orders dataset 3) Create a schema for sales lead dataset 4) Write a method to create a dataframe that holds past sales order data. This method takes in sales order dataset and returns a dataframe 5) Write a method to create a dataframe that holds lead sales data } Bring in the following imports: import org.apache.spark.mllib.recommendation.{ALS, Rating} import org.apache.spark.rdd.RDD import org.apache.spark.sql.{DataFrame, Dataset, SparkSession} Create a Scala object called RecSystem: object RecSystem extends App with RecWrapper { } Before going any further, bring in the following imports: import org.apache.spark.rdd.RDD import org.apache.spark.sql.DataFrame Inside this object, start by loading the past sales order data. This will be our training data. Load the sales order dataset, as follows: val salesOrdersDf = buildSalesOrders("sales\\PastWeaponSalesOrders.csv") Verify the schema. This is what the schema looks like: salesOrdersDf.printSchema() root |-- sCustomerId: integer (nullable = true) |-- sCustomerName: string (nullable = true) |-- sItemId: integer (nullable = true) |-- sItemName: string (nullable = true) |-- sItemUnitPrice: double (nullable = true) |-- sOrderSize: double (nullable = true) |-- sAmountPaid: double (nullable = true) Here is a partial view of a dataframe displaying past weapon sales order data: Partial view of dataframe displaying past weapon sales order data Now, we have what we need to create a dataframe of ratings: val ratingsDf: DataFrame = salesOrdersDf.map( salesOrder => Rating( salesOrder.getInt(0), salesOrder.getInt(2), salesOrder.getDouble(6) ) ).toDF("user", "item", "rating") Save all and compile the project at the command line: C:\Path\To\Your\Project\Chapter7>sbt compile You are likely to run into the following error: [error] C:\Path\To\Your\Project\Chapter7\src\main\scala\com\packt\modern\chapter7\RecSystem.scala:50:50: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases. [error] val ratingsDf: DataFrame = salesOrdersDf.map( salesOrder => [error] ^ [error] two errors found [error] (compile:compileIncremental) Compilation failed To fix this, place the following statement at the top of the declarations of the rating dataframe. It should look like this: import session.implicits._ val ratingsDf: DataFrame = salesOrdersDf.map( salesOrder => UserRating( salesOrder.getInt(0), salesOrder.getInt(2), salesOrder.getDouble(6) ) ).toDF("user", "item", "rating") Save and recompile the project. This time, it compiles just fine. Next, import the Rating class from the org.apache.spark.mllib.recommendation package. This transforms the rating dataframe that we obtained previously to its RDD equivalent: val ratings: RDD[Rating] = ratingsDf.rdd.map( row => Rating( row.getInt(0), row.getInt(1), row.getDouble(2) ) ) println("Ratings RDD is: " + ratings.take(10).mkString(" ") ) The following few lines of code are very important. We will be using the ALS algorithm from Spark MLlib to create and train a MatrixFactorizationModel, which takes an RDD[Rating] object as input. The ALS train method may require a combination of the following training hyperparameters: numBlocks: Preset to -1 in an auto-configuration setting. This parameter is meant to parallelize computation. custRank: The number of features, otherwise known as latent factors. iterations: This parameter represents the number of iterations for ALS to execute. For a reasonable solution to converge on, this algorithm needs roughly 20 iterations or less. regParam: The regularization parameter. implicitPrefs: This hyperparameter is a specifier. It lets us use either of the following: Explicit feedback Implicit feedback alpha: This is a hyperparameter connected to an implicit feedback variant of the ALS algorithm. Its role is to govern the baseline confidence in preference observations. We just explained the role played by each parameter needed by the ALS algorithm's train method. Let's get started by bringing in the following imports: import org.apache.spark.mllib.recommendation.MatrixFactorizationModel Now, let's get down to training the matrix factorization model using the ALS algorithm. Let's train a matrix factorization model given an RDD of ratings by customers (users) for certain items (products). Our train method on the ALS algorithm will take the following four parameters: Ratings. A rank. A number of iterations. A Lambda value or regularization parameter: val ratingsModel: MatrixFactorizationModel = ALS.train(ratings, 6, /* THE RANK */ 10, /* Number of iterations */ 15.0 /* Lambda, or regularization parameter */ ) Next, we load the sales lead file and convert it into a tuple format: val weaponSalesLeadDf = buildSalesLeads("sales\\ItemSalesLeads.csv") In the next section, we will display the new weapon sales lead dataframe. Step 6 – displaying the weapons sales dataframe First, we must invoke the show method: println("Weapons Sales Lead dataframe is: ") weaponSalesLeadDf.show Here is a view of the weapon sales lead dataframe: View of weapon sales lead dataframe Next, create a version of the sales lead dataframe structured as (customer, item) tuples: val customerWeaponsSystemPairDf: DataFrame = weaponSalesLeadDf.map(salesLead => ( salesLead.getInt(0), salesLead.getInt(2) )).toDF("user","item") In the next section, let's display the dataframe that we just created. Step 7 – displaying the customer-weapons-system dataframe Let's the show method, as follows: println("The Customer-Weapons System dataframe as tuple pairs looks like: ") customerWeaponsSystemPairDf.show Here is a screenshot of the new customer-weapons-system dataframe as tuple pairs: New customer-weapons-system dataframe as tuple pairs Next, we will convert the preceding dataframe into an RDD: val customerWeaponsSystemPairRDD: RDD[(Int, Int)] = customerWeaponsSystemDf.rdd.map(row => (row.getInt(0), row.getInt(1)) ) /* Notes: As far as the algorithm is concerned, customer corresponds to "user" and "product" or item corresponds to a "weapons system" */ We previously created a MatrixFactorization model that we trained with the weapons system sales orders dataset. We are in a position to predict how each customer country may rate a weapon system in the future. In the next section, we will generate predictions. Step 8 – generating predictions Here is how we will generate predictions. The predict method of our model is designed to do just that. It will generate a predictions RDD that we call weaponRecs. It represents the ratings of weapons systems that were not rated by customer nations (listed in the past sales order data) previously: val weaponRecs: RDD[Rating] = ratingsModel.predict(customerWeaponsSystemPairRDD).distinct() Next up, we will display the final predictions. Step 9 – displaying predictions Here is how to display the predictions, lined up in tabular format: println("Future ratings are: " + weaponRecs.foreach(rating => { println( "Customer: " + rating.user + " Product: " + rating.product + " Rating: " + rating.rating ) } ) ) The following table displays how each nation is expected to rate a certain system in the future, that is, a weapon system that they did not rate earlier: System rating by each nation Our recommendation system proved itself capable of generating future predictions. Up until now, we did not say how all of the preceding code is compiled and deployed. We will look at this in the next section. Compilation and deployment Compiling the project Invoke the sbt compile project at the root folder of your Chapter7 project. You should get the following output: Output on compiling the project Besides loading build.sbt, the compile task is also loading settings from assembly.sbt which we will create below. What is an assembly.sbt file? We have not yet talked about the assembly.sbt file. Our scala-based Spark application is a Spark job that will be submitted to a (local) Spark cluster as a JAR file. This file, apart from Spark libraries, also needs other dependencies in it for our recommendation system job to successfully complete. The name fat JAR is from all dependencies bundled in one JAR. To build such a fat JAR, we need an sbt-assembly plugin. This explains the need for creating a new assembly.sbt and the assembly plugin. Creating assembly.sbt Create a new assembly.sbt in your IntelliJ project view and save it under your project folder, as follows: Creating assembly.sbt Contents of assembly.sbt Paste the following contents into the newly created assembly.sbt (under the project folder). The output should look like this: Output on placing contents of assembly.sbt The sbt-assembly plugin, version 0.14.7, gives us the ability to run an sbt-assembly task. With that, we are one step closer to building a fat or Uber JAR. This action is documented in the next step. Running the sbt assembly task Issue the sbt assembly command, as follows: Running the sbt assembly command This time, the assembly task loads the assembly-plugin in assembly.sbt. However, further assembly halts because of a common duplicate error. This error arises due to several duplicates, multiple copies of dependency files that need removal before the assembly task can successfully complete. To address this situation, build.sbt needs an upgrade. Upgrading the build.sbt file The following lines of code need to be added in, as follows: Code lines for upgrading the build.sbt file To test the effect of your changes, save this and go to the command line to reissue the sbt assembly task. Rerunning the assembly command Run the assembly task, as follows: Rerunning the assembly task This time, the settings in the assembly.sbt file are loaded. The task completes successfully. To verify, drill down to the target folder. If everything went well, you should see a fat JAR, as follows: Output as a JAR file Our JAR file under the target folder is the recommendation system application's JAR file that needs to be deployed into Spark. This is documented in the next step. Deploying the recommendation application The spark-submit command is how we will deploy the application into Spark. Here are two formats for the spark-submit command. The first one is a long one which sets more parameters than the second one: spark-submit --class "com.packt.modern.chapter7.RecSystem" --master local[2] --deploy-mode client --driver-memory 16g -num-executors 2 --executor-memory 2g --executor-cores 2 <path-to-jar> Leaning on the preceding format, let's submit our Spark job, supplying various parameters to it: Parameters for Spark The different parameters are explained as follows: Tabular explanation of parameters for Spark Job We used Spark's support for recommendations to build a prediction model that generated recommendations and leveraged Spark's alternating least squares algorithm to implement our collaborative filtering recommendation system. If you've enjoyed reading this post, do check out the book Modern Scala Projects to gain insights into data that will help organizations have a strategic and competitive advantage. How to Build a music recommendation system with PageRank Algorithm Recommendation Systems Building A Recommendation System with Azure

0
0
16905

How-To Tutorials

article-image-data-bindings-with-knockout-js

Vijin Boricha

23 Apr 2018

7 min read

Data bindings with Knockout.js

Vijin Boricha

23 Apr 2018

7 min read

Today, we will learn about three data binding abilities of Knockout.js. Data bindings are attributes added by the framework for the purpose of data access between elements and view scope. While Observable arrays are efficient in accessing the list of objects with the number of operations on top of the display of the list using the foreach function, Knockout.js has provided three additional data binding abilities: Control-flow bindings Appearance bindings Interactive bindings Let us review these data bindings in detail in the following sections. Control-flow bindings As the name suggests, control-flow bindings help us access the data elements based on a certain condition. The if, if-not, and with are the control-flow bindings available from the Knockout.js. In the following example, we will be using if and with control-flow bindings. We have added a new attribute to the Employee object called age; we are displaying the age value in green only if it is greater than 20. Similarly, we have added another markedEmployee. With control-flow binding, we can limit the scope of access to that specific employee object in the following paragraph. Add the following code snippet to index.html and run the program to see the if and with control-flow bindings working: <!DOCTYPE html> <html> <head> <title>Knockout JS</title> </head> <body> <h1>Welcome to Knockout JS programming</h1> <table border="1" > <tr > <th colspan="2" style="padding:10px;"> <b>Employee Data - Organization : <span style="color:red" data-bind='text: organizationName'> </span> </b> </th> </tr> <tr> <td style="padding:10px;">Employee First Name:</td> <td style="padding:10px;"> <span data-bind='text: empFirstName'></span> </td> </tr> <tr> <td style="padding:10px;">Employee Last Name:</td> <td style="padding:10px;"> <span data-bind='text: empLastName'></span> </td> </tr> </table> <p>Organization Full Name : <span style="color:red" data-bind='text: orgFullName'></span> </p>  <h2>Observable Array Example : </h2> <table border="1"> <thead><tr> <th style="padding:10px;">First Name</th> <th style="padding:10px;">Last Name</th> <th style="padding:10px;">Age</th> </tr></thead> <tbody data-bind='foreach: organization'> <tr> <td style="padding:10px;" data-bind='text: firstName'></td> <td style="padding:10px;" data-bind='text: lastName'></td> <td data-bind="if: age() > 20" style="color: green;padding:10px;"> <span data-bind='text:age'></span> </td> </tr> </tbody> </table>  <p data-bind='with: markedEmployee'> Employee <strong data-bind="text: firstName() + ', ' + lastName()"> </strong> is marked with the age <strong data-bind='text: age'> </strong> </p> <h2>Add New Employee to Observable Array</h2> First Name : <input data-bind="value: newFirstName" /> Last Name : <input data-bind="value: newLastName" /> Age : <input data-bind="value: newEmpAge" /> <button data-bind='click: addEmployee'>Add Employee</button>  <script type='text/javascript' src='js/knockout-3.4.2.js'></script> <script type='text/javascript'> function Employee (firstName, lastName,age) { this.firstName = ko.observable(firstName); this.lastName = ko.observable(lastName); this.age = ko.observable(age); }; this.addEmployee = function() { this.organization.push(new Employee (employeeViewModel.newFirstName(), employeeViewModel.newLastName(), employeeViewModel.newEmpAge())); }; var employeeViewModel = { empFirstName: "Tony", empLastName: "Henry", //Observable organizationName: ko.observable("Sun"), newFirstName: ko.observable(""), newLastName: ko.observable(""), newEmpAge: ko.observable(""), //With control flow object markedEmployee: ko.observable(new Employee("Garry", "Parks", "65")), //Observable Arrays organization : ko.observableArray([ new Employee("John", "Kennedy", "24"), new Employee("Peter", "Hennes","18"), new Employee("Richmond", "Smith","54") ]) }; //Computed Observable employeeViewModel.orgFullName = ko.computed(function() { return employeeViewModel.organizationName() + " Limited"; }); ko.applyBindings(employeeViewModel); employeeViewModel.organizationName("Oracle"); </script> </body> </html> Run the preceding program to see the if control-flow acting on the Age field, and the with control-flow showing a marked employee record with age 65: Appearance bindings Appearance bindings deal with displaying the data from binding elements on view components in formats such as text and HTML, and applying styles with the help of a set of six bindings, as follows: Text: <value>—Sets the value to an element. Example: <td data-bind='text: name'></td> HTML: <value>—Sets the HTML value to an element. Example: //JavaScript: function Employee(firstname, lastname, age) { ... this.formattedName = ko.computed(function() { return "<strong>" + this.firstname() + "</strong>"; }, this); } //Html: <span data-bind='html: markedEmployee().formattedName'></span> Visible: <condition>—An element can be shown or hidden based on the condition. Example: <td data-bind='visible: age() > 20' style='color: green'> span data-bind='text:age'> CSS: <object>—An element can be associated with a CSS class. Example: //CSS: .strongEmployee { font-weight: bold; } //HTML: <span data-bind='text: formattedName, css: {strongEmployee}'> </span> Style: <object>—Associates an inline style to the element. Example: <span data-bind='text: age, style: {color: age() > 20 ? "green" :"red"}'> </span> Attr: <object>—Defines an attribute for the element. Example: <p><a data-bind='attr: {href: featuredEmployee().populatelink}'> View Employee</a></p> Interactive bindings Interactive bindings help the user interact with the form elements to be associated with corresponding viewmodel methods or events to be triggered in the pages. Knockout JS supports the following interactive bindings: Click: <method>—An element click invokes a ViewModel method. Example: <button data-bind='click: addEmployee'>Submit</button> Value:<property>—Associates the form element value to the ViewModel attribute. Example: <td>Age: <input data-bind='value: age' /></td> Event: <object>—With an user-initiated event, it invokes a method. Example: <p data-bind='event: {mouseover: showEmployee, mouseout: hideEmployee}'> Age: <input data-bind='value: Age' /> </p> Submit: <method>—With a form submit event, it can invoke a method. Example: <form data-bind="submit: addEmployee"> <!—Employee form fields --> <button type="submit">Submit</button> </form> Enable: <property>—Conditionally enables the form elements. Example: last name field is enabled only after adding first name field. Disable: <property>—Conditionally disables the form elements. Example: last name field is disabled after adding first name: <p>Last Name: <input data-bind='value: lastName, disable: firstName' /> </p> Checked: <property>—Associates a checkbox or radio element to the ViewModel attribute. Example: <p>Gender: <input data-bind='checked:gender' type='checkbox' /></p> Options: <array>—Defines a ViewModel array for the<select> element. Example: //Javascript: this.designations = ko.observableArray(['manager', 'administrator']); //Html: Designation: <select data-bind='options: designations'></select> selectedOptions: <array>—Defines the active/selected element from the <select> element. Example: Designation: <select data-bind='options: designations, optionsText:"Select", selectedOptions:defaultDesignation'> </select> hasfocus: <property>—Associates the focus attribute to the element. Example: First Name: <input data-bind='value: firstName, hasfocus: firstNameHasFocus' /> We learned about data binding abilities of Knockout.js. You can know more about external data access and Hybrid Mobile Application Development from the book Oracle JET for Developers. Read More Text and appearance bindings and form field bindings Getting to know KnockoutJS Templates

0
0
16888

How-To Tutorials

article-image-installing-and-configuring-network-monitoring-software

Packt

02 Jun 2015

9 min read

Installing and Configuring Network Monitoring Software

Packt

02 Jun 2015

9 min read

This article written by Bill Pretty, Glenn Vander Veer, authors of the book Building Networks and Servers Using BeagleBone will serve as an installation guide for the software that will be used to monitor the traffic on your local network. These utilities can help determine which devices on your network are hogging the bandwidth, which slows down the network for other devices on your network. Here are the topics that we are going to cover: Installing traceroute and My Trace Route (MTR or Matt's Traceroute): These utilities will give you a real-time view of the connection between one node and another Installing Nmap: This utility is a network scanner that can list all the hosts on your network and all the services available on those hosts Installing iptraf-ng: This utility gathers various network traffic information and statistics (For more resources related to this topic, see here.) Installing Traceroute Traceroute is a tool that can show the path from one node on a network to another. This can help determine the ideal placement of a router to maximize wireless bandwidth in order to stream music and videos from the BeagleBone server to remote devices. Traceroute can be installed with the following command: apt-get install traceroute Once Traceroute is installed, it can be run to find the path from the BeagleBone to any server anywhere in the world. For example, here's the route from my BeagelBone to the Canadian Google servers: Now, it is time to decipher all the information that is presented. This first command line tells traceroute the parameters that it must use: traceroute to google.ca (74.125.225.23), 30 hops max, 60 byte packets This gives the hostname, the IP address returned by the DNS server, the maximum number of hops to be taken, and the size of the data packet to be sent. The maximum number of hops can be changed with the –m flag and can be up to 255. In the context of this book, this will not have to be changed. After the first line, the next few lines show the trip from the BeagleBone, through the intermediate hosts (or hops), to the Google.ca server. Each line follows the following format: hop_number host_name (host IP_address) packet_round_trip_times From the command that was run previously (specifically hop number 4): 2 10.149.206.1 (10.149.206.1) 15.335 ms 17.319 ms 17.232 ms Here's a breakdown of the output: The hop number 2: This is a count of the number of hosts between this host and the originating host. The higher the number, the greater is the number of computers that the traffic has to go through to reach its destination. 10.149.206.1: This denotes the hostname. This is the result of a reverse DNS lookup on the IP address. If no information is returned from the DNS query (as in this case), the IP address of the host is given instead. (10.149.206.1): This is the actual host IP address. Various numbers: This is the round-trip time for a packet to go from the BeagleBone to the server and back again. These numbers will vary depending on network traffic, and lower is better. Sometimes, the traceroute will return some asterisks (*). This indicates that the packet has not been acknowledged by the host. If there are consecutive asterisks and the final destination is not reached, then there may be a routing problem. In a local network trace, it most likely is a firewall that is blocking the data packet. Installing My Traceroute My Traceroute (MTR) is an extension of traceroute, which probes the routers on the path from the packet source and destination, and keeps track of the response times of the hops. It does this repeatedly so that the response times can be averaged. Now, install mtr with the following command: sudo apt-get install mtr After it is run, mtr will provide quite a bit more information to look at, which would look like the following: While the output may look similar, the big advantage over traceroute is that the output is constantly updated. This allows you to accumulate trends and averages and also see how network performance varies over time. When using traceroute, there is a possibility that the packets that were sent to each hop happened to make the trip without incident, even in a situation where the route is suffering from intermittent packet loss. The mtr utility allows you to monitor this by gathering data over a wider range of time. Here's an mtr trace from my Beaglebone to my Android smartphone: Here's another trace, after I changed the orientation of the antennae of my router: As you can see, the original orientation was almost 100 milliseconds faster for ping traffic. Installing Nmap Nmap is designed to allow the scanning of networks in order to determine which hosts are up and what services are they offering. Nmap supports a large number of scanning options, which are overkill for what will be done in this book. Nmap is installed with the following command: sudo apt-get install nmap Answer Yes to install nmap and its dependent packages. Using Nmap After it is installed, run the following command to see all the hosts that are currently on the network: nmap –T4 –F <your_local_ip_range> The option -T4 sets the timing template to be used, and the -F option is for fast scanning. There are other options that can be used and found via the nmap manpage. Here, your_local_ip_range is within the range of addresses assigned by your router. Here's a node scan of my local network. If you have a lot of devices on your local network, this command may take a long time to complete. Now, I know that I have more nodes on my network, but they don't show up. This is because the command we ran didn't tell nmap to explicitly query each IP address to see whether the host responds but to query common ports that may be open to traffic. Instead, only use the -Pn option in the command to tell nmap to scan all the ports for every address in the range. This will scan more ports on each address to determine whether the host is active or not. Here, we can see that there are definitely more hosts registered in the router device table. This scan will attempt to scan a host IP address even if the device is powered off. Resetting the router and running the same scan will scan the same address range, but it will not return any device names for devices that are not powered at the time of the scan. You will notice that after scanning, nmap reports that some IP addresses' ports are closed and some are filtered. Closed ports are usually maintained on the addresses of devices that are locked down by their firewall. Filtered ports are on the addresses that will be handled by the router because there actually isn't a node assigned to these addresses. Here's a part of the output from an nmap scan of my Windows machine: Here's a part of the output of a scan of the BeagleBone: Installing iptraf-ng Iptraf-ng is a utility that monitors traffic on any of the interfaces or IP addresses on your network via custom filters. Because iptraf-ng is based on the ncurses libraries, we will have to install them first before downloading and compiling the actual iptraf-ng package. To install ncurses, run the following command: sudo apt-get install libncurses5-dev Here's how you will install ncurses and its dependent packages: Once ncurses is installed, download and extract the iptraf-ng tarball so that it can be built. At the time of writing this book, iptrf-ng's version 1.1.4 was available. This will change over time, and a quick search on Google will give you the latest and greatest version to download. You can download this version with the following command: wget https://fedorahosted.org/releases/i/p/iptraf-ng/iptraf-ng- <current_version_number>.tar.gz The following screenshot shows how to download the iptraf-ng tarball: After we have completed the downloading, extract the tarball using the following command: tar –xzf iptraf-ng-<current_version_number>.tar.gz Navigate to the iptraf-ng directory created by the tar command and issue the following commands: ./configure make sudo make install After these commands are complete, iptraf-ng is ready to run, using the following command: sudo iptraf-ng When the program starts, you will be presented with the following screen: Configuring iptraf-ng As an example, we are going to monitor all incoming traffic to the BeagleBone. In order to do this, iptraf-ng should be configured. Selecting the Configure... menu item will show you the following screen: Here, settings can be changed by highlighting an option in the left-hand side window and pressing Enter to select a new value, which will be shown in the Current Settings window. In this case, I have enabled all the options except Logging. Exit the configuration screen and enter the Filter Status screen. This is where we will set up the filter to only monitor traffic coming to the BeagleBone and from it. Then, the following screen will be presented: Selecting IP... will create an IP filter, and the following subscreen will pop up: Selecting Define new filter... will allow the creation and saving of a filter that will only display traffic for the IP address and the IP protocols that are selected, as shown in the following screenshot: Here, I have put in the BeagleBone's IP address, and to match all IP protocols. Once saved, return to the main menu and select IP traffic monitor. Here, you will be able to select the network interfaces to be monitored. Because my BeagleBone is connected to my wired network, I have selected eth0. The following screenshot should shows us the options: If all went well with your filter, you should see traffic to your BeagleBone and from it. Here are the entries for my PuTTy session; 192.168.17.2 is my Windows 8 machine, and 192.168.17.15 is my BeagleBone: Here's an image of the traffic generated by browsing the DLNA server from the Windows Explorer: Moreover, here's the traffic from my Android smartphone running a DLNA player, browsing the shared directories that were set up: Summary In this article, you saw how to install and configure the software that will be used to monitor the traffic on your local network. With these programs and a bit of experience, you can determine which devices on your network are hogging the bandwidth and find out whether you have any unauthorized users. Resources for Article: Further resources on this subject: Learning BeagleBone [article] Protecting GPG Keys in BeagleBone [article] Home Security by BeagleBone [article]

0
0
16881

How-To Tutorials

article-image-facebook-fails-to-block-ecj-data-security-case-from-proceeding

Guest Contributor

19 Jun 2019

6 min read

Facebook fails to block ECJ data security case from proceeding

Guest Contributor

19 Jun 2019

6 min read

This July, the European Court of Justice (ECJ) in Luxembourg will now hear a case to answer questions on whether the American government's surveillance, Privacy Shield and Standard Contract Clauses, during EU-US data transfers, provides adequate protection of EU citizen's personal information. The ECJ set the case hearing after the supreme court of Ireland — where Facebook's international headquarters is located — decided, on Friday, May 31st, 2019, to dismiss an appeal by Facebook to block the data security case from progressing to the ECJ. The Austrian Supreme Court has also recently rejected Facebook’s bid to stop a similar case. If Europe's Court of Justice makes a ruling against the current legal arrangements, this would majorly impact thousands of companies, which make millions of data transfers every day. Companies potentially affected, include human resources databases, storage of internet browsing histories and credit card companies. Background on this case The case started with the Austrian privacy lawyer and campaigner, Max Schrems. In 2013, Schrems made a complaint regarding concerns that US surveillance programs like the PRISM system were accessing the data of European Facebook users, as whistleblower Edward Snowden described. His concerns also dealt with Facebook’s use of a separate data transfer mechanism — Standard Contractual Clauses (SCCs). Around the time Snowden disclosed about the US government's mass surveillance programs, Schrems also challenged the legality of the prior EU-US data transfer arrangement, Safe Harbor, eventually bringing it down. After Schrems stated that the transfer of his data by Facebook to the US infringed upon his rights as an EU citizen, Ireland's High Court ruled, in 2017, that the US government partook in "mass indiscriminate processing of data" and deferred concerns to the European Court of Justice. Then, in October of last year, the High Court referred this case to the ECJ based on the Data Protection Commissioner's "well-founded" concerns about whether or not US law provided adequate protection for EU citizens' data privacy rights. Within all of this, there also exist people questioning the compatibility between US law which focuses on national security and EU law which aims for personal privacy. Whistleblowers like Edward Snowden played a role in what has lead up to this case, and whistleblower attorneys and paraprofessionals continue working to expose fraud against the government through means of the False Claims Acts (FCA). Why Facebook appealed the case Although Irish law doesn't require an appeal against CJEU referrals, Facebook chose to stay and appeal the decision anyway, aiming to keep it from progressing to court. The court denied them the stay but granted them leave to appeal last year. Keep in mind that Facebook was already under a lot of scrutiny after playing a part in the Cambridge Analytica data scandal, which showed that up to 87 million users faced having their data compromised by Cambridge Analytica. One of the reasons Facebook said it wanted to block this case from progressing was that the High Court failed to regard the 'Privacy Shield' decision. Under the Privacy Shield decision, the European Commission had approved the use of certain EU-US data transfer channels. Another main issue here was whether Facebook actually had the legal rights to appeal a referral to the ECJ. Privacy Shield is also in question by French digital rights groups who claim it disrupts fundamental EU rights and will be heard by the General Court of the EU in July. Why the appeal was dismissed The five-judge High Court, headed by the Chief Justice Frank Clarke, decided they cannot entertain an appeal over the referral decision itself. In addition, he said Facebook’s criticisms related to the “proper characterization” of underlying facts rather than the facts themselves. If there had been any actual finding of facts not sustainable on the evidence before the High Court per Irish procedural law, he would have overturned it, but no such matter had been established on this appeal, he ruled. "Joint Control" and its possible impact on the case In June 2018, after a Facebook fan page was found to have been allowing visitor data to be collected by Facebook via a cookie on the fan page, without informing visitors, The Federal Administrative Court of Germany referred the case to ECJ. This resulted in the ECJ deciding to deem joint responsibility between social media networks and administrators in the processing of visitor data. The ECJ´s ruling, in this case, has consequences not only for Facebook sites but for other situations where more than one company or administrator plays an active role in the data processing. The concept of “joint control” is now on the table, and further decisions of authorities and courts in this area are likely. What's next for data security Currently, Facebook also faces questioning by Ireland's Data Protection Commission over numerous potential infringements of strict European privacy laws that the new General Data Protection Regulation (GDPR) outlines. Facebook, however, already stated it will take the necessary steps to ensure the site operators can comply with the GDPR. There have even been pleas for Global Data Laws. A common misconception exists that only big organizations, governments and businesses are at risk for data security breaches, but this is simply not true. Data security is important for everyone — now more than ever. Your computer, tablet and mobile devices could be affected by attackers for their sensitive information, such as credit card details, banking details and passwords, by way of phishing attacks, malware attacks, ransomware attacks, man-in-the-middle attacks and more. Therefore, bringing continual awareness to these US and global data security issues will enable stricter laws to be put in place. Kayla Matthews writes about big data, cybersecurity and technology. You can find her work on The Week, Information Age, KDnuggets and CloudTweaks, or over at ProductivityBytes.com. Facebook releases Pythia, a deep learning framework for vision and language multimodal research Zuckberg just became the target of the world’s first high profile white hat deepfake op. Can Facebook come out unscathed? US regulators plan to probe Google on anti-trust issues; Facebook, Amazon & Apple also under legal scrutiny

0
0
16864

article-image-25-startups-machine-learning-differently-2018

Fatema Patrawala

29 Dec 2017

14 min read

25 Startups using machine learning differently in 2018: From farming to brewing beer to elder care

Fatema Patrawala

29 Dec 2017

14 min read

What really excites me about data science and by extension machine learning is the sheer number of possibilities! You can think of so many applications off the top of your head: robo-advisors, computerized lawyers, digital medicine, even automating VC decisions when they invest in startups. You can even venture into automation of art and music, algorithms writing papers which are indistinguishable from human-written papers. It's like solving a puzzle, but a puzzle that's meaningful and that has real world implications. The things that we can do today weren’t possible 5 years ago, and this is largely thanks to growth in computational power, data availability, and the adoption of the cloud that made accessing these resources economical for everyone, all key enabling factors for the advancement of Machine learning and AI. Having witnessed the growth of data science as discipline, industries like finance, health-care, education, media & entertainment, insurance, retail as well as energy has left no stone unturned to harness this opportunity. Data science has the capability to offer even more; and we will see the wide range of applications in the future in places haven’t even been explored. In the years to come, we will increasingly see data powered/AI enabled products and services take on roles traditionally handled by humans as they required innately human qualities to successfully perform. In this article we have covered some use cases of Data Science being used differently and start-ups who have practically implemented it: The Nurturer: For elder care The world is aging rather rapidly. According to the World Health Organization, nearly two billion people across the world are expected to be over 60 years old by 2050, a figure that’s more than triple what it was in 2000. In order to adapt to their increasingly aging population, many countries have raised the retirement age, reducing pension benefits, and have started spending more on elderly care. Research institutions in countries like Japan, home to a large elderly population, are focusing their R&D efforts on robots that can perform tasks like lifting and moving chronically ill patients, many startups are working on automating hospital logistics and bringing in virtual assistance. They also offer AI-based virtual assistants to serve as middlemen between nurses and patients, reducing the need for frequent in-hospital visits. Dr Ben Maruthappu, a practising doctor, has brought a change to the world of geriatric care with an AI based app Cera. It is an on-demand platform to aid the elderly in need. The Cera app firmly puts itself in the category of Uber & Amazon, whereby it connects elderly people in need of care with a caregiver in a matter of few hours. The team behind this innovation also plans to use AI to track patients’ health conditions and reduce the number of emergency patients admitted in hospitals. A social companion technology - Elliq created by Intuition Robotics helps older adults stay active and engaged with a proactive social robot that overcomes the digital divide. AliveCor, a leading FDA-cleared mobile heart solution helps save lives, money, and has brought modern healthcare alive into the 21st century. The Teacher: Personalized education platform for lifelong learning With children increasingly using smartphones and tablets and coding becoming a part of national curricula around the world, technology has become an integral part of classrooms. We have already witnessed the rise and impact of education technology especially through a multitude of adaptive learning platforms that allow learners to strengthen their skills and knowledge - CBTs, LMSes, MOOCs and more. And now virtual reality (VR) and artificial intelligence (AI) are gaining traction to provide us with lifelong learning companion that can accompany and support individuals throughout their studies - in and beyond school . An AI based educational platform learns the amount of potential held by each particular student. Based on this data, tailored guidance is provided to fix mistakes and improvise on the weaker areas. A detailed report can be generated by the teachers to help them customise lesson plans to best suit the needs of the student. Take Gruff Davies’ Kwiziq for example. Gruff with his team leverage AI to provide a personalised learning experience for students based on their individual needs. Students registered on the platform get an advantage of an AI based language coach which asks them to solve various micro quizzes. Quiz solutions provided by students are then turned into detailed “brain maps”. These brain maps are further used to provide tailored instructions and feedback for improvement. Other startup firms like Blippar specialize in Augmented reality for visual and experiential learning. Unelma Platforms, a software platform development company provides state-of-the-art software for higher-education, healthcare and business markets. The Provider: Farming to be more productive, sustainable and advanced Though farming is considered the backbone of many national economies especially in the developing world, there is often an outdated view of it involving a small, family-owned lands where crops are hand harvested. The reality of modern-day farms have had to overhaul operations to meet demand and remain competitively priced while adapting to the ever-changing ways technology is infiltrating all parts of life. Climate change is a serious environmental threat farmers must deal with every season: Strong storms and severe droughts have made farming even more challenging. Additionally lack of agricultural input, water scarcity, over-chemicalization in fertilizers, water & soil pollution or shortage of storage systems has made survival for farmers all the more difficult. To overcome these challenges, smart farming techniques are the need of an hour for farmers in order to manage resources and sustain in the market. For instance, in a paper published by arXiv, the team explains how they used a technique known as transfer learning to teach the AI how to recognize crop diseases and pest damage.They utilized TensorFlow, to build and train a neural network of their own, which involved showing the AI 2,756 images of cassava leaves from plants in Tanzania. Their efforts were a success, as the AI was able to correctly identify brown leaf spot disease with 98 percent accuracy. WeFarm, SaaS based agritech firm, headquartered in London, aims to bridge the connectivity gap amongst the farmer community. It allows them to send queries related to farming via text message which is then shared online into several languages. The farmer then receives a crowdsourced response from other farmers around the world. In this way, a particular farmer in Kenya can get a solution from someone sitting in Uganda, without having to leave his farm, spend additional money or without accessing the internet. Benson Hill Bio-systems, by Matthew B. Crisp, former President of Agricultural Biotechnology Division, has differentiated itself by bringing the power of Cloud Biology™ to agriculture. It combines cloud computing, big data analytics, and plant biology to inspire innovation in agriculture. At the heart of Benson Hill is CropOS™, a cognitive engine that integrates crop data and analytics with the biological expertise and experience of the Benson Hill scientists. CropOS™ continuously advances and improves with every new dataset, resulting in the strengthening of the system’s predictive power. Firms like Plenty Inc and Bowery Farming Inc are nowhere behind in offering smart farming solutions. Plenty Inc is an agriculture technology company that develops plant sciences for crops to flourish in a pesticide and GMO-free environment. While Bowery Farming uses high-tech approaches such as robotics, LED lighting and data analytics to grow leafy greens indoors. The Saviour: For sustainability and waste management The global energy landscape continues to evolve, sometimes by the nanosecond, sometimes by the day. The sector finds itself pulled to economize and pushed to innovate due to a surge in demand for new power and utilities offerings. Innovations in power-sector technology, such as new storage battery options and smartphone-based thermostat apps, AI enabled sensors etc; are advancing at a pace that has surprised developers and adopters alike. Consumer’s demands for such products have increased. To meet this, industry leaders are integrating those innovations into their operations and infrastructure as rapidly as they can. On the other hand, companies pursuing energy efficiency have two long-standing goals — gaining a competitive advantage and boosting the bottom line — and a relatively new one: environmental sustainability. Realising the importance of such impending situations in the industry, we have startups like SmartTrace offering an innovative cloud-based platform to quickly manage waste at multiple levels. This includes bridging rough data from waste contractors, extrapolating to volume, EWC, finance and Co2 statistics. Data extracted acts as a guide to improve methodology, educate, strengthen oversight and direct improvements to the bottom line, as well as environmental outcomes. One Concern provides damage estimates using artificial intelligence on natural phenomena sciences. Autogrid organizes energy data and employs big data analytics to generate real-time predictions to create actionable data. The Dreamer: For lifestyle and creative product development and design Consumers in our modern world continually make multiple decisions with regard to product choice due to many competing products in the market.Often those choices boil down to whether it provides better value than others either in terms of product quality, price or by aligning with their personal beliefs and values.Lifestyle products and brands operate off ideologies, hoping to attract a relatively high number of people and ultimately becoming a recognized social phenomenon. While ecommerce has leveraged data science to master the price dimension, here are some examples of startups trying to deconstruct the other two dimensions: product development and branding. I wonder if you have ever imagined your beer to be brewed by AI? Well, now you can with IntelligentX. The Intelligent X team claim to have invented the world's first beer brewed by Artificial intelligence. They also plan to craft a premium beer using complex machine learning algorithms which can improve itself from the feedback given by its customers. Customers are given to try one of their four bottled conditioned beers, after the trial they are asked by their AI what they think of the beer, via an online feedback messenger. The data then collected is used by an algorithm to brew the next batch. Because their AI is constantly reacting to user feedback, they can brew beer that matches what customers want, more quickly than anyone else can. What this actually means that the company gets more data and customers get a customized fresh beer! In the lifestyle domain, we have Stitch Fix which has brought a personal touch to the online shopping journey. They are no regular other apparel e-commerce company. They have created a perfect formula for blending human expertise with the right amount of Data Science to serve their customers. According to Katrina Lake, Founder, and CEO, "You can look at every product on the planet, but trying to figure out which one is best for you is really the challenge” and that’s where Stitch Fix has come into the picture. The company is disrupting traditional retail by bridging the gap of personalized shopping, that the former could not achieve. To know how StitchFix uses Full Stack Data Science read our detailed article. The Writer: From content creation to curation to promotion In the publishing industry, we have seen a digital revolution coming in too. Echobox are one of the pioneers in building AI for the publishing industry. Antoine Amann, founder of Echobox, wrote in a blog post that they have "developed an AI platform that takes large quantity of variables into account and analyses them in real time to determine optimum post performance". Echobox pride itself to currently work with Facebook and Twitter for optimizing social media content, perform advanced analytics with A/B testing and also curate content for desired CTRs. With global client base like The Le Monde, The Telegraph, The Guardian etc. they have conveniently ripped social media editors. New York-based startup Agolo uses AI to create real-time summaries of information. It initially use to curate Twitter feeds in order to focus on conversations, tweets and hashtags that were most relevant to its user's preferences. Using natural language processing, Agolo scans content, identifies relationships among the information sources, and picks out the most relevant information, all leading to a comprehensive summary of the original piece of information. Other websites like Grammarly, offers AI-powered solutions to help people write, edit and formulate mistake-free content. Textio came up with augmented writing which means every time you wrote something and you would come to know ahead of time exactly who is going to respond. It basically means writing which is supported by outcomes in real time. Automated Insights, Creator of Wordsmith, the natural language generation platform enables you to produce human-sounding narratives from data. The Matchmaker: Connecting people, skills and other entities AI will make networking at B2B events more fun and highly productive for business professionals. Grip, a London based startup, formerly known as Network, rebranded itself in the month of April, 2016. Grip is using AI as a platform to make networking at events more constructive and fruitful. It acts as a B2B matchmaking engine that accumulates data from social accounts (LinkedIn, Twitter) and smartly matches the event registration data. Synonymous to Tinder for networking, Grip uses advanced algorithms to recommend the right people and presents them with an easy to use swiping interface feature. It also delivers a detailed report to the event organizer on the success of the event for every user or a social Segment. We are well aware of the data scientist being the sexiest job of the 21st century. JamieAi harnessing this fact connects technical talent with data-oriented jobs organizations of all types and sizes. The start-up firm has combined AI insights and human oversight to reduce hiring costs and eliminate bias. Also, third party recruitment agencies are removed from the process to boost transparency and efficiency in the path to employment. Another example is Woo.io, a marketplace for matching tech professionals and companies. The Manager: Virtual assistants of a different kind Artificial Intelligence can also predict how much your household appliance will cost on your electricity bill. Verv, a producer of clever home energy assistance provides intelligent information on your household appliances. It helps its customers with a significant reduction on their electricity bills and carbon footprints. The technology uses machine learning algorithms to provide real-time information by learning how much power and money each device is using. Not only this, it can also suggest eco-friendly alternatives, alert homeowners of appliances in use for a longer duration and warn them of any dangerous activity when they aren’t present at home. Other examples include firms like Maana which manages machines and improves operational efficiencies in order to make fast data driven decisions. Gong.io, acts as a sales representative’s assistant to understand sales conversations resulting into actionable insights. ObEN, creates complete virtual identities for consumers and celebrities in the emerging digital world. The Motivator: For personal and business productivity and growth A super cross-functional company Perkbox, came up with an employee engagement platform. Saurav Chopra founder of Perkbox believes teams perform their best when they are happy and engaged! Hence, Perkbox helps companies boost employee motivation and create a more inspirational atmosphere to work. The platform offers gym services, dental discounts and rewards for top achievers in the team to firms in UK. Perkbox offers a wide range of perks, discounts and tools to help organizations retain and motivate their employees. Technologies like AWS and Kubernetes allow to closely knit themselves with their development team. In order to build, scale and support Perkbox application for the growing number of user base. So, these are some use cases where we found startups using data science and machine learning differently. Do you know of others? Please share them in the comments below.

0
0
16848

article-image-raspberry-pi-led-blueprints

Packt

16 Sep 2015

5 min read

Raspberry Pi LED Blueprints

Packt

16 Sep 2015

5 min read

Blinking LEDs is a popular application in the field of embedded development. In Raspberry Pi LED Blueprints by Agus Kurniawan, we are going to design, build, and test LED-based projects using the Raspberry Pi. To Implement real LED-based projects for Raspberry Pi, we need to learn how to interface various LED modules, such as LEDs, 7-segment, 4-digit 7-segment, and dot matrix to Raspberry Pi. We will get hands-on experience by exploring real-time LEDs with this project-based book. (For more resources related to this topic, see here.) Why Raspberry Pi? The Raspberry Pi was designed by the Raspberry Pi Foundation in the UK initially to help schoolkids learn basic computer science knowledge. The Raspberry Pi uses Linux as a basic programming language, and they attempt to come up with their own language that fits this technology better sometime in the future. Although Raspberry Pi is as small as the size of a credit card, it works like a normal computer at a relatively low price. A Raspberry Pi can easily control an LED, which is a simple actuator device that displays lighting. This book will provide you with the ability to control LEDs using Raspberry Pi. What this article covers? This article covers introduction of Raspberry Pi GPIO. In this, we will learn how to use different libraries to access Raspberry Pi GPIO. The step-by-step procedure to install it is also provided along with the Python command. Introducing Raspberry Pi GPIO General-purpose input/output (GPIO) is a generic pin on the Raspberry Pi, which can be used to interact with external devices, for instance, sensor and actuator devices. In general, you can see Raspberry Pi GPIO pinouts in the following figure: To access Raspberry Pi GPIO, we can use several GPIO libraries. If you are working with Python, Raspbian has already installed the RPi.GPIO library to access Raspberry Pi GPIO. You can read more about RPi.GPIO at https://pypi.python.org/pypi/RPi.GPIO. You can verify the RPi.GPIO library from a Python terminal by importing the RPi.GPIO module. If you don’t find this library on Python at runtime or get the error message ImportError: No module named RPi.GPIO, you can install it by compiling from the source code. For instance, if we want to install RPi.GPIO 0.5.11, type the following commands: wget https://pypi.python.org/packages/source/R/RPi.GPIO/RPi.GPIO-0.5.11.tar.gz tar -xvzf RPi.GPIO-0.5.11.tar.gz cd RPi.GPIO-0.5.11/ sudo python setup.py install To install and update through the apt command, your Raspberry Pi must be connected to the Internet. Another way to access Raspberry Pi GPIO is to use WiringPi. It is a library written in C for Raspberry Pi to access GPIO pins. You can read more about WiringPi at http://wiringpi.com/. To install WiringPi, you can type the following commands: sudo apt-get update sudo apt-get install git-core git clone git://git.drogon.net/wiringPi cd wiringPi sudo ./build Please make sure that your Pi network does not block the git protocol for git://git.dragon.net/wiringPi. You can browsed https://git.drogon.net/?p=wiringPi;a=summary for this code. The next step is to install the WiringPi interface for Python, so you can access Raspberry Pi GPIO from the Python program. Type the following commands: sudo apt-get install python-dev python-setuptools git clone https://github.com/Gadgetoid/WiringPi2-Python.git cd WiringPi2-Python sudo python setup.py install When finished, you can verify it by showing GPIO map from the Raspberry Pi board using the following gpio tool: gpio readall You should see the GPIO map from the Raspberry Pi board on the terminal. You can also see values in the wPi column, which will be used in the WirinPi program as GPIO value parameters. In this book, you can find more information about how to use it on the WiringPi library. What you need for this book? We are going to use Raspberry Pi 2 board Model B. To make Raspberry Pi work, we need OS that acts as a bridge between the hardware and the user. There are many OS options that you can use for Raspberry Pi. This book uses Raspbian for the OS platform for Raspberry Pi. To deploy Raspbian on Raspberry Pi 2 Model B, we need microSD card of at least 4 GB size. Who this book is written for? This book is for those who want to learn how to build Raspberry Pi projects using LEDs, 7-segment, 4-digit 7-segment, and dot matrix modules. You will also learn to implement those modules in real applications, including interfacing with wireless modules and the Android mobile app. However, you don't need to have any previous experience with the Raspberry Pi or Android platforms. Summary In this article, we learned different techniques to install Raspberry Pi GPIO. Read Raspberry Pi LED Blueprints to start designing and implementing several projects based on LEDs, such as 7-segments, 4-digit 7-segment, and dot matrix displays. Other related titles are: Raspberry Pi Blueprints Raspberry Pi Super Cluster Learning Raspberry Pi Raspberry Pi Robotic Projects Resources for Article: Further resources on this subject: Color and motion finding [article] Basic Image Processing [article] Develop a Digital Clock [article]

0
0
16847

How-To Tutorials

article-image-write-first-blockchain-learning-solidity-programming-15-minutes

Aaron Lazar

03 Jan 2018

15 min read

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Aaron Lazar

03 Jan 2018

15 min read

[box type="note" align="" class="" width=""]This post is a book extract from the title Mastering Blockchain, authored by Imran Bashir. The book begins with the technical foundations of blockchain, teaching you the fundamentals of cryptography and how it keeps data secure.[/box] Our article aims to quickly get you up to speed with Blockchain development using the Solidity Programming language. Introducing solidity Solidity is a domain-specific language of choice for programming contracts in Ethereum. There are, however, other languages, such as serpent, Mutan, and LLL but solidity is the most popular at the time of writing this. Its syntax is closer to JavaScript and C. Solidity has evolved into a mature language over the last few years and is quite easy to use, but it still has a long way to go before it can become advanced and feature-rich like other well established languages. Nevertheless, this is the most widely used language available for programming contracts currently. It is a statically typed language, which means that variable type checking in solidity is carried out at compile time. Each variable, either state or local, must be specified with a type at compile time. This is beneficial in the sense that any validation and checking is completed at compile time and certain types of bugs, such as interpretation of data types, can be caught earlier in the development cycle instead of at run time, which could be costly, especially in the case of the blockchain/smart contracts paradigm. Other features of the language include inheritance, libraries, and the ability to define composite data types. Solidity is also a called contract-oriented language. In solidity, contracts are equivalent to the concept of classes in other object-oriented programming languages. Types Solidity has two categories of data types: value types and reference types. Value types These are explained in detail here. Boolean This data type has two possible values, true or false, for example: bool v = true; This statement assigns the value true to v. Integers This data type represents integers. A table is shown here, which shows various keywords used to declare integer data types. For example, in this code, note that uint is an alias for uint256: uint256 x; uint y; int256 z; These types can also be declared with the constant keyword, which means that no storage slot will be reserved by the compiler for these variables. In this case, each occurrence will be replaced with the actual value: uint constant z=10+10; State variables are declared outside the body of a function, and they remain available throughout the contract depending on the accessibility assigned to them and as long as the contract persists. Address This data type holds a 160-bit long (20 byte) value. This type has several members that can be used to interact with and query the contracts. These members are described here: Balance The balance member returns the balance of the address in Wei. Send This member is used to send an amount of ether to an address (Ethereum's 160-bit address) and returns true or false depending on the result of the transaction, for example, the following: address to = 0x6414cc08d148dce9ebf5a2d0b7c220ed2d3203da; address from = this; if (to.balance < 10 && from.balance > 50) to.send(20); Call functions The call, callcode, and delegatecall are provided in order to interact with functions that do not have Application Binary Interface (ABI). These functions should be used with caution as they are not safe to use due to the impact on type safety and security of the contracts. Array value types (fixed size and dynamically sized byte arrays) Solidity has fixed size and dynamically sized byte arrays. Fixed size keywords range from bytes1 to bytes32, whereas dynamically sized keywords include bytes and strings. bytes are used for raw byte data and string is used for strings encoded in UTF-8. As these arrays are returned by the value, calling them will incur gas cost. length is a member of array value types and returns the length of the byte array. An example of a static (fixed size) array is as follows: bytes32[10] bankAccounts; An example of a dynamically sized array is as follows: bytes32[] trades; Get length of trades: trades.length; Literals These are used to represent a fixed value. Integer literals Integer literals are a sequence of decimal numbers in the range of 0-9. An example is shown as follows: uint8 x = 2; String literals String literals specify a set of characters written with double or single quotes. An example is shown as follows: 'packt' "packt” Hexadecimal literals Hexadecimal literals are prefixed with the keyword hex and specified within double or single quotation marks. An example is shown as follows: (hex'AABBCC'); Enums This allows the creation of user-defined types. An example is shown as follows: enum Order{Filled, Placed, Expired }; Order private ord; ord=Order.Filled; Explicit conversion to and from all integer types is allowed with enums. Function types There are two function types: internal and external functions. Internal functions These can be used only within the context of the current contract. External functions External functions can be called via external function calls. A function in solidity can be marked as a constant. Constant functions cannot change anything in the contract; they only return values when they are invoked and do not cost any gas. This is the practical implementation of the concept of call as discussed in the previous chapter. The syntax to declare a function is shown as follows: function <nameofthefunction> (<parameter types> <name of the variable>) {internal|external} [constant] [payable] [returns (<return types> <name of the variable>)] Reference types As the name suggests, these types are passed by reference and are discussed in the following section. Arrays Arrays represent a contiguous set of elements of the same size and type laid out at a memory location. The concept is the same as any other programming language. Arrays have two members named length and push: uint[] OrderIds; Structs These constructs can be used to group a set of dissimilar data types under a logical group. These can be used to define new types, as shown in the following example: Struct Trade { uint tradeid; uint quantity; uint price; string trader; } Data location Data location specifies where a particular complex data type will be stored. Depending on the default or annotation specified, the location can be storage or memory. This is applicable to arrays and structs and can be specified using the storage or memory keywords. As copying between memory and storage can be quite expensive, specifying a location can be helpful to control the gas expenditure at times. Calldata is another memory location that is used to store function arguments. Parameters of external functions use calldata memory. By default, parameters of functions are stored in memory, whereas all other local variables make use of storage. State variables, on the other hand, are required to use storage. Mappings Mappings are used for a key to value mapping. This is a way to associate a value with a key. All values in this map are already initialized with all zeroes, for example, the following: mapping (address => uint) offers; This example shows that offers is declared as a mapping. Another example makes this clearer: mapping (string => uint) bids; bids["packt"] = 10; This is basically a dictionary or a hash table where string values are mapped to integer values. The mapping named bids has a packt string value mapped to value 10. Global variables Solidity provides a number of global variables that are always available in the global namespace. These variables provide information about blocks and transactions. Additionally, cryptographic functions and address-related variables are available as well. A subset of available functions and variables is shown as follows: keccak256(...) returns (bytes32) This function is used to compute the keccak256 hash of the argument provided to the Function: ecrecover(bytes32 hash, uint8 v, bytes32 r, bytes32 s) returns (address) This function returns the associated address of the public key from the elliptic curve signature: block.number This returns the current block number. Control structures Control structures available in solidity are if - else, do, while, for, break, continue, return. They work in a manner similar to how they work in C-language or JavaScript. Events Events in solidity can be used to log certain events in EVM logs. These are quite useful when external interfaces are required to be notified of any change or event in the contract. These logs are stored on the blockchain in transaction logs. Logs cannot be accessed from the contracts but are used as a mechanism to notify change of state or the occurrence of an event (meeting a condition) in the contract. In a simple example here, the valueEvent event will return true if the x parameter passed to function Matcher is equal to or greater than 10: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } Inheritance Inheritance is supported in solidity. The is keyword is used to derive a contract from another contract. In the following example, valueChecker2 is derived from the valueChecker contract. The derived contract has access to all nonprivate members of the parent contract: contract valueChecker { uint8 price=10; event valueEvent(bool returnValue); function Matcher(uint8 x) returns (bool) { if (x>=price) { valueEvent(true); return true; } } } contract valueChecker2 is valueChecker { function Matcher2() returns (uint) { return price + 10; } } In the preceding example, if uint8 price = 10 is changed to uint8 private price = 10, then it will not be accessible by the valuechecker2 contract. This is because now the member is declared as private, it is not allowed to be accessed by any other contract. Libraries Libraries are deployed only once at a specific address and their code is called via CALLCODE/DELEGATECALL Opcode of the EVM. The key idea behind libraries is code reusability. They are similar to contracts and act as base contracts to the calling contracts. A library can be declared as shown in the following example: library Addition { function Add(uint x,uint y) returns (uint z) { return x + y; } } This library can then be called in the contract, as shown here. First, it needs to be imported and it can be used anywhere in the code. A simple example is shown as follows: Import "Addition.sol" function Addtwovalues() returns(uint) { return Addition.Add(100,100); } There are a few limitations with libraries; for example, they cannot have state variables and cannot inherit or be inherited. Moreover, they cannot receive Ether either; this is in contrast to contracts that can receive Ether. Functions Functions in solidity are modules of code that are associated with a contract. Functions are declared with a name, optional parameters, access modifier, optional constant keyword, and optional return type. This is shown in the following example: function orderMatcher(uint x) private constant returns(bool returnvalue) In the preceding example, function is the keyword used to declare the function. orderMatcher is the function name, uint x is an optional parameter, private is the access modifier/specifier that controls access to the function from external contracts, constant is an optional keyword used to specify that this function does not change anything in the contract but is used only to retrieve values from the contract instead, and returns (bool returnvalue) is the optional return type of the function. How to define a function: The syntax of defining a function is shown as follows: function <name of the function>(<parameters>) <visibility specifier> returns (<return data type> <name of the variable>) { <function body> } Function signature: Functions in solidity are identified by its signature, which is the first four bytes of the keccak-256 hash of its full signature string. This is also visible in browser solidity, as shown in the following screenshot. D99c89cb is the first four bytes of 32 byte keccak-256 hash of the function named Matcher. In this example function, Matcher has the signature hash of d99c89cb. This information is useful in order to build interfaces. Input parameters of a function: Input parameters of a function are declared in the form of <data type> <parameter name>. This example clarifies the concept where uint x and uint y are input parameters of the checkValues function: contract myContract { function checkValues(uint x, uint y) { } } Output parameters of a function: Output parameters of a function are declared in the form of <data type> <parameter name>. This example shows a simple function returning a uint value: contract myContract { Function getValue() returns (uint z) { z=x+y; } } A function can return multiple values. In the preceding example function, getValue only returns one value, but a function can return up to 14 values of different data types. The names of the unused return parameters can be omitted optionally. Internal function calls: Functions within the context of the current contract can be called internally in a direct manner. These calls are made to call the functions that exist within the same contract. These calls result in simple JUMP calls at the EVM byte code level. External function calls: External function calls are made via message calls from a contract to another contract. In this case, all function parameters are copied to the memory. If a call to an internal function is made using the this keyword, it is also considered an external call. The this variable is a pointer that refers to the current contract. It is explicitly convertible to an address and all members for a contract are inherited from the address. Fall back functions: This is an unnamed function in a contract with no arguments and return data. This function executes every time ether is received. It is required to be implemented within a contract if the contract is intended to receive ether; otherwise, an exception will be thrown and ether will be returned. This function also executes if no other function signatures match in the contract. If the contract is expected to receive ether, then the fall back function should be declared with the payable modifier. The payable is required; otherwise, this function will not be able to receive any ether. This function can be called using the address.call() method as, for example, in the following: function () { throw; } In this case, if the fallback function is called according to the conditions described earlier; it will call throw, which will roll back the state to what it was before making the call. It can also be some other construct than throw; for example, it can log an event that can be used as an alert to feed back the outcome of the call to the calling application. Modifier functions: These functions are used to change the behavior of a function and can be called before other functions. Usually, they are used to check some conditions or verification before executing the function. _(underscore) is used in the modifier functions that will be replaced with the actual body of the function when the modifier is called. Basically, it symbolizes the function that needs to be guarded. This concept is similar to guard functions in other languages. Constructor function: This is an optional function that has the same name as the contract and is executed once a contract is created. Constructor functions cannot be called later on by users, and there is only one constructor allowed in a contract. This implies that no overloading functionality is available. Function visibility specifiers (access modifiers): Functions can be defined with four access specifiers as follows: External: These functions are accessible from other contracts and transactions. They cannot be called internally unless the this keyword is used. Public: By default, functions are public. They can be called either internally or using messages. Internal: Internal functions are visible to other derived contracts from the parent contract. Private: Private functions are only visible to the same contract they are declared in. Other important keywords/functions throw: throw is used to stop execution. As a result, all state changes are reverted. In this case, no gas is returned to the transaction originator because all the remaining gas is consumed. Layout of a solidity source code file Version pragma In order to address compatibility issues that may arise from future versions of the solidity compiler version, pragma can be used to specify the version of the compatible compiler as, for example, in the following: pragma solidity ^0.5.0 This will ensure that the source file does not compile with versions smaller than 0.5.0 and versions starting from 0.6.0. Import Import in solidity allows the importing of symbols from the existing solidity files into the current global scope. This is similar to import statements available in JavaScript, as for example, in the following: Import "module-name"; Comments Comments can be added in the solidity source code file in a manner similar to C-language. Multiple line comments are enclosed in /* and */, whereas single line comments start with //. An example solidity program is as follows, showing the use of pragma, import, and comments: To summarize, we went through a brief introduction to the solidity language. Detailed documentation and coding guidelines are available online. If you found this article useful, and would like to learn more about building blockchains, go ahead and grab the book Mastering Blockchain, authored by Imran Bashir.

0
0
16842

Packt

30 Dec 2014

30 min read

Open Source Intelligence

Packt

30 Dec 2014

30 min read

0
0
16836

How-To Tutorials

Building Surveys using Xcode

Working with pandas DataFrames

The R Statistical Package Interfacing with Python

Tuning Solr JVM and Container

Bias-Variance tradeoff: How to choose between bias and variance for your machine learning model [Tutorial]

Using the Firebase Real-Time Database

How to integrate Sequence Generator transformation in Informatica PowerCenter 10.x

Building Recommendation System with Scala and Apache Spark [Tutorial]

Data bindings with Knockout.js

Installing and Configuring Network Monitoring Software

Trending Topics

Facebook fails to block ECJ data security case from proceeding

25 Startups using machine learning differently in 2018: From farming to brewing beer to elder care

Raspberry Pi LED Blueprints

Write your first Blockchain: Learning Solidity Programming in 15 minutes

Open Source Intelligence

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access