Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-deepfakes-house-committee-hearing-risks-vulnerabilities-and-recommendations
Vincy Davis
21 Jun 2019
16 min read
Save for later

Deepfakes House Committee Hearing: Risks, Vulnerabilities and Recommendations

Vincy Davis
21 Jun 2019
16 min read
Last week, the House Intelligence Committee held a hearing to examine the public risks posed by “deepfake” videos. Deepfake is identified as a technology that alters audio or video and then is passed off as true or original content. In this hearing, experts on AI and digital policy highlighted to the committee, deepfakes risk to national security, upcoming elections, public trust and the mission of journalism. They also offered potential recommendations on what Congress could do to combat deepfakes and misinformation. The chair of the committee Adam B. Schiff, initiated the hearing by stating that it is time to regulate the technology of deepfake videos as it is enabling sinister forms of deception and disinformation by malicious actors. He adds that “Advances in AI or machine learning have led to the emergence of advance digitally doctored type of media, the so-called deepfakes that enable malicious actors to foment chaos, division or crisis and have the capacity to disrupt entire campaigns including that for the Presidency.” For a quick glance, here’s a TL;DR: Jack Clerk believes that governments should be in the business of measuring and assessing deepfake threats by looking directly at the scientific literature and developing a base knowledge of it. David Doermann suggests that tools and processes which can identify fake content should be made available in the hands of individuals, rather than relying completely on the government or on social media platforms to police content. Danielle Citron warns that the phenomenon of deepfake is going to be increasingly felt by women and minorities and for people from marginalized communities. Clint Watts provides a list of recommendations which should be implemented to prohibit U.S. officials, elected representatives and agencies from creating and distributing false and manipulated content. A unified standard should be followed by all social media platforms. Also they should be pressurized to have a 10-15 seconds delay in all videos, so that they can decide, to label a particular video or not. Regarding 2020 Presidential election: State governments and social media companies should be ready with a response plan, if a fake video surfaces to cause disrupt. It was also recommended that the algorithms to make deepfakes should be open sourced. Laws should be altered, and strict actions should be awarded, to discourage deepfake videos. Being forewarned is forearmed in case of deepfake technology Jack Clerk, OpenAI Policy Director, highlighted in his testimony that he does not think A.I. is the cause of any disruption, but actually is an “accelerant to an issue which has been with us for some time.'' He adds that computer software aligned with A.I. technology has become significantly cheaper and more powerful, due to its increased accessibility. This has led to its usage in audio or video editing, which was previously very difficult. Similar technologies  are being used for production of synthetic media. Also deepfakes are being used in valuable scientific research. Clerk suggests that interventions should be made to avoid its misuse. He believes that “it may be possible for large-scale technology platforms to try and develop and share tools for the detection of malicious synthetic media at both the individual account level and the platform level. We can also increase funding.” He strongly believes that governments should be in the business of measuring and assessing these threats by looking directly at the scientific literature and developing a base knowledge. Clerk concludes saying that “being forewarned is forearmed here.” Make Deepfake detector tools readily availaible David Doermann, the former Project Manager at the Defense Advanced Research Projects Agency mentions that the phrase ‘seeing is believing’ is no longer true. He states that there is nothing fundamentally wrong or evil about the technology, like basic image and video desktop editors, deepfakes is only a tool. There are a lot of positive applications of generative networks just as there are negative ones. He adds that, as of today, there are some solutions that can identify deepfakes reliably. However, Doermann fears that it’s only a matter of time before the current detection capabilities will be rendered less effective. He adds that “it's likely to get much worse before it gets much better.” Doermann suggests that tools and processes which can identify such fake content should be made available in the hands of individuals, rather than relying completely on the government or on social media platforms to police content. At the same time, there should also be ways to verify it or prove it or easily report it. He also hopes that automated detection tools will be developed, in the future, which will help in filtering and detection at the front end of the distribution pipeline. He also adds that “appropriate warning labels should be provided, which suggests that this is not real or not authentic, or not what it's purported to be. This would be independent of whether this is done and the decisions are made, by humans, machines or a combination.” Groups most vulnerable to Deepfake attacks Women and minorities Danielle Citron, a Law Professor at the University of Maryland, describes Deepfake as “particularly troubling when they're provocative and destructive.” She adds that, we as humans, tend to believe what our eyes and ears are telling us and also tend to share information that confirms our biases. It’s particularly true when that information is novel and negative, so the more salacious, we're more willing to pass it on. She also specifies that the deepfakes on social media networks are ad-driven. When all of this is put together, it turns out that the more provocative the deepfake is, the salacious will be the spread virally.  She also informed the panel committee about an incident, involving an investigative journalist in India, who had her posters circulated over the internet and deepfake sex videos, with her face morphed into pornography, over a provocative article. Citron thus states that “the economic and the social and psychological harm is profound”. Also based on her work in cyber stalking, she believes that this phenomenon is going to be increasingly felt by women and minorities and for people from marginalized communities. She also shared other examples explaining the effect of deepfake on trades and businesses. Citron also highlighted that “We need a combination of law, markets and really societal resilience to get through this, but the law has a modest role to play.” She also mentioned that though there are laws to sue for defamation, intentional infliction of emotional distress, privacy torture, these procedures are quite expensive. She adds that criminal law offers very less opportunity for the public to push criminals to the next level. National security Clint Watts, a Senior Fellow at the Foreign Policy Research Institute provided insight into how such technologies can affect national security. He says that “A.I. provides purveyors of disinformation to identify psychological vulnerabilities and to create modified content digital forgeries advancing false narratives against Americans and American interests.” Watts suspects that Russia, “being an enduring purveyor of disinformation is and will continue to pursue the acquisition of synthetic media capability, and employ the output against adversaries around the world.” He also adds that China, being the U.S. rival, will join Russia “to get vast amounts of information stolen from the U.S. The country has already shown a propensity to employ synthetic media in broadcast journalism. They'll likely use it as part of disinformation campaigns to discredit foreign detractors, incite fear inside western-style democracy and then, distort the reality of audiences and the audiences of America's allies.” He also mentions that deepfake proliferation can present a danger to American constituency by demoralizing it. Watts suspects that the U.S. diplomats and military personnel deployed overseas, will be prime target for deepfake driven disinformation planted by adversaries. Watts provided a list of recommendations which should be implemented to “prohibit U.S. officials, elected representatives and agencies from creating and distributing false and manipulated content.” The U.S. government must be the sole purveyor of facts and truth to constituents, assuring the effective administration of democracy via productive policy debate from a shared basis of reality. Policy makers should work jointly with social media companies to develop standards for content and accountability. The U.S. government should partner with private sectors to implement digital verification designating a date, time and physical origination of the content. Social media companies should start labeling videos, and forward the same across all platforms. Consumers should be able to determine the source of the information and whether it's the authentic depiction of people and events. The U.S. government from a national security perspective, should maintain intelligence on capabilities of adversaries to conduct such information. The departments of defense and state should immediately develop response plans, for deepfake smear campaigns and mobilizations overseas, in an attempt to mitigate harm. Lastly he also added that public awareness of deepfakes and signatures, will assist in tamping down attempts to subvert the  U.S. democracy and incite violence. Schiff asked the witnesses, if it's “time to do away with the immunity that social media platforms enjoy”, Watts replied in the affirmative and listed suggestions in three particular areas. If social media platforms see something spiking in terms of virality, it should be put in a queue for human review, linked to fact checkers, then down rate it and don't let it into news feeds. Also make the mainstream understand what is manipulated content. Anything related to outbreaks of violence and public safety should be regulated immediately. Anything related to elected officials or public institutions, should immediately be flagged and pulled down and checked and then a context should be given to it. Co-chair of the committee, Devin Nunes asked Citron what kind of filters can be placed on these tech companies, as “it's not developed by partisan left wing like it is now, where most of the time, it's conservatives who get banned and not democrats”. Citron suggested that proactive filtering won’t be possible and hence companies should react responsibly and should be bipartisan. She added that “but rather, is this a misrepresentation in a defamatory way, right, that we would say it's a falsehood that is harmful to reputation. that's an impersonation, then we should take it down. This is the default I am imagining.” How laws could be altered according to the changing times, to discourage deepfake videos Citron says that laws could be altered, like in the case of Section 230 C. It states that “No speaker or publisher -- or no online service shall be treated as a speaker or publisher of someone else's content.” This law can be altered to “No online service that engages in reasonable content moderation practices shall be treated as a speaker or publisher of somebody else's content.” Citron believes that avoiding reasonability could lead to negligence of law. She also adds that “I've been advising Twitter and Facebook all of the time. There is meaningful reasonable practices that are emerging and have emerged in the last ten years. We already have a guide, it's not as if this is a new issue in 2019. So we can come up with reasonable practices.” Also Watts added that if any adversary from big countries like China, Iran, Russia makes a deepfake video to push the US downwards, we can trace them back if we have aggressive laws at our hand. He says it could be anything from an “arrest and extradition, if the sanction permits, response should be individually, or in terms of cyber response”, could help us to discourage deepfake. How to slow down the spread of videos One of the reasons that these types of manipulated images gain traction is because it's almost instantaneous - they can be shared around the world, shared across platforms in a few seconds. Doermann says that these social media platforms must be pressurized to have a 10-15 seconds delay, so that it can be decided whether to label a particular video or not. He adds that “We've done it for child pornography, we've done it for human trafficking, they're serious about those things. This is another area that's a little bit more in the middle, but I think they can take the same effort in these areas to do that type of triage.” This delay will allow third parties or fact checkers to decide on the authenticity of videos and label them. Citron adds that this is where labelling a particular video can help, “I think it is incredibly important and there are times in which, that's the perfect rather than second best, and we should err on the side of inclusion and label it as synthetic.” The representative of Ohio, Brad Wenstrup added that we can have internal extradition laws, which can punish somebody when “something comes from some other country, maybe even a friendly country, that defames and hurts someone here”. There should be an agreement among nations that “we'll extradite those people and they can be punished in your country for what they did to one of your citizens.” Terri Sewell, the Representative of Alabama further probed about the current scenario of detecting fake videos, to which Doermann replied that currently we have enough solutions to detect a fake video, however with a constant delay of 15-20 minutes. Deepfakes and 2020 Presidential elections Watts says that he’s concerned about deepfakes acting on the eve of election day 2020. Foreign adversaries may use a standard disinformation approach by “using an organic content that suits their narrative and inject it back.” This can escalate as more people are making deepfakes each year. He also added that “Right now I would be very worried about someone making a fake video about electoral systems being out or broken down on election day 2020.” So state governments and social media companies should be ready with a response plan in the wake of such an event. Sewell then asked the witnesses for suggestions on campaigns to political parties/candidates so that they are prepared for the possibility of deepfake content. Watts replied that the most important thing to counter fake content would be a unified standard, that all the social media industries should follow. He added that “if you're a manipulator, domestic or international, and you're making deep fakes, you're going to go to whatever platform allows you to post anything from inauthentic accounts. they go to wherever the weak point is and it spreads throughout the system.” He believes that this system would help counter extremism, disinformation and political smear campaigns. Watts added any sort of lag in responding to such videos should be avoided as “any sort of lag in terms of response allows that conspiracy to grow.” Citron also pointed out that firstly all candidates should have a clear policy about deep fakes and should commit that they won’t use them or spread them. Should the algorithms to make deepfakes be open sourced? Doermann answered that the algorithms of deepfakes have to be absolutely open sourced. He says that though this might help adversaries, but they are anyway going to learn about it. He believes this is significant as, “We need to get this type of stuff out there. We need to get it into the hands of users. There are companies out there that are starting to make these types of things.” He also states that people should be able to use this technology. The more we educate them, more the tools they learn, more the correct choices people can make. On Mark Zuckerberg’s deepfake video On being asked to comment on the decision of Mark Zuckerberg to not take down his deepfake video from his own platform, Facebook, Citron replied that Mark gave a perfect example of “satire and parody”, by not taking down the video. She added that private companies can make these kinds of choices, as they have an incredible amount of power, without any liability, “it seemed to be a conversation about the choices they make and what does that mean for society. So it was incredibly productive, I think.” Watts also opined that he likes Facebook for its consistency in terms of enforcement and that they are always trying to learn better things and implement it. He adds that he really like Facebook as its always ready to hear “from legislatures about what falls inside those parameters. The one thing that I really like is that they're doing is identifying inauthentic account creation and inauthentic content generation, they are enforcing it, they have increased the scale,and it is very very good in terms of how they have scaled it up, it’s not perfect, but it is better.”   Read More: Zuckberg just became the target of the world’s first high profile white hat deepfake op. Can Facebook come out unscathed? On the Nancy Pelosi doctored video Schiff asked the witnesses if there is any account on the number of millions of people who have watched the doctored video of Nancy Pelosi, and an account of how many of them ultimately got to know that it was not a real video. He said he’s asking this as according to psychologists, people never really forget their once constructed negative impression. Clarke replied that “Fact checks and clarifications tend not to travel nearly as far as the initial news.” He added that its becomes a very general thing as “If you care, you care about clarifications and fact checks. but if you're just enjoying media, you're enjoying media. You enjoy the experience of the media and the absolute minority doesn’t care whether it's true.” Schiff also recalled how in 2016, “some foreign actresses, particularly Russia had mimicked black lives matter to push out continent to racially divide people.” Such videos gave the impression of police violence, on people of colour. They “certainly push out videos that are enormously jarring and disruptive.” All the information revealed in the hearing was described as “scary and worrying”, by one of the representatives. The hearing was ended by Schiff, the chair of the committee, after thanking all the witnesses for their testimonies and recommendations. For more details, head over to the full Hearing on deepfake videos by the House Intelligence Committee. Worried about Deepfakes? Check out the new algorithm that manipulate talking-head videos by altering the transcripts Lawmakers introduce new Consumer privacy bill and Malicious Deep Fake Prohibition Act to support consumer privacy and battle deepfakes Machine generated videos like Deepfakes – Trick or Treat?
Read more
  • 0
  • 0
  • 22425

article-image-julia-computing-research-team-runs-machine-learning-model-on-encrypted-data-without-decrypting-it
Fatema Patrawala
28 Nov 2019
5 min read
Save for later

Julia Computing research team runs machine learning model on encrypted data without decrypting it

Fatema Patrawala
28 Nov 2019
5 min read
Last week, the team at Julia Computing published a research based on cutting edge cryptographic techniques. The research involved cryptography techniques to practically perform computation on data without ever decrypting it. For example, the user would send encrypted data (e.g. images) to the cloud API, which would run the machine learning model and then return the encrypted answer. Nowhere is the user data decrypted and in particular the cloud provider does not have access to either the original image nor is it able to decrypt the prediction it computed. The team made this possible by building a machine learning service for handwriting recognition of encrypted images (from the MNIST dataset). The ability to compute on encrypted data is generally referred to as “secure computation” and is a fairly large area of research, with many different cryptographic approaches and techniques for a plethora of different application scenarios. For their research, Julia team focused on using a technique known as “homomorphic encryption”. What is homomorphic encryption Homomorphic encryption is a form of encryption that allows computation on ciphertexts, generating an encrypted result which, when decrypted, matches the result of the operations as if they had been performed on the plaintext. This technique can be used for privacy-preserving outsourced storage and computation. It allows data to be encrypted and out-sourced to commercial cloud environments for processing, all while encrypted. In highly regulated industries, such as health care, homomorphic encryption can be used to enable new services by removing privacy barriers inhibiting data sharing. In this research, the Julia Computing team used a homomorphic encryption system which involves the following operations: pub_key, eval_key, priv_key = keygen() encrypted = encrypt(pub_key, plaintext) decrypted = decrypt(priv_key, encrypted) encrypted′ = eval(eval_key, f, encrypted) So the first three are fairly straightforward and are familiar to anyone who has used asymmetric cryptography before. The last one is important as it evaluates some function f on the encryption and returns another encrypted value corresponding to the result of evaluating f on the encrypted value. It is this property that gives homomorphic computation its name. Further the Julia Computing team talks about CKKS (Cheon-Kim-Kim-Song), a homomorphic encryption scheme that allowed homomorphic evaluation on the following primitive operations: Element-wise addition of length n vectors of complex numbers Element-wise multiplication of length n complex vectors Rotation (in the circshift sense) of elements in the vector Complex conjugation of vector elements But they also mentioned that computations using CKKS were noisy, and hence they tested to perform these operations in Julia. Which convolutional neural network did the Julia Computing team use As a starting point the Julia Computing team used the convolutional neural network example given in the Flux model zoo. They kept training the loop, prepared the data and tweaked the ML model slightly. It is essentially the same model as the one used in the paper “Secure Outsourced Matrix Computation and Application to Neural Networks”, which uses the same (CKKS) cryptographic scheme. This paper also encrypts the model, which the Julia team neglected for simplicity and they involved bias vectors after every layer (which Flux does by default). This resulted in a higher test set accuracy of the model used by Julia team which was (98.6% vs 98.1%). An unusual feature in this model are the x.^2 activation functions. More common choices here would have been tanh or relu or something more advanced. While those functions (relu in particular) are cheap to evaluate on plaintext values, they would however, be quite expensive to evaluate on encrypted values. Also, the team would have ended up evaluating a polynomial approximation had they adopted these common choices. Fortunately  x.^2 worked fine for their purpose. How was the homomorphic operation carried out The team performed homomorphic operation on Convolutions and Matrix Multiply assuming a batch size of 64. They precomputed each convolution window of 7x7 extraction from the original images which gave them 64 7x7 matrices per input image. Then they collected the same position in each window into one vector and got a 64-element vector for each image, (i.e. a total of 49 64x64 matrices), and encrypted these matrices. In this way the convolution became a scalar multiplication of the whole matrix with the appropriate mask element, and by summing all 49 elements later, the team got the result of the convolution. Then the team moved to Matrix Multiply by rotating elements in the vector to effect a re-ordering of the multiplication indices. They considered a row-major ordering of matrix elements in the vector. Then shifted the vector by a multiple of the row-size, and got the effect of rotating the columns, which is a sufficient primitive for implementing matrix multiply. The team was able to get everything together and it worked. You can take a look at the official blog post to know the step by step implementation process with codes. Further they also executed the whole encryption process in Julia as it allows powerful abstractions and they could encapsulate the whole convolution extraction process as a custom array type. The Julia Computing team states, “Achieving the dream of automatically executing arbitrary computations securely is a tall order for any system, but Julia’s metaprogramming capabilities and friendly syntax make it well suited as a development platform.” Julia co-creator, Jeff Bezanson, on what’s wrong with Julialang and how to tackle issues like modularity and extension Julia v1.3 released with new multithreading features, and much more! The Julia team shares its finalized release process with the community Julia announces the preview of multi-threaded task parallelism in alpha release v1.3.0 How to make machine learning based recommendations using Julia [Tutorial]
Read more
  • 0
  • 0
  • 22366

article-image-running-parallel-data-operations-using-java-streams
Pravin Dhandre
15 Jan 2018
8 min read
Save for later

Running Parallel Data Operations using Java Streams

Pravin Dhandre
15 Jan 2018
8 min read
[box type="note" align="" class="" width=""]Our article is an excerpt from a book co-authored by Richard M. Reese and Jennifer L. Reese, titled Java for Data Science. This book provides in-depth understanding of important tools and techniques used across data science projects in a Java environment.[/box] This article will give you an advantage of using Java 8 for solving complex and math-intensive problems on larger datasets using Java streams and lambda expressions. You will explore short demonstrations for performing matrix multiplication and map-reduce using Java 8. The release of Java 8 came with a number of important enhancements to the language. The two enhancements of interest to us include lambda expressions and streams. A lambda expression is essentially an anonymous function that adds a functional programming dimension to Java. The concept of streams, as introduced in Java 8, does not refer to IO streams. Instead, you can think of it as a sequence of objects that can be generated and manipulated using a fluent style of programming. This style will be demonstrated shortly. As with most APIs, programmers must be careful to consider the actual execution performance of their code using realistic test cases and environments. If not used properly, streams may not actually provide performance improvements. In particular, parallel streams, if not crafted carefully, can produce incorrect results. We will start with a quick introduction to lambda expressions and streams. If you are familiar with these concepts you may want to skip over the next section. Understanding Java 8 lambda expressions and streams A lambda expression can be expressed in several different forms. The following illustrates a simple lambda expression where the symbol, ->, is the lambda operator. This will take some value, e, and return the value multiplied by two. There is nothing special about the name e. Any valid Java variable name can be used: e -> 2 * e It can also be expressed in other forms, such as the following: (int e) -> 2 * e (double e) -> 2 * e (int e) -> {return 2 * e; The form used depends on the intended value of e. Lambda expressions are frequently used as arguments to a method, as we will see shortly. A stream can be created using a number of techniques. In the following example, a stream is created from an array. The IntStream interface is a type of stream that uses integers. The Arrays class' stream method converts an array into a stream: IntStream stream = Arrays.stream(numbers); We can then apply various stream methods to perform an operation. In the following statement, the forEach method will simply display each integer in the stream: stream.forEach(e -> out.printf("%d ", e)); There are a variety of stream methods that can be applied to a stream. In the following example, the mapToDouble method will take an integer, multiply it by 2, and then return it as a double. The forEach method will then display these values: stream .mapToDouble(e-> 2 * e) .forEach(e -> out.printf("%.4f ", e)); The cascading of method invocations is referred to as fluent programing. Using Java 8 to perform matrix multiplication Here, we will illustrate how streams can be used to perform matrix multiplication. The definitions of the A, B, and C matrices are the same as declared in the Implementing basic matrix operations section. They are duplicated here for your convenience: double A[][] = { {0.1950, 0.0311}, {0.3588, 0.2203}, {0.1716, 0.5931}, {0.2105, 0.3242}}; double B[][] = { {0.0502, 0.9823, 0.9472}, {0.5732, 0.2694, 0.916}}; double C[][] = new double[n][p]; The following sequence is a stream implementation of matrix multiplication. A detailed explanation of the code follows: C = Arrays.stream(A) .parallel() .map(AMatrixRow -> IntStream.range(0, B[0].length) .mapToDouble(i -> IntStream.range(0, B.length) .mapToDouble(j -> AMatrixRow[j] * B[j][i]) .sum() ).toArray()).toArray(double[][]::new); The first map method, shown as follows, creates a stream of double vectors representing the 4 rows of the A matrix. The range method will return a list of stream elements ranging from its first argument to the second argument. .map(AMatrixRow -> IntStream.range(0, B[0].length) The variable i corresponds to the numbers generated by the second range method, which corresponds to the number of rows in the B matrix (2). The variable j corresponds to the numbers generated by the third range method, representing the number of columns of the B matrix (3). At the heart of the statement is the matrix multiplication, where the sum method calculates the sum: .mapToDouble(j -> AMatrixRow[j] * B[j][i]) .sum() The last part of the expression creates the two-dimensional array for the C matrix. The operator, ::new, is called a method reference and is a shorter way of invoking the new operator to create a new object: ).toArray()).toArray(double[][]::new); The displayResult method is as follows: public void displayResult() { out.println("Result"); for (int i = 0; i < n; i++) { for (int j = 0; j < p; j++) { out.printf("%.4f ", C[i][j]); } out.println(); } } The output of this sequence follows: Result 0.0276 0.1999 0.2132 0.1443 0.4118 0.5417 0.3486 0.3283 0.7058 0.1964 0.2941 0.4964 Using Java 8 to perform map-reduce In this section, we will use Java 8 streams to perform a map-reduce operation. In this example, we will use a Stream of Book objects. We will then demonstrate how to use the Java 8 reduce and average methods to get our total page count and average page count. Rather than begin with a text file, as we did in the Hadoop example, we have created a Book class with title, author, and page-count fields. In the main method of the driver class, we have created new instances of Book and added them to an ArrayList called books. We have also created a double value average to hold our average, and initialized our variable totalPg to zero: ArrayList<Book> books = new ArrayList<>(); double average; int totalPg = 0; books.add(new Book("Moby Dick", "Herman Melville", 822)); books.add(new Book("Charlotte's Web", "E.B. White", 189)); books.add(new Book("The Grapes of Wrath", "John Steinbeck", 212)); books.add(new Book("Jane Eyre", "Charlotte Bronte", 299)); books.add(new Book("A Tale of Two Cities", "Charles Dickens", 673)); books.add(new Book("War and Peace", "Leo Tolstoy", 1032)); books.add(new Book("The Great Gatsby", "F. Scott Fitzgerald", 275)); Next, we perform a map and reduce operation to calculate the total number of pages in our set of books. To accomplish this in a parallel manner, we use the stream and parallel methods. We then use the map method with a lambda expression to accumulate all of the page counts from each Book object. Finally, we use the reduce method to merge our page counts into one final value, which is to be assigned to totalPg: totalPg = books .stream() .parallel() .map((b) -> b.pgCnt) .reduce(totalPg, (accumulator, _item) -> { out.println(accumulator + " " +_item); return accumulator + _item; }); Notice in the preceding reduce method we have chosen to print out information about the reduction operation's cumulative value and individual items. The accumulator represents the aggregation of our page counts. The _item represents the individual task within the map-reduce process undergoing reduction at any given moment. In the output that follows, we will first see the accumulator value stay at zero as each individual book item is processed. Gradually, the accumulator value increases. The final operation is the reduction of the values 1223 and 2279. The sum of these two numbers is 3502, or the total page count for all of our books: 0 822 0 189 0 299 0 673 0 212 299 673 0 1032 0 275 1032 275 972 1307 189 212 822 401 1223 2279 Next, we will add code to calculate the average page count of our set of books. We multiply our totalPg value, determined using map-reduce, by 1.0 to prevent truncation when we divide by the integer returned by the size method. We then print out average. average = 1.0 * totalPg / books.size(); out.printf("Average Page Count: %.4fn", average); Our output is as follows: Average Page Count: 500.2857 We could have used Java 8 streams to calculate the average directly using the map method. Add the following code to the main method. We use parallelStream with our map method to simultaneously get the page count for each of our books. We then use mapToDouble to ensure our data is of the correct type to calculate our average. Finally, we use the average and getAsDouble methods to calculate our average page count: average = books .parallelStream() .map(b -> b.pgCnt) .mapToDouble(s -> s) .average() .getAsDouble(); out.printf("Average Page Count: %.4fn", average); Then we print out our average. Our output, identical to our previous example, is as follows: Average Page Count: 500.2857 The above techniques leveraged Java 8 capabilities on the map-reduce framework to solve numeric problems. This type of process can also be applied to other types of data, including text-based data. The true benefit is seen when these processes handle extremely large datasets within a significant reduction in time frame. To know various other mathematical and parallel techniques in Java for building a complete data analysis application, you may read through the book Java for Data Science to get a better integrated approach.
Read more
  • 0
  • 0
  • 22238

article-image-session-4-fair-classification
Sugandha Lahoti
23 Feb 2018
7 min read
Save for later

FAT Conference 2018 Session 4: Fair Classification

Sugandha Lahoti
23 Feb 2018
7 min read
As algorithms are increasingly used to make decisions of social consequence, the social values encoded in these decision-making procedures are the subject of increasing study, with fairness being a chief concern. The Conference on Fairness, Accountability, and Transparency (FAT) scheduled on Feb 23 and 24 this year in New York is an annual conference dedicated to bringing theory and practice of fair and interpretable Machine Learning, Information Retrieval, NLP, Computer Vision, Recommender systems, and other technical disciplines. This year's program includes 17 peer-reviewed papers and 6 tutorials from leading experts in the field. The conference will have three sessions. Session 4 of the two-day conference on Saturday, February 24, is in the field of fair classification. In this article, we give our readers a peek into the four papers that have been selected for presentation in Session 4. You can also check out Session 1,  Session 2, and Session 3 summaries in case you’ve missed them. The cost of fairness in binary classification What is the paper about? This paper provides a simple approach to the Fairness-aware problem which involves suitably thresholding class-probability estimates. It has been awarded Best paper in Technical contribution category. The authors have studied the inherent tradeoffs in learning classifiers with a fairness constraint in the form of two questions: What is the best accuracy we can expect for a given level of fairness? What is the nature of these optimal fairness aware classifiers? The authors showed that for cost-sensitive approximate fairness measures, the optimal classifier is an instance-dependent thresholding of the class probability function. They have quantified the degradation in performance by a measure of alignment of the target and sensitive variable. This analysis is then used to derive a simple plugin approach for the fairness problem. Key takeaways For Fairness-aware learning, the authors have designed an algorithm targeting a particular measure of fairness. They have reduced two popular fairness measures (disparate impact and mean difference) to cost-sensitive risks. They show that for cost-sensitive fairness measures, the optimal Fairness-aware classifier is an instance-dependent thresholding of the class-probability function. They quantify the intrinsic, method independent impact of the fairness requirement on accuracy via a notion of alignment between the target and sensitive feature. The ability to theoretically compute the tradeoffs between fairness and utility is perhaps the most interesting aspect of their technical results. They have stressed that the tradeoff is intrinsic to the underlying data. That is, any fairness or unfairness, is a property of the data, not of any particular technique. They have theoretically computed what price one has to pay (in utility) in order to achieve a desired degree of fairness: in other words, they have computed the cost of fairness. Decoupled Classifiers for Group-Fair and Efficient Machine Learning What is the paper about? This paper considers how to use a sensitive attribute such as gender or race to maximize fairness and accuracy, assuming that it is legal and ethical. Simple linear classifiers may use the raw data, upweight/oversample data from minority groups, or employ advanced approaches to fitting linear classifiers that aim to be accurate and fair. However, an inherent tradeoff between accuracy on one group and accuracy on another still prevails. This paper defines and explores decoupled classification systems, in which a separate classifier is trained on each group. The authors present experiments on 47 datasets. The experiments are “semi-synthetic” in the sense that the first binary feature was used as a substitute sensitive feature. The authors found that on many data sets the decoupling algorithm improves performance while less often decreasing performance. Key takeaways The paper describes a simple technical approach for a practitioner using ML to incorporate sensitive attributes. This approach avoids unnecessary accuracy tradeoffs between groups and can accommodate an application-specific objective, generalizing the standard ML notion of loss. For a certain family of “weakly monotonic” fairness objectives, the authors provide a black-box reduction that can use any off-the-shelf classifier to efficiently optimize the objective. This work requires the application designer to pin down a specific loss function that trades off accuracy for fairness. Experiments demonstrate that decoupling can reduce the loss on some datasets for some potentially sensitive features A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions What is the paper about? The work is based on the use of predictive analytics in the area of child welfare. It won the best paper award in the Technical and Interdisciplinary Contribution. The authors have worked on developing, validating, fairness auditing, and deploying a risk prediction model in Allegheny County, PA, USA. The authors have described competing models that are being developed in the Allegheny County as part of an ongoing redesign process in comparison to the previous models. Next, they investigate the predictive bias properties of the current tool and a Random forest model that has emerged as one of the best performing competing models. Their predictive bias assessment is motivated both by considerations of human bias and recent work on fairness criteria. They then discuss some of the challenges in incorporating algorithms into human decision-making processes and reflect on the predictive bias analysis in the context of how the model is actually being used. They also propose an “oracle test” as a tool for clarifying whether particular concerns pertain to the statistical properties of a model or if these concerns are targeted at other potential deficiencies. Key takeaways The goal in Allegheny County is to improve both the accuracy and equity of screening decisions by taking a Fairness-aware approach to incorporating prediction models into the decision-making pipeline. The paper reports on the lessons learned so far by the authors, their approaches to predictive bias assessment, and several outstanding challenges in the child maltreatment hotline context. This report contributes to the ongoing conversation concerning the use of algorithms in supporting critical decisions in government—and the importance of considering fairness and discrimination in data-driven decision making. The paper discussion and general analytic approach are also broadly applicable to other domains where predictive risk modeling may be used. Fairness in Machine Learning: Lessons from Political Philosophy What is the paper about? Plenty of moral and political philosophers have expended significant efforts in formalizing and defending the central concepts of discrimination, egalitarianism, and justice. Thus it is unsurprising to know that the attempts to formalize ‘fairness’ in machine learning contain echoes of these old philosophical debates. This paper draws on existing work in moral and political philosophy in order to elucidate emerging debates about fair machine learning. It answers the following questions: What does it mean for a machine learning model to be ‘fair’, in terms which can be operationalized? Should fairness consist of ensuring everyone has an equal probability of obtaining some benefit, or should we aim instead to minimize the harms to the least advantaged? Can the relevant ideal be determined by reference to some alternative state of affairs in which a particular social pattern of discrimination does not exist? Key takeaways This paper aims to provide an overview of some of the relevant philosophical literature on discrimination, fairness, and egalitarianism in order to clarify and situate the emerging debate within fair machine learning literature. The author addresses the conceptual distinctions drawn between terms frequently used in the fair ML literature–including ‘discrimination’ and ‘fairness’–and the use of related terms in the philosophical literature. He suggests that ‘fairness’ as used in the fair machine learning community is best understood as a placeholder term for a variety of normative egalitarian considerations. He also provides an overview of implications for the incorporation of ‘fairness’ into algorithmic decision-making systems. We hope you like the coverage of Session 4. Don’t miss our coverage on Session 5 on Fat recommenders and more.
Read more
  • 0
  • 0
  • 22236

article-image-10-machine-learning-tools-to-look-out-for-in-2018
Amey Varangaonkar
26 Dec 2017
7 min read
Save for later

10 Machine Learning Tools to watch in 2018

Amey Varangaonkar
26 Dec 2017
7 min read
2017 has been a wonderful year for Machine Learning. Developing smart, intelligent models has now become easier than ever thanks to the extensive research into and development of newer and more efficient tools and frameworks. While the likes of Tensorflow, Keras, PyTorch and some more have ruled the roost in 2017 as the top machine learning and deep learning libraries, 2018 promises to be even more exciting with a strong line-up of open source and enterprise tools ready to take over - or at least compete with - the current lot. In this article, we take a look at 10 such tools and frameworks which are expected to make it big in 2018. Amazon Sagemaker One of the major announcements in the AWS re:Invent 2017 was the general availability of Amazon Sagemaker - a new framework that eases the building and deployment of machine learning models on the cloud. This service will be of great use to developers who don’t have a deep exposure to machine learning, by giving them a variety of pre-built development environments, based on the popular Jupyter notebook format. Data scientists looking to build effective machine learning systems on AWS and to fine-tune their performance without spending a lot of time will also find this service useful. DSSTNE Yet another offering by Amazon, DSSTNE (popularly called as Destiny) is an open source library for developing machine learning models. It’s primary strength lies in the fact that it can be used to train and deploy recommendation models which work with sparse inputs. The models developed using DSSTNE can be trained to use multiple GPUs, are scalable and are optimized for fast performance. Boasting close to 4000 stars on GitHub, this library is yet another tool to look out for in 2018! Azure Machine Learning Workbench Way back in 2014, Microsoft put Machine Learning and AI capabilities on the cloud by releasing Azure Machine Learning. However, this was strictly a cloud-only service. During the Ignite 2017 conference held in September, Microsoft announced the next generation of Machine Learning on Azure - bringing machine learning capabilities to the organizations through their Azure Machine Learning Workbench. Azure ML Workbench is a cross-platform client which can run on both Windows and Apple machines. It is tailor-made for data scientists and machine learning developers who want to perform their data manipulation and wrangling tasks. Built for scalability, users can get intuitive insights from a broad range of data sources and use them for their data modeling tasks. Neon Way back in 2016, Intel announced their intentions to become a major player in the AI market with the $350 million acquisition of Nervana, an AI startup which had been developing both hardware and software for effective machine learning. With Neon, they now have a fast, high-performance deep learning framework designed specifically to run on top of the recently announced Nervana Neural Network Processor. Designed for ease of use and supporting integration with the iPython notebook, Neon supports training of common deep learning models such as CNN, RNN, LSTM and others. The framework is showing signs of continuous improvement and with over 3000 stars on GitHub, Neon looks set to challenge the major league of deep learning libraries in the years to come. Microsoft DMLT One of the major challenges with machine learning for enterprises is the need to scale out the models quickly, without compromising on the performance while minimising significant resource consumption. Microsoft’s Distributed Machine Learning framework is designed to do just that. Open sourced by Microsoft so that it can receive a much wider support from the community, DMLT allows machine learning developers and data scientists to take their single-machine algorithms and scale them out to build high performance distributed models. DMLT mostly focuses on distributed machine learning algorithms and allows you to perform tasks such as word embedding, sampling, and gradient boosting with ease. The framework does not have support for training deep learning models yet, however, we can expect this capability to be added to the framework very soon. Google Cloud Machine Learning Engine Considered to be Google’s premium machine learning offering, the Cloud Machine Learning Engine allows you to build machine learning models on all kinds of data with relative ease. Leveraging the popular Tensorflow machine learning framework, this platform can be used to perform predictive analytics at scale. It also lets you fine-tune and optimize the performance of your machine learning models using the popular HyperTune feature. With a serverless architecture supporting automated monitoring, provisioning and scaling, the Machine Learning Engine ensures you only have to worry about the kind of machine learning models you want to train. This feature is especially useful for machine learning developers looking to build large-scale models on the go. Apple Core ML Developed by Apple to help iOS developers build smarter applications, the Core ML framework is what makes Siri smarter. It takes advantage of both CPU and GPU capabilities to allow the developers to build different kinds of machine learning and deep learning models, which can then be integrated seamlessly into the iOS applications. Core ML supports all popularly used machine learning algorithms such as decision trees, Support Vector Machines, linear models and more. Targeting a variety of real-world use-cases such as natural language processing, computer vision and more, Core ML’s capabilities make it possible to analyze data on the Apple devices on the go, without having to import to the models for learning. Apple Turi Create In many cases, the iOS developers want to customize the machine learning models they want to integrate into their apps. For this, Apple has come up with Turi Create. This library allows you to focus on the task at hand rather than deciding which algorithm to use. You can be flexible in terms of the data set, the scale at which the model needs to operate and what platform the models need to be deployed to. Turi Create comes in very handy for building custom models for recommendations, image processing, text classification and many more tasks. All you need is some knowledge of Python to get started! Convnetjs Move over supercomputers and clusters of machines, deep learning is well and truly here - on your web browsers! You can now train your advanced machine learning and deep learning models directly on your browser, without needing a CPU or a GPU, using the popular Javascript-based Convnetjs library. Originally written by Andrej Karpathy, the current director of AI at Tesla, the library has since been open sourced and extended by the contributions of the community. You can easily train deep neural networks and even reinforcement learning models on your browser directly, powered by this very unique and useful library. This library is suited for those who do not wish to purchase serious hardware for training computationally-intensive models. With close to 9000 stars on GitHub, Convnetjs has been one of the rising stars in 2017 and is quickly becoming THE go-to library for deep learning. BigML BigML is a popular machine learning company that provides an easy to use platform for developing machine learning models. Using BigML’s REST API, you can seamlessly train your machine learning models on their platform. It allows you to perform different tasks such as anomaly detection, time series forecasting, and build apps that perform real-time predictive analytics. With BigML, you can deploy your models on-premise or on the cloud, giving you the flexibility of selecting the kind of environment you need to run your machine learning models. True to their promise, BigML really do make ‘machine learning beautifully simple for everyone’. So there you have it! With Microsoft, Amazon, and Google all fighting for supremacy in the AI space, 2018 could prove to be a breakthrough year for developments in Artificial Intelligence. Add to this mix the various open source libraries that aim to simplify machine learning for the users, and you get a very interesting list of tools and frameworks to keep a tab on. The exciting thing about all this is - all of them possess the capability to become the next TensorFlow and cause the next AI disruption.  
Read more
  • 0
  • 0
  • 22150

article-image-experts-discuss-dark-patterns-and-deceptive-ui-designs-what-are-they-what-do-they-do-how-do-we-stop-them
Sugandha Lahoti
29 Jun 2019
12 min read
Save for later

Experts discuss Dark Patterns and deceptive UI designs: What are they? What do they do? How do we stop them?

Sugandha Lahoti
29 Jun 2019
12 min read
Dark patterns are often used online to deceive users into taking actions they would otherwise not take under effective, informed consent. Dark patterns are generally used by shopping websites, social media platforms, mobile apps and services as a part of their user interface design choices. Dark patterns can lead to financial loss, tricking users into giving up vast amounts of personal data, or inducing compulsive and addictive behavior in adults and children. Using dark patterns is unambiguously unlawful in the United States (under Section 5 of the Federal Trade Commission Act and similar state laws), the European Union (under the Unfair Commercial Practices Directive and similar member state laws), and numerous other jurisdictions. Earlier this week, at the Russell Senate Office Building, a panel of experts met to discuss the implications of Dark patterns in the session, Deceptive Design and Dark Patterns: What are they? What do they do? How do we stop them? The session included remarks from Senator. Mark Warner and Deb Fischer, sponsors of the DETOUR Act, and a panel of experts including Tristan Harris (Co-Founder and Executive Director, Center for Humane Technology). The entire panel of experts included: Tristan Harris (Co-Founder and Executive Director, Center for Humane Technology) Rana Foroohar (Global Business Columnist and Associate Editor, Financial Times) Amina Fazlullah (Policy Counsel, Common Sense Media) Paul Ohm (Professor of Law and Associate Dean, Georgetown Law School), also the moderator Katie McInnis (Policy Counsel, Consumer Reports) Marshall Erwin (Senior Director of Trust & Security, Mozilla) Arunesh Mathur (Dept. of Computer Science, Princeton University) Dark patterns are growing in social media platforms, video games, shopping websites, and are increasingly used to target children The expert session was inaugurated by Arunesh Mathur (Dept. of Computer Science, Princeton University) who talked about his new study by researchers from Princeton University and the University of Chicago. The study suggests that shopping websites are abundant with dark patterns that rely on consumer deception. The researchers conducted a large-scale study, analyzing almost 53K product pages from 11K shopping websites to characterize and quantify the prevalence of dark patterns. They so discovered 1,841 instances of dark patterns on shopping websites, which together represent 15 types of dark patterns. One of the dark patterns was Sneak into Website, which adds additional products to users’ shopping carts without their consent. For example, you would buy a bouquet on a website and the website without your consent would add a greeting card in the hopes that you will actually purchase it. Katie McInnis agreed and added that Dark patterns not only undermine the choices that are available to users on social media and shopping platforms but they can also cost users money. User interfaces are sometimes designed to push a user away from protecting their privacy, making it tough to evaluate them. Amina Fazlullah, Policy Counsel, Common Sense Media said that dark patterns are also being used to target children. Manipulative apps use design techniques to shame or confuse children into in-app purchases or trying to keep them on the app for longer. Children mostly are unable to discern these manipulative techniques. Sometimes the screen will have icons or buttons that will appear to be a part of game play and children will click on them not realizing that they're either being asked to make a purchase or being shown an ad or being directed onto another site. There are games which ask for payments or microtransactions to continue the game forward. Mozilla uses product transparency to curb Dark patterns Marshall Erwin, Senior Director of Trust & Security at Mozilla talked about the negative effects of dark patterns and how they make their own products at Mozilla more transparent.  They have a set of checks and principles in place to avoid dark patterns. No surprises: If users were to figure out or start to understand exactly what is happening with the browser, it should be consistent with their expectations. If the users are surprised, this means browsers need to make a change either by stopping the activity entirely or creating additional transparency that helps people understand. Anti-tracking technology: Cross-site tracking is one of the most pervasive and pernicious dark patterns across the web today that is enabled by cookies. Browsers should take action to decrease the attack surface in the browser and actively protect people from those patterns online.  Mozilla and Apple have introduced anti tracking technology to actively intervene to protect people from the diverse parties that are probably not trustworthy. Detour Act by Senators Warner and Fisher In April, Warner and Fischer had introduced the Deceptive Experiences To Online Users Reduction (DETOUR) Act, a bipartisan legislation to prohibit large online platforms from using dark patterns to trick consumers into handing over their personal data. This act focuses on the activities of large online service providers (over a hundred million users visiting in a given month). Under this act you cannot use practices that trick users into obtaining information or consenting. You will experience new controls about conducting ‘psychological experiments on your users’ and you will no longer be able to target children under 13 with the goal of hooking them into your service. It extends additional rulemaking and enforcement abilities to the Federal Trade Commission. “Protecting users personal data and user autonomy online are truly bipartisan issues”: Senator Mark Warner In his presentation, Warner talked about how 2019 is the year when we need to recognize dark patterns and their ongoing manipulation of American consumers.  While we've all celebrated the benefits that communities have brought from social media, there is also an enormous dark underbelly, he says. It is important that Congress steps up and we play a role as senators such that Americans and their private data is not misused or manipulated going forward. Protecting users personal data and user autonomy online are truly bipartisan issues. This is not a liberal versus conservative, it's much more a future versus past and how we get this future right in a way that takes advantage of social media tools but also put some of the appropriate constraints in place. He says that the driving notion behind the Detour act is that users should have the choice and autonomy when it comes to their personal data. When a company like Facebook asks you to upload your phone contacts or some other highly valuable data to their platform, you ought to have a simple choice yes or no. Companies that run experiments on you without your consent are coercive and Detour act aims to put appropriate protections in place that defend user's ability to make informed choices. In addition to prohibiting large online platforms from using dark patterns to trick consumers into handing over their personal data, the bill would also require informed consent for behavior experimentation. In the process, the bill will be sending a clear message to the platform companies and the FTC that they are now in the business of preserving user's autonomy when it comes to the use of their personal data. The goal, Warner says, is simple - to bring some transparency to what remains a very opaque market and give consumers the tools they need to make informed choices about how and when to share their personal information. “Curbing the use of dark patterns will be foundational to increasing trust online” : Senator Deb Fischer Fischer argued that tech companies are increasingly tailoring users’ online experiences in ways that are more granular. On one hand, she says, you get a more personalized user experience and platforms are more responsive, however it's this variability that allows companies to take that design just a step too far. Companies are constantly competing for users attention and this increases the motivation for a more intrusive and invasive user design. The ability for online platforms to guide the visual interfaces that billions of people view is an incredible influence. It forces us to assess the impact of design on user privacy and well-being. Fundamentally the detour act would prohibit large online platforms from purposely using deceptive user interfaces - dark patterns. The detour act would provide a better accountability system for improved transparency and autonomy online. The legislation would take an important step to restore the hidden options. It would give users a tool to get out of the maze that coaxes you to just click on ‘I agree’. A privacy framework that involves consent cannot function properly if it doesn't ensure the user interface presents fair and transparent options. The detour act would enable the creation of a professional standards body which can register with the Federal Trade Commission. This would serve as a self regulatory body to develop best practices for UI design with the FTC as a backup. She adds, “We need clarity for the enforcement of dark patterns that don't directly involve our wallets. We need policies that place value on user choice and personal data online. We need a stronger mechanism to protect the public interest when the goal for tech companies is to make people engage more and more. User consent remains weakened by the presence of dark patterns and unethical design. Curbing the use of dark patterns will be foundational to increasing trust online. The detour act does provide a key step in getting there.” “The DETOUR act is calling attention to asymmetry and preventing deceptive asymmetry”: Tristan Harris Tristan says that companies are now competing not on manipulating your immediate behavior but manipulating and predicting the future. For example, Facebook has something called loyalty prediction which allows them to sell to an advertiser the ability to predict when you're going to become disloyal to a brand. It can sell that opportunity to another advertiser before probably you know you're going to switch. The DETOUR act is a huge step in the right direction because it's about calling attention to asymmetry and preventing deceptive asymmetry. We need a new relationship for this  asymmetric power by having a duty of care. It’s about treating asymmetrically powerful technologies to be in the service of the systems that they are supposed to protect. He says, we need to switch to a regenerative energy economy that actually treats attention as sacred and not directly tying profit to user extraction. Top questions raised by the panel and online viewers Does A/B testing result in dark patterns? Dark patterns are often a result of A/B testing right where a designer may try things that lead to better engagement or maybe nudge users in a way where the company benefits. However, A/B testing isn't the problem, it’s the intention of how A/B testing is being used. Companies and other organizations should have an oversight on the different experiments that they are conducting to see if A/B testing is actually leading to some kind of concrete harm. The challenge in the space is drawing a line about A/B testing features and optimizing for engagement and decreasing friction. Are consumers smart enough to tackle dark patterns on their own or do we need a legislation? It's well established that for children whose brains are just developing, they're unable to discern these types of deceptive techniques so especially for kids, these types of practices should be banned. For vulnerable families who are juggling all sorts of concerns around income and access to jobs and transportation and health care, putting this on their plate as well is just unreasonable. Dark patterns are deployed for an array of opaque reasons the average user will never recognize. From a consumer perspective, going through and identifying dark pattern techniques--that these platform companies have spent hundreds of thousands  of dollars developing to be as opaque and as tricky as possible--is an unrealistic expectation put on consumers. This is why the DETOUR act and this type of regulation are absolutely necessary and the only way forward. What is it about the largest online providers that make us want to focus on them first or only? Is it their scale or do they have more powerful dark patterns? Is it because they're just harming more people or is it politics? Sometimes larger companies stay wary of indulging in dark patterns because they have a greater risk in terms of getting caught and the PR backlash. However, they do engage in manipulative practices and that warrants a lot of attention. Moreover, targeting bigger companies is just one part of a more comprehensive privacy enforcement environment. Hitting companies that have a large number of users is also great for consumer engagement.  Obviously there is a need to target more broadly but this is a starting point. If Facebook were to suddenly reclass itself and its advertising business model, would you still trust them? No, the leadership that's in charge now for Facebook can not be trusted, especially the organizational cultures that have been building. There are change efforts going on inside of Google and Facebook right now but it’s getting gridlocked. Even if employees want to see policies being changed, they still have bonus structures and employee culture to keep in mind. We recommend you to go through the full hearing here. You can read more about the Detour Act here. U.S. senators introduce a bipartisan bill that bans social media platforms from using ‘dark patterns’ to trick its users. How social media enabled and amplified the Christchurch terrorist attack A new study reveals how shopping websites use ‘dark patterns’ to deceive you into buying things you may not want
Read more
  • 0
  • 0
  • 22147
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-basics-image-histograms-opencv
Packt
12 Oct 2016
11 min read
Save for later

Basics of Image Histograms in OpenCV

Packt
12 Oct 2016
11 min read
In this article by Samyak Datta, author of the book Learning OpenCV 3 Application Development we are going to focus our attention on a different style of processing pixel values. The output of the techniques, which would comprise our study in the current article, will not be images, but other forms of representation for images, namely image histograms. We have seen that a two-dimensional grid of intensity values is one of the default forms of representing images in digital systems for processing as well as storage. However, such representations are not at all easy to scale. So, for an image with a reasonably low spatial resolution, say 512 x 512 pixels, working with a two-dimensional grid might not pose any serious issues. However, as the dimensions increase, the corresponding increase in the size of the grid may start to adversely affect the performance of the algorithms that work with the images. A primary advantage that an image histogram has to offer is that the size of a histogram is a constant that is independent of the dimensions of the image. As a consequence of this, we are guaranteed that irrespective of the spatial resolution of the images that we are dealing with, the algorithms that power our solutions will have to deal with a constant amount of data if they are working with image histograms. (For more resources related to this topic, see here.) Each descriptor captures some particular aspects or features of the image to construct its own form of representation. One of the common pitfalls of using histograms as a form of image representation as compared to its native form of using the entire two-dimensional grid of values is loss of information. A full-fledged image representation using pixel intensity values for all pixel locations naturally consists of all the information that you would need to reconstruct a digital image. However, the same cannot be said about histograms. When we study about image histograms in detail, we'll get to see exactly what information do we stand to lose. And this loss in information is prevalent across all forms of image descriptors. The basics of histograms At the outset, we will briefly explain the concept of a histogram. Most of you might already know this from your lessons on basic statistics. However, we will reiterate this for the sake of completeness. Histogram is a form of data representation technique that relies on an aggregation of data points. The data is aggregated into a set of predefined bins that are represented along the x axis, and the number of data points that fall within each of the bins make up the corresponding counts on the y axis. For example, let's assume that our data looks something like the following: D={2,7,1,5,6,9,14,11,8,10,13} If we define three bins, namely Bin_1 (1 - 5), Bin_2 (6 - 10), and Bin_3 (11 - 15), then the histogram corresponding to our data would look something like this: Bins Frequency Bin_1 (1 - 5) 3 Bin_2 (6 - 10) 5 Bin_3 (11 - 15) 3 What this histogram data tells us is that we have three values between 1 and 5, five between 6 and 10, and three again between 11 and 15. Note that it doesn't tell us what the values are, just that some n values exist in a given bin. A more familiar visual representation of the histogram in discussion is shown as follows: As you can see, the bins have been plotted along the x axis and their corresponding frequencies along the y axis. Now, in the context of images, how is a histogram computed? Well, it's not that difficult to deduce. Since the data that we have comprise pixel intensity values, an image histogram is computed by plotting a histogram using the intensity values of all its constituent pixels. What this essentially means is that the sequence of pixel intensity values in our image becomes the data. Well, this is in fact the simplest kind of histogram that you can compute using the information available to you from the image. Now, coming back to image histograms, there are some basic terminologies (pertaining to histograms in general) that you need to be aware of before you can dip your hands into code. We have explained them in detail here: Histogram size: The histogram size refers to the number of bins in the histogram. Range: The range of a histogram is the range of data that we are dealing with. The range of data as well as the histogram size are both important parameters that define a histogram. Dimensions: Simply put, dimensions refer to the number of the type of items whose values we aggregate in the histogram bins. For example, consider a grayscale image. We might want to construct a histogram using the pixel intensity values for such an image. This would be an example of a single-dimensional histogram because we are just interested in aggregating the pixel intensity values and nothing else. The data, in this case, is spread over a range of 0 to 255. On account of being one-dimensional, such histograms can be represented graphically as 2D plots—one-dimensional data (pixel intensity values) being plotted on the x axis (in the form of bins) along with the corresponding frequency counts along the y axis. We have already seen an example of this before. Now, imagine a color image with three channels: red, green, and blue. Let's say that we want to plot a histogram for the intensities in the red and green channels combined. This means that our data now becomes a pair of values (r, g). A histogram that is plotted for such data will have a dimensionality of 2. The plot for such a histogram will be a 3D plot with the data bins covering the x and y axes and the frequency counts plotted along the z axis. Now that we have discussed the theoretical aspects of image histograms in detail, let's start thinking along the lines of code. We will start with the simplest (and in fact the most ubiquitous) design of image histograms. The range of our data will be from 0 to 255 (both inclusive), which means that all our data points will be integers that fall within the specified range. Also, the number of data points will equal the number of pixels that make up our input image. The simplicity in design comes from the fact that we fix the size of the histogram (the number of bins) as 256. Now, take a moment to think about what this means. There are 256 different possible values that our data points can take and we have a separate bin corresponding to each one of those values. So such an image histogram will essentially depict the 256 possible intensity values along with the counts of the number of pixels in the image that are colored with each of the different intensities. Before taking a peek at what OpenCV has to offer, let's try to implement such a histogram on our own! We define a function named computeHistogram() that takes the grayscale image as an input argument and returns the image histogram. From our earlier discussions, it is evident that the histogram must contain 256 entries (for the 256 bins): one for each integer between 0 and 255. The value stored in the histogram corresponding to each of the 256 entries will be the count of the image pixels that have a particular intensity value. So, conceptually, we can use an array for our implementation such that the value stored in the histogram [ i ] (for 0≤i≤255) will be the count of the number of pixels in the image having the intensity of i. However, instead of using a C++ array, we will comply with the rules and standards followed by OpenCV and represent the histogram as a Mat object. We have already seen that a Mat object is nothing but a multidimensional array store. The implementation is outlined in the following code snippet: Mat computeHistogram(Mat input_image) { Mat histogram = Mat::zeros(256, 1, CV_32S); for (int i = 0; i < input_image.rows; ++i) { for (int j = 0; j < input_image.cols; ++j) { int binIdx = (int) input_image.at<uchar>(i, j); histogram.at<int>(binIdx, 0) += 1; } } return histogram; } As you can see, we have chosen to represent the histogram as a 256-element-column-vector Mat object. We iterate over all the pixels in the input image and keep on incrementing the corresponding counts in the histogram (which had been initialized to 0). As per our description of the image histogram properties, it is easy to see that the intensity value of any pixel is the same as the bin index that is used to index into the appropriate histogram bin to increment the count. Having such an implementation ready, let's test it out with the help of an actual image. The following code demonstrates a main() function that reads an input image, calls the computeHistogram() function that we have defined just now, and displays the contents of the histogram that is returned as a result: int main() { Mat input_image = imread("/home/samyak/Pictures/lena.jpg", IMREAD_GRAYSCALE); Mat histogram = computeHistogram(input_image); cout << "Histogram...n"; for (int i = 0; i < histogram.rows; ++i) cout << i << " : " << histogram.at<int>(i, 0) << "n"; return 0; } We have used the fact that the histogram that is returned from the function will be a single column Mat object. This makes the code that displays the contents of the histogram much cleaner. Histograms in OpenCV We have just seen the implementation of a very basic and minimalistic histogram using the first principles in OpenCV. The image histogram was basic in the sense that all the bins were uniform in size and comprised only a single pixel intensity. This made our lives simple when we designed our code for the implementation; there wasn't any need to explicitly check the membership of a data point (the intensity value of a pixel) with all the bins of our histograms. However, we know that a histogram can have bins whose sizes span more than one. Can you think of the changes that we might need to make in the code that we had written just now to accommodate for bin sizes larger than 1? If this change seems doable to you, try to figure out how to incorporate the possibility of non-uniform bin sizes or multidimensional histograms. By now, things might have started to get a little overwhelming to you. No need to worry. As always, OpenCV has you covered! The developers at OpenCV have provided you with a calcHist() function whose sole purpose is to calculate the histograms for a given set of arrays. By arrays, we refer to the images represented as Mat objects, and we use the term set because the function has the capability to compute multidimensional histograms from the given data: Mat computeHistogram(Mat input_image) { Mat histogram; int channels[] = { 0 }; int histSize[] = { 256 }; float range[] = { 0, 256 }; const float* ranges[] = { range }; calcHist(&input_image, 1, channels, Mat(), histogram, 1, histSize, ranges, true, false); return histogram; } Before we move on to an explanation of the different parameters involved in the calcHist() function call, I want to bring your attention to the abundant use of arrays in the preceding code snippet. Even arguments as simple as histogram sizes are passed to the function in the form of arrays rather than integer values, which at first glance seem quite unnecessary and counter-intuitive. The usage of arrays is due to the fact that the implementation of calcHist() is equipped to handle multidimensional histograms as well, and when we are dealing with such multidimensional histogram data, we require multiple parameters to be passed, one for each dimension. This would become clearer once we demonstrate an example of calculating multidimensional histograms using the calcHist() function. For the time being, we just wanted to clear the immediate confusion that might have popped up in your minds upon seeing the array parameters. Here is a detailed list of the arguments in the calcHist() function call: Source images Number of source images Channel indices Mask Dimensions (dims) Histogram size Ranges Uniform flag Accumulate flag The last couple of arguments (the uniform and accumulate flags) have default values of true and false, respectively. Hence, the function call that you have seen just now can very well be written as follows: calcHist(&input_image, 1, channels, Mat(), histogram, 1, histSize, ranges); Summary Thus in this article we have successfully studied fundamentals of using histograms in OpenCV for image processing. Resources for Article: Further resources on this subject: Remote Sensing and Histogram [article] OpenCV: Image Processing using Morphological Filters [article] Learn computer vision applications in Open CV [article]
Read more
  • 0
  • 0
  • 22139

article-image-camera-calibration
Packt
25 Aug 2014
18 min read
Save for later

Camera Calibration

Packt
25 Aug 2014
18 min read
This article by Robert Laganière, author of OpenCV Computer Vision Application Programming Cookbook Second Edition, includes that images are generally produced using a digital camera, which captures a scene by projecting light going through its lens onto an image sensor. The fact that an image is formed by the projection of a 3D scene onto a 2D plane implies the existence of important relationships between a scene and its image and between different images of the same scene. Projective geometry is the tool that is used to describe and characterize, in mathematical terms, the process of image formation. In this article, we will introduce you to some of the fundamental projective relations that exist in multiview imagery and explain how these can be used in computer vision programming. You will learn how matching can be made more accurate through the use of projective constraints and how a mosaic from multiple images can be composited using two-view relations. Before we start the recipe, let's explore the basic concepts related to scene projection and image formation. (For more resources related to this topic, see here.) Image formation Fundamentally, the process used to produce images has not changed since the beginning of photography. The light coming from an observed scene is captured by a camera through a frontal aperture; the captured light rays hit an image plane (or an image sensor) located at the back of the camera. Additionally, a lens is used to concentrate the rays coming from the different scene elements. This process is illustrated by the following figure: Here, do is the distance from the lens to the observed object, di is the distance from the lens to the image plane, and f is the focal length of the lens. These quantities are related by the so-called thin lens equation: In computer vision, this camera model can be simplified in a number of ways. First, we can neglect the effect of the lens by considering that we have a camera with an infinitesimal aperture since, in theory, this does not change the image appearance. (However, by doing so, we ignore the focusing effect by creating an image with an infinite depth of field.) In this case, therefore, only the central ray is considered. Second, since most of the time we have do>>di, we can assume that the image plane is located at the focal distance. Finally, we can note from the geometry of the system that the image on the plane is inverted. We can obtain an identical but upright image by simply positioning the image plane in front of the lens. Obviously, this is not physically feasible, but from a mathematical point of view, this is completely equivalent. This simplified model is often referred to as the pin-hole camera model, and it is represented as follows: From this model, and using the law of similar triangles, we can easily derive the basic projective equation that relates a pictured object with its image: The size (hi) of the image of an object (of height ho) is therefore inversely proportional to its distance (do) from the camera, which is naturally true. In general, this relation describes where a 3D scene point will be projected on the image plane given the geometry of the camera. Calibrating a camera From the introduction of this article, we learned that the essential parameters of a camera under the pin-hole model are its focal length and the size of the image plane (which defines the field of view of the camera). Also, since we are dealing with digital images, the number of pixels on the image plane (its resolution) is another important characteristic of a camera. Finally, in order to be able to compute the position of an image's scene point in pixel coordinates, we need one additional piece of information. Considering the line coming from the focal point that is orthogonal to the image plane, we need to know at which pixel position this line pierces the image plane. This point is called the principal point. It might be logical to assume that this principal point is at the center of the image plane, but in practice, this point might be off by a few pixels depending on the precision at which the camera has been manufactured. Camera calibration is the process by which the different camera parameters are obtained. One can obviously use the specifications provided by the camera manufacturer, but for some tasks, such as 3D reconstruction, these specifications are not accurate enough. Camera calibration will proceed by showing known patterns to the camera and analyzing the obtained images. An optimization process will then determine the optimal parameter values that explain the observations. This is a complex process that has been made easy by the availability of OpenCV calibration functions. How to do it... To calibrate a camera, the idea is to show it a set of scene points for which their 3D positions are known. Then, you need to observe where these points project on the image. With the knowledge of a sufficient number of 3D points and associated 2D image points, the exact camera parameters can be inferred from the projective equation. Obviously, for accurate results, we need to observe as many points as possible. One way to achieve this would be to take one picture of a scene with many known 3D points, but in practice, this is rarely feasible. A more convenient way is to take several images of a set of some 3D points from different viewpoints. This approach is simpler but requires you to compute the position of each camera view in addition to the computation of the internal camera parameters, which fortunately is feasible. OpenCV proposes that you use a chessboard pattern to generate the set of 3D scene points required for calibration. This pattern creates points at the corners of each square, and since this pattern is flat, we can freely assume that the board is located at Z=0, with the X and Y axes well-aligned with the grid. In this case, the calibration process simply consists of showing the chessboard pattern to the camera from different viewpoints. Here is one example of a 6x4 calibration pattern image: The good thing is that OpenCV has a function that automatically detects the corners of this chessboard pattern. You simply provide an image and the size of the chessboard used (the number of horizontal and vertical inner corner points). The function will return the position of these chessboard corners on the image. If the function fails to find the pattern, then it simply returns false: // output vectors of image points std::vector<cv::Point2f> imageCorners; // number of inner corners on the chessboard cv::Size boardSize(6,4); // Get the chessboard corners bool found = cv::findChessboardCorners(image, boardSize, imageCorners); The output parameter, imageCorners, will simply contain the pixel coordinates of the detected inner corners of the shown pattern. Note that this function accepts additional parameters if you needs to tune the algorithm, which are not discussed here. There is also a special function that draws the detected corners on the chessboard image, with lines connecting them in a sequence: //Draw the corners cv::drawChessboardCorners(image, boardSize, imageCorners, found); // corners have been found The following image is obtained: The lines that connect the points show the order in which the points are listed in the vector of detected image points. To perform a calibration, we now need to specify the corresponding 3D points. You can specify these points in the units of your choice (for example, in centimeters or in inches); however, the simplest is to assume that each square represents one unit. In that case, the coordinates of the first point would be (0,0,0) (assuming that the board is located at a depth of Z=0), the coordinates of the second point would be (1,0,0), and so on, the last point being located at (5,3,0). There are a total of 24 points in this pattern, which is too small to obtain an accurate calibration. To get more points, you need to show more images of the same calibration pattern from various points of view. To do so, you can either move the pattern in front of the camera or move the camera around the board; from a mathematical point of view, this is completely equivalent. The OpenCV calibration function assumes that the reference frame is fixed on the calibration pattern and will calculate the rotation and translation of the camera with respect to the reference frame. Let's now encapsulate the calibration process in a CameraCalibrator class. The attributes of this class are as follows: class CameraCalibrator { // input points: // the points in world coordinates std::vector<std::vector<cv::Point3f>> objectPoints; // the point positions in pixels std::vector<std::vector<cv::Point2f>> imagePoints; // output Matrices cv::Mat cameraMatrix; cv::Mat distCoeffs; // flag to specify how calibration is done int flag; Note that the input vectors of the scene and image points are in fact made of std::vector of point instances; each vector element is a vector of the points from one view. Here, we decided to add the calibration points by specifying a vector of the chessboard image filename as input: // Open chessboard images and extract corner points int CameraCalibrator::addChessboardPoints( const std::vector<std::string>& filelist, cv::Size & boardSize) { // the points on the chessboard std::vector<cv::Point2f> imageCorners; std::vector<cv::Point3f> objectCorners; // 3D Scene Points: // Initialize the chessboard corners // in the chessboard reference frame // The corners are at 3D location (X,Y,Z)= (i,j,0) for (int i=0; i<boardSize.height; i++) { for (int j=0; j<boardSize.width; j++) { objectCorners.push_back(cv::Point3f(i, j, 0.0f)); } } // 2D Image points: cv::Mat image; // to contain chessboard image int successes = 0; // for all viewpoints for (int i=0; i<filelist.size(); i++) { // Open the image image = cv::imread(filelist[i],0); // Get the chessboard corners bool found = cv::findChessboardCorners( image, boardSize, imageCorners); // Get subpixel accuracy on the corners cv::cornerSubPix(image, imageCorners, cv::Size(5,5), cv::Size(-1,-1), cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, // max number of iterations 0.1)); // min accuracy //If we have a good board, add it to our data if (imageCorners.size() == boardSize.area()) { // Add image and scene points from one view addPoints(imageCorners, objectCorners); successes++; } } return successes; } The first loop inputs the 3D coordinates of the chessboard, and the corresponding image points are the ones provided by the cv::findChessboardCorners function. This is done for all the available viewpoints. Moreover, in order to obtain a more accurate image point location, the cv::cornerSubPix function can be used, and as the name suggests, the image points will then be localized at a subpixel accuracy. The termination criterion that is specified by the cv::TermCriteria object defines the maximum number of iterations and the minimum accuracy in subpixel coordinates. The first of these two conditions that is reached will stop the corner refinement process. When a set of chessboard corners have been successfully detected, these points are added to our vectors of the image and scene points using our addPoints method. Once a sufficient number of chessboard images have been processed (and consequently, a large number of 3D scene point / 2D image point correspondences are available), we can initiate the computation of the calibration parameters as follows: // Calibrate the camera // returns the re-projection error double CameraCalibrator::calibrate(cv::Size &imageSize) { //Output rotations and translations std::vector<cv::Mat> rvecs, tvecs; // start calibration return calibrateCamera(objectPoints, // the 3D points imagePoints, // the image points imageSize, // image size cameraMatrix, // output camera matrix distCoeffs, // output distortion matrix rvecs, tvecs, // Rs, Ts flag); // set options } In practice, 10 to 20 chessboard images are sufficient, but these must be taken from different viewpoints at different depths. The two important outputs of this function are the camera matrix and the distortion parameters. These will be described in the next section. How it works... In order to explain the result of the calibration, we need to go back to the figure in the introduction, which describes the pin-hole camera model. More specifically, we want to demonstrate the relationship between a point in 3D at the position (X,Y,Z) and its image (x,y) on a camera specified in pixel coordinates. Let's redraw this figure by adding a reference frame that we position at the center of the projection as seen here: Note that the y axis is pointing downward to get a coordinate system compatible with the usual convention that places the image origin at the upper-left corner. We learned previously that the point (X,Y,Z) will be projected onto the image plane at (fX/Z,fY/Z). Now, if we want to translate this coordinate into pixels, we need to divide the 2D image position by the pixel's width (px) and height (py), respectively. Note that by dividing the focal length given in world units (generally given in millimeters) by px, we obtain the focal length expressed in (horizontal) pixels. Let's then define this term as fx. Similarly, fy =f/py is defined as the focal length expressed in vertical pixel units. Therefore, the complete projective equation is as follows: Recall that (u0,v0) is the principal point that is added to the result in order to move the origin to the upper-left corner of the image. These equations can be rewritten in the matrix form through the introduction of homogeneous coordinates, in which 2D points are represented by 3-vectors and 3D points are represented by 4-vectors (the extra coordinate is simply an arbitrary scale factor, S, that needs to be removed when a 2D coordinate needs to be extracted from a homogeneous 3-vector). Here is the rewritten projective equation: The second matrix is a simple projection matrix. The first matrix includes all of the camera parameters, which are called the intrinsic parameters of the camera. This 3x3 matrix is one of the output matrices returned by the cv::calibrateCamera function. There is also a function called cv::calibrationMatrixValues that returns the value of the intrinsic parameters given by a calibration matrix. More generally, when the reference frame is not at the projection center of the camera, we will need to add a rotation vector (a 3x3 matrix) and a translation vector (a 3x1 matrix). These two matrices describe the rigid transformation that must be applied to the 3D points in order to bring them back to the camera reference frame. Therefore, we can rewrite the projection equation in its most general form: Remember that in our calibration example, the reference frame was placed on the chessboard. Therefore, there is a rigid transformation (made of a rotation component represented by the matrix entries r1 to r9 and a translation represented by t1, t2, and t3) that must be computed for each view. These are in the output parameter list of the cv::calibrateCamera function. The rotation and translation components are often called the extrinsic parameters of the calibration, and they are different for each view. The intrinsic parameters remain constant for a given camera/lens system. The intrinsic parameters of our test camera obtained from a calibration based on 20 chessboard images are fx=167, fy=178, u0=156, and v0=119. These results are obtained by cv::calibrateCamera through an optimization process aimed at finding the intrinsic and extrinsic parameters that will minimize the difference between the predicted image point position, as computed from the projection of the 3D scene points, and the actual image point position, as observed on the image. The sum of this difference for all the points specified during the calibration is called the re-projection error. Let's now turn our attention to the distortion parameters. So far, we have mentioned that under the pin-hole camera model, we can neglect the effect of the lens. However, this is only possible if the lens that is used to capture an image does not introduce important optical distortions. Unfortunately, this is not the case with lower quality lenses or with lenses that have a very short focal length. You may have already noted that the chessboard pattern shown in the image that we used for our example is clearly distorted—the edges of the rectangular board are curved in the image. Also, note that this distortion becomes more important as we move away from the center of the image. This is a typical distortion observed with a fish-eye lens, and it is called radial distortion. The lenses used in common digital cameras usually do not exhibit such a high degree of distortion, but in the case of the lens used here, these distortions certainly cannot be ignored. It is possible to compensate for these deformations by introducing an appropriate distortion model. The idea is to represent the distortions induced by a lens by a set of mathematical equations. Once established, these equations can then be reverted in order to undo the distortions visible on the image. Fortunately, the exact parameters of the transformation that will correct the distortions can be obtained together with the other camera parameters during the calibration phase. Once this is done, any image from the newly calibrated camera will be undistorted. Therefore, we have added an additional method to our calibration class: // remove distortion in an image (after calibration) cv::Mat CameraCalibrator::remap(const cv::Mat &image) { cv::Mat undistorted; if (mustInitUndistort) { // called once per calibration cv::initUndistortRectifyMap( cameraMatrix, // computed camera matrix distCoeffs, // computed distortion matrix cv::Mat(), // optional rectification (none) cv::Mat(), // camera matrix to generate undistorted image.size(), // size of undistorted CV_32FC1, // type of output map map1, map2); // the x and y mapping functions mustInitUndistort= false; } // Apply mapping functions cv::remap(image, undistorted, map1, map2, cv::INTER_LINEAR); // interpolation type return undistorted; } Running this code results in the following image: As you can see, once the image is undistorted, we obtain a regular perspective image. To correct the distortion, OpenCV uses a polynomial function that is applied to the image points in order to move them at their undistorted position. By default, five coefficients are used; a model made of eight coefficients is also available. Once these coefficients are obtained, it is possible to compute two cv::Mat mapping functions (one for the x coordinate and one for the y coordinate) that will give the new undistorted position of an image point on a distorted image. This is computed by the cv::initUndistortRectifyMap function, and the cv::remap function remaps all the points of an input image to a new image. Note that because of the nonlinear transformation, some pixels of the input image now fall outside the boundary of the output image. You can expand the size of the output image to compensate for this loss of pixels, but you will now obtain output pixels that have no values in the input image (they will then be displayed as black pixels). There's more... More options are available when it comes to camera calibration. Calibration with known intrinsic parameters When a good estimate of the camera's intrinsic parameters is known, it could be advantageous to input them in the cv::calibrateCamera function. They will then be used as initial values in the optimization process. To do so, you just need to add the CV_CALIB_USE_INTRINSIC_GUESS flag and input these values in the calibration matrix parameter. It is also possible to impose a fixed value for the principal point (CV_CALIB_FIX_PRINCIPAL_POINT), which can often be assumed to be the central pixel. You can also impose a fixed ratio for the focal lengths fx and fy (CV_CALIB_FIX_RATIO); in which case, you assume the pixels of the square shape. Using a grid of circles for calibration Instead of the usual chessboard pattern, OpenCV also offers the possibility to calibrate a camera by using a grid of circles. In this case, the centers of the circles are used as calibration points. The corresponding function is very similar to the function we used to locate the chessboard corners: cv::Size boardSize(7,7); std::vector<cv::Point2f> centers; bool found = cv:: findCirclesGrid( image, boardSize, centers); See also The A flexible new technique for camera calibration article by Z. Zhang in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no 11, 2000, is a classic paper on the problem of camera calibration Summary In this article, we explored the projective relations that exist between two images of the same scene. Resources for Article: Further resources on this subject: Creating an Application from Scratch [Article] Wrapping OpenCV [Article] New functionality in OpenCV 3.0 [Article]
Read more
  • 0
  • 0
  • 22104

article-image-speech2face-a-neural-network-that-imagines-faces-from-hearing-voices-is-it-too-soon-to-worry-about-ethnic-profiling
Savia Lobo
28 May 2019
8 min read
Save for later

Speech2Face: A neural network that “imagines” faces from hearing voices. Is it too soon to worry about ethnic profiling?

Savia Lobo
28 May 2019
8 min read
Last week, a few researchers from the MIT CSAIL and Google AI published their research study of reconstructing a facial image of a person from a short audio recording of that person speaking, in their paper titled, “Speech2Face: Learning the Face Behind a Voice”. The researchers designed and trained a neural network which uses millions of natural Internet/YouTube videos of people speaking. During training, they demonstrated that the model learns voice-face correlations that allows it to produce images that capture various physical attributes of the speakers such as age, gender, and ethnicity. The entire training was done in a self-supervised manner, by utilizing the natural co-occurrence of faces and speech in Internet videos, without the need to model attributes explicitly. They said they further evaluated and numerically quantified how their Speech2Face reconstructs, obtains results directly from audio, and how it resembles the true face images of the speakers. For this, they tested their model both qualitatively and quantitatively on the AVSpeech dataset and the VoxCeleb dataset. The Speech2Face model The researchers utilized the VGG-Face model, a face recognition model pre-trained on a large-scale face dataset called DeepFace and extracted a 4096-D face feature from the penultimate layer (fc7) of the network. These face features were shown to contain enough information to reconstruct the corresponding face images while being robust to many of the aforementioned variations. The Speech2Face pipeline consists of two main components: 1) a voice encoder, which takes a complex spectrogram of speech as input, and predicts a low-dimensional face feature that would correspond to the associated face; and 2) a face decoder, which takes as input the face feature and produces an image of the face in a canonical form (frontal-facing and with neutral expression). During training, the face decoder is fixed, and only the voice encoder is trained which further predicts the face feature. How were the facial features evaluated? To quantify how well different facial attributes are being captured in Speech2Face reconstructions, the researchers tested different aspects of the model. Demographic attributes Researchers used Face++, a leading commercial service for computing facial attributes. They evaluated and compared age, gender, and ethnicity, by running the Face++ classifiers on the original images and our Speech2Face reconstructions. The Face++ classifiers return either “male” or “female” for gender, a continuous number for age, and one of the four values, “Asian”, “black”, “India”, or “white”, for ethnicity. Source: Arxiv.org Craniofacial attributes Source: Arxiv.org The researchers evaluated craniofacial measurements commonly used in the literature, for capturing ratios and distances in the face. They computed the correlation between F2F and the corresponding S2F reconstructions. Face landmarks were computed using the DEST library. As can be seen, there is statistically significant (i.e., p < 0.001) positive correlation for several measurements. In particular, the highest correlation is measured for the nasal index (0.38) and nose width (0.35), the features indicative of nose structures that may affect a speaker’s voice. Feature similarity The researchers further test how well a person can be recognized from on the face features predicted from speech. They, first directly measured the cosine distance between the predicted features and the true ones obtained from the original face image of the speaker. The table above shows the average error over 5,000 test images, for the predictions using 3s and 6s audio segments. The use of longer audio clips exhibits consistent improvement in all error metrics; this further evidences the qualitative improvement observed in the image below. They further evaluated how accurately they could retrieve the true speaker from a database of face images. To do so, they took the speech of a person to predict the feature using the Speech2Face model and query it by computing its distances to the face features of all face images in the database. Ethical considerations with Speech2Face model Researchers said that the training data used is a collection of educational videos from YouTube and that it does not represent equally the entire world population. Hence, the model may be affected by the uneven distribution of data. They have also highlighted that “ if a certain language does not appear in the training data, our reconstructions will not capture well the facial attributes that may be correlated with that language”. “In our experimental section, we mention inferred demographic categories such as “White” and “Asian”. These are categories defined and used by a commercial face attribute classifier and were only used for evaluation in this paper. Our model is not supplied with and does not make use of this information at any stage”, the paper mentions. They also warn that any further investigation or practical use of this technology would be carefully tested to ensure that the training data is representative of the intended user population. “If that is not the case, more representative data should be broadly collected”, the researchers state. Limitations of the Speech2Face model In order to test the stability of the Speech2Face reconstruction, the researchers used faces from different speech segments of the same person, taken from different parts within the same video, and from a different video. The reconstructed face images were consistent within and between the videos. They further probed the model with an Asian male example speaking the same sentence in English and Chinese to qualitatively test the effect of language and accent. While having the same reconstructed face in both cases would be ideal, the model inferred different faces based on the spoken language. In other examples, the model was able to successfully factor out the language, reconstructing a face with Asian features even though the girl was speaking in English with no apparent accent. “In general, we observed mixed behaviors and a more thorough examination is needed to determine to which extent the model relies on language. More generally, the ability to capture the latent attributes from speech, such as age, gender, and ethnicity, depends on several factors such as accent, spoken language, or voice pitch. Clearly, in some cases, these vocal attributes would not match the person’s appearance”, the researchers state in the paper. Speech2Cartoon: Converting generated image into cartoon faces The face images reconstructed from speech may also be used for generating personalized cartoons of speakers from their voices. The researchers have used Gboard, the keyboard app available on Android phones, which is also capable of analyzing a selfie image to produce a cartoon-like version of the face. Such cartoon re-rendering of the face may be useful as a visual representation of a person during a phone or a video conferencing call when the person’s identity is unknown or the person prefers not to share his/her picture. The reconstructed faces may also be used directly, to assign faces to machine-generated voices used in home devices and virtual assistants. https://twitter.com/NirantK/status/1132880233017761792 A user on HackerNews commented, “This paper is a neat idea, and the results are interesting, but not in the way I'd expected. I had hoped it would the domain of how much person-specific information this can deduce from a voice, e.g. lip aperture, overbite, size of the vocal tract, openness of the nares. This is interesting from a speech perception standpoint. Instead, it's interesting more in the domain of how much social information it can deduce from a voice. This appears to be a relatively efficient classifier for gender, race, and age, taking voice as input.” “I'm sure this isn't the first time it's been done, but it's pretty neat to see it in action, and it's a worthwhile reminder: If a neural net is this good at inferring social, racial, and gender information from audio, humans are even better. And the idea of speech as a social construct becomes even more relevant”, he further added. This recent study is interesting considering the fact that it is taking AI to another level wherein we are able to predict the face just by using audio recordings and even without the need for a DNA. However, there can be certain repercussions, especially when it comes to security. One can easily misuse such technology by impersonating someone else and can cause trouble. It would be interesting to see how this study turns out to be in the near future. To more about the Speech2Face model in detail, head over to the research paper. OpenAI introduces MuseNet: A deep neural network for generating musical compositions An unsupervised deep neural network cracks 250 million protein sequences to reveal biological structures and functions OpenAI researchers have developed Sparse Transformers, a neural network which can predict what comes next in a sequence
Read more
  • 0
  • 0
  • 22010

article-image-how-to-perform-full-text-search-fts-in-postgresql
Sugandha Lahoti
27 Mar 2018
8 min read
Save for later

How to perform full-text search (FTS) in PostgreSQL

Sugandha Lahoti
27 Mar 2018
8 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book, Mastering  PostgreSQL 10, written by Hans-Jürgen Schönig. This book provides expert techniques on PostgreSQL 10 development and administration.[/box] If you are looking up names or for simple strings, you are usually querying the entire content of a field. In Full-Text-Search (FTS), this is different. The purpose of the full-text search is to look for words or groups of words, which can be found in a text. Therefore, FTS is more of a contains operation as you are basically never looking for an exact string. In this article, we will show how to perform a full-text search operation in PostgreSQL. In PostgreSQL, FTS can be done using GIN indexes. The idea is to dissect a text, extract valuable lexemes (= "preprocessed tokens of words"), and index those elements rather than the underlying text. To make your search even more successful, those words are preprocessed. Here is an example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars'); to_tsvector --------------------------------------------------------------- 'car':2,6,14 'even':10 'mani':13 'mind':11 'want':4 'would':8 (1 row) The example shows a simple sentence. The to_tsvector function will take the string, apply English rules, and perform a stemming process. Based on the configuration (english), PostgreSQL will parse the string, throw away stop words, and stem individual words. For example, car and cars will be transformed to the car. Note that this is not about finding the word stem. In the case of many, PostgreSQL will simply transform the string to mani by applying standard rules working nicely with the English language. Note that the output of the to_tsvector function is highly language dependent. If you tell PostgreSQL to treat the string as dutch, the result will be totally different: test=# SELECT to_tsvector('dutch', 'A car, I want a car. I would not even mind having many cars'); to_tsvector ----------------------------------------------------------------- 'a':1,5 'car':2,6,14 'even':10 'having':12 'i':3,7 'many':13 'mind':11 'not':9 'would':8 (1 row) To figure out which configurations are supported, consider running the following query: SELECT cfgname FROM pg_ts_config; Comparing strings After taking a brief look at the stemming process, it is time to figure out how a stemmed text can be compared to a user query. The following code snippet checks for the word wanted: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted'); ?column? ---------- t (1 row) Note that wanted does not actually show up in the original text. Still, PostgreSQL will return true. The reason is that want and wanted are both transformed to the same lexeme, so the result is true. Practically, this makes a lot of sense. Imagine you are looking for a car on Google. If you find pages selling cars, this is totally fine. Finding common lexemes is, therefore, an intelligent idea. Sometimes, people are not only looking for a single word, but want to find a set of words. With to_tsquery, this is possible, as shown in the next example: test=# SELECT to_tsvector('english', 'A car, I want a car. I would not even mind having many cars') @@ to_tsquery('english', 'wanted & bmw'); ?column? ---------- f (1 row) In this case, false is returned because bmw cannot be found in our input string. In the to_tsquery function, & means and and | means or. It is therefore easily possible to build complex search strings. Defining GIN indexes If you want to apply text search to a column or a group of columns, there are basically two choices: Create a functional index using GIN Add a column containing ready-to-use tsvectors and a trigger to keep them in sync In this section, both options will be outlined. To show how things work, I have created some sample data: test=# CREATE TABLE t_fts AS SELECT comment FROM pg_available_extensions; SELECT 43 Indexing the column directly with a functional index is definitely a slower but more space efficient way to get things done: test=# CREATE INDEX idx_fts_func ON t_fts USING gin(to_tsvector('english', comment)); CREATE INDEX Deploying an index on the function is easy, but it can lead to some overhead. Adding a materialized column needs more space, but will lead to a better runtime behavior: test=# ALTER TABLE t_fts ADD COLUMN ts tsvector; ALTER TABLE The only trouble is, how do you keep this column in sync? The answer is by using a trigger: test=# CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON t_fts FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(somename, 'pg_catalog.english', 'comment'); Fortunately, PostgreSQL already provides a C function that can be used by a trigger to sync the tsvector column. Just pass a name, the desired language, as well as a couple of columns to the function, and you are already done. The trigger function will take care of all that is needed. Note that a trigger will always operate within the same transaction as the statement making the modification. Therefore, there is no risk of being inconsistent. Debugging your search Sometimes, it is not quite clear why a query matches a given search string. To debug your query, PostgreSQL offers the ts_debug function. From a user's point of view, it can be used just like to_tsvector. It reveals a lot about the inner workings of the FTS infrastructure: test=# x Expanded display is on. test=# SELECT * FROM ts_debug('english', 'go to www.postgresql-support.de'); -[ RECORD 1 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | go dictionaries | {english_stem} dictionary           | english_stem lexemes    | {go} -[ RECORD 2 ]+---------------------------- alias  | blank Description | Space symbols token   |         dictionaries | {}         dictionary     |         lexemes       | -[ RECORD 3 ]+---------------------------- alias  | asciiword description | Word, all ASCII token      | to dictionaries | {english_stem} dictionary   | english_stem lexemes    | {} -[ RECORD 4 ]+---------------------------- alias  | blank description | Space symbols token | dictionaries | {} dictionary   | lexemes          | -[ RECORD 5 ]+---------------------------- alias  | host description | Host token      | www.postgresql-support.de dictionaries | {simple} dictionary | simple lexemes    | {www.postgresql-support.de} ts_debug will list every token found and display information about the token. You will see which token the parser found, the dictionary used, as well as the type of object. In my example, blanks, words, and hosts have been found. You might also see numbers, email addresses, and a lot more. Depending on the type of string, PostgreSQL will handle things differently. For example, it makes absolutely no sense to stem hostnames and e-mail addresses. Gathering word statistics Full-text search can handle a lot of data. To give end users more insights into their texts, PostgreSQL offers the pg_stat function, which returns a list of words: SELECT * FROM ts_stat('SELECT to_tsvector(''english'', comment) FROM pg_available_extensions') ORDER BY 2 DESC LIMIT 3; word   | ndoc | nentry ----------+------+-------- function | 10 |   10 data      |      10 |  10 type        |   7  |     7 (3 rows) The word column contains the stemmed word, ndoc tells us about the number of documents a certain word occurs.nentry indicates how often a word was found all together. Taking advantage of exclusion operators So far, indexes have been used to speed things up and to ensure uniqueness. However, a couple of years ago, somebody came up with the idea of using indexes for even more. As you have seen in this chapter, GiST supports operations such as intersects, overlaps, contains, and a lot more. So, why not use those operations to manage data integrity? Here is an example: test=# CREATE EXTENSION btree_gist; test=# CREATE TABLE t_reservation ( room int, from_to tsrange, EXCLUDE USING GiST (room with =, from_to with &&) ); CREATE TABLE The EXCLUDE  USING  GiST clause defines additional constraints. If you are selling rooms, you might want to allow different rooms to be booked at the same time. However, you don't want to sell the same room twice during the same period. What the EXCLUDE clause says in my example is this, if a room is booked twice at the same time, an error should pop up (the data in from_to with must not overlap (&&) if it is related to the same room). The following two rows will not violate constraints: test=# INSERT INTO t_reservation VALUES (10, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 test=# INSERT INTO t_reservation VALUES (13, '["2017-01-01", "2017-03-03"]'); INSERT 0 1 However, the next INSERT will cause a violation because the data overlaps: test=# INSERT INTO t_reservation VALUES (13, '["2017-02-02", "2017-08-14"]'); ERROR:  conflicting key value violates exclusion constraint "t_reservation_room_from_to_excl" DETAIL:   Key (room, from_to)=(13, ["2017-02-02 00:00:00","2017-08-14 00:00:00"]) conflicts with existing key (room, from_to)=(13, ["2017-01-01 00:00:00","2017-03-03 00:00:00"]). The use of exclusion operators is very useful and can provide you with highly advanced means to handle integrity. To summarize, we learnt how to perform full-text search operation in PostgreSQL. If you liked our article, check out the book Mastering  PostgreSQL 10 to understand how to perform operations such as indexing, query optimization, concurrent transactions, table partitioning, server tuning, and more.  
Read more
  • 0
  • 0
  • 21944
article-image-building-classification-system-logistic-regression-opencv
Savia Lobo
28 Nov 2017
7 min read
Save for later

Building a classification system with logistic regression in OpenCV

Savia Lobo
28 Nov 2017
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book by Michael Beyeler titled Machine Learning for OpenCV. The code and related files are available on Github here.[/box] A famous dataset in the world of machine learning is called the Iris dataset. The Iris dataset contains measurements of 150 iris flowers from three different species: setosa, versicolor, and viriginica. These measurements include the length and width of the petals, and the length and width of the sepals, all measured in centimeters: Understanding logistic regression Despite its name, logistic regression can actually be used as a model for classification. It uses a logistic function (or sigmoid) to convert any real-valued input x into a predicted output value ŷ that take values between 0 and 1, as shown in the following figure:         The logistic function Rounding ŷ to the nearest integer effectively classifies the input as belonging either to class 0 or 1. Of course, most often, our problems have more than one input or feature value, x. For example, the Iris dataset provides a total of four features. For the sake of simplicity, let's focus here on the first two features, sepal length—which we will call feature f1—and sepal width—which we will call f2. Using the tricks we learned when talking about linear regression, we know we can express the input x as a linear combination of the two features, f1 and f2: However, in contrast to linear regression, we are not done yet. From the previous section, we know that the sum of products would result in a real-valued, output—but we are interested in a categorical value, zero or one. This is where the logistic function comes in: it acts as a squashing function, σ, that compresses the range of possible output values to the range [0, 1]: [box type="shadow" align="" class="" width=""]Because the output is always between 0 and 1, it can be interpreted as a probability. If we only have a single input variable x, the output value ŷ can be interpreted as the probability of x belonging to class 1.[/box] Now let's apply this knowledge to the Iris dataset! Loading the training data The Iris dataset is included with scikit-learn. We first load all the necessary modules, as we did in our earlier examples: In [1]: import numpy as np ... import cv2 ... from sklearn import datasets ... from sklearn import model_selection ... from sklearn import metrics ... import matplotlib.pyplot as plt ... %matplotlib inline In [2]: plt.style.use('ggplot') Then, loading the dataset is a one-liner: In [3]: iris = datasets.load_iris() This function returns a dictionary we call iris, which contains a bunch of different fields: In [4]: dir(iris) Out[4]: ['DESCR', 'data', 'feature_names', 'target', 'target_names'] Here, all the data points are contained in 'data'. There are 150 data points, each of which has four feature values: In [5]: iris.data.shape Out[5]: (150, 4) These four features correspond to the sepal and petal dimensions mentioned earlier: In [6]: iris.feature_names Out[6]: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] For every data point, we have a class label stored in target: In [7]: iris.target.shape Out[7]: (150,) We can also inspect the class labels, and find that there is a total of three classes: In [8]: np.unique(iris.target) Out[8]: array([0, 1, 2]) Making it a binary classification problem For the sake of simplicity, we want to focus on a binary classification problem for now, where we only have two classes. The easiest way to do this is to discard all data points belonging to a certain class, such as class label 2, by selecting all the rows that do not belong to class 2: In [9]: idx = iris.target != 2 ... data = iris.data[idx].astype(np.float32) ... target = iris.target[idx].astype(np.float32) Inspecting the data Before you get started with setting up a model, it is always a good idea to have a look at the data. We did this earlier for the town map example, so let's continue our streak. Using Matplotlib, we create a scatter plot where the color of each data point corresponds to the class label: In [10]: plt.scatter(data[:, 0], data[:, 1], c=target, cmap=plt.cm.Paired, s=100) ... plt.xlabel(iris.feature_names[0]) ... plt.ylabel(iris.feature_names[1]) Out[10]: <matplotlib.text.Text at 0x23bb5e03eb8> To make plotting easier, we limit ourselves to the first two features (iris.feature_names[0] being the sepal length and iris.feature_names[1] being the sepal width). We can see a nice separation of classes in the following figure: Plotting the first two features of the Iris dataset Splitting the data into training and test sets We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions: In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split( ... data, target, test_size=0.1, random_state=42 ... ) Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points: In [12]: X_train.shape, y_train.shape Out[12]: ((90, 4), (90,)) In [13]: X_test.shape, y_test.shape Out[13]: ((10, 4), (10,)) Training the classifier Creating a logistic regression classifier involves pretty much the same steps as setting up k- NN: In [14]: lr = cv2.ml.LogisticRegression_create() We then have to specify the desired training method. Here, we can choose cv2.ml.LogisticRegression_BATCH or cv2.ml.LogisticRegression_MINI_BATCH. For now, all we need to know is that we want to update the model after every data point, which can be achieved with the following code: In [15]: lr.setTrainMethod(cv2.ml.LogisticRegression_MINI_BATCH) ... lr.setMiniBatchSize(1) We also want to specify the number of iterations the algorithm should run before it terminates: In [16]: lr.setIterations(100) We can then call the training method of the object (in the exact same way as we did earlier), which will return True upon success: In [17]: lr.train(X_train, cv2.ml.ROW_SAMPLE, y_train) Out[17]: True As we just saw, the goal of the training phase is to find a set of weights that best transform the feature values into an output label. A single data point is given by its four feature values (f0, f1, f2, f3). Since we have four features, we should also get four weights, so that x = w0 f0 + w1 f1 + w2 f2 + w3 f3, and ŷ=σ(x). However, as discussed previously, the algorithm adds an extra weight that acts as an offset or bias, so that x = w0 f0 + w1 f1 + w2 f2 + w3 f3 + w4. We can retrieve these weights as follows: In [18]: lr.get_learnt_thetas() Out[18]: array([[-0.04109113, -0.01968078, -0.16216497, 0.28704911, 0.11945518]], dtype=float32) This means that the input to the logistic function is x = -0.0411 f0 - 0.0197 f1 - 0.162 f2 + 0.287 f3 + 0.119. Then, when we feed in a new data point (f0, f1, f2, f3) that belongs to class 1, the output ŷ=σ(x) should be close to 1. But how well does that actually work? Testing the classifier Let's see for ourselves by calculating the accuracy score on the training set: In [19]: ret, y_pred = lr.predict(X_train) In [20]: metrics.accuracy_score(y_train, y_pred) Out[20]: 1.0 Perfect score! However, this only means that the model was able to perfectly memorize the training dataset. This does not mean that the model would be able to classify a new, unseen data point. For this, we need to check the test dataset: In [21]: ret, y_pred = lr.predict(X_test) ... metrics.accuracy_score(y_test, y_pred) Out[21]: 1.0 Luckily, we get another perfect score! Now we can be sure that the model we built is truly awesome. If you enjoyed building a classifier using logistic regression and would like to learn more machine learning tasks using OpenCV, be sure to check out the book, Machine Learning for OpenCV, where this section originally appears.    
Read more
  • 0
  • 0
  • 21854

article-image-generative-models-action-create-van-gogh-neural-artistic-style-transfer
Sunith Shetty
03 Apr 2018
14 min read
Save for later

Generative Models in action: How to create a Van Gogh with Neural Artistic Style Transfer

Sunith Shetty
03 Apr 2018
14 min read
In today’s tutorial, we will learn the principles behind neural artistic style transfer and show a working example to transfer the style of Van Gogh art onto an image. Neural artistic style transfer An image can be considered as a combination of style and content. The artistic style transfer technique transforms an image to look like a painting with a specific painting style. We will see how to code this idea up. The loss function will compare the generated image with the content of the photo and style of the painting. Hence, the optimization is carried out for the image pixel, rather than for the weights of the network. Two values are calculated by comparing the content of the photo with the generated image followed by the style of the painting and the generated image. Content loss Since pixels are not a good choice, we will use the CNN features of various layers, as they are a better representation of the content. The initial layers have high-frequency such as edges, corners, and textures but the later layers represent objects, and hence are better for content. The latter layer can compare the object to object better than the pixel. But for this, we need to first import the required libraries, using the following code: import  numpy as  np from PIL  import  Image from  scipy.optimize  import fmin_l_bfgs_b from  scipy.misc  import imsave from  vgg16_avg  import VGG16_Avg from  keras import  metrics from  keras.models  import Model from  keras import  backend as K  Now, let's load the required image, using the following command: content_image = Image.open(work_dir + 'bird_orig.png') We will use the following image for this instance: As we are using the VGG architecture for extracting the features, the mean of all the ImageNet images has to be subtracted from all the images, as shown in the following code: imagenet_mean = np.array([123.68, 116.779, 103.939], dtype=np.float32) def subtract_imagenet_mean(image):  return (image - imagenet_mean)[:, :, :, ::-1] Note that the channels are different. The preprocess function takes the generated image and subtracts the mean and then reverses the channel. The deprocess function reverses that effect because of the preprocessing step, as shown in the following code: def add_imagenet_mean(image, s):  return np.clip(image.reshape(s)[:, :, :, ::-1] + imagenet_mean, 0,    255) First, we will see how to create an image with the content from another image. This is a process of creating an image from random noise. The content used here is the sum of the activation in some layer. We will minimize the loss of the content between the random noise and image, which is termed as the content loss. This loss is similar to pixel-wise loss but applied on layer activations, hence will capture the content leaving out the noise. Any CNN architecture can be used to do forward inference of content image and random noise. The activations are taken and the mean squared error is calculated, comparing the activations of these two outputs. The pixel of the random image is updated while the CNN weights are frozen. We will freeze the VGG network for this case. Now, the VGG model can be loaded. Generative images are very sensitive to subsampling techniques such as max pooling. Getting back the pixel values from max pooling is not possible. Hence, average pooling is a smoother method than max pooling. The function to convert VGG model with average pooling is used for loading the model, as shown here: vgg_model = VGG16_Avg(include_top=False) Note that the weights are the same for this model as the original, even though the pooling type has been changed. The ResNet and Inception models are not suited for this because of their inability to provide various abstractions. We will take the activations from the last convolutional layer of the VGG model namely block_conv1, while the model was frozen. This is the third last layer from the VGG, with a wide receptive field. The code for the same is given here for your reference: content_layer = vgg_model.get_layer('block5_conv1').output Now, a new model is created with a truncated VGG, till the layer that was giving good features. Hence, the image can be loaded now and can be used to carry out the forward inference, to get the actually activated layers. A TensorFlow variable is created to capture the activation, using the following code: content_model = Model(vgg_model.input, content_layer) content_image_array = subtract_imagenet_mean(np.expand_dims(np.array(content_image), 0)) content_image_shape = content_image_array.shape target = K.variable(content_model.predict(content_image_array)) Let's define an evaluator class to compute the loss and gradients of the image. The following class returns the loss and gradient values at any point of the iteration: class ConvexOptimiser(object): def __init__(self, cost_function, tensor_shape): self.cost_function = cost_function self.tensor_shape = tensor_shape self.gradient_values = None def loss(self, point): loss_value, self.gradient_values = self.cost_function([point.reshape(self.tensor_shape)]) return loss_value.astype(np.float64) def gradients(self, point): return self.gradient_values.flatten().astype(np.float64) Loss function can be defined as the mean squared error between the values of activations at specific convolutional layers. The loss will be computed between the layers of generated image and the original content photo, as shown here: mse_loss = metrics.mean_squared_error(content_layer, target) The gradients of the loss can be computed by considering the input of the model, as shown: grads = K.gradients(mse_loss, vgg_model.input) The input to the function is the input of the model and the output will be the array of loss and gradient values as shown: cost_function = K.function([vgg_model.input], [mse_loss]+grads) This function is deterministic to optimize, and hence SGD is not required: optimiser = ConvexOptimiser(cost_function, content_image_shape) This function can be optimized using a simple optimizer, as it is convex and hence is deterministic. We can also save the image at every step of the iteration. We will define it in such a way that the gradients are accessible, as we are using the scikit-learn's optimizer, for the final optimization. Note that this loss function is convex and so, a simple optimizer is good enough for the computation. The optimizer can be defined using the following code: def optimise(optimiser, iterations, point, tensor_shape, file_name): for i in range(iterations): point, min_val, info = fmin_l_bfgs_b(optimiser.loss, point.flatten(), fprime=optimiser.gradients, maxfun=20) point = np.clip(point, -127, 127) print('Loss:', min_val) imsave(work_dir + 'gen_'+file_name+'_{i}.png', add_imagenet_mean(point.copy(), tensor_shape)[0]) return point The optimizer takes loss function, point, and gradients, and returns the updates. A random image needs to be generated so that the content loss will be minimized, using the following code: def generate_rand_img(shape):  return np.random.uniform(-2.5, 2.5, shape)/1 generated_image = generate_rand_img(content_image_shape) Here is the random image that is created: The optimization can be run for 10 iterations to see the results, as shown: iterations = 10 generated_image = optimise(optimiser, iterations, generated_image, content_image_shape, 'content') If everything goes well, the loss should print as shown here, over the iterations: Current loss value: 73.2010421753 Current loss value: 22.7840042114 Current loss value: 12.6585302353 Current loss value: 8.53817081451 Current loss value: 6.64649534225 Current loss value: 5.56395864487 Current loss value: 4.83072710037 Current loss value: 4.32800722122 Current loss value: 3.94804215431 Current loss value: 3.66387653351 Here is the image that is generated and now, it almost looks like a bird. The optimization can be run for further iterations to have this done: An optimizer took the image and updated the pixels so that the content is the same. Though the results are worse, it can reproduce the image to a certain extent with the content. All the images through iterations give a good intuition on how the image is generated. There is no batching involved in this process. In the next section, we will see how to create an image in the style of a painting. Style loss using the Gram matrix After creating an image that has the content of the original image, we will see how to create an image with just the style. Style can be thought of as a mix of colour and texture of an image. For that purpose, we will define style loss. First, we will load the image and convert it to an array, as shown in the following code: style_image = Image.open(work_dir + 'starry_night.png') style_image = style_image.resize(np.divide(style_image.size, 3.5).astype('int32')) Here is the style image we have loaded: Now, we will preprocess this image by changing the channels, using the following code: style_image_array = subtract_imagenet_mean(np.expand_dims(style_image, 0)[:, :, :, :3]) style_image_shape = style_image_array.shape For this purpose, we will consider several layers, like we have done in the following code: model = VGG16_Avg(include_top=False, input_shape=shp[1:]) outputs = {l.name: l.output for l in model.layers} Now, we will take multiple layers as an array output of the first four blocks, using the following code: layers = [outputs['block{}_conv1'.format(o)] for o in range(1,3)] A new model is now created, that can output all those layers and assign the target variables, using the following code: layers_model = Model(model.input, layers) targs = [K.variable(o) for o in layers_model.predict(style_arr)] Style loss is calculated using the Gram matrix. The Gram matrix is the product of a matrix and its transpose. The activation values are simply transposed and multiplied. This matrix is then used for computing the error between the style and random images. The Gram matrix loses the location information but will preserve the texture information. We will define the Gram matrix using the following code: def grammian_matrix(matrix):  flattened_matrix = K.batch_flatten(K.permute_dimensions(matrix, (2, 0, 1)))  matrix_transpose_dot = K.dot(flattened_matrix, K.transpose(flattened_matrix))  element_count = matrix.get_shape().num_elements()  return matrix_transpose_dot / element_count As you might be aware now, it is a measure of the correlation between the pair of columns. The height and width dimensions are flattened out. This doesn't include any local pieces of information, as the coordinate information is disregarded. Style loss computes the mean squared error between the Gram matrix of the input image and the target, as shown in the following code def style_mse_loss(x, y):  return metrics.mse(grammian_matrix(x), grammian_matrix(y)) Now, let's compute the loss by summing up all the activations from the various layers, using the following code: style_loss = sum(style_mse_loss(l1[0], l2[0]) for l1, l2 in zip(style_features, style_targets)) grads = K.gradients(style_loss, vgg_model.input) style_fn = K.function([vgg_model.input], [style_loss]+grads) optimiser = ConvexOptimiser(style_fn, style_image_shape) We then solve it as the same way we did before, by creating a random image. But this time, we will also apply a Gaussian filter, as shown in the following code: generated_image = generate_rand_img(style_image_shape) The random image generated will look like this: The optimization can be run for 10 iterations to see the results, as shown below: generated_image = optimise(optimiser, iterations, generated_image, style_image_shape) If everything goes well, the solver should print the loss values similar to the following: Current loss value: 5462.45556641 Current loss value: 189.738555908 Current loss value: 82.4192581177 Current loss value: 55.6530838013 Current loss value: 37.215713501 Current loss value: 24.4533748627 Current loss value: 15.5914745331 Current loss value: 10.9425945282 Current loss value: 7.66888141632 Current loss value: 5.84042310715 Here is the image that is generated: Here, from a random noise, we have created an image with a particular painting style without any location information. In the next section, we will see how to combine both—the content and style loss. Style transfer Now we know how to reconstruct an image, as well as how to construct an image that captures the style of an original image. The obvious idea may be to just combine these two approaches by weighting and adding the two loss functions, as shown in the following code: w,h = style.size src = img_arr[:,:h,:w] Like before, we're going to grab a sequence of layer outputs to compute the style loss. However, we still only need one layer output to compute the content loss. How do we know which layer to grab? As we discussed earlier, the lower the layer, the more exact the content reconstruction will be. In merging content reconstruction with style, we might expect that a looser reconstruction of the content will allow more room for the style to affect (re: inspiration). Furthermore, a later layer ensures that the image looks like the same subject, even if it doesn't have the same details. The following code is used for this process: style_layers = [outputs['block{}_conv2'.format(o)] for o in range(1,6)] content_name = 'block4_conv2' content_layer = outputs[content_name] Now, a separate model for style is created with required output layers, using the following code: style_model = Model(model.input, style_layers) style_targs = [K.variable(o) for o in style_model.predict(style_arr)] We will also create another model for the content with the content layer, using the following code: content_model = Model(model.input, content_layer) content_targ = K.variable(content_model.predict(src)) Now, the merging of the two approaches is as simple as merging their respective loss functions. Note that as opposed to our previous functions, this function is producing three separate types of outputs: One for the original image One for the image whose style we're emulating One for the random image whose pixels we are training One way for us to tune how the reconstructions mix is by changing the factor on the content loss, which we have here as 1/10. If we increase that denominator, the style will have a larger effect on the image, and if it's too large, the original content of the image will be obscured by an unstructured style. Likewise, if it is too small then the image will not have enough style. We will use the following code for this process: style_wgts = [0.05,0.2,0.2,0.25,0.3] The loss function takes both style and content layers, as shown here: loss = sum(style_loss(l1[0], l2[0])*w    for l1,l2,w in zip(style_layers, style_targs, style_wgts)) loss += metrics.mse(content_layer, content_targ)/10 grads = K.gradients(loss, model.input) transfer_fn = K.function([model.input], [loss]+grads) evaluator = Evaluator(transfer_fn, shp) We will run the solver for 10 iterations as before, using the following code: iterations=10 x = rand_img(shp) x = solve_image(evaluator, iterations, x) The loss values should be printed as shown here: Current loss value: 2557.953125 Current loss value: 732.533630371 Current loss value: 488.321166992 Current loss value: 385.827178955 Current loss value: 330.915924072 Current loss value: 293.238189697 Current loss value: 262.066864014 Current loss value: 239.34185791 Current loss value: 218.086700439 Current loss value: 203.045211792 These results are remarkable. Each one of them does a fantastic job of recreating the original image in the style of the artist. The generated image will look like the following: We will now conclude the style transfer section. This operation is really slow but can work with any images. In the next section, we will see how to use a similar idea to create a superresolution network. There are several ways to make this better, such as: Adding a Gaussian filter to a random image Adding different weights to the layers Different layers and weights can be used to content Initialization of image rather than random image Color can be preserved Masks can be used for specifying what is required Any sketch can be converted to painting Drawing a sketch and creating the image Any image can be converted to artistic style by training a CNN to output such an image. To summarize, we learned to implement to transfer style from one image to another while preserving the content as is. You read an excerpt from a book written by Rajalingappaa Shanmugamani titled Deep Learning for Computer Vision. In this book, you will learn how to model and train advanced neural networks to implement a variety of Computer Vision tasks.
Read more
  • 0
  • 0
  • 21816

article-image-measures-and-measure-groups-microsoft-analysis-services-part-2
Packt
15 Oct 2009
20 min read
Save for later

Measures and Measure Groups in Microsoft Analysis Services: Part 2

Packt
15 Oct 2009
20 min read
Measure groups All but the simplest data warehouses will contain multiple fact tables, and Analysis Services allows you to build a single cube on top of multiple fact tables through the creation of multiple measure groups. These measure groups can contain different dimensions and be at different granularities, but so long as you model your cube correctly, your users will be able to use measures from each of these measure groups in their queries easily and without worrying about the underlying complexity. Creating multiple measure groups To create a new measure group in the Cube Editor, go to the Cube Structure tab and right-click on the cube name in the Measures pane and select 'New Measure Group'. You'll then need to select the fact table to create the measure group from and then the new measure group will be created; any columns that aren't used as foreign key columns in the DSV will automatically be created as measures, and you'll also get an extra measure of aggregation type Count. It's a good idea to delete any measures you are not going to use at this stage. Once you've created a new measure group, BIDS will try to set up relationships between it and any existing dimensions in your cube based on the relationships you've defined in your DSV. Since doing this manually can be time-consuming, this is another great reason for defining relationships in the DSV. You can check the relationships that have been created on the Dimension Usage tab of the Cube Editor: In Analysis Services 2005, it was true in some cases that query performance was better on cubes with fewer measure groups, and that breaking a large cube with many measure groups up into many smaller cubes with only one or two measure groups could result in faster queries. This is no longer the case in Analysis Services 2008. Although there are other reasons why you might want to consider creating separate cubes for each measure group, this is still something of a controversial subject amongst Analysis Services developers. The advantages of a single cube approach are: All of your data is in one place. If your users need to display measures from multiple measure groups, or you need to create calculations that span measure groups, everything is already in place. You only have one cube to manage security and calculations on; with multiple cubes the same security and calculations might have to be duplicated. The advantages of the multiple cube approach are: If you have a complex cube but have to use Standard Edition, you cannot use Perspectives to hide complexity from your users. In this case, creating multiple cubes might be a more user-friendly approach. Depending on your requirements, security might be easier to manage with multiple cubes. It's very easy to grant or deny a role access to a cube; it's much harder to use dimension security to control which measures and dimensions in a multi-measure group cube a role can access. If you have complex calculations, especially MDX Script assignments, it's too easy to write a calculation that has an effect on part of the cube you didn't want to alter. With multiple cubes, the chances of this happening are reduced. Creating measure groups from dimension tables Measure groups don't always have to be created from fact tables. In many cases, it can be useful to build measure groups from dimension tables too. One common scenario where you might want to do this is when you want to create a measure that counts the number of days in the currently selected time period, so if you had selected a year on your Time dimension's hierarchy, the measure would show the number of days in the year. You could implement this with a calculated measure in MDX, but it would be hard to write code that worked in all possible circumstances, such as when a user multi-selects time periods. In fact, it's a better idea to create a new measure group from your Time dimension table containing a new measure with AggregateFunction Count, so you're simply counting the number of days as the number of rows in the dimension table. This measure will perform faster and always return the values you expect. This post on Mosha Pasumansky's blog discusses the problem in more detail: http://tinyurl.com/moshadays MDX formulas vs pre-calculating valuesIf you can somehow model a calculation into the structure of your cube, or perform it in your ETL, you should do so in preference to doing it in MDX only so long as you do not compromise the functionality of your cube. A pure MDX approach will be the most flexible and maintainable since it only involves writing code, and if calculation logic needs to change, then you just need to redeploy your updated MDX Script; doing calculations upstream in the ETL can be much more time-consuming to implement and if you decide to change your calculation logic, then it could involve reloading one or more tables. However, an MDX calculation, even one that is properly tuned, will of course never perform as well as a pre-calculated value or a regular measure. The day count measure, discussed in the previous paragraph, is a perfect example of where a cube-modeling approach trumps MDX. If your aim was to create a measure that showed average daily sales, though, it would make no sense to try to pre-calculate all possible values since that would be far too time-consuming and would result in a non-aggregatable measure. The best solution here would be a hybrid: create real measures for sales and day count, and then create an MDX calculated measure that divided the former by the latter. However, it's always necessary to consider the type of calculation, the volume of data involved and the chances of the calculation algorithm changing in the future before you can make an informed decision on which approach to take. Handling different dimensionality When you have different measure groups in a cube, they are almost always going to have different dimensions associated with them; indeed, if you have measure groups that have identical dimensionality, you might consider combining them into a single measure group if it is convenient to do so. As we've already seen, the Dimension Usage tab shows us which dimensions have relationships with which measure groups. When a dimension has a relationship with a measure group it goes without saying that making a selection on that dimension will affect the values that are displayed for measures on that measure group. But what happens to measures when you make a selection on a dimension that has no relationship with a measure group? In fact, you have two options here, controlled by the IgnoreUnrelatedDimensions property of a measure group: IgnoreUnrelatedDimensions=False displays a null value for all members below the root (the intersection of all of the All Members or default members on every hierarchy) of the dimension, except the Unknown member, or IgnoreUnrelatedDimensions=True repeats the value displayed at the root of the dimension for every member on every hierarchy of the dimension. This is the default state. The screenshot below shows what happens for two otherwise identical measures from measure groups which have IgnoreUnrelatedDimensions set to True and to False when they're displayed next to a dimension they have no relationship with: It's usually best to keep IgnoreUnrelatedDimensions set to True since if the users are querying measures from multiple measure groups, then they don't want some of their selected measures suddenly returning null if they slice by a dimension that has a regular relationship with their other selected measures. Handling different granularities Even when measure groups share the same dimensions, they may not share the same granularity. For example, we may hold sales information in one fact table down to the day level, but also hold sales quotas in another fact table at the quarter level. If we created measure groups from both these fact tables, then they would both have regular relationships with our Time dimension but at different granularities. Normally, when you create a regular relationship between a dimension and a measure group, Analysis Services will join the columns specified in the KeyColumns property of the key attribute of the dimension with the appropriate foreign key columns of the fact table (note that during processing, Analysis Services won't usually do the join in SQL, it does it internally). However, when you have a fact table of a higher granularity, you need to change the granularity attribute property of the relationship to choose the attribute from the dimension you do want to join on instead: In the previous screenshot, we can see an amber warning triangle telling us that by selecting a non-key attribute, the server may have trouble aggregating measure values. What does this mean exactly? Let's take a look at the attribute relationships defined on our Time dimension again: If we're loading data at the Quarter level, what do we expect to see at the Month and Date level? We can only expect to see useful values at the level of the granularity attribute we've chosen, and for only those attributes whose values can be derived from that attribute; this is yet another good reason to make sure your attribute relationships have been optimized. Below the granularity attribute, we've got the same options regarding what gets displayed as we had with dimensions that have no relationship at all with a measure group: either repeated values or null values. The IgnoreUnrelatedDimensions property is again used to control this behavior. Unfortunately, the default True setting for IgnoreUnrelatedDimensions is usually not the option you want to use in this scenario (users usually prefer to see nulls below the granularity of a measure in our experience) and this may conflict with how we want to set IgnoreUnrelatedDimensions to control the behavior of dimensions which have no relationship with a measure group. There are ways of resolving this conflict such as using MDX Script assignments to set cell values to null or by using the ValidMeasure() MDX function, but none are particularly elegant. Non-aggregatable measures: a different approach We've already seen how we can use parent/child hierarchies to load non-aggregatable measure values into our cube. However, given the problems associated with using parent/child hierarchies and knowing what we now know about measure groups, let's consider a different approach to solving this problem. A non-aggregatable measure will have, by its very nature, data stored for many different granularities of a dimension. Rather than storing all of these different granularities of values in the same fact table, we could create multiple fact tables for each granularity of value. Having built measure groups from these fact tables, we would then be able to join our dimension to each of them with a regular relationship but at different granularities. We'd then be in the position of having multiple measures representing the different granularities of a single, logical measure. What we actually want is a single non-aggregatable measure, and we can get this by using MDX Script assignments to combine the different granularities. Let's say we have a regular (non-parent/child) dimension called Employee with three attributes Manager, Team Leader and Sales Person, and a logical non-aggregatable measure called Sales Quota appearing in three measure groups as three measures called Sales Amount Quota_Manager, Sales Amount Quota_TeamLead and Sales Amount Quota for each of these three granularities. Here's a screenshot showing what a query against this cube would show at this stage: We can combine the three measures into one like this: SCOPE([Measures].[Sales Amount Quota]); SCOPE([Employee].[Salesperson].[All]); THIS=[Measures].[Sales Amount Quota_TeamLead]; END SCOPE; SCOPE([Employee].[Team Lead].[All]); THIS=[Measures].[Sales Amount Quota_Manager]; END SCOPE;END SCOPE; This code takes the lowest granularity measure Sales Amount Quota, and then overwrites it twice: the first assignment replaces all of the values above the Sales Person granularity with the value of the measure containing Sales Amount Quota for Team Leaders; the second assignment then replaces all of the values above the Team Leader granularity with the value of the measure containing Sales Quotas for Managers. Once we've set Visible=False for the Sales Amount Quota_TeamLead and Sales Amount Quota_Manager measures, we're left with just the Sales Amount Quota measure visible, thus displaying the non-aggregatable values that we wanted. The user would then see this: Using linked dimensions and measure groups Creating linked dimensions and measure groups allows you to share the same dimensions and measure groups across separate Analysis Services databases, and the same measure group across multiple cubes. To do this, all you need to do is to run the 'New Linked Object' wizard from the Cube Editor, either by clicking on the button in the toolbar on the Cube Structure or Dimension Usage tabs, or by selecting it from the right-click menu in the Measures pane of the Cube Structure tab. Doing this has the advantage of reducing the amount of processing and maintenance needed: instead of having many identical dimensions and measure groups to maintain and keep synchronized, all of which need processing separately, you can have a single object which only needs to be changed and processed once. At least that's the theory—in practice, linked objects are not as widely used as they could be because there are a number of limitations in their use: Linked objects represent a static snapshot of the metadata of the source object, and any changes to the source object are not passed through to the linked object. So for example, if you create a linked dimension and then add an attribute to the source dimension, you then have to delete and recreate the linked dimension—there's no option to refresh a linked object. You can also import the calculations defined in the MDX Script of the source cube using the wizard. However, you can only import the entire script and this may include references to objects present in the source cube that aren't in the target cube, and which may need to be deleted to prevent errors. The calculations that remain will also need to be updated manually when those in the source cube are changed, and if there are a lot, this can add an unwelcome maintenance overhead. A linked measure group can only be used with dimensions from the same database as the source measure group. This isn't a problem when you're sharing measure groups between cubes in the same database, but could be if you wanted to share measure groups across databases. As you would expect, when you query a linked measure group, your query is redirected to the source measure group. If the source measure group is on a different server, this may introduce some latency and hurt query performance. Analysis Services does try to mitigate this by doing some caching on the linked measure group's database, though. By default, it will cache data on a per-query basis, but if you change the RefreshPolicy property from ByQuery to ByInterval you can specify a time limit for data to be held in cache. Linked objects can be useful when cube development is split between multiple development teams, or when you need to create multiple cubes containing some shared data, but, in general, we recommend against using them widely because of these limitations. Role-playing dimensions It's also possible to add the same dimension to a cube more than once, and give each instance a different relationship to the same measure group. For example, in our Sales fact table, we might have several different foreign key columns that join to our Time dimension table: one which holds the date an order was placed on, one which holds the date it was shipped from the warehouse, and one which holds the date the order should arrive with the customer. In Analysis Services, we can create a single physical Time dimension in our database, which is referred to as a database dimension, and then add it three times to the cube to create three 'cube dimensions', renaming each cube dimension to something like Order Date, Ship Date and Due Date. These three cube dimensions are referred to as role-playing dimensions: the same dimension is playing three different roles in the same cube. Role playing dimensions are a very useful feature. They reduce maintenance overheads because you only need to edit one dimension, and unlike linked dimensions, any changes made to the underlying database dimension are propagated to all of the cube dimensions that are based on it. They also reduce processing time because you only need to process the database dimension once. However, there is one frustrating limitation with role-playing dimensions and that is that while you can override certain properties of the database dimension on a per-cube dimension basis, you can't change the name of any of the attributes or hierarchies of a cube dimension. So if you have a user hierarchy called 'Calendar' on your database dimension, all of your cube dimensions will also have a user hierarchy called 'Calendar', and your users might find it difficult to tell which hierarchy is which in certain client tools (Excel 2003 is particularly bad in this respect) or in reports. Unfortunately, we have seen numerous cases where this problem alone meant role-playing dimensions couldn't be used. Dimension/measure group relationships So far we've seen dimensions either having no relationship with a measure group or having a regular relationship, but that's not the whole story: there are many different types of relationships that a dimension can have with a measure group. Here's the complete list: No relationship Regular Fact Referenced Many-to-Many Data Mining Fact relationships Fact or degenerate dimensions are dimensions that are built directly from columns in a fact table, not from a separate dimension table. From an Analysis Services dimension point of view, they are no different from any other kind of dimension, except that there is a special fact relationship type that a dimension can have with a measure group. There are in fact very few differences between a fact relationship and a regular relationship, and they are: A fact relationship will result in marginally more efficient SQL being generated when the fact dimension is used in ROLAP drillthrough. Fact relationships are visible to client tools in the cube's metadata, so client tools may choose to display fact dimensions differently. A fact relationship can only be defined on dimensions and measure groups that are based on the same table in the DSV. A measure group can only have a fact relationship with one database dimension. It can have more than one fact relationship, but all of them have to be with cube dimensions based on the same database dimension. It still makes sense though to define relationships as fact relationships when you can. Apart from the reasons given above, the functionality might change in future versions of Analysis Services and fact relationship types might be further optimized in some way. Referenced relationships A referenced relationship is where a dimension joins to a measure group through another dimension. For example, you might have a Customer dimension that includes geographic attributes up to and including a customer's country; also, your organization might divide the world up into international regions such as North America, Europe, Middle East and Africa (EMEA), Latin America (LATAM) and Asia-Pacific and so on for financial reporting, and you might build a dimension for this too. If your sales fact table only contained a foreign key for the Customer dimension, but you wanted to analyze sales by international region, you would be able to create a referenced relationship from the Region dimension through the Customer dimension to the Sales measure group. When setting up a referenced relationship in the Define Relationship dialog in the Dimension Usage tab, you're asked to first choose the dimension that you wish to join through and then which attribute on the reference dimension joins to which attribute on the intermediate dimension: When the join is made between the attributes you've chosen on the reference dimension, once again it's the values in the columns that are defined in the KeyColumns property of each attribute that you're in fact joining on. The Materialize checkbox is automatically checked, and this ensures maximum query performance by resolving the join between the dimensions at processing time, which can lead to a significant decrease in processing performance. Unchecking this box means that no penalty is paid at processing time but query performance may be worse. The question you may well be asking yourself at this stage is: why bother to use referenced relationships at all? It is in fact a good question to ask, because, in general, it's better to include all of the attributes you need in a single Analysis Services dimension built from multiple tables rather than use a referenced relationship. The single dimension approach will perform better and is more user-friendly: for example, you can't define user hierarchies that span a reference dimension and its intermediate dimension. That said, there are situations where referenced relationships are useful because it's simply not feasible to add all of the attributes you need to a dimension. You might have a Customer dimension, for instance, that has a number of attributes representing dates—the date of a customer's first purchase, the date of a customer's tenth purchase, the date of a customer's last purchase and so on. If you had created these attributes with keys that matched the surrogate keys of your Time dimension, you could create multiple, referenced (but not materialized) role-playing Time dimensions joined to each of these attributes that would give you the ability to analyze each of these dates. You certainly wouldn't want to duplicate all of the attributes from your Time dimension for each of these dates in your Customer dimension. Another good use for referenced relationships is when you want to create multiple parent/child hierarchies from the same dimension table Data mining relationships The data mining functionality of Analysis Services is outside the scope of this article, so we won't spend much time on the data mining relationship type. Suffice to say that when you create an Analysis Services mining structure from data sourced from a cube, you have the option of using that mining structure as the source for a special type of dimension, called a data mining dimension. The wizard will also create a new cube containing linked copies of all of the dimensions and measure groups in the source cube, plus the new data mining dimension, which then has a data mining relationships with the measure groups. Summary In this part, we focused on how to create new measure groups and handle the problems of different dimensionality and granularity, and looked at the different types of relationships that are possible between dimensions and measure groups.
Read more
  • 0
  • 0
  • 21796
article-image-perform-data-partitioning-postgresql-10
Sugandha Lahoti
09 Mar 2018
11 min read
Save for later

How to perform data partitioning in PostgreSQL 10

Sugandha Lahoti
09 Mar 2018
11 min read
Partitioning refers to splitting; logically it means breaking one large table into smaller physical pieces. PostgreSQL supports basic table partitioning. It can store up to 32 TB of data inside a single table, which are by default 8k blocks. Infact, if we compile PostgreSQL with 32k blocks, we can even put up to 128 TB into a single table. However, large tables like these are not necessarily too convenient and it makes sense to partition tables to enable processing easier, and in some cases, a bit faster. With PostgreSQL 10.0, partitioning data has improved and offers significantly easier handling of partitioning data to the end users. In this article, we will talk about both, the classic way to partition data as well as the new features available on PostgreSQL 10.0 to perform data partitioning. Creating partitions First, we will learn the old method to partition data. Before digging deeper into the advantages of partitioning, I want to show how partitions can be created. The entire thing starts with a parent table: test=# CREATE TABLE t_data (id serial, t date, payload text); CREATE TABLE In this example, the parent table has three columns. The date column will be used for partitioning but more on that a bit later. Now that the parent table is in place, the child tables can be created. This is how it works: test=# CREATE TABLE t_data_2016 () INHERITS (t_data); CREATE TABLE test=# d t_data_2016 Table "public.t_data_2016" Column | Type  | Modifiers ---------+---------+----------------------------------------------------- id   | integer | not null default nextval('t_data_id_seq'::regclass) t    | date | payload | text   | Inherits: t_data The table is called t_data_2016 and inherits from t_data.  () means that no extra columns are added to the child table. As you can see, inheritance means that all columns from the parents are available in the child table. Also note that the id column will inherit the sequence from the parent so that all children can share the very same numbering. Let's create more tables: test=# CREATE TABLE t_data_2015 () INHERITS (t_data); CREATE TABLE test=# CREATE TABLE t_data_2014 () INHERITS (t_data); CREATE TABLE So far, all the tables are identical and just inherit from the parent. However, there is more: child tables can actually have more columns than parents. Adding more fields is simple: test=# CREATE TABLE t_data_2013 (special text) INHERITS (t_data); CREATE TABLE In this case, a special column has been added. It has no impact on the parent, but just enriches the children and makes them capable of holding more data. After creating a handful of tables, a row can be added: test=# INSERT INTO t_data_2015 (t, payload) VALUES ('2015-05-04', 'some data'); INSERT 0 1 The most important thing now is that the parent table can be used to find all the data in the child tables: test=# SELECT * FROM t_data; id |   t   | payload ----+------------+----------- 1 | 2015-05-04 | some data (1 row) Querying the parent allows you to gain access to everything below the parent in a simple and efficient manner. To understand how PostgreSQL does partitioning, it makes sense to take a look at the plan: test=# EXPLAIN SELECT * FROM t_data; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..84.10 rows=4411 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) -> Seq Scan on t_data_2016 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2015 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2014 (cost=0.00..22.00 rows=1200 width=40) -> Seq Scan on t_data_2013 (cost=0.00..18.10 rows=810 width=40) (6 rows) Actually, the process is quite simple. PostgreSQL will simply unify all tables and show us all the content from all the tables inside and below the partition we are looking at. Note that all tables are independent and are just connected logically through the system catalog. Applying table constraints What happens if filters are applied? test=# EXPLAIN SELECT * FROM t_data WHERE t = '2016-01-04'; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..95.12 rows=23 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2016 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2015 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2014 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2013 (cost=0.00..20.12 rows=4 width=40) Filter: (t = '2016-01-04'::date) (11 rows) PostgreSQL will apply the filter to all the partitions in the structure. It does not know that the table name is somehow related to the content of the tables. To the database, names are just names and have nothing to do with what you are looking for. This makes sense, of course, as there is no mathematical justification for doing anything else. The point now is: how can we teach the database that the 2016 table only contains 2016 data, the 2015 table only contains 2015 data, and so on? Table constraints are here to do exactly that. They teach PostgreSQL about the content of those tables and therefore allow the planner to make smarter decisions than before. The feature is called constraint exclusion and helps dramatically to speed up queries in many cases. The following listing shows how table constraints can be created: test=# ALTER TABLE t_data_2013 ADD CHECK (t < '2014-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2014 ADD CHECK (t >= '2014-01-01' AND t < '2015-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2015 ADD CHECK (t >= '2015-01-01' AND t < '2016-01-01'); ALTER TABLE test=# ALTER TABLE t_data_2016 ADD CHECK (t >= '2016-01-01' AND t < '2017-01-01'); ALTER TABLE For each table, a CHECK constraint can be added. PostgreSQL will only create the constraint if all the data in those tables is perfectly correct and if every single row satisfies the constraint. In contrast to MySQL, constraints in PostgreSQL are taken seriously and honored under any circumstances. In PostgreSQL, those constraints can overlap--this is not forbidden and can make sense in some cases. However, it is usually better to have non-overlapping constraints because PostgreSQL has the option to prune more tables. Here is what happens after adding those table constraints: test=# EXPLAIN SELECT * FROM t_data WHERE t = '2016-01-04'; QUERY PLAN ----------------------------------------------------------------- Append (cost=0.00..25.00 rows=7 width=40) -> Seq Scan on t_data (cost=0.00..0.00 rows=1 width=40) Filter: (t = '2016-01-04'::date) -> Seq Scan on t_data_2016 (cost=0.00..25.00 rows=6 width=40) Filter: (t = '2016-01-04'::date) (5 rows) The planner will be able to remove many of the tables from the query and only keep those which potentially contain the data. The query can greatly benefit from a shorter and more efficient plan. In particular, if those tables are really large, removing them can boost speed considerably. Modifying inherited structures Once in a while, data structures have to be modified. The ALTER  TABLE clause is here to do exactly that. The question is: how can partitioned tables be modified? Basically, all you have to do is tackle the parent table and add or remove columns. PostgreSQL will automatically propagate those changes through to the child tables and ensure that changes are made to all the relations, as follows: test=# ALTER TABLE t_data ADD COLUMN x int; ALTER TABLE test=# d t_data_2016 Table "public.t_data_2016" Column |   Type   | Modifiers ---------+---------+----------------------------------------------------- id | integer | not null default t | date | payload |  text | x | integer | Check constraints: nextval('t_data_id_seq'::regclass) "t_data_2016_t_check" CHECK (t >= '2016-01-01'::date AND t < '2017-01-01'::date) Inherits: t_data As you can see, the column is added to the parent and automatically added to the child table here. Note that this works for columns, and so on. Indexes are a totally different story. In an inherited structure, every table has to be indexed separately. If you add an index to the parent table, it will only be present on the parent-it won't be deployed on those child tables. Indexing all those columns in all those tables is your task and PostgreSQL is not going to make those decisions for you. Of course, this can be seen as a feature or as a limitation. On the upside, you could say that PostgreSQL gives you all the flexibility to index things separately and therefore potentially more efficiently. However, people might also argue that deploying all those indexes one by one is a lot more work. Moving tables in and out of partitioned structures Suppose you have an inherited structure. Data is partitioned by date and you want to provide the most recent years to the end user. At some point, you might want to remove some data from the scope of the user without actually touching it. You might want to put data into some sort of archive or something. PostgreSQL provides a simple means to achieve exactly that. First, a new parent can be created: test=# CREATE TABLE t_history (LIKE t_data); CREATE TABLE The LIKE keyword allows you to create a table which has exactly the same layout as the t_data table. If you have forgotten which columns the t_data table actually has, this might come in handy as it saves you a lot of work. It is also possible to include indexes, constraints, and defaults. Then, the table can be moved away from the old parent table and put below the new one. Here is how it works: test=# ALTER TABLE t_data_2013 NO INHERIT t_data; ALTER TABLE test=# ALTER TABLE t_data_2013 INHERIT t_history; ALTER TABLE The entire process can of course be done in a single transaction to assure that the operation stays atomic. Cleaning up data One advantage of partitioned tables is the ability to clean data up quickly. Suppose that we want to delete an entire year. If data is partitioned accordingly, a simple DROP  TABLE clause can do the job: test=# DROP TABLE t_data_2014; DROP TABLE As you can see, dropping a child table is easy. But what about the parent table? There are depending objects and therefore PostgreSQL naturally errors out to make sure that nothing unexpected happens: test=# DROP TABLE t_data; ERROR: cannot drop table t_data because other objects depend on it DETAIL: default for table t_data_2013 column id depends on sequence t_data_id_seq table t_data_2016 depends on table t_data table t_data_2015 depends on table t_data HINT: Use DROP ... CASCADE to drop the dependent objects too. The DROP  TABLE clause will warn us that there are depending objects and refuses to drop those tables. The CASCADE clause is needed to force PostgreSQL to actually remove those objects, along with the parent table: test=# DROP TABLE t_data CASCADE; NOTICE:   drop   cascades to 3 other objects DETAIL:   drop   cascades to default for table    t_data_2013 column id drop cascades to table      t_data_2016 drop   cascades to table t_data_2015 DROP TABLE Understanding PostgreSQL 10.0 partitioning For many years, the PostgreSQL community has been working on built-in partitioning. Finally, PostgreSQL 10.0 offers the first implementation of in-core partitioning, which will be covered in this chapter. For now, the partitioning functionality is still pretty basic. However, a lot of infrastructure for future improvements is already in place. To show you how partitioning works, I have compiled a simple example featuring range partitioning: CREATE TABLE data ( payload   integer )  PARTITION BY RANGE (payload); CREATE TABLE negatives PARTITION OF data FOR VALUES FROM (MINVALUE) TO (0); CREATE TABLE positives PARTITION OF data FOR VALUES FROM (0) TO (MAXVALUE); In this example, one partition will hold all negative values while the other one will take care of positive values. While creating the parent table, you can simply specify which way you want to partition data. In PostgreSQL 10.0, there is range partitioning and list partitioning. Support for hash partitioning and the like might be available as soon as PostgreSQL 11.0. Once the parent table has been created, it is already time to create the partitions. To do that, the PARTITION  OF clause has been added. At this point, there are still some limitations. The most important one is that a tuple (= a row) cannot move from one partition to the other, for example: UPDATE data SET payload = -10 WHERE id = 5 If there were rows satisfying this condition, PostgreSQL would simply error out and refuse to change the value. However, in case of a good design, it is a bad idea to change the partitioning key anyway. Also, keep in mind that you have to think about indexing each partition. We learnt both, the old way of data partitioning and new data partitioning features introduced in PostgreSQL 10.0. [box type="note" align="" class="" width=""]You read an excerpt from the book Mastering PostgreSQL 10, written by Hans-Jürgen Schönig.  To know about, query optimization, stored procedures and other techniques in PostgreSQL 10.0, you may check out this book Mastering PostgreSQL 10..[/box]
Read more
  • 0
  • 0
  • 21791

article-image-conversational-ai-in-2018-an-arms-race-of-new-products-acquisitions-and-more
Bhagyashree R
21 Jan 2019
5 min read
Save for later

Conversational AI in 2018: An arms race of new products, acquisitions, and more

Bhagyashree R
21 Jan 2019
5 min read
Conversational AI is one of the most interesting applications of artificial intelligence in recent years. While the trend isn’t yet ubiquitous in the way that recommendation systems are (perhaps unsurprising), it has been successfully productized by a number of tech giants, in the form of Google Home and Amazon Echo (which is ‘powered by’ Alexa). The conversational AI arms race Arguably, 2018 has seen a bit of an arms race in conversational AI. As well as Google and Amazon, the likes of IBM, Microsoft, and Apple have wanted a piece of the action. Here are some of the new conversational AI tools and products these companies introduced this year: Google Google worked towards enhancing its conversational interface development platform, Dialogflow. In July, at the Google Cloud Next event, it announced several improvements and new capabilities to Dialogflow including Text to Speech via DeepMind's WaveNet and Dialogflow Phone Gateway for telephony integration. It also launched a new product called Contact Center AI that comes with Dialogflow Enterprise Edition and additional capabilities to assist live agents and perform analytics. Google Assistant became better in having a back-and-forth conversation with the help of Continued Conversation, which was unveiled at the Google I/O conference. The assistant became multilingual in August, which means users can speak to it in more than one language at a time, without having to adjust their language settings. Users can enable this multilingual functionality by selecting two of the supported languages. Following the footsteps of Amazon, Google also launched its own smart display named Google Home Hub at the ‘Made by Google’ event held in October. Microsoft Microsoft in 2018 introduced and improved various bot-building tools for developers. In May, at the Build conference, Microsoft announced major updates in their conversational AI tools: Azure Bot Service, Microsoft Cognitive Services Language Understanding, and QnAMaker. To enable intelligent bots to learn from example interactions and handle common small talk, it launched new experimental projects from named Conversation Learner and Personality Chat. At Microsoft Ignite, Bot Framework SDK V4.0 was made generally available. Later in November, Microsoft announced the general availability of the Bot Framework Emulator V4 and Web Chat control. In May, to drive more research and development in its conversational AI products, Microsoft acquired Semantic Machines and established conversational AI center of excellence in Berkeley. In November, the organization's acquisition of Austin-based bot startup XOXCO was a clear indication that it wants to get serious about using artificial intelligence for conversational bots. Producing guidelines on developing ‘responsible’ conversational AI further confirmed Microsoft wants to play a big part in the future evolution of the area. Microsoft were the chosen tech partner by UK based conversational AI startup ICS.ai. The team at ICS are using Azure and LUIS from Microsoft in their public sector AI chatbots, aimed at higher education, healthcare trusts and county councils. Amazon Amazon with the aims to improve Alexa’s capabilities released Alexa Skills Kit (ASK) which consists of APIs, tools, documentation, and code samples using which developers can build new skills for Alexa. In September, it announced a preview of a new design language named Alexa Presentation Language (APL). With APL, developers can build visual skills that include graphics, images, slideshows, and video, and to customize them for different device types. Amazon’s smart speaker Echo Dot saw amazing success with becoming the best seller in smart speaker category on Amazon. At its 2018 hardware event in Seattle, Amazon announced the launch of redesigned Echo Dot and a new addition to Alexa-powered A/V device called Echo Plus. As well as the continuing success of Alexa and the Amazon Echo, Amazon’s decision to launch the Alexa Fellowship at a number of leading academic institutions also highlights that for the biggest companies conversational AI is as much about research and exploration as it is products. Like Microsoft, it appears that Amazon is well aware that conversational AI is an area only in its infancy, still in development - as much as great products, it requires clear thinking and cutting-edge insight to ensure that it develops in a way that is both safe and impactful. What’s next? This huge array of products is a result of advances in deep learning researches. Now conversational AI is not just limited to small tasks like setting an alarm or searching the best restaurant. We can have a back and forth conversation with the conversational agent. But, needless to say, it still needs more work. Conversational agents are yet to meet user expectations related to sensing and responding with emotion. In the coming years, we will see these systems understand and do a good job at generating natural language. They will be able to have reasonably natural conversations with humans in certain domains, grounded in context. Also, the continuous development in IoT will provide AI systems with more context. Apple has introduced Shortcuts for iOS 12 to automate your everyday tasks Microsoft amplifies focus on conversational AI: Acquires XOXCO; shares guide to developing responsible bots Amazon is supporting research into conversational AI with Alexa fellowships
Read more
  • 0
  • 0
  • 21786
Modal Close icon
Modal Close icon