How-To Tutorials

article-image-questions-tensorflow-2-0-tf-prebuilt-binaries-tensorboard-keras-python-support

10 Dec 2019

5 min read

#AskTensorFlow: Twitterati ask questions on TensorFlow 2.0 - TF prebuilt binaries, Tensorboard, Keras, and Python support

10 Dec 2019

TensorFlow 2.0 was released recently with tighter integration with Keras, eager execution enabled by default, three times faster training performance, a cleaned-up API, and more. TensorFlow 2.0 had a major API Cleanup. Many API symbols are removed or renamed for better consistency and clarity. It now enables eager execution by default which effectively means that your TensorFlow code runs like numpy code. Keras has been introduced as the main high-level API to enable developers to easily leverage Keras’ various model-building APIs. TensorFlow 2.0 also has the SavedModel API that allows you to save your trained Machine learning model into a language-neutral format. In May, Paige Bailey, Product Manager (TensorFlow) and Laurence Moroney, Developer Advocate at Google sat down to discuss frequently asked questions on TensorFlow 2.0. They talked about TensorFlow prebuilt binaries, the TF 2.0 upgrade script, Tensorflow Datasets, and Python support. Can I ask about any prebuilt binary for the RTX 2080 GPU on Ubuntu 16? Prebuilt binaries for TensorFlow tend to be associated with a specific driver from Nvidia. If you're taking a look at any of the prebuilt binaries, take a look at what driver or what version of the driver you have supported on that specific card. It's easy for you to go to the driver vendor and download the latest version. But that may not be the one that TensorFlow is built for or the one that it supports. So, just make sure that they actually match each other. Do my TensorFlow scripts work with TensorFlow 2.0? Generally, TensorFlow scripts do not work with TensorFlow 2.0. But TensorFlow 2.0 has created an upgrade utility that is automatically downloaded with TensorFlow 2.0. For more information, you can check out this medium blog post that Paige and her colleague Anna created. It shows how you can upgrade script on an end file - any arbitrary Python file or even Jupyter Notebooks. It'll give you an export.txt file that shows you all of the symbol renames, the added keywords, and then some manual changes. When will TensorFlow be supported in Python 3.7 and hence be accessed in Anaconda 3? TensorFlow has made the commitment that as of January 1, 2020, they no longer support Python 2. They are firmly committed to Python 3 and Python 3 support. Is it possible to run Tensorboard on colabs? You can run Tensorboard on colabs and do different operations like smoothing, changing some of the values, and using the embedding visualizer directly from your collab notebook in order to understand accuracies and to be able to model performance debugging. You also don't have to specify ports which means you need not remember to have multiple tensor board instances running. Tensorboard automatically selects one that would be a good candidate. How would you use [TensorFlow’s] feature_columns with Keras? TensorFlow's feature_columns API is quite useful for non-numerical feature processing. Feature columns are a way of getting your data efficiently into Estimators and you can use them in Keras. TensorFlow 2.0 also has a migration guide if you wanted to migrate your models from using Estimators to being more of a TensorFlow 2.0 format with Keras. What are some simple data sets for testing and comparing different training methods for artificial neural networks? Are there any in TensorFlow 2.0? Although MNIST and Fashion-MNIST are great, TensorFlow 2.0 also has TensorFlow Datasets which provide a collection of datasets ready to use with TensorFlow. It handles downloading and preparing the data and constructing a tf.data. TensorFlow Datasets is compatible with both TensorFlow Eager mode and Graph mode. Also, you can use them with all of your deep learning and machine learning models with just a few lines of code. What about all the web developers who are new to AI, how does TensorFlow 2.0 help them get started? With TensorFlow 2.0, the web models that you create using saved model can be deployed to TFLite, or TensorFlow.js. The Keras layers are also supported in TensorFlow.js, so it's not just for Python developers but also for JS developers or even R developers. You can watch Paige and Lawrence answering more questions in this three-part video series available on YouTube. Some of the other questions asked were: Is there any TensorFlow.js transfer learning example for object detection? Are you going to publish the updated version of TensorFlow from Poets tutorial from Pete Warden implementing TF2.0. TFLite 2.0 and NN-API for faster inference on Android devices equipped with NPU/DSP? Will the frozen graph generated from TF 1.x work on TF 2.0? Which is the preferred format for saving the model GOIU forward saved_model (SM) or hd5? What is the purpose of keeping Estimators and Keras as separate APIs? If you want to quickly start with building machine learning projects with TensorFlow 2.0, read our book TensorFlow 2.0 Quick Start Guide by Tony Holdroyd. In this book, you will get acquainted with some new practices introduced in TensorFlow 2.0. You will also learn to train your own models for effective prediction, using high-level Keras API. TensorFlow.js contributor Kai Sasaki on how TensorFlow.js eases web-based machine learning application development Introducing Spleeter, a Tensorflow based python library that extracts voice and sound from any music track. TensorFlow 2.0 released with tighter Keras integration, eager execution enabled by default, and more! Brad Miro talks TensorFlow 2.0 features and how Google is using it internally

0
0
33856

article-image-build-a-custom-news-feed-with-python-tutorial

Prasad Ramesh

10 Sep 2018

13 min read

Build a custom news feed with Python [Tutorial]

Prasad Ramesh

10 Sep 2018

13 min read

0
0
33853

How-To Tutorials

article-image-how-to-make-machine-learning-based-recommendations-using-julia-tutorial

Prasad Ramesh

08 Feb 2019

8 min read

How to make machine learning based recommendations using Julia [Tutorial]

Prasad Ramesh

08 Feb 2019

8 min read

In this article, we will look at machine learning based recommendations using Julia. We will make recommendations using a Julia package called 'Recommendation'. This article is an excerpt from a book written by Adrian Salceanu titled Julia Programming Projects. In this book, you will learn how to build simple-to-advanced applications through examples in Julia Lang 1.x using modern tools. In order to ensure that your code will produce the same results as described in this article, it is recommended to use the same package versions. Here are the external packages used in this tutorial and their specific versions: CSV@v.0.4.3 DataFrames@v0.15.2 Gadfly@v1.0.1 IJulia@v1.14.1 Recommendation@v0.1.0+ In order to install a specific version of a package you need to run: pkg> add PackageName@vX.Y.Z For example: pkg> add IJulia@v1.14.1 Alternatively, you can install all the used packages by downloading the Project.toml file provided on GitHub. You can use pkg> instantiate as follows: julia> download("https://raw.githubusercontent.com/PacktPublishing/Julia-Projects/master/Chapter07/Project.toml", "Project.toml") pkg> activate . pkg> instantiate Julia's ecosystem provides access to Recommendation.jl, a package that implements a multitude of algorithms for both personalized and non-personalized recommendations. For model-based recommenders, it has support for SVD, MF, and content-based recommendations using TF-IDF scoring algorithms. There's also another very good alternative—the ScikitLearn.jl package (https://github.com/cstjean/ScikitLearn.jl). This implements Python's very popular scikit-learn interface and algorithms in Julia, supporting both models from the Julia ecosystem and those of the scikit-learn library (via PyCall.jl). The Scikit website and documentation can be found at http://scikit-learn.org/stable/. It is very powerful and definitely worth keeping in mind, especially for building highly efficient recommenders for production usage. For learning purposes, we'll stick to Recommendation, as it provides for a simpler implementation. Making recommendations with Recommendation For our learning example, we'll use Recommendation. It is the simplest of the available options, and it's a good teaching device, as it will allow us to further experiment with its plug-and-play algorithms and configurable model generators. Before we can do anything interesting, though, we need to make sure that we have the package installed: pkg> add Recommendation#master julia> using Recommendation Please note that I'm using the #master version, because the tagged version, at the time of writing this book, was not yet fully updated for Julia 1.0. The workflow for setting up a recommender with Recommendation involves three steps: Setting up the training data Instantiating and training a recommender using one of the available algorithms Once the training is complete, asking for recommendations Let's implement these steps. Setting up the training data Recommendation uses a DataAccessor object to set up the training data. This can be instantiated with a set of Event objects. A Recommendation.Event is an object that represents a user-item interaction. It is defined like this: struct Event user::Int item::Int value::Float64 end In our case, the user field will represent the UserID, the item field will map to the ISBN, and the value field will store the Rating. However, a bit more work is needed to bring our data in the format required by Recommendation: First of all, our ISBN data is stored as a string and not as an integer. Second, internally, Recommendation builds a sparse matrix of user * item and stores the corresponding values, setting up the matrix using sequential IDs. However, our actual user IDs are large numbers, and Recommendation will set up a very large, sparse matrix, going all the way from the minimum to the maximum user IDs. What this means is that, for example, we only have 69 users in our dataset (as confirmed by unique(training_data[:UserID]) |> size), with the largest ID being 277,427, while for books we have 9,055 unique ISBNs. If we go with this, Recommendation will create a 277,427 x 9,055 matrix instead of a 69 x 9,055 matrix. This matrix would be very large, sparse, and inefficient. Therefore, we'll need to do a bit more data processing to map the original user IDs and the ISBNs to sequential integer IDs, starting from 1. We'll use two Dict objects that will store the mappings from the UserID and ISBN columns to the recommender's sequential user and book IDs. Each entry will be of the form dict[original_id] = sequential_id: julia> user_mappings, book_mappings = Dict{Int,Int}(), Dict{String,Int}() We'll also need two counters to keep track of, and increment, the sequential IDs: julia> user_counter, book_counter = 0, 0 We can now prepare the Event objects for our training data: julia> events = Event[] julia> for row in eachrow(training_data) global user_counter, book_counter user_id, book_id, rating = row[:UserID], row[:ISBN], row[:Rating] haskey(user_mappings, user_id) || (user_mappings[user_id] = (user_counter += 1)) haskey(book_mappings, book_id) || (book_mappings[book_id] = (book_counter += 1)) push!(events, Event(user_mappings[user_id], book_mappings[book_id], rating)) end This will fill up the events array with instances of Recommendation.Event, which represents a unique UserID, ISBN, and Rating combination. To give you an idea, it will look like this: julia> events 10005-element Array{Event,1}: Event(1, 1, 10.0) Event(1, 2, 8.0) Event(1, 3, 9.0) Event(1, 4, 8.0) Event(1, 5, 8.0) # output omitted # Please remember this very important aspect—in Julia, the for loop defines a new scope. This means that variables defined outside the for loop are not accessible inside it. To make them visible within the loop's body, we need to declare them as global. Now, we are ready to set up our DataAccessor: julia> da = DataAccessor(events, user_counter, book_counter) Building and training the recommender At this point, we have all that we need to instantiate our recommender. A very efficient and common implementation uses MF—unsurprisingly, this is one of the options provided by the Recommendation package, so we'll use it. Matrix Factorization The idea behind MF is that, if we're starting with a large sparse matrix like the one used to represent user x profile ratings, then we can represent it as the product of multiple smaller and denser matrices. The challenge is to find these smaller matrices so that their product is as close to our original matrix as possible. Once we have these, we can fill in the blanks in the original matrix so that the predicted values will be consistent with the existing ratings in the matrix: Our user x books rating matrix can be represented as the product between smaller and denser users and books matrices. To perform the matrix factorization, we can use a couple of algorithms, among which the most popular are SVD and Stochastic Gradient Descent (SGD). Recommendation uses SGD to perform matrix factorization. The code for this looks as follows: julia> recommender = MF(da) julia> build(recommender) We instantiate a new MF recommender and then we build it—that is, train it. The build step might take a while (a few minutes on a high-end computer using the small dataset that's provided on GitHub). If we want to tweak the training process, since SGD implements an iterative approach for matrix factorization, we can pass a max_iter argument to the build function, asking it for a maximum number of iterations. The more iterations we do, in theory, the better the recommendations—but the longer it will take to train the model. If you want to speed things up, you can invoke the build function with a max_iter of 30 or less—build(recommender, max_iter = 30). We can pass another optional argument for the learning rate, for example, build (recommender, learning_rate=15e-4, max_iter=100). The learning rate specifies how aggressively the optimization technique should vary between each iteration. If the learning rate is too small, the optimization will need to be run a lot of times. If it's too big, then the optimization might fail, generating worse results than the previous iterations. Making recommendations Now that we have successfully built and trained our model, we can ask it for recommendations. These are provided by the recommend function, which takes an instance of a recommender, a user ID (from the ones available in the training matrix), the number of recommendations, and an array of books ID from which to make recommendations as its arguments: julia> recommend(recommender, 1, 20, [1:book_counter...]) With this line of code, we retrieve the recommendations for the user with the recommender ID 1, which corresponds to the UserID 277427 in the original dataset. We're asking for up to 20 recommendations that have been picked from all the available books. We get back an array of a Pair of book IDs and recommendation scores: 20-element Array{Pair{Int64,Float64},1}: 5081 => 19.1974 5079 => 19.1948 5078 => 19.1946 5077 => 17.1253 5080 => 17.1246 # output omitted # In this article, we learned how to make recommendations with machine learning in Julia. To learn more about machine learning recommendation in Julia and testing the model check out this book Julia Programming Projects. YouTube to reduce recommendations of ‘conspiracy theory’ videos that misinform users in the US How to Build a music recommendation system with PageRank Algorithm How to build a cold-start friendly content-based recommender using Apache Spark SQL

0
0
33846

How-To Tutorials

article-image-managing-heroku-command-line

Packt

20 Nov 2014

27 min read

Managing Heroku from the Command Line

Packt

20 Nov 2014

27 min read

0
0
33833

article-image-collaboration-using-github-workflow

Packt

30 Sep 2015

12 min read

Collaboration Using the GitHub Workflow

Packt

30 Sep 2015

12 min read

In this article by Achilleas Pipinellis, the author of the book GitHub Essentials, has come up with a workflow based on the features it provides and the power of Git. It has named it the GitHub workflow (https://guides.github.com/introduction/flow). In this article, we will learn how to work with branches and pull requests, which is the most powerful feature of GitHub. (For more resources related to this topic, see here.) Learn about pull requests Pull request is the number one feature in GitHub that made it what it is today. It was introduced in early 2008 and is being used extensively among projects since then. While everything else can be pretty much disabled in a project's settings (such as issues and the wiki), pull requests are always enabled. Why pull requests are a powerful asset to work with Whether you are working on a personal project where you are the sole contributor or on a big open source one with contributors from all over the globe, working with pull requests will certainly make your life easier. I like to think of pull requests as chunks of commits, and the GitHub UI helps you visualize clearer what is about to be merged in the default branch or the branch of your choice. Pull requests are reviewable with an enhanced diff view. You can easily revert them with a simple button on GitHub and they can be tested before merging, if a CI service is enabled in the project. The connection between branches and pull requests There is a special connection between branches and pull requests. In this connection, GitHub will automatically show you a button to create a new pull request if you push a new branch in your repository. As we will explore in the following sections, this is tightly coupled to the GitHub workflow, and GitHub uses some special words to describe the from and to branches. As per GitHub's documentation: The base branch is where you think changes should be applied, the head branch is what you would like to be applied. So, in GitHub terms, head is your branch, and base the branch you would like to merge into. Create branches directly in a project – the shared repository model The shared repository model, as GitHub aptly calls it, is when you push new branches directly to the source repository. From there, you can create a new pull request by comparing between branches, as we will see in the following sections. Of course, in order to be able to push to a repository you either have to be the owner or a collaborator; in other words you must have write access. Create branches in your fork – the fork and pull model Forked repositories are related to their parent in a way that GitHub uses in order to compare their branches. The fork and pull model is usually used in projects when one does not have write access but is willing to contribute. After forking a repository, you push a branch to your fork and then create a pull request in the source repository asking its maintainer to merge the changes. This is common practice to contribute to open source projects hosted on GitHub. You will not have access to their repository, but being open source, you can fork the public repository and work on your own copy. How to create and submit a pull request There are quite a few ways to initiate the creation of a pull request, as we you will see in the following sections. The most common one is to push a branch to your repository and let GitHub's UI guide you. Let's explore this option first. Use the Compare & pull request button Whenever a new branch is pushed to a repository, GitHub shows a quick button to create a pull request. In reality, you are taken to the compare page, as we will explore in the next section, but some values are already filled out for you. Let's create, for example, a new branch named add_gitignore where we will add a .gitignore file with the following contents: git checkout -b add_gitignore echo -e '.bundlen.sass-cachen.vendorn_site' > .gitignore git add .gitignore git commit -m 'Add .gitignore' git push origin add_gitignore Next, head over your repository's main page and you will notice the Compare & pull request button, as shown in the following screenshot: From here on, if you hit this button you will be taken to the compare page. Note that I am pushing to my repository following the shared repository model, so here is how GitHub greets me: What would happen if I used the fork and pull repository model? For this purpose, I created another user to fork my repository and followed the same instructions to add a new branch named add_gitignore with the same changes. From here on, when you push the branch to your fork, the Compare & pull request button appears whether you are on your fork's page or on the parent repository. Here is how it looks if you visit your fork: The following screenshot will appear, if you visit the parent repository: In the last case (captured in red), you can see from which user this branch came from (axil43:add_gitignore). In either case, when using the fork and pull model, hitting the Compare & pull request button will take you to the compare page with slightly different options: Since you are comparing across forks, there are more details. In particular, you can see the base fork and branch as well as the head fork and branch that are the ones you are the owner of. GitHub considers the default branch set in your repository to be the one you want to merge into (base) when the Create Pull Request button appears. Before submitting it, let's explore the other two options that you can use to create a pull request. You can jump to the Submit a pull request section if you like. Use the compare function directly As mentioned in the previous section, the Compare & pull request button gets you on the compare page with some predefined values. The button appears right after you push a new branch and is there only for a few moments. In this section, we will see how to use the compare function directly in order to create a pull request. You can access the compare function by clicking on the green button next to the branch drop-down list on a repository's main page: This is pretty powerful as one can compare across forks or, in the same repository, pretty much everything—branches, tags, single commits and time ranges. The default page when you land on the compare page is like the following one; you start by comparing your default branch with GitHub, proposing a list of recently created branches to choose from and compare: In order to have something to compare to, the base branch must be older than what you are comparing to. From here, if I choose the add_gitignore branch, GitHub compares it to a master and shows the diff along with the message that it is able to be merged into the base branch without any conflicts. Finally, you can create the pull request: Notice that I am using the compare function while I'm at my own repository. When comparing in a repository that is a fork of another, the compare function slightly changes and automatically includes more options as we have seen in the previous section. As you may have noticed the Compare & pull request quick button is just a shortcut for using compare manually. If you want to have more fine-grained control on the repositories and the branches compared, use the compare feature directly. Use the GitHub web editor So far, we have seen the two most well-known types of initiating a pull request. There is a third way as well: using entirely the web editor that GitHub provides. This can prove useful for people who are not too familiar with Git and the terminal, and can also be used by more advanced Git users who want to propose a quick change. As always, according to the model you are using (shared repository or fork and pull), the process is a little different. Let's first explore the shared repository model flow using the web editor, which means editing files in a repository that you own. The shared repository model Firstly, make sure you are on the branch that you wish to branch off; then, head over a file you wish to change and press the edit button with the pencil icon: Make the change you want in that file, add a proper commit message, and choose Create a new branch giving the name of the branch you wish to create. By default, the branch name is username-patch-i, where username is your username and i is an increasing integer starting from 1. Consecutive edits on files will create branches such as username-patch-1, username-patch-2, and so on. In our example, I decided to give the branch a name of my own: When ready, press the Propose file change button. From this moment on, the branch is created with the file edits you made. Even if you close the next page, your changes will not be lost. Let's skip the pull request submission for the time being and see how the fork and pull model works. The fork and pull model In the fork and pull model, you fork a repository and submit a pull request from the changes you make in your fork. In the case of using the web editor, there is a caveat. In order to get GitHub automatically recognize that you wish to perform a pull request in the parent repository, you have to start the web editor from the parent repository and not your fork. In the following screenshot, you can see what happens in this case: GitHub informs you that a new branch will be created in your repository (fork) with the new changes in order to submit a pull request. Hitting the Propose file change button will take you to the form to submit the pull request: Contrary to the shared repository model, you can now see the base/head repositories and branches that are compared. Also, notice that the default name for the new branch is patch-i, where i is an increasing integer number. In our case, this was the first branch created that way, so it was named patch-1. If you would like to have the ability to name the branch the way you like, you should follow the shared repository model instructions as explained in preceding section. Following that route, edit the file in your fork where you have write access, add your own branch name, hit the Propose file change button for the branch to be created, and then abort when asked to create the pull request. You can then use the Compare & pull request quick button or use the compare function directly to propose a pull request to the parent repository. One last thing to consider when using the web editor, is the limitation of editing one file at a time. If you wish to include more changes in the same branch that GitHub created for you when you first edited a file, you must first change to that branch and then make any subsequent changes. How to change the branch? Simply choose it from the drop-down menu as shown in the following screenshot: Submit a pull request So far, we have explored the various ways to initiate a pull request. In this section, we will finally continue to submit it as well. The pull request form is identical to the form when creating a new issue. If you have write access to the repository that you are making the pull request to, then you are able to set labels, milestone, and assignee. The title of the pull request is automatically filled by the last commit message that the branch has, or if there are multiple commits, it will just fill in the branch name. In either case, you can change it to your liking. In the following image, you can see the title is taken from the branch name after GitHub has stripped the special characters. In a sense, the title gets humanized: You can add an optional description and images if you deem proper. Whenever ready, hit the Create pull request button. In the following sections, we will explore how the peer review works. Peer review and inline comments The nice thing about pull requests is that you have a nice and clear view of what is about to get merged. You can see only the changes that matter, and the best part is that you can fire up a discussion concerning those changes. In the previous section, we submitted the pull request so that it can be reviewed and eventually get merged. Suppose that we are collaborating with a team and they chime in to discuss the changes. Let's first check the layout of a pull request. Summary In this article, we explored the GitHub workflow and the various ways to perform a pull request, as well as the many features GitHub provides to make that workflow even smoother. This is how the majority of open source projects work when there are dozens of contributors involved. Resources for Article: Further resources on this subject: Git Teaches – Great Tools Don't Make Great Craftsmen[article] Maintaining Your GitLab Instance[article] Configuration [article]

0
0
33829

article-image-multithreading-in-rust-using-crates-tutorial

Aaron Lazar

15 Aug 2018

17 min read

Multithreading in Rust using Crates [Tutorial]

Aaron Lazar

15 Aug 2018

17 min read

The crates.io ecosystem in Rust can make use of approaches to improve our development speed as well as the performance of our code. In this tutorial, we'll learn how to use the crates ecosystem to manipulate threads in Rust. This article is an extract from Rust High Performance, authored by Iban Eguia Moraza. Using non-blocking data structures One of the issues we saw earlier was that if we wanted to share something more complex than an integer or a Boolean between threads and if we wanted to mutate it, we needed to use a Mutex. This is not entirely true, since one crate, Crossbeam, allows us to use great data structures that do not require locking a Mutex. They are therefore much faster and more efficient. Often, when we want to share information between threads, it's usually a list of tasks that we want to work on cooperatively. Other times, we want to create information in multiple threads and add it to a list of information. It's therefore not so usual for multiple threads to be working with exactly the same variables since as we have seen, that requires synchronization and it will be slow. This is where Crossbeam shows all its potential. Crossbeam gives us some multithreaded queues and stacks, where we can insert data and consume data from different threads. We can, in fact, have some threads doing an initial processing of the data and others performing a second phase of the processing. Let's see how we can use these features. First, add crossbeam to the dependencies of the crate in the Cargo.toml file. Then, we start with a simple example: extern crate crossbeam; use std::thread; use std::sync::Arc; use crossbeam::sync::MsQueue; fn main() { let queue = Arc::new(MsQueue::new()); let handles: Vec<_> = (1..6) .map(|_| { let t_queue = queue.clone(); thread::spawn(move || { for _ in 0..1_000_000 { t_queue.push(10); } }) }) .collect(); for handle in handles { handle.join().unwrap(); } let final_queue = Arc::try_unwrap(queue).unwrap(); let mut sum = 0; while let Some(i) = final_queue.try_pop() { sum += i; } println!("Final sum: {}", sum); } Let's first understand what this example does. It will iterate 1,000,000 times in 5 different threads, and each time it will push a 10 to a queue. Queues are FIFO lists, first input, first output. This means that the first number entered will be the first one to pop() and the last one will be the last to do so. In this case, all of them are a 10, so it doesn't matter. Once the threads finish populating the queue, we iterate over it and we add all the numbers. A simple computation should make you able to guess that if everything goes perfectly, the final number should be 50,000,000. If you run it, that will be the result, and that's not all. If you run it by executing cargo run --release, it will run blazingly fast. On my computer, it took about one second to complete. If you want, try to implement this code with the standard library Mutex and vector, and you will see that the performance difference is amazing. As you can see, we still needed to use an Arc to control the multiple references to the queue. This is needed because the queue itself cannot be duplicated and shared, it has no reference count. Crossbeam not only gives us FIFO queues. We also have LIFO stacks. LIFO comes from last input, first output, and it means that the last element you inserted in the stack will be the first one to pop(). Let's see the difference with a couple of threads: extern crate crossbeam; use std::thread; use std::sync::Arc; use std::time::Duration; use crossbeam::sync::{MsQueue, TreiberStack}; fn main() { let queue = Arc::new(MsQueue::new()); let stack = Arc::new(TreiberStack::new()); let in_queue = queue.clone(); let in_stack = stack.clone(); let in_handle = thread::spawn(move || { for i in 0..5 { in_queue.push(i); in_stack.push(i); println!("Pushed :D"); thread::sleep(Duration::from_millis(50)); } }); let mut final_queue = Vec::new(); let mut final_stack = Vec::new(); let mut last_q_failed = 0; let mut last_s_failed = 0; loop { // Get the queue match queue.try_pop() { Some(i) => { final_queue.push(i); last_q_failed = 0; println!("Something in the queue! :)"); } None => { println!("Nothing in the queue :("); last_q_failed += 1; } } // Get the stack match stack.try_pop() { Some(i) => { final_stack.push(i); last_s_failed = 0; println!("Something in the stack! :)"); } None => { println!("Nothing in the stack :("); last_s_failed += 1; } } // Check if we finished if last_q_failed > 1 && last_s_failed > 1 { break; } else if last_q_failed > 0 || last_s_failed > 0 { thread::sleep(Duration::from_millis(100)); } } in_handle.join().unwrap(); println!("Queue: {:?}", final_queue); println!("Stack: {:?}", final_stack); } As you can see in the code, we have two shared variables: a queue and a stack. The secondary thread will push new values to each of them, in the same order, from 0 to 4. Then, the main thread will try to get them back. It will loop indefinitely and use the try_pop() method. The pop() method can be used, but it will block the thread if the queue or the stack is empty. This will happen in any case once all values get popped since no new values are being added, so the try_pop() method will help not to block the main thread and end gracefully. The way it checks whether all the values were popped is by counting how many times it failed to pop a new value. Every time it fails, it will wait for 100 milliseconds, while the push thread only waits for 50 milliseconds between pushes. This means that if it tries to pop new values two times and there are no new values, the pusher thread has already finished. It will add values as they are popped to two vectors and then print the result. In the meantime, it will print messages about pushing and popping new values. You will understand this better by seeing the output: Note that the output can be different in your case, since threads don't need to be executed in any particular order. In this example output, as you can see, it first tries to get something from the queue and the stack but there is nothing there, so it sleeps. The second thread then starts pushing things, two numbers actually. After this, the queue and the stack will be [0, 1]. Then, it pops the first item from each of them. From the queue, it will pop the 0 and from the stack it will pop the 1 (the last one), leaving the queue as [1] and the stack as [0]. It will go back to sleep and the secondary thread will insert a 2 in each variable, leaving the queue as [1, 2] and the stack as [0, 2]. Then, the main thread will pop two elements from each of them. From the queue, it will pop the 1 and the 2, while from the stack it will pop the 2 and then the 0, leaving both empty. The main thread then goes to sleep, and for the next two tries, the secondary thread will push one element and the main thread will pop it, twice. It might seem a little bit complex, but the idea is that these queues and stacks can be used efficiently between threads without requiring a Mutex, and they accept any Send type. This means that they are great for complex computations, and even for multi-staged complex computations. The Crossbeam crate also has some helpers to deal with epochs and even some variants of the mentioned types. For multithreading, Crossbeam also adds a great utility: scoped threads. Scoped threads In all our examples, we have used standard library threads. As we have discussed, these threads have their own stack, so if we want to use variables that we created in the main thread we will need to send them to the thread. This means that we will need to use things such as Arc to share non-mutable data. Not only that, having their own stack means that they will also consume more memory and eventually make the system slower if they use too much. Crossbeam gives us some special threads that allow sharing stacks between them. They are called scoped threads. Using them is pretty simple and the crate documentation explains them perfectly; you will just need to create a Scope by calling crossbeam::scope(). You will need to pass a closure that receives the Scope. You can then call spawn() in that scope the same way you would do it in std::thread, but with one difference, you can share immutable variables among threads if they were created inside the scope or moved to it. This means that for the queues or stacks we just talked about, or for atomic data, you can simply call their methods without requiring an Arc! This will improve the performance even further. Let's see how it works with a simple example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..1_000_u64).into_iter().collect(); let mut results = Vec::new(); crossbeam::scope(|scope| { for num in &all_nums { results.push(scope.spawn(move || num * num + num * 5 + 250)); } }); let final_result: u64 = results.into_iter().map(|res| res.join()).sum(); println!("Final result: {}", final_result); } Let's see what this code does. It will first just create a vector with all the numbers from 0 to 1000. Then, for each of them, in a crossbeam scope, it will run one scoped thread per number and perform a supposedly complex computation. This is just an example, since it will just return a result of a simple second-order function. Interestingly enough, though, the scope.spawn() method allows returning a result of any type, which is great in our case. The code will add each result to a vector. This won't directly add the resulting number, since it will be executed in parallel. It will add a result guard, which we will be able to check outside the scope. Then, after all the threads run and return the results, the scope will end. We can now check all the results, which are guaranteed to be ready for us. For each of them, we just need to call join() and we will get the result. Then, we sum it up to check that they are actual results from the computation. This join() method can also be called inside the scope and get the results, but it will mean that if you do it inside the for loop, for example, you will block the loop until the result is generated, which is not efficient. The best thing is to at least run all the computations first and then start checking the results. If you want to perform more computations after them, you might find it useful to run the new computation in another loop or iterator inside the crossbeam scope. But, how does crossbeam allow you to use the variables outside the scope freely? Won't there be data races? Here is where the magic happens. The scope will join all the inner threads before exiting, which means that no further code will be executed in the main thread until all the scoped threads finish. This means that we can use the variables of the main thread, also called parent stack, due to the main thread being the parent of the scope in this case without any issue. We can actually check what is happening by using the println!() macro. If we remember from previous examples, printing to the console after spawning some threads would usually run even before the spawned threads, due to the time it takes to set them up. In this case, since we have crossbeam preventing it, we won't see it. Let's check the example: extern crate crossbeam; fn main() { let all_nums: Vec<_> = (0..10).into_iter().collect(); crossbeam::scope(|scope| { for num in all_nums { scope.spawn(move || { println!("Next number is {}", num); }); } }); println!("Main thread continues :)"); } If you run this code, you will see something similar to the following output: As you can see, scoped threads will run without any particular order. In this case, it will first run the 1, then the 0, then the 2, and so on. Your output will probably be different. The interesting thing, though, is that the main thread won't continue executing until all the threads have finished. Therefore, reading and modifying variables in the main thread is perfectly safe. There are two main performance advantages with this approach; Arc will require a call to malloc() to allocate memory in the heap, which will take time if it's a big structure and the memory is a bit full. Interestingly enough, that data is already in our stack, so if possible, we should try to avoid duplicating it in the heap. Moreover, the Arc will have a reference counter, as we saw. And it will even be an atomic reference counter, which means that every time we clone the reference, we will need to atomically increment the count. This takes time, even more than incrementing simple integers. Most of the time, we might be waiting for some expensive computations to run, and it would be great if they just gave all the results when finished. We can still add some more chained computations, using scoped threads, that will only be executed after the first ones finish, so we should use scoped threads more often than normal threads, if possible. Using thread pool So far, we have seen multiple ways of creating new threads and sharing information between them. Nevertheless, the ideal number of threads we should spawn to do all the work should be around the number of virtual processors in the system. This means we should not spawn one thread for each chunk of work. Nevertheless, controlling what work each thread does can be complex, since you have to make sure that all threads have work to do at any given point in time. Here is where thread pooling comes in handy. The Threadpool crate will enable you to iterate over all your work and for each of your small chunks, you can call something similar to a thread::spawn(). The interesting thing is that each task will be assigned to an idle thread, and no new thread will be created for each task. The number of threads is configurable and you can get the number of CPUs with other crates. Not only that, if one of the threads panics, it will automatically add a new one to the pool. To see an example, first, let's add threadpool and num_cpus as dependencies in our Cargo.toml file. Then, let's see an example code: extern crate num_cpus; extern crate threadpool; use std::sync::atomic::{AtomicUsize, Ordering}; use std::sync::Arc; use threadpool::ThreadPool; fn main() { let pool = ThreadPool::with_name("my worker".to_owned(), num_cpus::get()); println!("Pool threads: {}", pool.max_count()); let result = Arc::new(AtomicUsize::new(0)); for i in 0..1_0000_000 { let t_result = result.clone(); pool.execute(move || { t_result.fetch_add(i, Ordering::Relaxed); }); } pool.join(); let final_res = Arc::try_unwrap(result).unwrap().into_inner(); println!("Final result: {}", final_res); } This code will create a thread pool of threads with the number of logical CPUs in your computer. Then, it will add a number from 0 to 1,000,000 to an atomic usize, just to test parallel processing. Each addition will be performed by one thread. Doing this with one thread per operation (1,000,000 threads) would be really inefficient. In this case, though, it will use the appropriate number of threads, and the execution will be really fast. There is another crate that gives thread pools an even more interesting parallel processing feature: Rayon. Using parallel iterators If you can see the big picture in these code examples, you'll have realized that most of the parallel work has a long loop, giving work to different threads. It happened with simple threads and it happens even more with scoped threads and thread pools. It's usually the case in real life, too. You might have a bunch of data to process, and you can probably separate that processing into chunks, iterate over them, and hand them over to various threads to do the work for you. The main issue with that approach is that if you need to use multiple stages to process a given piece of data, you might end up with lots of boilerplate code that can make it difficult to maintain. Not only that, you might find yourself not using parallel processing sometimes due to the hassle of having to write all that code. Luckily, Rayon has multiple data parallelism primitives around iterators that you can use to parallelize any iterative computation. You can almost forget about the Iterator trait and use Rayon's ParallelIterator alternative, which is as easy to use as the standard library trait! Rayon uses a parallel iteration technique called work stealing. For each iteration of the parallel iterator, the new value or values get added to a queue of pending work. Then, when a thread finishes its work, it checks whether there is any pending work to do and if there is, it starts processing it. This, in most languages, is a clear source of data races, but thanks to Rust, this is no longer an issue, and your algorithms can run extremely fast and in parallel. Let's look at how to use it for an example similar to those we have seen in this chapter. First, add rayon to your Cargo.toml file and then let's start with the code: extern crate rayon; use rayon::prelude::*; fn main() { let result = (0..1_000_000_u64) .into_par_iter() .map(|e| e * 2) .sum::<u64>(); println!("Result: {}", result); } As you can see, this works just as you would write it in a sequential iterator, yet, it's running in parallel. Of course, running this example sequentially will be faster than running it in parallel thanks to compiler optimizations, but when you need to process data from files, for example, or perform very complex mathematical computations, parallelizing the input can give great performance gains. Rayon implements these parallel iteration traits to all standard library iterators and ranges. Not only that, it can also work with standard library collections, such as HashMap and Vec. In most cases, if you are using the iter() or into_iter() methods from the standard library in your code, you can simply use par_iter() or into_par_iter() in those calls and your code should now be parallel and work perfectly. But, beware, sometimes parallelizing something doesn't automatically improve its performance. Take into account that if you need to update some shared information between the threads, they will need to synchronize somehow, and you will lose performance. Therefore, multithreading is only great if workloads are completely independent and you can execute one without any dependency on the rest. If you found this article useful and would like to learn more such tips, head over to pick up this book, Rust High Performance, authored by Iban Eguia Moraza. Rust 1.28 is here with global allocators, nonZero types and more Java Multithreading: How to synchronize threads to implement critical sections and avoid race conditions Multithreading with Qt

0
0
33775

How-To Tutorials

article-image-eric-evans-at-domain-driven-design-europe-2019-explains-the-different-bounded-context-types-and-their-relation-with-microservices

Bhagyashree R

17 Dec 2019

9 min read

Eric Evans at Domain-Driven Design Europe 2019 explains the different bounded context types and their relation with microservices

Bhagyashree R

17 Dec 2019

9 min read

0
0
33764

article-image-15-things-every-bi-professional-should-know-about-tableau

Fatema Patrawala

17 Dec 2019

8 min read

15 things every BI professional should know about Tableau

Fatema Patrawala

17 Dec 2019

8 min read

“The art and practice of visualizing data is becoming ever more important in bridging the human-computer gap to mediate analytical insight in a meaningful way.” ―Edd Dumbill Tableau is a powerful data visualization and discovery tool. It is an important part of a data analyst or data scientist’s - skill set, with many organizations specifying it as a key skill in job adverts. In this article, we’ll take a look at few things in Tableau you need to know to successfully make a mark in your business intelligence career. While architecture of traditional BI tools has hardware limitations, Tableau does not have such dependencies and it can function independently and requires minimum hardware support. Traditional tools are based on a complex set of technologies when Tableau is based on Associative Search technology making it intuitive, fast and dynamic. Tableau supports in-memory, multi-thread and multi-core computing and more advanced capabilities while traditional BI tools do not offer such functionalities. Various Tableau products Tableau Desktop is a self service business analytics and data visualization suite that anyone can use. With tableau desktop, you can extract massive data offline from your data warehouse for live up to date data analysis. Tableau Online / Tableau Server is an online hosting platform designed for enterprise users. It lets users working on Tableau publish and share dashboards across organization and teams. Tableau Reader is a free desktop application that enables you to open and view visualizations that are built in Tableau Desktop. Tableau Public is a free Tableau software which you can use to make visualizations but you will need to save your workbook or worksheets in the Tableau Server for anyone else to view them. Different data types in Tableau All fields in a data source have a data type. The data type reflects the kind of information stored in that field, for example integers (410), dates (1/23/2015) and strings (“Wisconsin”). The data type of a field is identified in the Data pane by one of the icons shown below. Data type icons in Tableau Icon Data type Text (string) values Date values Date & Time values Numerical values Boolean values (relational only) for example True/False Geographic values (used with maps) Cluster Group Source: Tableau website Measures and Dimensions in Tableau Measures contain numeric, quantitative values that you can measure. Measures can be aggregated. When you drag a measure into the view, Tableau applies an aggregation to that measure (by default). Dimensions, on the other hand, contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view. Ways to connect data in Tableau We can either connect live to your data set or extract data into Tableau. Live: Connecting live to a data set leverages its computational processing and storage. New queries will go to the database and will be reflected as new or updated within the data. Extract: The Extract API allows you to programmatically extract and combine any data sources for use in Tableau. There can be multiple data source connections to different sources in the same workbook. Each connection will show up under the Data tab on the left sidebar. The benefit of Tableau extract over live connection is that extract can be used anywhere without any connection and you can build your own visualization without connecting to database. You can read a complete section on how to extract data in Tableau from this book, Learning Tableau 2019 - Third Edition, written by Joshua Milligan. This book takes you through the foundations of the Tableau 2019 paradigm to the advanced topics. Joins and Blends in Tableau Joining tables and blending data sources are two different ways to link related data together in Tableau. Joins are performed to link tables of data together on a row-by-row basis. Blends are performed to link together multiple data sources at an aggregate level. Different filters in Tableau and different use cases in which these filters are more relevant than others In Tableau, filters are used to restrict the data from database. Often, you will want to filter data in Tableau in order to perform an analysis on a subset of data, narrow your focus, or drill into detail. Tableau offers multiple ways to filter data. If you want to limit the scope of your analysis to a subset of data, you can filter the data at the source using one of the following techniques: Data Source Filters are applied before all other filters and are useful when you want to limit your analysis to a subset of data. These filters are applied before any other filters. Extract Filters limit the data that is stored in an extract (.tde or .hyper). Data source filters are often converted into extract filters if they are present when you extract the data. Custom SQL Filters can be accomplished using a live connection with custom SQL, which has a Tableau parameter in the WHERE clause. Dual axis in Tableau Dual Axis is an excellent phenomenon supported by Tableau that helps users view two scales of two measures in the same graph. Many websites like Indeed.com and other make use of dual axis to show the comparison between two measures and their growth rate in a septic set of years. Dual axis let you compare multiple measures at once, having two independent axis layered on top of one another. Key components of a Tableau Dashboard Horizontal – Horizontal layout containers allow the designer to group worksheets and dashboard components left to right across your page and edit the height of all elements at once. Vertical – Vertical containers allow the user to group worksheets and dashboard components top to bottom down your page and edit the width of all elements at once. Text – All textual fields. Image Extract – A Tableau workbook is in XML format. In order to extract images, Tableau applies some codes to extract an image which can be stored in XML. Web [URL ACTION] – A URL action is a hyperlink that points to a Web page, file, or other web-based resource outside of Tableau. You can use URL actions to link to more information about your data that may be hosted outside of your data source. To make the link relevant to your data, you can substitute field values of a selection into the URL as parameters. If you want to learn how to design dashboards in Tableau, this book Learning Tableau 2019, will give you a step by step process for designing dashboards. Why automate reports in Tableau Once you have automated reporting, you’ll have time to spend on innovative projects. What can be done manually could be performed by automation, delivering the same results in a fraction of the time. Reducing such a time-consuming and repetitive task will make you more productive, and more efficient. What is story in Tableau? Why would create a story and what are they used for? A story is a sheet that contains a sequence of worksheets or dashboards that work together to convey information. You can create stories to show how facts are connected, provide context, demonstrate how decisions relate to outcomes, or simply make a compelling case. Each individual sheet in a story is called a story point. The primary objective of creating stories in Tableau is to communicate data to a certain audience with an intended result. How can you create stories in Tableau? There is a feature in Tableau named as Stories that allows you to tell a story using interactive snapshots of dashboards and views. The snapshots become points in a story. This allows you to construct guided narrative or even an entire presentation. Read this chapter, ‘Telling a Data Story with Dashboards’ from this book, Learning Tableau 2019, to create insightful dashboards in Tableau. How to embed views into Webpages? You can embed interactive Tableau views and dashboards into web pages, blogs, wiki pages, web applications, and intranet portals. Embedded views update as the underlying data changes, or as their workbooks are updated on Tableau Server. Embedded views follow the same licensing and permission restrictions used on Tableau Server. That is, to see a Tableau view that’s embedded in a web page, the person accessing the view must also have an account on Tableau Server. Alternatively, if your organization uses a core-based license on Tableau Server, a Guest account is available. This allows people in your organization to view and interact with Tableau views embedded in web pages without having to sign in to the server. Contact your server or site administrator to find out if the Guest user is enabled for the site you publish to. What is Tableau Prep? Can we clean messy data with Tableau? Tableau Prep extends the Tableau platform with robust options for cleaning and structuring data for analysis in Tableau. In the same way that Tableau Desktop provides a hands-on, visual experience for visualizing and analyzing data, Tableau Prep provides a hands-on, visual experience for cleaning and shaping data. If you wish to know more about Tableau Prep or how to clean messy data to create powerful data visualizations and unlock intelligent business insights, read this book Learning Tableau 2019, written by Joshua N. Milligan. ‘Tableau Day’ highlights: Augmented Analytics, Tableau Prep Builder and Conductor, and more! Alteryx vs. Tableau: Choosing the right data analytics tool for your business How to do data storytelling well with Tableau [Video]

0
0
33762

article-image-how-to-integrate-a-medium-editor-in-angular-8

Guest Contributor

05 Sep 2019

5 min read

How to integrate a Medium editor in Angular 8

Guest Contributor

05 Sep 2019

5 min read

In the world of text editing, there is a new era of WYSIWYG (What You See Is What You Get). We all know how styling and formatting become the important elements of your website but most of the times it is tough to pick a simple, easy-to-use and powerful editor. Currently, the good days are coming back with the new Medium Editor! Medium Editor is an independent Javascript library to make a coasting content manager bar which springs up when you select a bit of content of your page which is enlivened by the magnificence of Medium.com. You can turn every field from a message on your contact form to a whole article on the back-end into a professionally styled text paragraph contains quote blocks, heading, hyperlinks or just a few selected words. You can also try to incorporate text editor into Angular 8 for the ease of updates and edits of your content. Angular 8 has released its latest feature - beta 6 with the attractive new functionalities for testing your software and fixing bugs. One of them is Bazel - Google's open-source part of its internal build tool called Blaze which is capable of performing incremental builds and tests. Let us check how you can integrate a Medium editor in the Angular 8 platform. Also Read: Angular CLI 8.3.0 releases with a new deploy command, faster production builds, and more Steps to create an editor using Angular 8 Step 1: First thing first, Create a project in Angular and you can also make use of bootstrap for making it look pretty good by adding CDN links in the index.html After entering the above-given command line, it will generate an angular starter application after it has completed installing all the dependencies. Step 2: Install an npm package by entering the below-given line. And then, include the CSS and js in angular.json file Step 3: Create a component with your chosen name and then create the one with the name create Step 4: Click to the newly created component.html and make a div by giving it a template reference of the name Try the above-given code snippet under a few bootstrap classes just to give a basic stylings Step 5: Select your component class and make a variable editor to view the child property as listed below: Step 6: Then, we will make use of one lifecycle hook of angular which is ngAfterViewInit. Paste the above-given scrap and you may get a mistake like media supervisor that isn't characterized all things considered, in this way, we have to declare it on the top like In the wake of rolling out the above improvements, you can, for the most part, make a little medium-supervisor to utilize it for yourself. You can pick over to compose anything and simply select the content you have written to see the enchantment. Step 7: After this, you may require some more alternatives in your editorial manager toolbar. For doing as such, you have to pass an arrangement object in the MediumEditor Constructor. By making the selective changes, you will be able to see a load of available options. Step 8: So, now you have got an editor, you can easily get the data from it. If someone writes a post then you need to have an HTML write of the same. Once more, you have to partition the screen into two parts wherein one half, there will be a supervisor and the other half will show the see of the post. [box type="shadow" align="" class="" width=""]In the second half of the screen, you need to assign the value of inner HTML as given above[/box] Wrap Up Every system is prone to pros and cons and so does the Angular too. Angular offers a clean code development along with the high-performance framework that manages to route, providing seamless updates using Command Line Interface and retrieving the state of location services. Also, you can debug the templates in Angular 8 and supports multiple applications in one domain. Contrary to this, Angular might be confusing for the newcomers as there is no accurate manual which includes the proper documentation of the framework. Also, it lacks the developer community and there is limited scope to debug Limited Routing. However, Angular 8 supports multiple applications in one domain and user-friendly for all the versions of the operating system. So, here we come to the end of the article. We hope you have gained information on how to integrate the latest medium editor to Angular 8. Do give it a try! Till then - keep learning! Author Bio Dave Jarvis is working as a Business Development Executive at eTatvaSoft.com, an enterprise-level mobile & web application development company. He aims to sharpen his analytical skills, deepening his data understanding and broaden his business knowledge in these years of his career. Click here to find more information about the company. Follow him on Twitter. Other interesting news in Web development Google Chrome 76 now supports native lazy-loading Laravel 6.0 releases with Laravel vapor compatibility, LazyCollection, improved authorization response and more #Reactgate forces React leaders to confront the community’s toxic culture head on

0
0
33742

article-image-how-to-build-and-deploy-microservices-using-payara-micro

Gebin George

28 Mar 2018

9 min read

How to build and deploy Microservices using Payara Micro

Gebin George

28 Mar 2018

9 min read

0
0
33742

article-image-using-registry-and-xlswriter-modules

Packt

14 Apr 2016

12 min read

Using the Registry and xlswriter modules

Packt

14 Apr 2016

12 min read

0
0
33740

article-image-build-first-android-app-kotlin

Aarthi Kumaraswamy

13 Apr 2018

10 min read

Build your first Android app with Kotlin

Aarthi Kumaraswamy

13 Apr 2018

10 min read

Android application with Kotlin is an area which shines. Before getting started on this journey, we must set up our systems for the task at hand. A major necessity for developing Android applications is a suitable IDE - it is not a requirement but it makes the development process easier. Many IDE choices exist for Android developers. The most popular are: Android Studio Eclipse IntelliJ IDE Android Studio is by far the most powerful of the IDEs available with respect to Android development. As a consequence, we will be utilizing this IDE in all Android-related chapters in this book. Setting up Android Studio At the time of writing, the version of Android Studio that comes bundled with full Kotlin support is Android Studio 3.0. The canary version of this software can be downloaded from this website. Once downloaded, open the downloaded package or executable and follow the installation instructions. A setup wizard exists to guide you through the IDE setup procedure: Continuing to the next setup screen will prompt you to choose which type of Android Studio setup you'd like: Select the Standard setup and continue to the next screen. Click Finish on the Verify Settings screen. Android Studio will now download the components required for your setup. You will need to wait a few minutes for the required components to download: Click Finish once the component download has completed. You will be taken to the Android Studio landing screen. You are now ready to use Android Studio: [box type="note" align="" class="" width=""]You may also want to read Benefits of using Kotlin Java for Android programming.[/box] Building your first Android application with Kotlin Without further ado, let's explore how to create a simple Android application with Android Studio. We will be building the HelloApp. The HelloApp is an app that displays Hello world! on the screen upon the click of a button. On the Android Studio landing screen, click Start a new Android Studio project. You will be taken to a screen where you will specify some details that concern the app you are about to build, such as the name of the application, your company domain, and the location of the project. Type in HelloApp as the application name and enter a company domain. If you do not have a company domain name, fill in any valid domain name in the company domain input box – as this is a trivial project, a legitimate domain name is not required. Specify the location in which you want to save this project and tick the checkbox for the inclusion of Kotlin support. After filling in the required parameters, continue to the next screen: Here, we are required to specify our target devices. We are building this application to run on smartphones specifically, hence tick the Phone and Tablet checkbox if it's not already ticked. You will notice an options menu next to each device option. This dropdown is used to specify the target API level for the project being created. An API level is an integer that uniquely identifies the framework API division offered by a version of the Android platform. Select API level 15 if not already selected and continue to the next screen: On the next screen, we are required to select an activity to add to our application. An activity is a single screen with a unique user interface—similar to a window. We will discuss activities in more depth in Chapter 2, Building an Android Application – Tetris. For now, select the empty activity and continue to the next screen. Now, we need to configure the activity that we just specified should be created. Name the activity HelloActivityand ensure the Generate Layout File and Backwards Compatibility checkboxes are ticked: Now, click the Finish button. Android Studio may take a few minutes to set up your project. Once the setup is complete, you will be greeted by the IDE window containing your project files. [box type="note" align="" class="" width=""]Errors pertaining to the absence of required project components may be encountered at any point during project development. Missing components can be downloaded from the SDK manager. [/box] Make sure that the project window of the IDE is open (on the navigation bar, select View | Tool Windows | Project) and the Android view is currently selected from the drop-down list at the top of the Project window. You will see the following files at the left-hand side of the window: app | java | com.mydomain.helloapp | HelloActivity.java: This is the main activity of your application. An instance of this activity is launched by the system when you build and run your application: app | res | layout | activity_hello.xml: The user interface for HelloActivity is defined within this XML file. It contains a TextView element placed within the ViewGroup of a ConstraintLayout. The text of the TextView has been set to Hello World! app | manifests | AndroidManifest.xml: The AndroidManifest file is used to describe the fundamental characteristics of your application. In addition, this is the file in which your application's components are defined. Gradle Scripts | build.gradle: Two build.gradle files will be present in your project. The first build.gradle file is for the project and the second is for the app module. You will most frequently work with the module's build.gradle file for the configuration of the compilation procedure of Gradle tools and the building of your app. [box type="note" align="" class="" width=""]Gradle is an open source build automation system used for the declaration of project configurations. In Android, Gradle is utilized as a build tool with the goal of building packages and managing application dependencies. [/box] Creating a user interface A user interface (UI) is the primary means by which a user interacts with an application. The user interfaces of Android applications are made by the creation and manipulation of layout files. Layout files are XML files that exist in app | res | layout. To create the layout for the HelloApp, we are going to do three things: Add a LinearLayout to our layout file Place the TextView within the LinearLayout and remove the android:text attribute it possesses Add a button to the LinearLayout Open the activity_hello.xml file if it's not already opened. You will be presented with the layout editor. If the editor is in the Design view, change it to its Text view by toggling the option at the bottom of the layout editor. Now, your layout editor should look similar to that of the following screenshot: ViewGroup that arranges child views in either a horizontal or vertical manner within a single column. Copy the code snippet of our required LinearLayout from the following block and paste it within the ConstraintLayout preceding the TextView: <LinearLayout android:id="@+id/ll_component_container" android:layout_width="match_parent" android:layout_height="match_parent" android:orientation="vertical" android:gravity="center"> </LinearLayout> Now, copy and paste the TextView present in the activity_hello.xml file into the body of the LinearLayout element and remove the android:text attribute: <LinearLayout android:id="@+id/ll_component_container" android:layout_width="match_parent" android:layout_height="match_parent" android:orientation="vertical" android:gravity="center"> <TextView android:id="@+id/tv_greeting" android:layout_width="wrap_content" android:layout_height="wrap_content" android:textSize="50sp" /> </LinearLayout> Lastly, we need to add a button element to our layout file. This element will be a child of our LinearLayout. To create a button, we use the Button element: <LinearLayout android:id="@+id/ll_component_container" android:layout_width="match_parent" android:layout_height="match_parent" android:orientation="vertical" android:gravity="center"> <TextView android:id="@+id/tv_greeting" android:layout_width="wrap_content" android:layout_height="wrap_content" android:textSize="50sp" /> <Button android:id="@+id/btn_click_me" android:layout_width="wrap_content" android:layout_height="wrap_content" android:layout_marginTop="16dp" android:text="Click me!"/> </LinearLayout> Toggle to the layout editor's design view to see how the changes we have made thus far translate when rendered on the user interface: Now we have our layout, but there's a problem. Our CLICK ME! button does not actually do anything when clicked. We are going to fix that by adding a listener for click events to the button. Locate and open the HelloActivity.java file and edit the function to add the logic for the CLICK ME! button's click event as well as the required package imports, as shown in the following code: package com.mydomain.helloapp import android.support.v7.app.AppCompatActivity import android.os.Bundle import android.text.TextUtils import android.widget.Button import android.widget.TextView import android.widget.Toast class HelloActivity : AppCompatActivity() { override fun onCreate(savedInstanceState: Bundle?) { super.onCreate(savedInstanceState) setContentView(R.layout.activity_hello) val tvGreeting = findViewById<TextView>(R.id.tv_greeting) val btnClickMe = findViewById<Button>(R.id.btn_click_me) btnClickMe.setOnClickListener { if (TextUtils.isEmpty(tvGreeting.text)) { tvGreeting.text = "Hello World!" } else { Toast.makeText(this, "I have been clicked!", Toast.LENGTH_LONG).show() } } } } In the preceding code snippet, we have added references to the TextView and Button elements present in our activity_hello layout file by utilizing the findViewById function. The findViewById function can be used to get references to layout elements that are within the currently-set content view. The second line of the onCreate function has set the content view of HelloActivity to the activity_hello.xml layout. Next to the findViewById function identifier, we have the TextView type written between two angular brackets. This is called a function generic. It is being used to enforce that the resource ID being passed to the findViewById belongs to a TextView element. After adding our reference objects, we set an onClickListener to btnClickMe. Listeners are used to listen for the occurrence of events within an application. In order to perform an action upon the click of an element, we pass a lambda containing the action to be performed to the element's setOnClickListener method. When btnClickMe is clicked, tvGreeting is checked to see whether it has been set to contain any text. If no text has been set to the TextView, then its text is set to Hello World!, otherwise a toast is displayed with the I have been clicked! text. Running the Android application In order to run the application, click the Run 'app' (^R) button at the top-right side of the IDE window and select a deployment target. The HelloApp will be built, installed, and launched on the deployment target: You may use one of the available prepackaged virtual devices or create a custom virtual device to use as the deployment target. You may also decide to connect a physical Android device to your computer via USB and select it as your target. The choice is up to you. After selecting a deployment device, click OK to build and run the application. Upon launching the application, our created layout is rendered: When CLICK ME! is clicked, Hello World! is shown to the user: Subsequent clicks of the CLICK ME! button display a toast message with the text I have been clicked!: You enjoyed an excerpt from the book, Kotlin Programming By Example by Iyanu Adelekan. Start building and deploying Android apps with Kotlin using this book. Check out other related posts: Creating a custom layout implementation for your Android app Top 5 Must-have Android Applications OpenCV and Android: Making Your Apps See

0
0
33730

article-image-how-to-perform-audio-video-image-scraping-with-python

Amarabha Banerjee

08 Mar 2018

9 min read

How to perform Audio-Video-Image Scraping with Python

Amarabha Banerjee

08 Mar 2018

9 min read

[box type="note" align="" class="" width=""]Our article is an excerpt from the book Web Scraping with Python, written by Richard Lawson. This book contains step by step tutorials on how to leverage Python programming techniques for ethical web scraping. [/box] A common practice in scraping is the download, storage, and further processing of media content (non-web pages or data files). This media can include images, audio, and video. To store the content locally (or in a service like S3) and to do it correctly, we need to know what is the type of media, and it isn’t enough to trust the file extension in the URL. Hence, we will learn how to download and correctly represent the media type based on information from the web server. Another common task is the generation of thumbnails of images, videos, or even a page of a website. We will examine several techniques of how to generate thumbnails and make website page screenshots. Many times these are used on a new website as thumbnail links to the scraped media which is stored locally. Finally, it is often the need to be able to transcode media, such as converting non-MP4 videos to MP4, or changing the bit-rate or resolution of a video. Another scenario is to extract only the audio from a video file. We won't look at video transcoding, but we will rip MP3 audio out of an MP4 file using ffmpeg. It's a simple step from there to also transcode video with ffmpeg. Downloading media content from the web Downloading media content from the web is a simple process: use Requests or another library and download it just like you would HTML content. Getting ready There is a class named URLUtility in the urls.py module in the util folder of the solution. This class handles several of the scenarios in this chapter with downloading and parsing URLs. We will be using this class in this recipe and a few others. Make sure the modules folder is in your Python path. Also, the example for this recipe is in the 04/01_download_image.py file. How to do it Here is how we proceed with the recipe: The URLUtility class can download content from a URL. The code in the recipe's file is the following: import const from util.urls import URLUtility util = URLUtility(const.ApodEclipseImage()) print(len(util.data)) When running this you will see the following output: Reading URL: https://apod.nasa.gov/apod/image/1709/BT5643s.jpg Read 171014 bytes 171014 The example reads 171014 bytes of data. How it works The URL is defined as a constant const.ApodEclipseImage() in the const module: def ApodEclipseImage(): return "https://apod.nasa.gov/apod/image/1709/BT5643s.jpg" The constructor of the URLUtility class has the following implementation: def __init__(self, url, readNow=True): """ Construct the object, parse the URL, and download now if specified""" self._url = url self._response = None self._parsed = urlparse(url) if readNow: self.read() The constructor stores the URL, parses it, and downloads the file with the read() method. The following is the code of the read() method: def read(self): self._response = urllib.request.urlopen(self._url) self._data = self._response.read() This function uses urlopen to get a response object, and then reads the stream and stores it as a property of the object. That data can then be retrieved using the data property: @property def data(self): self.ensure_response() return self._data The code then simply reports on the length of that data, with the value of 171014. There's more This class will be used for other tasks such as determining content types, filename, and extensions for those files. We will examine parsing of URLs for filenames next. Parsing a URL with urllib to get the filename When downloading content from a URL, we often want to save it in a file. Often it is good enough to save the file in a file with a name found in the URL. But the URL consists of a number of fragments, so how can we find the actual filename from the URL, especially where there are often many parameters after the file name? Getting ready We will again be using the URLUtility class for this task. The code file for the recipe is 04/02_parse_url.py. How to do it Execute the recipe's file with your python interpreter. It will run the following code: util = URLUtility(const.ApodEclipseImage()) print(util.filename_without_ext) This results in the following output: Reading URL: https://apod.nasa.gov/apod/image/1709/BT5643s.jpg Read 171014 bytes The filename is: BT5643s How it works In the constructor for URLUtility, there is a call to urlib.parse.urlparse. The following demonstrates using the function interactively: >>> parsed = urlparse(const.ApodEclipseImage()) >>> parsed ParseResult(scheme='https', netloc='apod.nasa.gov', path='/apod/image/1709/BT5643s.jpg', params='', query='', fragment='') The ParseResult object contains the various components of the URL. The path element contains the path and the filename. The call to the .filename_without_ext property returns just the filename without the extension: @property def filename_without_ext(self): filename = os.path.splitext(os.path.basename(self._parsed.path))[0] return filename The call to os.path.basename returns only the filename portion of the path (including the extension). os.path.splittext() then separates the filename and the extension, and the function returns the first element of that tuple/list (the filename). There's more It may seem odd that this does not also return the extension as part of the filename. This is because we cannot assume that the content that we received actually matches the implied type from the extension. It is more accurate to determine this using headers returned by the web server. That's our next recipe. Determining the type of content for a URL When performing a GET requests for content from a web server, the web server will return a number of headers, one of which identities the type of the content from the perspective of the web server. In this recipe we learn to use that to determine what the web server considers the type of the content. Getting ready We again use the URLUtility class. The code for the recipe is in 04/03_determine_content_type_from_response.py. How to do it We proceed as follows: Execute the script for the recipe. It contains the following code: util = URLUtility(const.ApodEclipseImage()) print("The content type is: " + util.contenttype) With the following result: Reading URL: https://apod.nasa.gov/apod/image/1709/BT5643s.jpg Read 171014 bytes The content type is: image/jpeg How it works The .contentype property is implemented as follows: @property def contenttype(self): self.ensure_response() return self._response.headers['content-type'] The .headers property of the _response object is a dictionary-like class of headers. The content-type key will retrieve the content-type specified by the server. This call to the ensure_response() method simply ensures that the .read() function has been executed. There's more The headers in a response contain a wealth of information. If we look more closely at the headers property of the response, we can see the following headers are returned: >>> response = urllib.request.urlopen(const.ApodEclipseImage()) >>> for header in response.headers: print(header) Date Server Last-Modified ETag Accept-Ranges Content-Length Connection Content-Type Strict-Transport-Security And we can see the values for each of these headers. >>> for header in response.headers: print(header + " ==> " + response.headers[header]) Date ==> Tue, 26 Sep 2017 19:31:41 GMT Server ==> WebServer/1.0 Last-Modified ==> Thu, 31 Aug 2017 20:26:32 GMT ETag ==> "547bb44-29c06-5581275ce2b86" Accept-Ranges ==> bytes Content-Length ==> 171014 Connection ==> close Content-Type ==> image/jpeg Strict-Transport-Security ==> max-age=31536000; includeSubDomains Many of these we will not examine in this book, but for the unfamiliar it is good to know that they exist. Determining the file extension from a content type It is good practice to use the content-type header to determine the type of content, and to determine the extension to use for storing the content as a file. Getting ready We again use the URLUtility object that we created. The recipe's script is 04/04_determine_file_extension_from_contenttype.py):. How to do it Proceed by running the recipe's script. An extension for the media type can be found using the .extension property: util = URLUtility(const.ApodEclipseImage()) print("Filename from content-type: " + util.extension_from_contenttype) print("Filename from url: " + util.extension_from_url) This results in the following output: Reading URL: https://apod.nasa.gov/apod/image/1709/BT5643s.jpg Read 171014 bytes Filename from content-type: .jpg Filename from url: .jpg This reports both the extension determined from the file type, and also from the URL. These can be different, but in this case they are the same. How it works The following is the implementation of the .extension_from_contenttype property: @property def extension_from_contenttype(self): self.ensure_response() map = const.ContentTypeToExtensions() if self.contenttype in map: return map[self.contenttype] return None The first line ensures that we have read the response from the URL. The function then uses a python dictionary, defined in the const module, which contains a dictionary of content types to extension: def ContentTypeToExtensions(): return { "image/jpeg": ".jpg", "image/jpg": ".jpg", "image/png": ".png" } If the content type is in the dictionary, then the corresponding value will be returned. Otherwise, None is returned. Note the corresponding property, .extension_from_url: @property def extension_from_url(self): ext = os.path.splitext(os.path.basename(self._parsed.path))[1] return ext This uses the same technique as the .filename property to parse the URL, but instead returns the [1] element, which represents the extension instead of the base filename. To summarize, we discussed how effectively we can scrap audio, video and image content from the web using Python. If you liked our post, be sure to check out Web Scraping with Python, which gives more information on performing web scraping efficiently with Python.

0
0
33710

Merlyn Shelley

26 Mar 2024

14 min read

Transforming Web Data with Browse AI

Merlyn Shelley

26 Mar 2024

14 min read

Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Browse AI Turn Web Data into Your Business Superpower!👉 Train a robot in 2 minutes, no coding needed. 🤖 👉 Ideal for web scraping and data monitoring. 🌐 Here’s what you get: Monitor Websites for Changes ✅ Download Data from Any Website ✅ Turn Any Website into an API ✅ Product data extraction ✅ Also, extract data from news, stocks, jobs, social media, and more. Check out this 1-minute explainer video on how to extract data to Excel, Airtable, and connect to 5,000+ apps using Zapier! Start for free with up to 50 credits, and for a limited time, enjoy free setup and onboarding for Team and Company plans, saving up to 20% on Annual plans. Get Scraping Today!👋 Hello,Welcome to DataPro#85 – Your one-stop shop for the latest in Data Science and ML Algorithms! 🚀 In this issue:⚙️ Keeping Up with LLMs & GPTs Meet Devin: The pioneering AI software engineer. Google's Croissant: A fresh take on metadata for ML-ready datasets. INSTRUCTIR by Kaist AI: Setting new standards in instruction-following for information retrieval models. Spyx by Sussex AI: Turbocharging spiking neural networks with just-in-time compiled optimization. SynCode by VMware: Enhancing LLM code generation with a touch of grammar. Chatbot Arena: The ultimate battleground for evaluating LLMs by human preference. Apollo: Bringing medical AI to the masses with a multilingual medical LLM. ✨ On the RadarTop AI tools for code generation in 2024. Setting up a Pypi mirror in AWS with Terraform. Ensuring safer code changes with custom pre-commit hooks. Deciphering the AQLM Quantization Algorithm. AI's role in revolutionizing web browsing. Tackling tensors through three tricky errors. Running RStudio inside a container. Harnessing PyTorch and MLX for Apple Silicon. 🏭 Industry Highlights Google Research: Boosting LLMs with Cappy, evolving tables with Chain-of-table, and Scalable Instructable Multiworld Agent (SIMA). AWS: Streamlining code review with generative AI using Amazon Bedrock. OpenAI Updates: Leadership continuity and global news partnerships. 📚 New in Packt Library Practical Guide to Applied Conformal Prediction in Python by Valery Manokhin. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ deepseek-ai/DeepSeek-VL: Open-source Vision-Language (VL) model for real-world tasks, handling logical diagrams, web pages, formulas, scientific literature, and more. 🛠️ OpenGVLab/VideoMamba: VideoMamba enhances 3D CNNs and video transformers, excelling in long-term video understanding with scalability and modality compatibility. 🛠️ showlab/DragAnything: DragAnything uses entity representation for motion control in video generation, offering user-friendly interaction and outperforming existing methods. 🛠️ pkunlp-icler/FastV: FastV accelerates large vision language models by pruning redundant visual tokens, achieving 45% FLOPs reduction without performance loss. 🛠️ cnulab/RealNet: RealNet introduces SDAS for anomaly strength control, AFS for feature selection, and RRS for anomaly region identification. Partnering with SurfsharkSurfshark is allowing our readers to enjoy a full 2 years of their award-winning VPN protection for 79% off, plus 2 months free. With Surfshark One, you get: Unlimited devices and connections ✅ One account for the entire household ✅ Your online activity, made safe, secure, and invisible ✅ Plus, identity protection, ad blocking, antivirus, and data breach monitoring.Claim your VPN protection today! 📚 Expert Insights from Packt CommunityPractical Guide to Applied Conformal Prediction in Python - By Valery Manokhin Basic components of a conformal predictor We will now look at the basic components of a conformal predictor: Nonconformity measure: The nonconformity measure is a function that evaluates how much a new data point differs from the existing data points. It compares the new observation to either the entire dataset (in the full transductive version of conformal prediction) or the calibration set (in the most popular variant – ICP. The selection of the nonconformity measure is based on a particular machine learning task, such as classification, regression, or time series forecasting, as well as the underlying model. This will examine several nonconformity measures suitable for classification and regression tasks. Calibration set: The calibration set is a portion of the dataset used to calculate nonconformity scores for the known data points. These scores are a reference for establishing prediction intervals or regions for new test data points. The calibration set should be a representative sample of the entire data distribution and is typically randomly selected. The calibration set should contain a sufficient number of data points (at least 500). If the dataset is small and insufficient to reserve enough data for the calibration set, the user should consider other variants of conformal prediction – including TCP (see, for example, Mastering Classical Transductive Conformal Prediction in Action – https://medium.com/@valeman/how-to-use-full-transductive-conformal-prediction-7ed54dc6b72b). Test set: The test set contains new data points for generating predictions. For every data point in the test set, the conformal prediction model calculates a nonconformity score using the nonconformity measure and compares it to the scores from the calibration set. Using this comparison, the conformal predictor generates a prediction region that includes the target value with a user-defined confidence level. All these components work in tandem to create a conformal prediction framework that facilitates valid and efficient uncertainty quantification in a wide range of machine learning tasks. Discover more insights from 'Practical Guide to Applied Conformal Prediction in Python' by Valery Manokhin. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Enhance code review and approval efficiency with generative AI using Amazon Bedrock: This post discusses the challenges faced by managers in overseeing code review and approval processes in software development, such as lack of technical expertise, time constraints, volume of change requests, manual effort, and the need for documentation. It also introduces a solution that leverages generative artificial intelligence and integrates it with AWS deployment tools to streamline the review and approval process. The solution includes automated change analysis, summarization, and an approval workflow. Google Research 🌀 Cappy: Outperforming and boosting large multi-task language models with a small scorer. This blog discusses advancements in large language models (LLMs) and their use in natural language processing (NLP). It introduces the concept of multi-task LLMs, such as T0, FLAN, and OPT-IML, which excel at understanding and solving various tasks. It also presents a new approach called Cappy, a lightweight pre-trained scorer that enhances the performance and efficiency of multi-task LLMs. 🌀 Chain-of-table: Evolving tables in the reasoning chain for table understanding. This research focuses on improving how large language models (LLMs) reason over tabular data, which is challenging due to the structured nature of tables. The proposed framework, Chain-of-Table, trains LLMs to iteratively update tables, mimicking human reasoning, resulting in improved performance on table understanding tasks. 🌀 Talk like a graph: Encoding graphs for large language models. This research explores how to teach large language models (LLMs) to reason with graph information, crucial for understanding interconnected data. They introduce GraphQA, a benchmark to evaluate LLMs on graph problems, revealing insights into effective graph encoding methods and improving LLM performance on graph tasks by up to 60%. 🌀 Scalable Instructable Multiworld Agent (SIMA): A generalist AI agent for 3D virtual environments. Google DeepMind has developed SIMA, a versatile AI agent trained on multiple video games to follow natural-language instructions, akin to human behavior. Collaborating with game studios, SIMA navigates various environments, showcasing potential for AI to understand and execute diverse tasks. OpenAI Updates 🌀 Review completed & Altman, Brockman to continue to lead OpenAI: The OpenAI Board completed a review by WilmerHale, expressing full confidence in Sam Altman and Greg Brockman's leadership. They also elected new board members and adopted governance enhancements. WilmerHale's review found a breakdown in trust between the prior Board and Mr. Altman, leading to his removal, but concluded that his conduct did not mandate removal. Following the review, the Board endorsed the decision to rehire Mr. Altman and Mr. Brockman. 🌀 Global news partnerships: Le Monde and Prisa Media: OpenAI has partnered with Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT. This partnership aims to enhance user interaction with news content and contribute to the training of OpenAI's models. Through these partnerships, users will access summaries and links to original articles, expanding their news consumption experience. This collaboration supports the news industry and its role in providing reliable information globally. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Introducing Devin, the first AI software engineer: Meet Devin, the autonomous AI software engineer, skilled in long-term reasoning and planning. Devin can learn new technologies, build and deploy apps, find and fix bugs, train AI models, and contribute to open source. Devin excels in resolving real-world GitHub issues, outperforming previous models. Cognition, the AI lab behind Devin, aims to unlock new possibilities beyond coding. 🌀 Google’s Croissant: a metadata format for ML-ready datasets. Croissant is a new metadata format for ML datasets, aiming to simplify the use of existing datasets for training ML models. It standardizes dataset descriptions and organization, supporting responsible AI practices. Croissant builds upon schema.org and is supported by major tools and repositories like Kaggle, Hugging Face, and OpenML. It includes a specification, example datasets, a Python library, and a visual editor to facilitate dataset usage and publication. 🌀 Kaist AI’s INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. This research focuses on enhancing search accuracy by improving retrievers to understand users' intentions, similar to language models. It introduces INSTRUCTIR, a benchmark for evaluating retrievers' ability to follow user-aligned instructions in retrieval tasks. The study addresses limitations in existing benchmarks and highlights potential overfitting issues in instruction-aware retrieval datasets. 🌀 Sussex AI’s Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks. Advancements in large neural architectures have led to powerful AI accelerators for training deep neural networks. However, these networks often incur high costs. Neuromorphic computing with Spiking Neural Networks (SNNs) offers energy-efficient alternatives, but training SNNs is challenging. Spyx, a new lightweight SNN simulation and optimization library designed in JAX, aims to facilitate SNN architecture investigation by bridging Python-based deep learning frameworks with custom compute kernels, achieving optimal hardware utilization. 🌀 VMware’s SynCode: Improving LLM Code Generation with Grammar Augmentation. SynCode is a novel framework for efficient syntactical decoding of code with large language models (LLMs). It leverages grammar of a programming language using an offline-constructed efficient lookup table called Deterministic Finite Automaton (DFA) mask store. SynCode seamlessly integrates with any context-free grammar (CFG) defined language, reducing syntax errors by 96.07% when combined with LLMs. 🌀 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. Chatbot Arena is an open platform designed to evaluate Large Language Models (LLMs) by considering human preferences. Utilizing a pairwise comparison method and crowdsourced input, it assesses LLMs' alignment with user preferences. The platform, operational for months with over 240K votes, provides a credible and valuable resource for ranking LLMs. Check out the tool here. 🌀 Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. The project aims to develop medical Large Language Models (LLMs) in the six most spoken languages, benefiting 6.1 billion people. This includes creating the ApolloCorpora multilingual medical dataset and the XMedBench benchmark, with Apollo models achieving top performance among models of similar sizes. The project will open-source training data, code, model weights, and evaluation benchmarks. You can check for the demo here. ✨ On the Radar: Catch Up on What's Fresh🌀 Top Artificial Intelligence (AI) Tools That Can Generate Code To Help Programmers (2024): The article discusses how AI is changing programming, with tools like OpenAI Codex and GitHub Copilot generating code. It explores AI's impact on code quality and development speed, showcasing various AI-powered tools like Tabnine, CodeT5, and Polycoder. Additionally, it mentions AI tools for code review, static code analysis, and AI-assisted coding in IDEs like PyCharm and Visual Studio. 🌀 Pypi mirror in a private AWS environment Terraform: This article explains how to install Python packages in an AWS Sagemaker Studio environment without internet access. It covers setting up Sagemaker in VPC Only mode, using VPC Endpoint interfaces for network communications, and accessing the Pypi package repository through AWS Codeartifact, which allows defining Pypi as an upstream repository. 🌀 Custom pre-commit hooks for safer code changes: This blog post explains the importance of using pre-commit hooks in software development, particularly with the git version control system. It discusses the challenges of maintaining coding standards in collaborative projects and provides a step-by-step tutorial on how to set up and use custom pre-commit hooks for a Python project, using the example of validating dataflow definitions for the Hamilton library. 🌀 AQLM Quantization Algorithm, explained: A new quantization algorithm, AQLM (Additive Quantization of Language Models), was recently released and integrated into HuggingFace Transformers and HuggingFace PEFT. AQLM sets a new state-of-the-art for 2-bit quantization while providing improvements for 3-bit and 4-bit ranges, pushing the boundaries of model accuracy and memory footprint. 🌀 Revolutionize Web Browsing with AI: This article explores creating an AI agent using the gpt-4-vision-preview model from OpenAI, enabling it to navigate the web like a human. It discusses the agent's browser control, content browsing, and decision-making processes, showcasing potential use cases such as aiding visually challenged users and automating web browsing tasks. 🌀 Understanding Tensors: Learning a Data Structure Through 3 Pesky Errors. This article discusses transitioning from managing tabular data to working with tensors in TensorFlow, offering debugging tips and code recipes. It covers visualizing TensorFlow datasets, understanding tensor specs, and augmenting model summaries, while addressing common errors related to tensor rank and shape. 🌀 Running RStudio Inside a Container: This tutorial focuses on setting up RStudio using Docker, particularly leveraging the Rocker RStudio image. It covers pulling the image, launching RStudio in a container, and ensuring persistence of data by using volume mapping. The tutorial provides step-by-step instructions and explanations for each stage. 🌀 PyTorch and MLX for Apple Silicon: The blog discusses Apple's MLX framework, which is optimized for Apple Silicon and serves as a bridge between PyTorch, NumPy, and Jax. It details a comparison between MLX and PyTorch through a custom convolutional neural network implementation for image classification tasks. The discussion includes insights into MLX's features, such as its array class, lazy computation, and compilation for performance optimization. The post also highlights the ease of converting PyTorch code to MLX, despite some differences in API compatibility and coding conventions. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support!

0
0
33695

article-image-auto-generate-texts-shakespeare-writing-using-deep-recurrent-neural-networks

Savia Lobo

16 Feb 2018

6 min read

How to auto-generate texts from Shakespeare writing using deep recurrent neural networks

Savia Lobo

16 Feb 2018

6 min read

[box type="note" align="" class="" width=""]Our article is an excerpt from a book co-authored by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti, titled as Natural Language Processing with Python Cookbook. This book will give unique recipes to know various aspects of performing Natural Language Processing with NLTK—a leading Python platform for NLP.[/box] Today we will learn to use deep recurrent neural networks (RNN) to predict the next character based on the given length of a sentence. This way of training a model is able to generate automated text continuously, which can imitate the writing style of the original writer with enough training on the number of epochs and so on. Getting ready... The Project Gutenberg eBook of the complete works of William Shakespeare's dataset is used to train the network for automated text generation. Data can be downloaded from http:// www.gutenberg.org/ for the raw file used for training: >>> from future import print_function >>> import numpy as np >>> import random >>> import sys The following code is used to create a dictionary of characters to indices and vice-versa mapping, which we will be using to convert text into indices at later stages. This is because deep learning models cannot understand English and everything needs to be mapped into indices to train these models: >>> path = 'C:UsersprataDocumentsbook_codes NLP_DL shakespeare_final.txt' >>> text = open(path).read().lower() >>> characters = sorted(list(set(text))) >>> print('corpus length:', len(text)) >>> print('total chars:', len(characters)) >>> char2indices = dict((c, i) for i, c in enumerate(characters)) >>> indices2char = dict((i, c) for i, c in enumerate(characters)) How to do it… Before training the model, various preprocessing steps are involved to make it work. The following are the major steps involved: Preprocessing: Prepare X and Y data from the given entire story text file and converting them into indices vectorized format. Deep learning model training and validation: Train and validate the deep learning model. Text generation: Generate the text with the trained model. How it works... The following lines of code describe the entire modeling process of generating text from Shakespeare's writings. Here we have chosen character length. This needs to be considered as 40 to determine the next best single character, which seems to be very fair to consider. Also, this extraction process jumps by three steps to avoid any overlapping between two consecutive extractions, to create a dataset more fairly: # cut the text in semi-redundant sequences of maxlen characters >>> maxlen = 40 >>> step = 3 >>> sentences = [] >>> next_chars = [] >>> for i in range(0, len(text) - maxlen, step): ... sentences.append(text[i: i + maxlen]) ... next_chars.append(text[i + maxlen]) ... print('nb sequences:', len(sentences)) The following screenshot depicts the total number of sentences considered, 193798, which is enough data for text generation: The next code block is used to convert the data into a vectorized format for feeding into deep learning models, as the models cannot understand anything about text, words, sentences and so on. Initially, total dimensions are created with all zeros in the NumPy array and filled with relevant places with dictionary mappings: # Converting indices into vectorized format >>> X = np.zeros((len(sentences), maxlen, len(characters)), dtype=np.bool) >>> y = np.zeros((len(sentences), len(characters)), dtype=np.bool) >>> for i, sentence in enumerate(sentences): ... for t, char in enumerate(sentence): ... X[i, t, char2indices[char]] = 1 ... y[i, char2indices[next_chars[i]]] = 1 >>> from keras.models import Sequential >>> from keras.layers import Dense, LSTM,Activation,Dropout >>> from keras.optimizers import RMSprop The deep learning model is created with RNN, more specifically Long Short-Term Memory networks with 128 hidden neurons, and the output is in the dimensions of the characters. The number of columns in the array is the number of characters. Finally, the softmax function is used with the RMSprop optimizer. We encourage readers to try with other various parameters to check out how results vary: #Model Building >>> model = Sequential() >>> model.add(LSTM(128, input_shape=(maxlen, len(characters)))) >>> model.add(Dense(len(characters))) >>> model.add(Activation('softmax')) >>> model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.01)) >>> print (model.summary()) As mentioned earlier, deep learning models train on number indices to map input to output (given a length of 40 characters, the model will predict the next best character). The following code is used to convert the predicted indices back to the relevant character by determining the maximum index of the character: # Function to convert prediction into index >>> def pred_indices(preds, metric=1.0): ... preds = np.asarray(preds).astype('float64') ... preds = np.log(preds) / metric ... exp_preds = np.exp(preds) ... preds = exp_preds/np.sum(exp_preds) ... probs = np.random.multinomial(1, preds, 1) ... return np.argmax(probs) The model will be trained over 30 iterations with a batch size of 128. And also, the diversity has been changed to see the impact on the predictions: # Train and Evaluate the Model >>> for iteration in range(1, 30): ... print('-' * 40) ... print('Iteration', iteration) ... model.fit(X, y,batch_size=128,epochs=1).. ... start_index = random.randint(0, len(text) - maxlen - 1) ... for diversity in [0.2, 0.7,1.2]: ... print('n----- diversity:', diversity) ... generated = '' ... sentence = text[start_index: start_index + maxlen] ... generated += sentence ... print('----- Generating with seed: "' + sentence + '"') ... sys.stdout.write(generated) ... for i in range(400): ... x = np.zeros((1, maxlen, len(characters))) ... for t, char in enumerate(sentence): ... x[0, t, char2indices[char]] = 1. ... preds = model.predict(x, verbose=0)[0] ... next_index = pred_indices(preds, diversity) ... pred_char = indices2char[next_index] ... generated += pred_char ... sentence = sentence[1:] + pred_char ... sys.stdout.write(pred_char) ... sys.stdout.flush() ... print("nOne combination completed n") The results are shown in the next screenshot to compare the first iteration (Iteration 1) and final iteration (Iteration 29). It is apparent that with enough training, the text generation seems to be much better than with Iteration 1: Text generation after Iteration 29 is shown in this image: Though the text generation seems to be magical, we have generated text using Shakespeare's writings, proving that with the right training and handling, we can imitate any style of writing of a particular writer. If you found this post useful, you may check out this book Natural Language Processing with Python Cookbook to analyze sentence structure and master lexical analysis, syntactic and semantic analysis, pragmatic analysis, and other NLP techniques.

0
0
33693

#AskTensorFlow: Twitterati ask questions on TensorFlow 2.0 - TF prebuilt binaries, Tensorboard, Keras, and Python support

Build a custom news feed with Python [Tutorial]

How to make machine learning based recommendations using Julia [Tutorial]

Managing Heroku from the Command Line

Collaboration Using the GitHub Workflow

Multithreading in Rust using Crates [Tutorial]

Eric Evans at Domain-Driven Design Europe 2019 explains the different bounded context types and their relation with microservices

15 things every BI professional should know about Tableau

How to integrate a Medium editor in Angular 8

How to build and deploy Microservices using Payara Micro

Trending Topics

Using the Registry and xlswriter modules

Build your first Android app with Kotlin

How to perform Audio-Video-Image Scraping with Python

Transforming Web Data with Browse AI

How to auto-generate texts from Shakespeare writing using deep recurrent neural networks

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access