Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-transforming-web-data-with-browse-ai
Merlyn Shelley
26 Mar 2024
14 min read
Save for later

Transforming Web Data with Browse AI

Merlyn Shelley
26 Mar 2024
14 min read
Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Browse AI Turn Web Data into Your Business Superpower!👉 Train a robot in 2 minutes, no coding needed. 🤖 👉 Ideal for web scraping and data monitoring. 🌐 Here’s what you get: Monitor Websites for Changes ✅ Download Data from Any Website ✅ Turn Any Website into an API ✅ Product data extraction ✅ Also, extract data from news, stocks, jobs, social media, and more. Check out this 1-minute explainer video on how to extract data to Excel, Airtable, and connect to 5,000+ apps using Zapier! Start for free with up to 50 credits, and for a limited time, enjoy free setup and onboarding for Team and Company plans, saving up to 20% on Annual plans. Get Scraping Today!👋 Hello,Welcome to DataPro#85 – Your one-stop shop for the latest in Data Science and ML Algorithms! 🚀 In this issue:⚙️ Keeping Up with LLMs & GPTs  Meet Devin: The pioneering AI software engineer. Google's Croissant: A fresh take on metadata for ML-ready datasets. INSTRUCTIR by Kaist AI: Setting new standards in instruction-following for information retrieval models. Spyx by Sussex AI: Turbocharging spiking neural networks with just-in-time compiled optimization. SynCode by VMware: Enhancing LLM code generation with a touch of grammar. Chatbot Arena: The ultimate battleground for evaluating LLMs by human preference. Apollo: Bringing medical AI to the masses with a multilingual medical LLM. ✨ On the RadarTop AI tools for code generation in 2024. Setting up a Pypi mirror in AWS with Terraform. Ensuring safer code changes with custom pre-commit hooks. Deciphering the AQLM Quantization Algorithm. AI's role in revolutionizing web browsing. Tackling tensors through three tricky errors. Running RStudio inside a container. Harnessing PyTorch and MLX for Apple Silicon. 🏭 Industry Highlights Google Research: Boosting LLMs with Cappy, evolving tables with Chain-of-table, and Scalable Instructable Multiworld Agent (SIMA). AWS: Streamlining code review with generative AI using Amazon Bedrock. OpenAI Updates: Leadership continuity and global news partnerships. 📚 New in Packt Library Practical Guide to Applied Conformal Prediction in Python by Valery Manokhin. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ deepseek-ai/DeepSeek-VL: Open-source Vision-Language (VL) model for real-world tasks, handling logical diagrams, web pages, formulas, scientific literature, and more. 🛠️ OpenGVLab/VideoMamba: VideoMamba enhances 3D CNNs and video transformers, excelling in long-term video understanding with scalability and modality compatibility. 🛠️ showlab/DragAnything: DragAnything uses entity representation for motion control in video generation, offering user-friendly interaction and outperforming existing methods. 🛠️ pkunlp-icler/FastV: FastV accelerates large vision language models by pruning redundant visual tokens, achieving 45% FLOPs reduction without performance loss. 🛠️ cnulab/RealNet: RealNet introduces SDAS for anomaly strength control, AFS for feature selection, and RRS for anomaly region identification. Partnering with SurfsharkSurfshark is allowing our readers to enjoy a full 2 years of their award-winning VPN protection for 79% off, plus 2 months free. With Surfshark One, you get: Unlimited devices and connections ✅ One account for the entire household ✅ Your online activity, made safe, secure, and invisible ✅ Plus, identity protection, ad blocking, antivirus, and data breach monitoring.Claim your VPN protection today! 📚 Expert Insights from Packt CommunityPractical Guide to Applied Conformal Prediction in Python - By Valery Manokhin Basic components of a conformal predictor We will now look at the basic components of a conformal predictor: Nonconformity measure: The nonconformity measure is a function that evaluates how much a new data point differs from the existing data points. It compares the new observation to either the entire dataset (in the full transductive version of conformal prediction) or the calibration set (in the most popular variant – ICP. The selection of the nonconformity measure is based on a particular machine learning task, such as classification, regression, or time series forecasting, as well as the underlying model. This will examine several nonconformity measures suitable for classification and regression tasks. Calibration set: The calibration set is a portion of the dataset used to calculate nonconformity scores for the known data points. These scores are a reference for establishing prediction intervals or regions for new test data points. The calibration set should be a representative sample of the entire data distribution and is typically randomly selected. The calibration set should contain a sufficient number of data points (at least 500). If the dataset is small and insufficient to reserve enough data for the calibration set, the user should consider other variants of conformal prediction – including TCP (see, for example, Mastering Classical Transductive Conformal Prediction in Action – https://medium.com/@valeman/how-to-use-full-transductive-conformal-prediction-7ed54dc6b72b). Test set: The test set contains new data points for generating predictions. For every data point in the test set, the conformal prediction model calculates a nonconformity score using the nonconformity measure and compares it to the scores from the calibration set. Using this comparison, the conformal predictor generates a prediction region that includes the target value with a user-defined confidence level. All these components work in tandem to create a conformal prediction framework that facilitates valid and efficient uncertainty quantification in a wide range of machine learning tasks. Discover more insights from 'Practical Guide to Applied Conformal Prediction in Python' by Valery Manokhin. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!   Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Enhance code review and approval efficiency with generative AI using Amazon Bedrock: This post discusses the challenges faced by managers in overseeing code review and approval processes in software development, such as lack of technical expertise, time constraints, volume of change requests, manual effort, and the need for documentation. It also introduces a solution that leverages generative artificial intelligence and integrates it with AWS deployment tools to streamline the review and approval process. The solution includes automated change analysis, summarization, and an approval workflow. Google Research 🌀 Cappy: Outperforming and boosting large multi-task language models with a small scorer. This blog discusses advancements in large language models (LLMs) and their use in natural language processing (NLP). It introduces the concept of multi-task LLMs, such as T0, FLAN, and OPT-IML, which excel at understanding and solving various tasks. It also presents a new approach called Cappy, a lightweight pre-trained scorer that enhances the performance and efficiency of multi-task LLMs. 🌀 Chain-of-table: Evolving tables in the reasoning chain for table understanding. This research focuses on improving how large language models (LLMs) reason over tabular data, which is challenging due to the structured nature of tables. The proposed framework, Chain-of-Table, trains LLMs to iteratively update tables, mimicking human reasoning, resulting in improved performance on table understanding tasks. 🌀 Talk like a graph: Encoding graphs for large language models. This research explores how to teach large language models (LLMs) to reason with graph information, crucial for understanding interconnected data. They introduce GraphQA, a benchmark to evaluate LLMs on graph problems, revealing insights into effective graph encoding methods and improving LLM performance on graph tasks by up to 60%. 🌀 Scalable Instructable Multiworld Agent (SIMA): A generalist AI agent for 3D virtual environments. Google DeepMind has developed SIMA, a versatile AI agent trained on multiple video games to follow natural-language instructions, akin to human behavior. Collaborating with game studios, SIMA navigates various environments, showcasing potential for AI to understand and execute diverse tasks. OpenAI Updates 🌀 Review completed & Altman, Brockman to continue to lead OpenAI: The OpenAI Board completed a review by WilmerHale, expressing full confidence in Sam Altman and Greg Brockman's leadership. They also elected new board members and adopted governance enhancements. WilmerHale's review found a breakdown in trust between the prior Board and Mr. Altman, leading to his removal, but concluded that his conduct did not mandate removal. Following the review, the Board endorsed the decision to rehire Mr. Altman and Mr. Brockman. 🌀 Global news partnerships: Le Monde and Prisa Media: OpenAI has partnered with Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT. This partnership aims to enhance user interaction with news content and contribute to the training of OpenAI's models. Through these partnerships, users will access summaries and links to original articles, expanding their news consumption experience. This collaboration supports the news industry and its role in providing reliable information globally. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Introducing Devin, the first AI software engineer: Meet Devin, the autonomous AI software engineer, skilled in long-term reasoning and planning. Devin can learn new technologies, build and deploy apps, find and fix bugs, train AI models, and contribute to open source. Devin excels in resolving real-world GitHub issues, outperforming previous models. Cognition, the AI lab behind Devin, aims to unlock new possibilities beyond coding. 🌀 Google’s Croissant: a metadata format for ML-ready datasets. Croissant is a new metadata format for ML datasets, aiming to simplify the use of existing datasets for training ML models. It standardizes dataset descriptions and organization, supporting responsible AI practices. Croissant builds upon schema.org and is supported by major tools and repositories like Kaggle, Hugging Face, and OpenML. It includes a specification, example datasets, a Python library, and a visual editor to facilitate dataset usage and publication. 🌀 Kaist AI’s INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. This research focuses on enhancing search accuracy by improving retrievers to understand users' intentions, similar to language models. It introduces INSTRUCTIR, a benchmark for evaluating retrievers' ability to follow user-aligned instructions in retrieval tasks. The study addresses limitations in existing benchmarks and highlights potential overfitting issues in instruction-aware retrieval datasets.  🌀 Sussex AI’s Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks. Advancements in large neural architectures have led to powerful AI accelerators for training deep neural networks. However, these networks often incur high costs. Neuromorphic computing with Spiking Neural Networks (SNNs) offers energy-efficient alternatives, but training SNNs is challenging. Spyx, a new lightweight SNN simulation and optimization library designed in JAX, aims to facilitate SNN architecture investigation by bridging Python-based deep learning frameworks with custom compute kernels, achieving optimal hardware utilization. 🌀 VMware’s SynCode: Improving LLM Code Generation with Grammar Augmentation. SynCode is a novel framework for efficient syntactical decoding of code with large language models (LLMs). It leverages grammar of a programming language using an offline-constructed efficient lookup table called Deterministic Finite Automaton (DFA) mask store. SynCode seamlessly integrates with any context-free grammar (CFG) defined language, reducing syntax errors by 96.07% when combined with LLMs. 🌀 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. Chatbot Arena is an open platform designed to evaluate Large Language Models (LLMs) by considering human preferences. Utilizing a pairwise comparison method and crowdsourced input, it assesses LLMs' alignment with user preferences. The platform, operational for months with over 240K votes, provides a credible and valuable resource for ranking LLMs. Check out the tool here. 🌀 Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. The project aims to develop medical Large Language Models (LLMs) in the six most spoken languages, benefiting 6.1 billion people. This includes creating the ApolloCorpora multilingual medical dataset and the XMedBench benchmark, with Apollo models achieving top performance among models of similar sizes. The project will open-source training data, code, model weights, and evaluation benchmarks. You can check for the demo here. ✨ On the Radar: Catch Up on What's Fresh🌀 Top Artificial Intelligence (AI) Tools That Can Generate Code To Help Programmers (2024): The article discusses how AI is changing programming, with tools like OpenAI Codex and GitHub Copilot generating code. It explores AI's impact on code quality and development speed, showcasing various AI-powered tools like Tabnine, CodeT5, and Polycoder. Additionally, it mentions AI tools for code review, static code analysis, and AI-assisted coding in IDEs like PyCharm and Visual Studio. 🌀 Pypi mirror in a private AWS environment Terraform: This article explains how to install Python packages in an AWS Sagemaker Studio environment without internet access. It covers setting up Sagemaker in VPC Only mode, using VPC Endpoint interfaces for network communications, and accessing the Pypi package repository through AWS Codeartifact, which allows defining Pypi as an upstream repository. 🌀 Custom pre-commit hooks for safer code changes: This blog post explains the importance of using pre-commit hooks in software development, particularly with the git version control system. It discusses the challenges of maintaining coding standards in collaborative projects and provides a step-by-step tutorial on how to set up and use custom pre-commit hooks for a Python project, using the example of validating dataflow definitions for the Hamilton library. 🌀 AQLM Quantization Algorithm, explained: A new quantization algorithm, AQLM (Additive Quantization of Language Models), was recently released and integrated into HuggingFace Transformers and HuggingFace PEFT. AQLM sets a new state-of-the-art for 2-bit quantization while providing improvements for 3-bit and 4-bit ranges, pushing the boundaries of model accuracy and memory footprint. 🌀 Revolutionize Web Browsing with AI: This article explores creating an AI agent using the gpt-4-vision-preview model from OpenAI, enabling it to navigate the web like a human. It discusses the agent's browser control, content browsing, and decision-making processes, showcasing potential use cases such as aiding visually challenged users and automating web browsing tasks. 🌀 Understanding Tensors: Learning a Data Structure Through 3 Pesky Errors. This article discusses transitioning from managing tabular data to working with tensors in TensorFlow, offering debugging tips and code recipes. It covers visualizing TensorFlow datasets, understanding tensor specs, and augmenting model summaries, while addressing common errors related to tensor rank and shape. 🌀 Running RStudio Inside a Container: This tutorial focuses on setting up RStudio using Docker, particularly leveraging the Rocker RStudio image. It covers pulling the image, launching RStudio in a container, and ensuring persistence of data by using volume mapping. The tutorial provides step-by-step instructions and explanations for each stage. 🌀 PyTorch and MLX for Apple Silicon: The blog discusses Apple's MLX framework, which is optimized for Apple Silicon and serves as a bridge between PyTorch, NumPy, and Jax. It details a comparison between MLX and PyTorch through a custom convolutional neural network implementation for image classification tasks. The discussion includes insights into MLX's features, such as its array class, lazy computation, and compilation for performance optimization. The post also highlights the ease of converting PyTorch code to MLX, despite some differences in API compatibility and coding conventions. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support! 
Read more
  • 0
  • 0
  • 33695

article-image-auto-generate-texts-shakespeare-writing-using-deep-recurrent-neural-networks
Savia Lobo
16 Feb 2018
6 min read
Save for later

How to auto-generate texts from Shakespeare writing using deep recurrent neural networks

Savia Lobo
16 Feb 2018
6 min read
[box type="note" align="" class="" width=""]Our article is an excerpt from a book co-authored by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti, titled as Natural Language Processing with Python Cookbook. This book will give unique recipes to know various aspects of performing Natural Language Processing with NLTK—a leading Python platform for NLP.[/box] Today we will learn to use deep recurrent neural networks (RNN) to predict the next character based on the given length of a sentence. This way of training a model is able to generate automated text continuously, which can imitate the writing style of the original writer with enough training on the number of epochs and so on. Getting ready... The Project Gutenberg eBook of the complete works of William Shakespeare's dataset is used to train the network for automated text generation. Data can be downloaded from http:// www.gutenberg.org/ for the raw file used for training: >>> from  future import print_function >>> import numpy as np >>> import random >>> import sys The following code is used to create a dictionary of characters to indices and vice-versa mapping, which we will be using to convert text into indices at later stages. This is because deep learning models cannot understand English and everything needs to be mapped into indices to train these models: >>> path = 'C:UsersprataDocumentsbook_codes NLP_DL shakespeare_final.txt' >>> text = open(path).read().lower() >>> characters = sorted(list(set(text))) >>> print('corpus length:', len(text)) >>> print('total chars:', len(characters)) >>> char2indices = dict((c, i) for i, c in enumerate(characters)) >>> indices2char = dict((i, c) for i, c in enumerate(characters)) How to do it… Before training the model, various preprocessing steps are involved to make it work. The following are the major steps involved: Preprocessing: Prepare X and Y data from the given entire story text file and converting them into indices vectorized format. Deep learning model training and validation: Train and validate the deep learning model. Text generation: Generate the text with the trained model. How it works... The following lines of code describe the entire modeling process of generating text from Shakespeare's writings. Here we have chosen character length. This needs to be considered as 40 to determine the next best single character, which seems to be very fair to consider. Also, this extraction process jumps by three steps to avoid any overlapping between two consecutive extractions, to create a dataset more fairly: # cut the text in semi-redundant sequences of maxlen characters >>> maxlen = 40 >>> step = 3 >>> sentences = [] >>> next_chars = [] >>> for i in range(0, len(text) - maxlen, step): ... sentences.append(text[i: i + maxlen]) ... next_chars.append(text[i + maxlen]) ... print('nb sequences:', len(sentences)) The following screenshot depicts the total number of sentences considered, 193798, which is enough data for text generation: The next code block is used to convert the data into a vectorized format for feeding into deep learning models, as the models cannot understand anything about text, words, sentences and so on. Initially, total dimensions are created with all zeros in the NumPy array and filled with relevant places with dictionary mappings: # Converting indices into vectorized format >>> X = np.zeros((len(sentences), maxlen, len(characters)), dtype=np.bool) >>> y = np.zeros((len(sentences), len(characters)), dtype=np.bool) >>> for i, sentence in enumerate(sentences): ... for t, char in enumerate(sentence): ... X[i, t, char2indices[char]] = 1 ... y[i, char2indices[next_chars[i]]] = 1 >>> from keras.models import Sequential >>> from keras.layers import Dense, LSTM,Activation,Dropout >>> from keras.optimizers import RMSprop The deep learning model is created with RNN, more specifically Long Short-Term Memory networks with 128 hidden neurons, and the output is in the dimensions of the characters. The number of columns in the array is the number of characters. Finally, the softmax function is used with the RMSprop optimizer. We encourage readers to try with other various parameters to check out how results vary: #Model Building >>> model = Sequential() >>> model.add(LSTM(128, input_shape=(maxlen, len(characters)))) >>> model.add(Dense(len(characters))) >>> model.add(Activation('softmax')) >>> model.compile(loss='categorical_crossentropy', optimizer=RMSprop(lr=0.01)) >>> print (model.summary()) As mentioned earlier, deep learning models train on number indices to map input to output (given a length of 40 characters, the model will predict the next best character). The following code is used to convert the predicted indices back to the relevant character by determining the maximum index of the character: # Function to convert prediction into index >>> def pred_indices(preds, metric=1.0): ... preds = np.asarray(preds).astype('float64') ... preds = np.log(preds) / metric ... exp_preds = np.exp(preds) ... preds = exp_preds/np.sum(exp_preds) ... probs = np.random.multinomial(1, preds, 1) ... return np.argmax(probs) The model will be trained over 30 iterations with a batch size of 128. And also, the diversity has been changed to see the impact on the predictions: # Train and Evaluate the Model >>> for iteration in range(1, 30): ... print('-' * 40) ... print('Iteration', iteration) ... model.fit(X, y,batch_size=128,epochs=1).. ... start_index = random.randint(0, len(text) - maxlen - 1) ... for diversity in [0.2, 0.7,1.2]: ... print('n----- diversity:', diversity) ... generated = '' ... sentence = text[start_index: start_index + maxlen] ... generated += sentence ... print('----- Generating with seed: "' + sentence + '"') ... sys.stdout.write(generated) ... for i in range(400): ... x = np.zeros((1, maxlen, len(characters))) ... for t, char in enumerate(sentence): ... x[0, t, char2indices[char]] = 1. ... preds = model.predict(x, verbose=0)[0] ... next_index = pred_indices(preds, diversity) ... pred_char = indices2char[next_index] ... generated += pred_char ... sentence = sentence[1:] + pred_char ... sys.stdout.write(pred_char) ... sys.stdout.flush() ... print("nOne combination completed n") The results are shown in the next screenshot to compare the first iteration (Iteration 1) and final iteration (Iteration 29). It is apparent that with enough training, the text generation seems to be much better than with Iteration 1: Text generation after Iteration 29 is shown in this image: Though the text generation seems to be magical, we have generated text using Shakespeare's writings, proving that with the right training and handling, we can imitate any style of writing of a particular writer. If you found this post useful, you may check out this book Natural Language Processing with Python Cookbook to analyze sentence structure and master lexical analysis, syntactic and semantic analysis, pragmatic analysis, and other NLP techniques.  
Read more
  • 0
  • 0
  • 33693

article-image-applying-spring-security-using-json-web-token-jwt
Vijin Boricha
10 Apr 2018
9 min read
Save for later

Applying Spring Security using JSON Web Token (JWT)

Vijin Boricha
10 Apr 2018
9 min read
Today, we will learn about spring security and how it can be applied in various forms using powerful libraries like JSON Web Token (JWT). Spring Security is a powerful authentication and authorization framework, which will help us provide a secure application. By using Spring Security, we can keep all of our REST APIs secured and accessible only by authenticated and authorized calls. Authentication and authorization Let's look at an example to explain this. Assume you have a library with many books. Authentication will provide a key to enter the library; however, authorization will give you permission to take a book. Without a key, you can't even enter the library. Even though you have a key to the library, you will be allowed to take only a few books. JSON Web Token (JWT) Spring Security can be applied in many forms, including XML configurations using powerful libraries such as Jason Web Token. As most companies use JWT in their security, we will focus more on JWT-based security than simple Spring Security, which can be configured in XML. JWT tokens are URL-safe and web browser-compatible especially for Single Sign-On (SSO) contexts. JWT has three parts: Header Payload Signature The header part decides which algorithm should be used to generate the token. While authenticating, the client has to save the JWT, which is returned by the server. Unlike traditional session creation approaches, this process doesn't need to store any cookies on the client side. JWT authentication is stateless as the client state is never saved on a server. JWT dependency To use JWT in our application, we may need to use the Maven dependency. The following dependency should be added in the pom.xml file. You can get the Maven dependency from: https://mvnrepository.com/artifact/javax.xml.Bind. We have used version 2.3.0 of the Maven dependency in our application: <dependency> <groupId>javax.xml.bind</groupId> <artifactId>jaxb-api</artifactId> <version>2.3.0</version> </dependency> Note: As Java 9 doesn't include DataTypeConverter in their bundle, we need to add the preceding configuration to work with DataTypeConverter. We will cover DataTypeConverter in the following section. Creating a Jason Web Token To create a token, we have added an abstract method called createToken in our SecurityService interface. This interface will tell the implementing class that it has to create a complete method for createToken. In the createToken method, we will use only the subject and expiry time as these two options are important when creating a token. At first, we will create an abstract method in the SecurityService interface. The concrete class (whoever implements the SecurityService interface) has to implement the method in their class: public interface SecurityService { String createToken(String subject, long ttlMillis); // other methods } In the preceding code, we defined the method for token creation in the interface. SecurityServiceImpl is the concrete class that implements the abstract method of the SecurityService interface by applying the business logic. The following code will explain how JWT will be created by using the subject and expiry time: private static final String secretKey= "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX"; @Override public String createToken(String subject, long ttlMillis) { if (ttlMillis <= 0) { throw new RuntimeException("Expiry time must be greater than Zero :["+ttlMillis+"] "); } // The JWT signature algorithm we will be using to sign the token SignatureAlgorithm signatureAlgorithm = SignatureAlgorithm.HS256; byte[] apiKeySecretBytes = DatatypeConverter.parseBase64Binary(secretKey); Key signingKey = new SecretKeySpec(apiKeySecretBytes, signatureAlgorithm.getJcaName()); JwtBuilder builder = Jwts.builder() .setSubject(subject) .signWith(signatureAlgorithm, signingKey); long nowMillis = System.currentTimeMillis(); builder.setExpiration(new Date(nowMillis + ttlMillis)); return builder.compact(); } The preceding code creates the token for the subject. Here, we have hardcoded the secret key "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX " to simplify the token creation process. If needed, we can keep the secret key inside the properties file to avoid hard code in the Java code. At first, we verify whether the time is greater than zero. If not, we throw the exception right away. We are using the SHA-256 algorithm as it is used in most applications. Note: Secure Hash Algorithm (SHA) is a cryptographic hash function. The cryptographic hash is in the text form of a data file. The SHA-256 algorithm generates an almost-unique, fixed-size 256-bit hash. SHA-256 is one of the more reliable hash functions. We have hardcoded the secret key in this class. We can also store the key in the application.properties file. However to simplify the process, we have hardcoded it: private static final String secretKey= "4C8kum4LxyKWYLM78sKdXrzbBjDCFyfX"; We are converting the string key to a byte array and then passing it to a Java class, SecretKeySpec, to get a signingKey. This key will be used in the token builder. Also, while creating a signing key, we use JCA, the name of our signature algorithm. Note: Java Cryptography Architecture (JCA) was introduced by Java to support modern cryptography techniques. We use the JwtBuilder class to create the token and set the expiration time for it. The following code defines the token creation and expiry time setting option: JwtBuilder builder = Jwts.builder() .setSubject(subject) .signWith(signatureAlgorithm, signingKey); long nowMillis = System.currentTimeMillis(); builder.setExpiration(new Date(nowMillis + ttlMillis)); We will have to pass time in milliseconds while calling this method as the setExpiration takes only milliseconds. Finally, we have to call the createToken method in our HomeController. Before calling the method, we will have to autowire the SecurityService as follows: @Autowired SecurityService securityService; The createToken call is coded as follows. We take the subject as the parameter. To simplify the process, we have hardcoded the expiry time as 2 * 1000 * 60 (two minutes). HomeController.java: @Autowired SecurityService securityService; @ResponseBody @RequestMapping("/security/generate/token") public Map<String, Object> generateToken(@RequestParam(value="subject") String subject){ String token = securityService.createToken(subject, (2 * 1000 * 60)); Map<String, Object> map = new LinkedHashMap<>(); map.put("result", token); return map; } Generating a token We can test the token by calling the API in a browser or any REST client. By calling this API, we can create a token. This token will be used for user authentication-like purposes. Sample API for creating a token is as follows: http://localhost:8080/security/generate/token?subject=one Here we have used one as a subject. We can see the token in the following result. This is how the token will be generated for all the subjects we pass to the API: { result: "eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiJvbmUiLCJleHAiOjE1MDk5MzY2ODF9.GknKcywiIG4- R2bRmBOsjomujP0MxZqdawrB8TO3P4" } Note: JWT is a string that has three parts, each separated by a dot (.). Each section is base-64 encoded. The first section is the header, which gives a clue about the algorithm used to sign the JWT. The second section is the body, and the final section is the signature. Getting a subject from a Jason Web Token So far, we have created a JWT token. Here, we are going to decode the token and get the subject from it. In a future section, we will talk about how to decode and get the subject from the token. As usual, we have to define the method to get the subject. We will define the getSubject method in SecurityService. Here, we will create an abstract method called getSubject in the SecurityService interface. Later, we will implement this method in our concrete class: String getSubject(String token); In our concrete class, we will implement the getSubject method and add our code in the SecurityServiceImpl class. We can use the following code to get the subject from the token: @Override public String getSubject(String token) { Claims claims = Jwts.parser() .setSigningKey(DatatypeConverter.parseBase64Binary(secretKey)) .parseClaimsJws(token).getBody(); return claims.getSubject(); } In the preceding method, we use the Jwts.parser to get the claims. We set a signing key by converting the secret key to binary and then passing it to a parser. Once we get the Claims, we can simply get the subject by calling getSubject. Finally, we can call the method in our controller and pass the generated token to get the subject. You can check the following code, where the controller is calling the getSubject method and returning the subject in the HomeController.java file: @ResponseBody @RequestMapping("/security/get/subject") public Map<String, Object> getSubject(@RequestParam(value="token") String token){ String subject = securityService.getSubject(token); Map<String, Object> map = new LinkedHashMap<>(); map.put("result", subject); return map; } Getting a subject from a token Previously, we created the code to get the token. Here we will test the method we created previously by calling the get subject API. By calling the REST API, we will get the subject that we passed earlier. Sample API: http://localhost:8080/security/get/subject?token=eyJhbGciOiJIUzI1NiJ9.eyJzd WIiOiJvbmUiLCJleHAiOjE1MDk5MzY2ODF9.GknKcywiI-G4- R2bRmBOsjomujP0MxZqdawrB8TO3P4 Since we used one as the subject when creating the token by calling the generateToken method, we will get "one" in the getSubject method: { result: "one" } Note: Usually, we attach the token in the headers; however, to avoid complexity, we have provided the result. Also, we have passed the token as a parameter to get the subject. You may not need to do it the same way in a real application. This is only for demo purposes. This article is an excerpt from the book Building RESTful Web Services with Spring 5 - Second Edition, written by Raja CSP Raman. This book involves techniques to deal with security in Spring and shows how to implement unit test and integration test strategies. You may also like How to develop RESTful web services in Spring, another tutorial from this book. Check out other posts on Spring Security: Spring Security 3: Tips and Tricks Opening up to OpenID with Spring Security Migration to Spring Security 3  
Read more
  • 0
  • 0
  • 33632

article-image-adding-media-our-site
Packt
21 Jun 2016
19 min read
Save for later

Adding Media to Our Site

Packt
21 Jun 2016
19 min read
In this article by Neeraj Kumar et al, authors of the book, Drupal 8 Development Beginner's Guide - Second Edtion, explains a text-only site is not going to hold the interest of visitors; a site needs some pizzazz and some spice! One way to add some pizzazz to your site is by adding some multimedia content, such as images, video, audio, and so on. But, we don't just want to add a few images here and there; in fact, we want an immersive and compelling multimedia experience that is easy to manage, configure, and extend. The File entity (https://drupal.org/project/file_entity) module for Drupal 8 will enable us to manage files very easily. In this article, we will discover how to integrate the File entity module to add images to our d8dev site, and will explore compelling ways to present images to users. This will include taking a look at the integration of a lightbox-type UI element for displaying the File-entity-module-managed images, and learning how we can create custom image styles through UI and code. The following topics will be covered in this article: The File entity module for Drupal 8 Adding a Recipe image field to your content types Code example—image styles for Drupal 8 Displaying recipe images in a lightbox popup Working with Drupal issue queues (For more resources related to this topic, see here.) Introduction to the File entity module As per the module page at https://www.drupal.org/project/file_entity: File entity provides interfaces for managing files. It also extends the core file entity, allowing files to be fieldable, grouped into types, viewed (using display modes) and formatted using field formatters. File entity integrates with a number of modules, exposing files to Views, Entity API, Token and more. In our case, we need this module to easily edit image properties such as Title text and Alt text. So these properties will be used in the colorbox popup to display them as captions. Working with dev versions of modules There are times when you come across a module that introduces some major new features and is fairly stable, but not quite ready for use on a live/production website, and is therefore available only as a dev version. This is a perfect opportunity to provide a valuable contribution to the Drupal community. Just by installing and using a dev version of a module (in your local development environment, of course), you are providing valuable testing for the module maintainers. Of course, you should enter an issue in the project's issue queue if you discover any bugs or would like to request any additional features. Also, using a dev version of a module presents you with the opportunity to take on some custom Drupal development. However, it is important that you remember that a module is released as a dev version for a reason, and it is most likely not stable enough to be deployed on a public-facing site. Our use of the File entity module in this article is a good example of working with the dev version of a module. One thing to note: Drush will download official and dev module releases. But at this point in time, there is no official port for the File entity module in Drupal, so we will use the unofficial one, which lives on GitHub (https://github.com/drupal-media/file_entity). In the next step, we will be downloading the dev release with GitHub. Time for action – installing a dev version of the File entity module In Drupal, we use Drush to download and enable any module/theme, but there is no official port yet for the file entity module in Drupal, so we can use the unofficial one, which lives on GitHub at https://github.com/drupal-media/file_entity: Open the Terminal (Mac OS X) or Command Prompt (Windows) application, and go to the root directory of your d8dev site. Go inside the modules folder and download the File entity module from GitHub. We use the git command to download this module: $ git clone https://github.com/drupal-media/file_entity. Another way is to download a .zip file from https://github.com/drupal-media/file_entity and extract it in the modules folder: Next, on the Extend page (admin/modules), enable the File entity module. What just happened? We enabled the File entity module, and learned how to download and install with GitHub. A new recipe for our site In this article, we are going to create a new recipe: Thai Basil Chicken. If you would like to have more real content to use as an example, and feel free to try the recipe out! Name: Thai Basil Chicken Description: A spicy, flavorful version of one of my favorite Thai dishes RecipeYield : Four servings PrepTime: 25 minutes CookTime: 20 minutes Ingredients : One pound boneless chicken breasts Two tablespoons of olive oil Four garlic cloves, minced Three tablespoons of soy sauce Two tablespoons of fish sauce Two large sweet onions, sliced Five cloves of garlic One yellow bell pepper One green bell pepper Four to eight Thai peppers (depending on the level of hotness you want) One-third cup of dark brown sugar dissolved in one cup of hot water One cup of fresh basil leaves Two cups of Jasmin rice Instructions: Prepare the Jasmine rice according to the directions. Heat the olive oil in a large frying pan over medium heat for two minutes. Add the chicken to the pan and then pour on soy sauce. Cook the chicken until there is no visible pinkness—approximately 8 to 10 minutes. Reduce heat to medium low. Add the garlic and fish sauce, and simmer for 3 minutes. Next, add the Thai chilies, onion, and bell pepper and stir to combine. Simmer for 2 minutes. Add the brown sugar and water mixture. Stir to mix, and then cover. Simmer for 5 minutes. Uncover, add basil, and stir to combine. Serve over rice. Time for action – adding a Recipe images field to our Recipe content type We will use the manage fields administrative page to add a Media field to our d8dev Recipe content type: Open up the d8dev site in your favorite browser, click on the Structure link in the Admin toolbar, and then click on the Content types link. Next, on the Content types administrative page, click on the Manage fields link for your Recipe content type: Now, on the Manage fields administrative page, click on the Add field link. On the next screen, select Image from the Add a new field dropdown and Label as Recipe images. Click on the Save field settings button. Next, on the Field settings page, select Unlimited as the allowed number of values. Click on the Save field settings button. On the next screen, leave all settings as they are and click on the Save settings button. Next, on the Manage form display page, select widget Editable file for the Recipe images field and click on the Save button. Now, on the Manage display page, for the Recipe images field, select Hidden as the label. Click on the settings icon. Then select Medium (220*220) as the image style, and click on the Update button. At the bottom, click on the Save button: Let's add some Recipe images to a recipe. Click on the Content link in the menu bar, and then click on Add content and Recipe. On the next screen, fill in the title as Thai Basil Chicken and other fields respectively as mentioned in the preceding recipe details. Now, scroll down to the new Recipe images field that you have added. Click on the Add a new file button or drag and drop images that you want to upload. Then click on the Save and Publish button: Reload your Thai Basil Chicken recipe page, and you should see something similar to the following: All the images are stacked on top of each other. So, we will add the following CSS just under the style for field--name-field-recipe-images and field--type-recipe-images in the /modules/d8dev/styles/d8dev.css file, to lay out the Recipe images in more of a grid: .node .field--type-recipe-images { float: none !important; } .field--name-field-recipe-images .field__item { display: inline-flex; padding: 6px; } Now we will load this d8dev.css file to affect this grid style. In Drupal 8, loading a CSS file has a process: Save the CSS to a file. Define a library, which can contain the CSS file. Attach the library to a render array in a hook. So, we have already saved a CSS file called d8dev.css under the styles folder; now we will define a library. To define one or more (asset) libraries, add a *.libraries.yml file to your theme folder. Our module is named d8dev, and then the filename should be d8dev.libraries.yml. Each library in the file is an entry detailing CSS, like this: d8dev: version: 1.x css: theme: styles/d8dev.css: {} Now, we define the hook_page_attachments() function to load the CSS file. Add the following code inside the d8dev.module file. Use this hook when you want to conditionally add attachments to a page: /** * Implements hook_page_attachments(). */ function d8dev_page_attachments(array &$attachments) { $attachments['#attached']['library'][] = 'd8dev/d8dev'; } Now, we will need to clear the cache for our d8dev site by going to Configuration, clicking on the Performance link, and then clicking on the Clear all caches button. Reload your Thai Basil Chicken recipe page, and you should see something similar to the following: What just happened? We added and configured a media-based field for our Recipe content type. We updated the d8dev module with custom CSS code to lay out the Recipe images in more of a grid format. And also we looked at how to attach a CSS file through a module. Creating a custom image style Before we configure a colorbox feature, we are going to create a custom image style to use when we add them in colorbox content preview settings. Image styles for Drupal 8 are part of the core Image module. The core image module provides three default image styles—thumbnail, medium, and large—as seen in the following Image style configuration page: Now, we are going to add a fifth custom image style, an image style that will resize our images somewhere between the 100 x 75 thumbnail style and the 220 x 165 medium style. We will walkthrough the process of creating an image style through the Image style administrative page, and also walkthrough the process of programmatically creating an image style. Time for action – adding a custom image style through the image style administrative page First, we will use the Image style administrative page (admin/config/media/image-styles) to create a custom image style: Open the d8dev site in your favorite browser, click on the Configuration link in the Admin toolbar, and click on the Image styles link under the Media section. Once the Image styles administrative page has loaded, click on the Add style link. Next, enter small for the Image style name of your custom image style, and click on the Create new style button: Now, we will add the one and only effect for our custom image style by selecting Scale from the EFFECT options and then clicking on the Add button. On the Add Scale effect page, enter 160 for the width and 120 for the height. Leave the Allow Upscaling checkbox unchecked, and click on the Add effect button: Finally, just click on the Update style button on the Edit small style administrative page, and we are done. We now have a new custom small image style that we will be able to use to resize images for our site: What just happened? We learned how easy it is to add a custom image style with the administrative UI. Now, we are going to see how to add a custom image style by writing some code. The advantage of having code-based custom image styles is that it will allow us to utilize a source code repository, such as Git, to manage and deploy our custom image styles between different environments. For example, it would allow us to use Git to promote image styles from our development environment to a live production website. Otherwise, the manual configuration that we just did would have to be repeated for every environment. Time for action – creating a programmatic custom image style Now, we will see how we can add a custom image style with code: The first thing we need to do is delete the small image style that we just created. So, open your d8dev site in your favorite browser, click on the Configuration link in the Admin toolbar, and then click on the Image styles link under the Media section. Once the Image styles administrative page has loaded, click on the delete link for the small image style that we just added. Next, on the Optionally select a style before deleting small page, leave the default value for the Replacement style select list as No replacement, just delete, and click on the Delete button: In Drupal 8, image styles have been converted from an array to an object that extends ConfigEntity. All image styles provided by modules need to be defined as YAML configuration files in the config/install folder of each module. Suppose our module is located at modules/d8dev. Create a file called modules/d8dev/config/install/image.style.small.yml with the following content: uuid: b97a0bd7-4833-4d4a-ae05-5d4da0503041 langcode: en status: true dependencies: { } name: small label: small effects: c76016aa-3c8b-495a-9e31-4923f1e4be54: uuid: c76016aa-3c8b-495a-9e31-4923f1e4be54 id: image_scale weight: 1 data: width: 160 height: 120 upscale: false We need to use a UUID generator to assign unique IDs to image style effects. Do not copy/paste UUIDs from other pieces of code or from other image styles! The name of our custom style is small, is provided as the name and label as same. For each effect that we want to add to our image style, we will specify the effect we want to use as the name key, and then pass in values as the settings for the effect. In the case of the image_scale effect that we are using here, we pass in the width, height, and upscale settings. Finally, the value for the weight key allows us to specify the order the effects should be processed in, and although it is not very useful when there is only one effect, it becomes important when there are multiple effects. Now, we will need to uninstall and install our d8dev module by going to the Extend page. On the next screen click on the Uninstall tab, check the d8dev checkbox and click on the Uninstall button. Now, click on the List tab, check d8dev, and click on the Install button. Then, go back to the Image styles administrative page and you will see our programmatically created small image style. What just happened? We created a custom image style with some custom code. We then configured our Recipe content type to use our custom image style for images added to the Recipe images field. Integrating the Colorbox and File entity modules The File entity module provides interfaces for managing files. For images, we will be able to edit Title text, Alt text, and Filenames easily. However, the images are taking up quite a bit of room. Let's create a pop-up lightbox gallery and show images in a popup. When someone clicks on an image, a lightbox will pop up and allow the user to cycle through larger versions of all associated images. Time for action – installing the Colorbox module Before we can display Recipe images in a Colorbox, we need to download and enable the module: Open the Mac OS X Terminal or Windows Command Prompt, and change to the d8dev directory. Next, use Drush to download and enable the current dev release of the Colorbox module (http://drupal.org/project/colorbox): $ drush dl colorbox-8.x-1.x-dev Project colorbox (8.x-1.x-dev) downloaded to /var/www/html/d8dev/modules/colorbox. [success] $ drushencolorbox The following extensions will be enabled: colorbox Do you really want to continue? (y/n): y colorbox was enabled successfully. [ok] The Colorbox module depends on the Colorbox jQuery plugin available at https://github.com/jackmoore/colorbox. The Colorbox module includes a Drush task that will download the required jQuery plugin at the /libraries directory: $drushcolorbox-plugin Colorbox plugin has been installed in libraries [success] Next, we will look into the Colorbox display formatter. Click on the Structure link in the Admin toolbar, then click on the Content types link, and finally click on the manage display link for your Recipe content type under the Operations dropdown: Next, click on the FORMAT select list for the Recipe images field, and you will see an option for Colorbox, Select as Colorbox then you will see the settings change. Then, click on the settings icon: Now, you will see the settings for Colorbox. Select Content image style as small and Content image style for first image as small in the dropdown, and use the default settings for other options. Click on the Update button and next on the Save button at the bottom: Reload our Thai Basil Chicken recipe page, and you should see something similar to the following (with the new image style, small): Now, click on any image and then you will see the image loaded in the colorbox popup: We have learned more about images for colorbox, but colorbox also supports videos. Another way to add some spice to our site is by adding videos. So there are several modules available to work with colorbox for videos. The Video Embed Field module creates a simple field type that allows you to embed videos from YouTube and Vimeo and show their thumbnail previews simply by entering the video's URL. So you can try this module to add some pizzazz to your site! What just happened? We installed the Colorbox module and enabled it for the Recipe images field on our custom Recipe content type. Now, we can easily add images to our d8dev content with the Colorbox pop-up feature. Working with Drupal issue queues Drupal has its own issue queue for working with a team of developers around the world. If you need help for a specific project, core, module, or a theme related, you should go to the issue queue, where the maintainers, users, and followers of the module/theme communicate. The issue page provides a filter option, where you can search for specific issues based on Project, Assigned, Submitted by, Followers, Status, Priority, Category, and so on. We can find issues at https://www.drupal.org/project/issues/colorbox. Here, replace colorbox with the specific module name. For more information, see https://www.drupal.org/issue-queue. In our case, we have one issue with the colorbox module. Captions are working for the Automatic and Content title properties, but are not working for the Alt text and Title text properties. To check this issue, go to Structure | Content types and click on Manage display. On the next screen, click on the settings icon for the Recipe images field. Now select the Caption option as Title text or Alt text and click on the Update button. Finally, click on the Save button. Reload the Thai Basil Chicken recipe page, and click on any image. Then it opens in popup, but we cannot see captions for this. Make sure you have the Title text and Alt text properties updated for Recipe images field for the Thai Basil Chicken recipe. Time for action – creating an issue for the Colorbox module Now, before we go and try to figure out how to fix this functionality for the Colorbox module, let's create an issue: On https://www.drupal.org/project/issues/colorbox, click on the Create a new issue link: On the next screen we will see a form. We will fill in all the required fields: Title, Category as Bug report, Version as 8.x-1.x-dev, Component as Code, and the Issue summary field. Once I submitted my form, an issue was created at https://www.drupal.org/node/2645160. You should see an issue on Drupal (https://www.drupal.org/) like this: Next, the Maintainers of the colorbox module will look into this issue and reply accordingly. Actually, @frjo replied saying "I have never used that module but if someone who does sends in a patch I will take a look at it." He is a contributor to this module, so we will wait for some time and will see if someone can fix this issue by giving a patch or replying with useful comments. In case someone gives the patch, then we have to apply that to the colorbox module. This information is available on Drupal at https://www.drupal.org/patch/apply . What just happened? We understood and created an issue in the Colorbox module's issue queue list. We also looked into what the required fields are and how to fill them to create an issue in the Drupal module queue list form. Summary In this article, we looked at a way to use our d8dev site with multimedia, creating image styles using some custom code, and learned some new ways of interacting with the Drupal developer community. We also worked with the Colorbox module to add images to our d8dev content with the Colorbox pop-up feature. Lastly, we looked into the custom module to work with custom CSS files. Resources for Article: Further resources on this subject: Installing Drupal 8 [article] Drupal 7 Social Networking: Managing Users and Profiles [article] Drupal 8 and Configuration Management [article]
Read more
  • 0
  • 0
  • 33605

article-image-using-native-sdks-and-libraries-react-native
Emilio Rodriguez
07 Apr 2016
6 min read
Save for later

Using Native SDKs and Libraries in React Native

Emilio Rodriguez
07 Apr 2016
6 min read
When building an app in React Native we may end up needing to use third-party SDKs or libraries. Most of the time, these are only available in their native version, and, therefore, only accessible as Objective-C or Swift libraries in the case of iOS apps or as Java Classes for Android apps. Only in a few cases these libraries are written in JavaScript and even then, they may need pieces of functionality not available in React Native such as DOM access or Node.js specific functionality. In my experience, this is one of the main reasons driving developers and IT decision makers in general to run away from React Native when considering a mobile development framework for their production apps. The creators of React Native were fully aware of this potential pitfall and left a door open in the framework to make sure integrating third-party software was not only possible but also quick, powerful, and doable by any non-iOS/Android native developer (i.e. most of the React Native developers). As a JavaScript developer, having to write Objective-C or Java code may not be very appealing in the beginning, but once you realize the whole process of integrating a native SDK can take as little as eight lines of code split in two files (one header file and one implementation file), the fear quickly fades away and the feeling of being able to perform even the most complex task in a mobile app starts to take over. Suddenly, the whole power of iOS and Android can be at any React developer’s disposal. To better illustrate how to integrate a third-party SDK we will use one of the easiest to integrate payment providers: Paymill. If we take a look at their site, we notice that only iOS and Android SDKs are available for mobile payments. That should leave out every app written in React Native if it wasn’t for the ability of this framework to communicate with native modules. For the sake of convenience I will focus this article on the iOS module. Step 1: Create two native files for our bridge. We need to create an Objective-C class, which will serve as a bridge between our React code and Paymill’s native SDK. Normally, an Objective-C class is made out of two files, a .m and a .h, holding the module implementation and the header for this module respectively. To create the .h file we can right-click on our project’s main folder in XCode > New File > Header file. In our case, I will call this file PaymillBridge.h. For React Native to communicate with our bridge, we need to make it implement the RTCBridgeModule included in React Native. To do so, we only have to make sure our .h file looks like this: // PaymillBridge.h #import "RCTBridgeModule.h" @interface PaymillBridge : NSObject <RCTBridgeModule> @end We can follow a similar process to create the .m file: Right-click our project’s main folder in XCode > New File > Objective-C file. The module implementation file should include the RCT_EXPORT_MODULE macro (also provided in any React Native project): // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); @end A macro is just a predefined piece of functionality that can be imported just by calling it. This will make sure React is aware of this module and would make it available for importing in your app. Now we need to expose the method we need in order to use Paymill’s services from our JavaScript code. For this example we will be using Paymill’s method to generate a token representing a credit card based on a public key and some credit card details: generateTokenWithPublicKey. To do so, we need to use another macro provided by React Native: RCT_EXPORT_METHOD. // PaymillBridge.m @implementation PaymillBridge RCT_EXPORT_MODULE(); RCT_EXPORT_METHOD(generateTokenWithPublicKey: (NSString *)publicKey cardDetails:(NSDictionary *)cardDetails callback:(RCTResponseSenderBlock)callback) { //… Implement the call as described in the SDK’s documentation … callback(@[[NSNull null], token]); } @end In this step we will have to write some Objective-C but most likely it would be a very simple piece of code using the examples stated in the SDK’s documentation. One interesting point is how to send data from the native SDK to our React code. To do so you need to pass a callback as you can see I did as the last parameter of our exported method. Callbacks in React Native’s bridges have to be defined as RCTResponseSenderBlock. Once we do this, we can call this callback passing an array of parameters, which will be sent as parameters for our JavaScript function in React Native (in our case we decided to pass two parameters back: an error set to null following the error handling conventions of node.js, and the token generated by Paymill natively). Step 2: Call our bridge from our React Native code. Once the module is properly set up, React Native makes it available in our app just by importing it from our JavaScript code: // PaymentComponent.js var Paymill = require('react-native').NativeModules.PaymillBridge; Paymill.generateTokenWithPublicKey( '56s4ad6a5s4sd5a6', cardDetails, function(error, token){ console.log(token); }); NativeModules holds the list of modules we created implementing the RCTBridgeModule. React Native makes them available by the name we chose for our Objective-C class name (PaymillBridge in our example). Then, we can call any exported native method as a normal JavaScript method from our React Native Component or library. Going Even Further That should do it for any basic SDK, but React Native gives developers a lot more control on how to communicate with native modules. For example, we may want to force the module to be run in the main thread. For that we just need to add an extra method to our native module implementation: // PaymillBridge.m @implementation PaymillBridge //... - (dispatch_queue_t)methodQueue { return dispatch_get_main_queue(); } Just by adding this method to our PaymillBridge.m React Native will force all the functionality related to this module to be run on the main thread, which will be needed when running main-thread-only iOS API. And there is more: promises, exporting constants, sending events to JavaScript, etc. More complex functionality can be found in the official documentation of React Native; the topics covered on this article, however, should solve 80 percent of the cases when implementing most of the third-party SDKs. About the Author Emilio Rodriguez started working as a software engineer for Sun Microsystems in 2006. Since then, he has focused his efforts on building a number of mobile apps with React Native while contributing to the React Native project. These contributions helped his understand how deep and powerful this framework is.
Read more
  • 0
  • 2
  • 33600

article-image-building-microservices-from-a-monolith-java-ee-app-tutorial
Aaron Lazar
03 Aug 2018
11 min read
Save for later

Building microservices from a monolith Java EE app [Tutorial]

Aaron Lazar
03 Aug 2018
11 min read
Microservices are one of the top buzzwords these days. It's easy to understand why: in a growing software industry where the amount of services, data, and users increases crazily, we really need a way to build and deliver faster, decoupled, and scalable solutions. In this tutorial, we'll help you get started with microservices or go deeper into your ongoing project. This article is an extract from the book Java EE 8 Cookbook, authored by Elder Moraes. One common question that I have heard dozens of times is, "how do I break down my monolith into microservices?", or, "how do I migrate from a monolith approach to microservices?" Well, that's what this recipe is all about. Getting ready with monolith and microservice projects For both monolith and microservice projects, we will use the same dependency: <dependency> <groupId>javax</groupId> <artifactId>javaee-api</artifactId> <version>8.0</version> <scope>provided</scope> </dependency> Working with entities and beans First, we need the entities that will represent the data kept by the application. Here is the User entity: @Entity public class User implements Serializable { private static final long serialVersionUID = 1L; @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column private String name; @Column private String email; public User(){ } public User(String name, String email) { this.name = name; this.email = email; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public String getName() { return name; } public void setName(String name) { this.name = name; } public String getEmail() { return email; } public void setEmail(String email) { this.email = email; } } Here is the UserAddress entity: @Entity public class UserAddress implements Serializable { private static final long serialVersionUID = 1L; @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column @ManyToOne private User user; @Column private String street; @Column private String number; @Column private String city; @Column private String zip; public UserAddress(){ } public UserAddress(User user, String street, String number, String city, String zip) { this.user = user; this.street = street; this.number = number; this.city = city; this.zip = zip; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public User getUser() { return user; } public void setUser(User user) { this.user = user; } public String getStreet() { return street; } public void setStreet(String street) { this.street = street; } public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } public String getCity() { return city; } public void setCity(String city) { this.city = city; } public String getZip() { return zip; } public void setZip(String zip) { this.zip = zip; } } Now we define one bean to deal with the transaction over each entity. Here is the UserBean class: @Stateless public class UserBean { @PersistenceContext private EntityManager em; public void add(User user) { em.persist(user); } public void remove(User user) { em.remove(user); } public void update(User user) { em.merge(user); } public User findById(Long id) { return em.find(User.class, id); } public List<User> get() { CriteriaBuilder cb = em.getCriteriaBuilder(); CriteriaQuery<User> cq = cb.createQuery(User.class); Root<User> pet = cq.from(User.class); cq.select(pet); TypedQuery<User> q = em.createQuery(cq); return q.getResultList(); } } Here is the UserAddressBean class: @Stateless public class UserAddressBean { @PersistenceContext private EntityManager em; public void add(UserAddress address){ em.persist(address); } public void remove(UserAddress address){ em.remove(address); } public void update(UserAddress address){ em.merge(address); } public UserAddress findById(Long id){ return em.find(UserAddress.class, id); } public List<UserAddress> get() { CriteriaBuilder cb = em.getCriteriaBuilder(); CriteriaQuery<UserAddress> cq = cb.createQuery(UserAddress.class); Root<UserAddress> pet = cq.from(UserAddress.class); cq.select(pet); TypedQuery<UserAddress> q = em.createQuery(cq); return q.getResultList(); } } Finally, we build two services to perform the communication between the client and the beans. Here is the UserService class: @Path("userService") public class UserService { @EJB private UserBean userBean; @GET @Path("findById/{id}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response findById(@PathParam("id") Long id){ return Response.ok(userBean.findById(id)).build(); } @GET @Path("get") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response get(){ return Response.ok(userBean.get()).build(); } @POST @Path("add") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response add(User user){ userBean.add(user); return Response.accepted().build(); } @DELETE @Path("remove/{id}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response remove(@PathParam("id") Long id){ userBean.remove(userBean.findById(id)); return Response.accepted().build(); } } Here is the UserAddressService class: @Path("userAddressService") public class UserAddressService { @EJB private UserAddressBean userAddressBean; @GET @Path("findById/{id}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response findById(@PathParam("id") Long id){ return Response.ok(userAddressBean.findById(id)).build(); } @GET @Path("get") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response get(){ return Response.ok(userAddressBean.get()).build(); } @POST @Path("add") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response add(UserAddress address){ userAddressBean.add(address); return Response.accepted().build(); } @DELETE @Path("remove/{id}") @Consumes(MediaType.APPLICATION_JSON) @Produces(MediaType.APPLICATION_JSON) public Response remove(@PathParam("id") Long id){ userAddressBean.remove(userAddressBean.findById(id)); return Response.accepted().build(); } } Now let's break it down! Building microservices from the monolith Our monolith deals with User and UserAddress. So we will break it down into three microservices: A user microservice A user address microservice A gateway microservice A gateway service is an API between the application client and the services. Using it allows you to simplify this communication, also giving you the freedom of doing whatever you like with your services without breaking the API contracts (or at least minimizing it). The user microservice The User entity, UserBean, and UserService will remain exactly as they are in the monolith. Only now they will be delivered as a separated unit of deployment. The user address microservice The UserAddress classes will suffer just a single change from the monolith version, but keep their original APIs (that is great from the point of view of the client). Here is the UserAddress entity: @Entity public class UserAddress implements Serializable { private static final long serialVersionUID = 1L; @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; @Column private Long idUser; @Column private String street; @Column private String number; @Column private String city; @Column private String zip; public UserAddress(){ } public UserAddress(Long user, String street, String number, String city, String zip) { this.idUser = user; this.street = street; this.number = number; this.city = city; this.zip = zip; } public Long getId() { return id; } public void setId(Long id) { this.id = id; } public Long getIdUser() { return idUser; } public void setIdUser(Long user) { this.idUser = user; } public String getStreet() { return street; } public void setStreet(String street) { this.street = street; } public String getNumber() { return number; } public void setNumber(String number) { this.number = number; } public String getCity() { return city; } public void setCity(String city) { this.city = city; } public String getZip() { return zip; } public void setZip(String zip) { this.zip = zip; } } Note that User is no longer a property/field in the UserAddress entity, but only a number (idUser). We will get into more details about it in the following section. The gateway microservice First, we create a class that helps us deal with the responses: public class GatewayResponse { private String response; private String from; public String getResponse() { return response; } public void setResponse(String response) { this.response = response; } public String getFrom() { return from; } public void setFrom(String from) { this.from = from; } } Then, we create our gateway service: @Consumes(MediaType.APPLICATION_JSON) @Path("gatewayResource") @RequestScoped public class GatewayResource { private final String hostURI = "http://localhost:8080/"; private Client client; private WebTarget targetUser; private WebTarget targetAddress; @PostConstruct public void init() { client = ClientBuilder.newClient(); targetUser = client.target(hostURI + "ch08-micro_x_mono-micro-user/"); targetAddress = client.target(hostURI + "ch08-micro_x_mono-micro-address/"); } @PreDestroy public void destroy(){ client.close(); } @GET @Path("getUsers") @Produces(MediaType.APPLICATION_JSON) public Response getUsers() { WebTarget service = targetUser.path("webresources/userService/get"); Response response; try { response = service.request().get(); } catch (ProcessingException e) { return Response.status(408).build(); } GatewayResponse gatewayResponse = new GatewayResponse(); gatewayResponse.setResponse(response.readEntity(String.class)); gatewayResponse.setFrom(targetUser.getUri().toString()); return Response.ok(gatewayResponse).build(); } @POST @Path("addAddress") @Produces(MediaType.APPLICATION_JSON) public Response addAddress(UserAddress address) { WebTarget service = targetAddress.path("webresources/userAddressService/add"); Response response; try { response = service.request().post(Entity.json(address)); } catch (ProcessingException e) { return Response.status(408).build(); } return Response.fromResponse(response).build(); } } As we receive the UserAddress entity in the gateway, we have to have a version of it in the gateway project too. For brevity, we will omit the code, as it is the same as in the UserAddress project. Transformation to microservices The monolith application couldn't be simpler: just a project with two services using two beans to manage two entities. The microservices So we split the monolith into three projects (microservices): the user service, the user address service, and the gateway service. The user service classes remained unchanged after the migration from the monolith version. So there's nothing to comment on. The UserAddress class had to be changed to become a microservice. The first change was made on the entity. Here is the monolith version: @Entity public class UserAddress implements Serializable { ... @Column @ManyToOne private User user; ... public UserAddress(User user, String street, String number, String city, String zip) { this.user = user; this.street = street; this.number = number; this.city = city; this.zip = zip; } ... public User getUser() { return user; } public void setUser(User user) { this.user = user; } ... } Here is the microservice version: @Entity public class UserAddress implements Serializable { ... @Column private Long idUser; ... public UserAddress(Long user, String street, String number, String city, String zip) { this.idUser = user; this.street = street; this.number = number; this.city = city; this.zip = zip; } public Long getIdUser() { return idUser; } public void setIdUser(Long user) { this.idUser = user; } ... } Note that in the monolith version, user was an instance of the User entity: private User user; In the microservice version, it became a number: private Long idUser; This happened for two main reasons: In the monolith, we have the two tables in the same database (User and UserAddress), and they both have physical and logical relationships (foreign key). So it makes sense to also keep the relationship between both the objects. The microservice should have its own database, completely independent from the other services. So we choose to keep only the user ID, as it is enough to load the address properly anytime the client needs. This change also resulted in a change in the constructor. Here is the monolith version: public UserAddress(User user, String street, String number, String city, String zip) Here is the microservice version: public UserAddress(Long user, String street, String number, String city, String zip) This could lead to a change of contract with the client regarding the change of the constructor signature. But thanks to the way it was built, it wasn't necessary. Here is the monolith version: public Response add(UserAddress address) Here is the microservice version: public Response add(UserAddress address) Even if the method is changed, it could easily be solved with @Path annotation, or if we really need to change the client, it would be only the method name and not the parameters (which used to be more painful). Finally, we have the gateway service, which is our implementation of the API gateway design pattern. Basically it is the one single point to access the other services. The nice thing about it is that your client doesn't need to care about whether the other services changed the URL, the signature, or even whether they are available. The gateway will take care of them. The bad part is that it is also on a single point of failure. Or, in other words, without the gateway, all services are unreachable. But you can deal with it using a cluster, for example. So now you've built a microservice in Java EE code, that was once a monolith! If you found this tutorial helpful and would like to learn more, head over to this book Java EE 8 Cookbook, authored by Elder Moraes. Oracle announces a new pricing structure for Java Design a RESTful web API with Java [Tutorial] How to convert Java code into Kotlin
Read more
  • 0
  • 0
  • 33568
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-chatgpt-for-time-series-analysis
Bhavishya Pandit
26 Sep 2023
11 min read
Save for later

ChatGPT for Time Series Analysis

Bhavishya Pandit
26 Sep 2023
11 min read
Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionIn the era of artificial intelligence, ChatGPT stands as a remarkable example of natural language understanding and generation. Developed by OpenAI, ChatGPT is an advanced language model designed to comprehend and generate human-like text, making it a versatile tool for a wide range of applications.One of the critical domains where ChatGPT can make a significant impact is time series analysis. Time series data, consisting of sequential observations over time, is fundamental across industries such as finance, healthcare, and energy. It enables organizations to uncover trends, forecast future values, and detect anomalies, all of which are invaluable for data-driven decision-making. Whether it's predicting stock prices, monitoring patient health, or optimizing energy consumption, the ability to analyze time series data accurately is paramount.The purpose of this article is to explore the synergy between ChatGPT and time series analysis. We will delve into how ChatGPT's natural language capabilities can be harnessed to streamline data preparation, improve forecasting accuracy, and enhance anomaly detection in time series data. Through practical examples and code demonstrations, we aim to illustrate how ChatGPT can be a powerful ally for data scientists and analysts in their quest for actionable insights from time series data.1. Understanding Time Series Data Time series data is a specialized type of data that records observations, measurements, or events at successive time intervals. Unlike cross-sectional data, which captures information at a single point in time, time series data captures data points in a sequential order, often with a regular time interval between them. This temporal aspect makes time series data unique and valuable for various applications. Characteristics of Time Series Data:Temporal Order: Time series data is ordered chronologically, with each data point associated with a specific timestamp or time period.Dependency: Data points in a time series are often dependent on previous observations, making them suitable for trend analysis and forecasting.Seasonality: Many time series exhibit repetitive patterns or seasonality, which can be daily, weekly, monthly, or annual, depending on the domain.Noise and Anomalies: Time series data may contain noise, irregularities, and occasional anomalies that need to be identified and addressed. Real-World Applications of Time Series Analysis:Time series analysis is a crucial tool in numerous domains, including:Finance: Predicting stock prices, currency exchange rates, and market trends.Healthcare: Monitoring patient vital signs, disease progression, and healthcare resource optimization.Energy: Forecasting energy consumption, renewable energy generation, and grid management.Climate Science: Analyzing temperature, precipitation, and climate patterns.Manufacturing: Quality control, demand forecasting, and process optimization.Economics: Studying economic indicators like GDP, inflation rates, and unemployment rates. Emphasis on Powerful Tools and Techniques:The complexity of time series data necessitates the use of powerful tools and techniques. Effective time series analysis often involves statistical methods, machine learning models, and data preprocessing steps to extract meaningful insights. In this article, we will explore how ChatGPT can complement these techniques to facilitate various aspects of time series analysis, from data preparation to forecasting and anomaly detection. 2. ChatGPT OverviewChatGPT, developed by OpenAI, represents a groundbreaking advancement in natural language processing. It builds upon the success of its predecessors, like GPT-3, with a focus on generating human-like text and facilitating interactive conversations.Background: ChatGPT is powered by a deep neural network architecture called the Transformer, which excels at processing sequences of data, such as text. It has been pre-trained on a massive corpus of text from the internet, giving it a broad understanding of language and context. Capabilities: ChatGPT possesses exceptional natural language understanding and generation abilities. It can comprehend and generate text in a wide range of languages and styles, making it a versatile tool for communication, content generation, and now, data analysis. Aiding Data Scientists: For data scientists, ChatGPT offers invaluable assistance. Its ability to understand and generate text allows it to assist in data interpretation, data preprocessing, report generation, and even generating code snippets. In the context of time series analysis, ChatGPT can help streamline tasks, enhance communication, and contribute to more effective analysis by providing human-like interactions with data and insights. This article will explore how data scientists can harness ChatGPT's capabilities to their advantage in the realm of time series data. 3. Preparing Time Series DataData preprocessing is a critical step in time series analysis, as the quality of your input data greatly influences the accuracy of your results. Inaccurate or incomplete data can lead to flawed forecasts and unreliable insights. Therefore, it's essential to carefully clean and prepare time series data before analysis.Importance of Data Preprocessing:1. Missing Data Handling: Time series data often contains missing values, which need to be addressed. Missing data can disrupt calculations and lead to biased results.2. Noise Reduction: Raw time series data can be noisy, making it challenging to discern underlying patterns. Data preprocessing techniques can help reduce noise and enhance signal clarity.3. Outlier Detection: Identifying and handling outliers is crucial, as they can significantly impact analysis and forecasting.4. Normalization and Scaling: Scaling data to a consistent range is important, especially when using machine learning algorithms that are sensitive to the magnitude of input features.5. Feature Engineering: Creating relevant features, such as lag values or rolling statistics, can provide additional information for analysis.Code Examples for Data Preprocessing:Here's an example of how to load, clean, and prepare time series data using Python libraries like Pandas and NumPy:import pandas as pd import numpy as np # Load time series data data = pd.read_csv("time_series_data.csv") # Clean and preprocess data data['Date'] = pd.to_datetime(data['Date']) data.set_index('Date', inplace=True) # Resample data to handle missing values (assuming daily data) data_resampled = data.resample('D').mean() data_resampled.fillna(method='ffill', inplace=True) # Feature engineering (e.g., adding lag features) data_resampled['lag_1'] = data_resampled['Value'].shift(1) data_resampled['lag_7'] = data_resampled['Value'].shift(7) # Split data into training and testing sets train_data = data_resampled['Value'][:-30] test_data = data_resampled['Value'][-30:]4. ChatGPT for Time Series ForecastingChatGPT's natural language understanding and generation capabilities can be harnessed effectively for time series forecasting tasks. It can serve as a powerful tool to streamline forecasting processes, provide interactive insights, and facilitate communication within a data science team.Assisting in Time Series Forecasting:1. Generating Forecast Narratives: ChatGPT can generate descriptive narratives explaining forecast results in plain language. This helps in understanding and communicating forecasts to non-technical stakeholders.2. Interactive Forecasting: Data scientists can interact with ChatGPT to explore different forecasting scenarios. By providing ChatGPT with context and queries, you can receive forecasts for various time horizons and conditions.3. Forecast Sensitivity Analysis: You can use ChatGPT to explore the sensitivity of forecasts to different input parameters or assumptions. This interactive analysis can aid in robust decision-making.Code Example for Using ChatGPT in Forecasting:Below is a code example demonstrating how to use ChatGPT to generate forecasts based on prepared time series data. In this example, we use the OpenAI API to interact with ChatGPT for forecasting:import openai openai.api_key = "YOUR_API_KEY" def generate_forecast(query, historical_data):    prompt = f"Forecast the next data point in the time series: '{historical_data}'. The trend appears to be {query}."    response = openai.Completion.create(        engine="text-davinci-002",        prompt=prompt,        max_tokens=20,  # Adjust for desired output length        n=1,  # Number of responses to generate        stop=None,  # Stop criteria    )    forecast = response.choices[0].text.strip()    return forecast # Example usage query = "increasing" forecast = generate_forecast(query, train_data) print(f"Next data point in the time series: {forecast}")5. ChatGPT for Anomaly DetectionChatGPT can play a valuable role in identifying anomalies in time series data by leveraging its natural language understanding capabilities. Anomalies, which represent unexpected and potentially important events or errors, are crucial to detect in various domains, including finance, healthcare, and manufacturing. ChatGPT can assist in this process in the following ways:Contextual Anomaly Descriptions: ChatGPT can provide human-like descriptions of anomalies, making it easier for data scientists and analysts to understand the nature and potential impact of detected anomalies.Interactive Anomaly Detection: Data scientists can interact with ChatGPT to explore potential anomalies and receive explanations for detected outliers. This interactive approach can aid in identifying false positives and false negatives, enhancing the accuracy of anomaly detection.Code Example for Using ChatGPT in Anomaly Detection:Below is a code example demonstrating how to use ChatGPT to detect anomalies based on prepared time series data: import openai openai.api_key = "YOUR_API_KEY" def detect_anomalies(query, historical_data):    prompt = f"Determine if there are any anomalies in the time series: '{historical_data}'. The trend appears to be {query}."    response = openai.Completion.create(        engine="text-davinci-002",        prompt=prompt,        max_tokens=20,  # Adjust for desired output length        n=1,  # Number of responses to generate        stop=None,  # Stop criteria    )    anomaly_detection_result = response.choices[0].text.strip()    return anomaly_detection_result # Example usage query = "increasing with a sudden jump" anomaly_detection_result = detect_anomalies(query, train_data) print(f"Anomaly detection result: {anomaly_detection_result}")6. Limitations and ConsiderationsWhile ChatGPT offers significant advantages in time series analysis, it is essential to be aware of its limitations and consider certain precautions for its effective utilization: 1. Lack of Domain-Specific Knowledge: ChatGPT lacks domain-specific knowledge. It may generate plausible-sounding but incorrect insights, especially in specialized fields. Data scientists should always validate its responses with domain expertise.2. Sensitivity to Input Wording: ChatGPT's responses can vary based on the phrasing of input queries. Data scientists must carefully frame questions to obtain accurate and consistent results.3. Biases in Training Data: ChatGPT can inadvertently perpetuate biases present in its training data. When interpreting its outputs, users should remain vigilant about potential biases and errors.4. Limited Understanding of Context: ChatGPT's understanding of context has limitations. It may not remember information provided earlier in a conversation, which can lead to incomplete or contradictory responses.5. Uncertainty Handling: ChatGPT does not provide uncertainty estimates for its responses. Data scientists should use it as an assistant and rely on robust statistical techniques for decision-making. Best PracticesDomain Expertise: Combine ChatGPT's insights with domain expertise to ensure the accuracy and relevance of its recommendations.Consistency Checks: Ask ChatGPT multiple variations of the same question to assess the consistency of its responses.Fact-Checking: Verify critical information and predictions generated by ChatGPT with reliable external sources.Iterative Usage: Incorporate ChatGPT iteratively into your workflow, using it to generate ideas and hypotheses that can be tested and refined with traditional time series analysis methods.Bias Mitigation: Implement bias mitigation techniques when using ChatGPT in sensitive applications to reduce the risk of biased responses.Understanding the strengths and weaknesses of ChatGPT and taking appropriate precautions will help data scientists harness its capabilities effectively while mitigating potential errors and biases in time series analysis tasks.ConclusionIn summary, ChatGPT offers a transformative approach to time series analysis. It bridges the gap between natural language understanding and data analytics, providing data scientists with interactive insights, forecasting assistance, and anomaly detection capabilities. Its potential to generate human-readable narratives, explain anomalies, and explore diverse scenarios makes it a valuable tool in various domains. However, users must remain cautious of its limitations, verify critical information, and employ it as a supportive resource alongside established analytical methods. As technology evolves, ChatGPT continues to demonstrate its promise as a versatile and collaborative companion in the pursuit of actionable insights from time series data.Author BioBhavishya Pandit is a Data Scientist at Rakuten! He has been extensively exploring GPT to find use cases and build products that solve real-world problems.
Read more
  • 0
  • 0
  • 33518

article-image-how-to-predict-viral-content-using-random-forest-regression-in-python-tutorial
Prasad Ramesh
12 Sep 2018
9 min read
Save for later

How to predict viral content using random forest regression in Python [Tutorial]

Prasad Ramesh
12 Sep 2018
9 min read
Understanding sharing behavior is a big business. As consumers become blind to traditional advertising, the push is to go beyond simple pitches to tell engaging stories. In this article we will build a predictive content scoring model that will predict whether the content will go viral or not using random forest regression. This article is an excerpt from a book written by Alexander T. Combs titled Python Machine Learning Blueprints: Intuitive data projects you can relate to. You can download the code and other relevant files used in this article from this GitHub link. What does research tell us about content virality? Increasingly, the success of these endeavors is measured in social shares. Why go to so much trouble? Because as a brand, every share that I receive represents another consumer that I've reached—all without spending an additional cent. Due to this value, several researchers have examined sharing behavior in the hopes of understanding what motivates it. Among the reasons researchers have found: To provide practical value to others (an altruistic motive) To associate ourselves with certain ideas and concepts (an identity motive) To bond with others around a common emotion (a communal motive) With regard to the last motive, one particularly well-designed study looked at the 7,000 pieces of content from the New York Times to examine the effect of emotion on sharing. They found that simple emotional sentiment was not enough to explain sharing behavior, but when combined with emotional arousal, the explanatory power was greater. For example, while sadness has a strong negative valence, it is considered to be a low arousal state. Anger, on the other hand, has a negative valence paired with a high arousal state. As such, stories that sadden the reader tend to generate far fewer stories than anger-inducing stories: Source : “What Makes Online Content Viral?” by Jonah Berger and Katherine L. Milkman Building a predictive content scoring model Let's create a model that can estimate the share counts for a given piece of content. Ideally, we would have a much larger sample of content, especially content that had more typical share counts. However, we'll make do with what we have here. We're going to use an algorithm called random forest regression. Here we're going to use a regression and attempt to predict the share counts. We could bucket our share classes into ranges, but it is preferable to use regression when dealing with continuous variables. To begin, we'll create a bare-bones model. We'll use the number of images, the site, and the word count. We'll train our model on the number of Facebook likes. We'll first import the sci-kit learn library, then we'll prepare our data by removing the rows with nulls, resetting our index, and finally splitting the frame into our training and testing set: from sklearn.ensemble import RandomForestRegressor all_data = dfc.dropna(subset=['img_count', 'word_count']) all_data.reset_index(inplace=True, drop=True) train_index = [] test_index = [] for i in all_data.index: result = np.random.choice(2, p=[.65,.35]) if result == 1: test_index.append(i) else: train_index.append(i) We used a random number generator with a probability set for approximately 2/3 and 1/3 to determine which row items (based on their index) would be placed in each set. Setting the probabilities this way ensures that we get approximately twice the number of rows in our training set as compared to the test set. We see this, as follows: print('test length:', len(test_index), '\ntrain length:', len(train_index)) The preceding code will generate the following output: Now, we'll continue on with preparing our data. Next, we need to set up categorical encoding for our sites. Currently, our DataFrame object has the name for each site represented with a string. We need to use dummy encoding. This creates a column for each site. If the row is for that particular site, then that column will be filled in with 1; all the other site columns be filled in with 0. Let's do that now: sites = pd.get_dummies(all_data['site']) sites The preceding code will generate the following output: The dummy encoding can be seen in the preceding image. We'll now continue by splitting our data into training and test sets as follows: y_train = all_data.iloc[train_index]['fb'].astype(int) X_train_nosite = all_data.iloc[train_index][['img_count', 'word_count']] X_train = pd.merge(X_train_nosite, sites.iloc[train_index], left_index=True, right_index=True) y_test = all_data.iloc[test_index]['fb'].astype(int) X_test_nosite = all_data.iloc[test_index][['img_count', 'word_count']] X_test = pd.merge(X_test_nosite, sites.iloc[test_index], left_index=True, right_index=True) With this, we've set up our X_test, X_train, y_test, and y_train variables. We'll use this now to build our model: clf = RandomForestRegressor(n_estimators=1000) clf.fit(X_train, y_train) With these two lines of code, we have trained our model. Let's now use it to predict the Facebook likes for our testing set: y_actual = y_test deltas = pd.DataFrame(list(zip(y_pred, y_actual, (y_pred - y_actual)/(y_actual))), columns=['predicted', 'actual', 'delta']) deltas The preceding code will generate the following output: Here we see the predicted value, the actual value, and the difference as a percentage. Let's take a look at the descriptive stats for this: deltas['delta'].describe() The preceding code will generate the following output: Our median error is 0! Well, unfortunately, this isn't a particularly useful bit of information as errors are on both sides—positive and negative, and they tend to average out, which is what we see here. Let's now look at a more informative metric to evaluate our model. We're going to look at root mean square error as a percentage of the actual mean. To first illustrate why this is more useful, let's run the following scenario on two sample series: a = pd.Series([10,10,10,10]) b = pd.Series([12,8,8,12]) np.sqrt(np.mean((b-a)**2))/np.mean(a) This results in the following output: Now compare this to the mean: (b-a).mean() This results in the following output: Clearly the former is the more meaningful statistic. Let's now run this for our model: np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) The preceding code will generate the following output: Let's now add another feature that iscounts for words and see if it  helps our model. We'll use a count vectorizer to do this. Much like what we did with the site names, we'll transform individual words and n-grams into features: from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer(ngram_range=(1,3)) X_titles_all = vect.fit_transform(all_data['title']) X_titles_train = X_titles_all[train_index] X_titles_test = X_titles_all[test_index] X_test = pd.merge(X_test, pd.DataFrame(X_titles_test.toarray(), index=X_test.index), left_index=True, right_index=True) X_train = pd.merge(X_train, pd.DataFrame(X_titles_train.toarray(), index=X_train.index), left_index=True, right_index=True) In these lines, we joined our existing features to our new n-gram features. Let's now train our model and see if we have any improvement: clf.fit(X_train, y_train) y_pred = clf.predict(X_test) deltas = pd.DataFrame(list(zip(y_pred, y_actual, (y_pred - y_actual)/(y_actual))), columns=['predicted', 'actual', 'delta']) deltas The preceding code will generate the following output: While checking our errors again, we see the following: np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) This code results in the following output: So, it appears that we have a modestly improved model. Now, let's add another feature i.e the word count of the title, as follows: all_data = all_data.assign(title_wc = all_data['title'].map(lambda x: len(x.split(' ')))) X_train = pd.merge(X_train, all_data[['title_wc']], left_index=True, right_index=True) X_test = pd.merge(X_test, all_data[['title_wc']], left_index=True, right_index=True) clf.fit(X_train, y_train) y_pred = clf.predict(X_test) np.sqrt(np.mean((y_pred-y_actual)**2))/np.mean(y_actual) The preceding code will generate the following output: It appears that each feature has modestly improved our model. There are certainly more features that we could add to our model. For example, we could add the day of the week and the hour of the posting, we could determine if the article is a listicle by running a regex on the headline, or we could examine the sentiment of each article. This only begins to touch on the features that could be important to model virality. We would certainly need to go much further to continue reducing the error in our model. We have performed only the most cursory testing of our model. Each measurement should be run multiple times to get a more accurate representation of the true error rate. It is possible that there is no statistically discernible difference between our last two models, as we only performed one test. To summarize, we learned how we can build a model to predict content virality using a random forest regression. To know more about predicting and other machine learning projects in Python projects check out Python Machine Learning Blueprints: Intuitive data projects you can relate to. Writing web services with functional Python programming [Tutorial] Visualizing data in R and Python using Anaconda [Tutorial] Python 3.7 beta is available as the second generation Google App Engine standard runtime
Read more
  • 0
  • 0
  • 33426

article-image-kriging-interpolation-geostatistics
Guest Contributor
15 Nov 2017
7 min read
Save for later

Using R to implement Kriging - A Spatial Interpolation technique for Geostatistics data

Guest Contributor
15 Nov 2017
7 min read
The Kriging interpolation technique is being increasingly used in geostatistics these days. But how does Kriging work to create a prediction, after all? To start with, Kriging is a method where the distance and direction between the sample data points indicate a spatial correlation. This correlation is then used to explain the different variations in the surface. In cases where the distance and direction give appropriate spatial correlation, Kriging will be able to predict surface variations in the most effective way. As such, we often see Kriging being used in Geology and Soil Sciences. Kriging generates an optimal output surface for prediction which it estimates based on a scattered set with z-values. The procedure involves investigating the z-values’ spatial behavior in an ‘interactive’ manner where advanced statistical relationships are measured (autocorrelation). Mathematically speaking, Kriging is somewhat similar to regression analysis and its whole idea is to predict the unknown value of a function at a given point by calculating the weighted average of all known functional values in the neighborhood. To get the output value for a location, we take the weighted sum of already measured values in the surrounding (all the points that we intend to consider around a specific radius), using a  formula such as the following: In a regression equation, λi would represent the weights of how far the points are from the prediction location. However, in Kriging, λi represent not just the weights of how far the measured points are from prediction location, but also how the measured points are arranged spatially around the prediction location. First, the variograms and covariance functions are generated to create the spatial autocorrelation of data. Then, that data is used to make predictions. Thus, unlike the deterministic interpolation techniques like Inverse Distance Weighted (IDW) and Spline interpolation tools, Kriging goes beyond just estimating a prediction surface. Here, it brings an element of certainty in that prediction surface. That is why experts rate kriging so highly for a strong prediction. Instead of a weather report forecasting a 2 mm rain on a certain Saturday, Kriging also tells you what is the "probability" of a 2 mm rain on that Saturday. We hope you enjoy this simple R tutorial on Kriging by Berry Boessenkool. Geostatistics: Kriging - spatial interpolation between points, using semivariance We will be covering following sections in our tutorial with supported illustrations: Packages read shapefile Variogram Kriging Plotting Kriging: packages install.packages("rgeos") install.packages("sf") install.packages("geoR") library(sf) # for st_read (read shapefiles), # st_centroid, st_area, st_union library(geoR) # as.geodata, variog, variofit, # krige.control, krige.conv, legend.krige ## Warning: package ’sf’ was built under R version 3.4.1 Kriging: read shapefile / few points for demonstration x <- c(1,1,2,2,3,3,3,4,4,5,6,6,6) y <- c(4,7,3,6,2,4,6,2,6,5,1,5,7) z <- c(5,9,2,6,3,5,9,4,8,8,3,6,7) plot(x,y, pch="+", cex=z/4) Kriging: read shapefile II GEODATA <- as.geodata(cbind(x,y,z)) plot(GEODATA) Kriging: Variogram I EMP_VARIOGRAM <- variog(GEODATA) ## variog: computing omnidirectional variogram FIT_VARIOGRAM <- variofit(EMP_VARIOGRAM) ## variofit: covariance model used is matern ## variofit: weights used: npairs ## variofit: minimisation function used: optim ## Warning in variofit(EMP_VARIOGRAM): initial values not provided - running the default search ## variofit: searching for best initial value ... selected values: ## sigmasq phi tausq kappa ## initial.value "9.19" "3.65" "0" "0.5" ## status "est" "est" "est" "fix" ## loss value: 401.578968904954 Kriging: Variogram II plot(EMP_VARIOGRAM) lines(FIT_VARIOGRAM) Kriging: Kriging res <- 0.1 grid <- expand.grid(seq(min(x),max(x),res), seq(min(y),max(y),res)) krico <- krige.control(type.krige="OK", obj.model=FIT_VARIOGRAM) krobj <- krige.conv(GEODATA, locations=grid, krige=krico) ## krige.conv: model with constant mean ## krige.conv: Kriging performed using global neighbourhood # KRigingObjekt Kriging: Plotting I image(krobj, col=rainbow2(100)) legend.krige(col=rainbow2(100), x.leg=c(6.2,6.7), y.leg=c(2,6), vert=T, off=-0.5, values=krobj$predict) contour(krobj, add=T) colPoints(x,y,z, col=rainbow2(100), legend=F) points(x,y) Kriging: Plotting II library("berryFunctions") # scatterpoints by color colPoints(x,y,z, add=F, cex=2, legargs=list(y1=0.8,y2=1)) Kriging: Plotting III colPoints(grid[ ,1], grid[ ,2], krobj$predict, add=F, cex=2, col2=NA, legargs=list(y1=0.8,y2=1)) Time for a real dataset Precipitation from ca 250 gauges in Brandenburg  as Thiessen Polygons with steep gradients at edges: Exercise 41: Kriging Load and plot the shapefile in PrecBrandenburg.zip with sf::st_read. With colPoints in the package berryFunctions, add the precipitation  values at the centroids of the polygons. Calculate the variogram and fit a semivariance curve. Perform kriging on a grid with a useful resolution (keep in mind that computing time rises exponentially  with grid size). Plot the interpolated  values with image or an equivalent (Rclick 4.15) and add contour lines. What went wrong? (if you used the defaults, the result will be dissatisfying.) How can you fix it? Solution for exercise 41.1-2: Kriging Data # Shapefile: p <- sf::st_read("data/PrecBrandenburg/niederschlag.shp", quiet=TRUE) # Plot prep pcol <- colorRampPalette(c("red","yellow","blue"))(50) clss <- berryFunctions::classify(p$P1, breaks=50)$index # Plot par(mar = c(0,0,1.2,0)) plot(p, col=pcol[clss], max.plot=1) # P1: Precipitation # kriging coordinates cent <- sf::st_centroid(p) berryFunctions::colPoints(cent$x, cent$y, p$P1, add=T, cex=0.7, legargs=list(y1=0.8,y2=1), col=pcol) points(cent$x, cent$y, cex=0.7) Solution for exercise 41.3: Variogram library(geoR) # Semivariance: geoprec <- as.geodata(cbind(cent$x,cent$y,p$P1)) vario <- variog(geoprec, max.dist=130000) ## variog: computing omnidirectional variogram fit <-variofit(vario) ## Warning in variofit(vario): initial values not provided - running the default search ## variofit: searching for best initial value ... selected values: ## sigmasq phi tausq kappa ## initial.value "1326.72" "19999.93" "0" "0.5" ## status "est" "est" "est" "fix" ## loss value: 107266266.76371 plot(vario) ; lines(fit) # distance to closest other point: d <- sapply(1:nrow(cent), function(i) min(berryFunctions::distance( cent$x[i], cent$y[i], cent$x[-i], cent$y[-i]))) hist(d/1000, breaks=20, main="distance to closest gauge [km]") mean(d) # 8 km ## [1] 8165.633 Solution for exercise 41.4-5: Kriging # Kriging: res <- 1000 # 1 km, since stations are 8 km apart on average grid <- expand.grid(seq(min(cent$x),max(cent$x),res), seq(min(cent$y),max(cent$y),res)) krico <- krige.control(type.krige="OK", obj.model=fit) krobj <- krige.conv(geoprec, locations=grid, krige=krico) ## krige.conv: model with constant mean ## krige.conv: Kriging performed using global neighbourhood # Set values outside of Brandenburg to NA: grid_sf <- sf::st_as_sf(grid, coords=1:2, crs=sf::st_crs(p)) isinp <- sapply(sf::st_within(grid_sf, p), length) > 0 krobj2 <- krobj krobj2$predict[!isinp] <- NA Solution for exercise 41.5: Kriging Visualization geoR:::image.kriging(krobj2, col=pcol) colPoints(cent$x, cent$y, p$P1, col=pcol, zlab="Prec", cex=0.7, legargs=list(y1=0.1,y2=0.8, x1=0.78, x2=0.87, horiz=F)) plot(p, add=T, col=NA, border=8)#; points(cent$x,cent$y, cex=0.7) [author title="About the author"]Berry started working with R in 2010 during his studies of Geoecology at Potsdam University, Germany. He has since then given a number of R programming workshops and tutorials, including full-week workshops in Kyrgyzstan and Kazachstan. He has left the department for environmental science in summer 2017 to focus more on software development and teaching in the data science industry. Please follow the Github link for detailed explanations on Berry’s R courses. [/author]
Read more
  • 0
  • 0
  • 33391

article-image-creating-simple-gamemanager-using-unity3d
Ellison Leao
09 Jan 2015
5 min read
Save for later

Creating a simple GameManager using Unity3D

Ellison Leao
09 Jan 2015
5 min read
Using the so called "Game Managers" in games is just as common as eating when making games. Probably every game made has their natural flow: Start -> Play -> Pause -> Die -> Game Over , etc. To handle these different game states, we need a proper manager who can provide a mechanism to know when to change to state "A" to state "B" during gameplay. In this post we will show you how to create a simple game manager for Unity3D games. We will assume that you have some previous knowledge in Unity, but if you haven't get the chance to know it, please go to the Official Learn Unity page and get started. We are going to create the scripts using the C# language. 1 - The Singleton Pattern For the implementation, we will use the Singleton pattern. Why? Some reasons: One instance for all the game implementation, with no possible duplications. The instance is never destroyed on scene changes. It stores the current game state to be accessible anytime. We will not explain the design of the Singleton pattern because it's not the purpose of this post. If you wish to know more about it, you can go here. 2 - The GameManager code Create a new project on Unity and add a first csharp script called SimpleGameManager.cs and add the following code: using UnityEngine; using System.Collections; // Game States // for now we are only using these two public enum GameState { INTRO, MAIN_MENU } public delegate void OnStateChangeHandler(); public class SimpleGameManager { protected SimpleGameManager() {} private static SimpleGameManager instance = null; public event OnStateChangeHandler OnStateChange; public GameState gameState { get; private set; } public static SimpleGameManager Instance{ get { if (SimpleGameManager.instance == null){ DontDestroyOnLoad(SimpleGameManager.instance); SimpleGameManager.instance = new SimpleGameManager(); } return SimpleGameManager.instance; } } public void SetGameState(GameState state){ this.gameState = state; OnStateChange(); } public void OnApplicationQuit(){ SimpleGameManager.instance = null; } } Explaining the code in parts, we have: First we are making some enums for easily check the Game State, so for this example we will have: public enum GameState { INTRO, MAIN_MENU } Then we will have an event delegate method that we will use as a callback when a game state changes. This is ideal for changing scenes. public delegate void OnStateChangeHandler(); Moving forward we will have the gameState attribute, that is a getter for the current Game State. public GameState gameState {get; private set;} Then we will have our class. Taking a look at the singleton implementation we can see that we will use the Instance static variable to get our Game Manager current instance or create a new one if it doesn't exists. It's also interesting to see that we call the DontDestroyOnLoad method in the Game Manager instanciation. On doing that, Unity makes sure that our instance is never destroyed between scenes. The method used to change the Game State is SetGameState, which we only need to pass the GameState enum variable as the parameter. public void SetGameState(GameState state){ this.gameState = state; OnStateChange(); } It automatically sets the new gameState for the instance and call the callback OnStateChangemethod. 3 - Creating Sample Scenes For testing our new Game Manager, we will create 2 Unity scenes: Intro and Menu. The Intro scene will just show some debug messages, simulating an Intro game scene, and after 3 seconds it will change to the Menu Scene were we have the Game Menu code. Create a new scene called Intro and create a csharp script called Intro.cs. Put the following code into the script: using UnityEngine; using System.Collections; public class Intro : MonoBehaviour { SimpleGameManager GM; void Awake () { GM = SimpleGameManager.Instance; GM.OnStateChange += HandleOnStateChange; Debug.Log("Current game state when Awakes: " + GM.gameState); } void Start () { Debug.Log("Current game state when Starts: " + GM.gameState); } public void HandleOnStateChange () { GM.SetGameState(GameState.MAIN_MENU); Debug.Log("Handling state change to: " + GM.gameState); Invoke("LoadLevel", 3f); } public void LoadLevel(){ Application.LoadLevel("Menu"); } } You can see here that we just need to call the Game Manager instance inside the Awake method. The same initialization will happen on the others scripts, to get the current Game Manager state. After getting the Game Manager instance we set the OnStateChange event, which is load the Menu scene after 3 seconds. You can notice that the first line of the event sets the new Game State by calling the SetGameState method. If you run this scene however, you will get an error because we don't have the Menu.cs Scene yet. So let's create it! Create a new scene called Menu and add a csharp script called Menu.cs into this Scene. Add the following code to Menu.cs: using UnityEngine; using System.Collections; public class Menu : MonoBehaviour { SimpleGameManager GM; void Awake () { GM = SimpleGameManager.Instance; GM.OnStateChange += HandleOnStateChange; } public void HandleOnStateChange () { Debug.Log("OnStateChange!"); } public void OnGUI(){ //menu layout GUI.BeginGroup (new Rect (Screen.width / 2 - 50, Screen.height / 2 - 50, 100, 800)); GUI.Box (new Rect (0, 0, 100, 200), "Menu"); if (GUI.Button (new Rect (10, 40, 80, 30), "Start")){ StartGame(); } if (GUI.Button (new Rect (10, 160, 80, 30), "Quit")){ Quit(); } GUI.EndGroup(); } public void StartGame(){ //start game scene GM.SetGameState(GameState.GAME); Debug.Log(GM.gameState); } public void Quit(){ Debug.Log("Quit!"); Application.Quit(); } } We added simple Unity GUI elements for this scene just for example. Run the Intro Scene and check the Debug logs, You should see the messages when the Game State is changing from the old state to the new state and keeping the instance between scenes. And there you have it! You can add more GameStates for multiple screens like Credits, High Score, Levels, etc. The code for this examples is on github, feel free to fork and use it in your games! https://github.com/bttfgames/SimpleGameManager About this Author  Ellison Leão (@ellisonleao) is a passionate software engineer with more than 6 years of experience in web projects and contributor to the MelonJS framework and other open source projects. When he is not writing games, he loves to play drums.
Read more
  • 0
  • 1
  • 33390
article-image-tangled-web-not-all
Packt
22 Jun 2017
20 min read
Save for later

Tangled Web? Not At All!

Packt
22 Jun 2017
20 min read
In this article by Clif Flynt, the author of the book Linux Shell Scripting Cookbook - Third Edition, we can see a collection of shell-scripting recipes that talk to services on the Internet. This articleis intended to help readers understand how to interact with the Web using shell scripts to automate tasks such as collecting and parsing data from web pages. This is discussed using POST and GET to web pages, writing clients to web services. (For more resources related to this topic, see here.) In this article, we will cover the following recipes: Downloading a web page as plain text Parsing data from a website Image crawler and downloader Web photo album generator Twitter command-line client Tracking changes to a website Posting to a web page and reading response Downloading a video from the Internet The Web has become the face of technology and the central access point for data processing. The primary interface to the web is via a browser that's designed for interactive use. That's great for searching and reading articles on the web, but you can also do a lot to automate your interactions with shell scripts. For instance, instead of checking a website daily to see if your favorite blogger has added a new blog, you can automate the check and be informed when there's new information. Similarly, twitter is the current hot technology for getting up-to-the-minute information. But if I subscribe to my local newspaper's twitter account because I want the local news, twitter will send me all news, including high-school sports that I don't care about. With a shell script, I can grab the tweets and customize my filters to match my desires, not rely on their filters. Downloading a web page as plain text Web pages are simply text with HTML tags, JavaScript and CSS. The HTML tags define the content of a web page, which we can parse for specific content. Bash scripts can parse web pages. An HTML file can be viewed in a web browser to see it properly formatted. Parsing a text document is simpler than parsing HTML data because we aren't required to strip off the HTML tags. Lynx is a command-line web browser which download a web page as plaintext. Getting Ready Lynx is not installed in all distributions, but is available via the package manager. # yum install lynx or apt-get install lynx How to do it... Let's download the webpage view, in ASCII character representation, in a text file by using the -dump flag with the lynx command: $ lynx URL -dump > webpage_as_text.txt This command will list all the hyperlinks <a href="link"> separately under a heading References, as the footer of the text output. This lets us parse links separately with regular expressions. For example: $lynx -dump http://google.com > plain_text_page.txt You can see the plaintext version of text by using the cat command: $ cat plain_text_page.txt Search [1]Images [2]Maps [3]Play [4]YouTube [5]News [6]Gmail [7]Drive [8]More » [9]Web History | [10]Settings | [11]Sign in [12]St. Patrick's Day 2017 _______________________________________________________ Google Search I'm Feeling Lucky [13]Advanced search [14]Language tools [15]Advertising Programs [16]Business Solutions [17]+Google [18]About Google © 2017 - [19]Privacy - [20]Terms References Parsing data from a website The lynx, sed, and awk commands can be used to mine data from websites. How to do it... Let's go through the commands used to parse details of actresses from the website: $ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html | grep -o "Rank-.*" | sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' | sort -nk 1 > actresslist.txt The output is: # Only 3 entries shown. All others omitted due to space limits 1 Keira Knightley 2 Natalie Portman 3 Monica Bellucci How it works... Lynx is a command-line web browser—it can dump a text version of a website as we would see in a web browser, instead of returning the raw html as wget or cURL do. This saves the step of removing HTML tags. The -nolist option shows the links without numbers. Parsing and formatting the lines that contain Rank is done with sed: sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' These lines are then sorted according to the ranks. See also The Downloading a web page as plain text recipe in this article explains the lynx command. Image crawler and downloader Image crawlers download all the images that appear in a web page. Instead of going through the HTML page by hand to pick the images, we can use a script to identify the images and download them automatically. How to do it... This Bash script will identify and download the images from a web page: #!/bin/bash #Desc: Images downloader #Filename: img_downloader.sh if [ $# -ne 3 ]; then echo "Usage: $0 URL -d DIRECTORY" exit -1 fi while [ -n $1 ] do case $1 in -d) shift; directory=$1; shift ;; *) url=$1; shift;; esac done mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") echo Downloading $url curl -s $url | egrep -o "<imgsrc=[^>]*>" | sed's/<imgsrc="([^"]*).*/1/g' | sed"s,^/,$baseurl/,"> /tmp/$$.list cd $directory; while read filename; do echo Downloading $filename curl -s -O "$filename" --silent done < /tmp/$$.list An example usage is: $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images How it works... The image downloader script reads an HTML page, strips out all tags except <img>, parses src="URL" from the <img> tag, and downloads them to the specified directory. This script accepts a web page URL and the destination directory as command-line arguments. The [ $# -ne 3 ] statement checks whether the total number of arguments to the script is three, otherwise it exits and returns a usage example. Otherwise, this code parses the URL and destination directory: while [ -n "$1" ] do case $1 in -d) shift; directory=$1; shift ;; *) url=${url:-$1}; shift;; esac done The while loop runs until all the arguments are processed. The shift command shifts arguments to the left so that $1 will take the next argument's value; that is, $2, and so on. Hence, we can evaluate all arguments through $1 itself. The case statement checks the first argument ($1). If that matches -d, the next argument must be a directory name, so the arguments are shifted and the directory name is saved. If the argument is any other string it is a URL. The advantage of parsing arguments in this way is that we can place the -d argument anywhere in the command line: $ ./img_downloader.sh -d DIR URL Or: $ ./img_downloader.sh URL -d DIR The egrep -o "<imgsrc=[^>]*>"code will print only the matching strings, which are the <img> tags including their attributes. The phrase [^>]*matches all the characters except the closing >, that is, <imgsrc="image.jpg">. sed's/<imgsrc="([^"]*).*/1/g' extracts the url from the string src="url". There are two types of image source paths—relative and absolute. Absolute paths contain full URLs that start with http:// or https://. Relative URLs starts with / or image_name itself. An example of an absolute URL is http://example.com/image.jpg. An example of a relative URL is /image.jpg. For relative URLs, the starting / should be replaced with the base URL to transform it to http://example.com/image.jpg. The script initializes the baseurl by extracting it from the initial url with the command: baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") The output of the previously described sed command is piped into another sed command to replace a leading / with the baseurl, and the results are saved in a file named for the script's PID: /tmp/$$.list. sed"s,^/,$baseurl/,"> /tmp/$$.list The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. Web photo album generator Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically.  Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically. Getting ready This script uses a for loop to iterate over every image in the current directory. The usual Bash utilities such as cat and convert (from the Image Magick package) are used. These will generate an HTML album, using all the images, in index.html. How to do it... This Bash script will generate an HTML album page: #!/bin/bash #Filename: generate_album.sh #Description: Create a photo album using images in current directory echo "Creating album.." mkdir -p thumbs cat <<EOF1 > index.html <html> <head> <style> body { width:470px; margin:auto; border: 1px dashed grey; padding:10px; } img { margin:5px; border: 1px solid black; } </style> </head> <body> <center><h1> #Album title </h1></center> <p> EOF1 for img in *.jpg; do convert "$img" -resize "100x""thumbs/$img" echo "<a href="$img">">>index.html echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html done cat <<EOF2 >> index.html </p> </body> </html> EOF2 echo Album generated to index.html Run the script as follows: $ ./generate_album.sh Creating album.. Album generated to index.html How it works... The initial part of the script is used to write the header part of the HTML page. The following script redirects all the contents up to EOF1 to index.html: cat <<EOF1 > index.html contents... EOF1 The header includes the HTML and CSS styling. for img in *.jpg *.JPG; iterates over the file names and evaluates the body of the loop. convert "$img" -resize "100x""thumbs/$img" creates images of 100 px width as thumbnails. The following statements generate the required <img> tag and appends it to index.html: echo "<a href="$img">" echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html Finally, the footer HTML tags are appended with cat as done in the first part of the script. Twitter command-line client Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Let's see how to do it. Getting ready Recently, Twitter stopped allowing people to log in by using plain HTTP Authentication, so we must use OAuth to authenticate ourselves.  Perform the following steps: Download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip, and unzip it to any directory. Go to that directory and then inside the subdirectory bash-oauth-master, run make install-all as root.Go to https://apps.twitter.com/ and register a new app. This will make it possible to use OAuth. After registering the new app, go to your app's settings and change Access type to Read and Write. Now, go to the Details section of the app and note two things—Consumer Key and Consumer Secret, so that you can substitute these in the script we are going to write. Great, now let's write the script that uses this. How to do it... This Bash script uses the OAuth library to read tweets or send your own updates. #!/bin/bash #Filename: twitter.sh #Description: Basic twitter client oauth_consumer_key=YOUR_CONSUMER_KEY oauth_consumer_scret=YOUR_CONSUMER_SECRET config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]]; then echo -e "Usage: $0 tweet status_messagen ORn $0 readn" exit -1; fi #source /usr/local/bin/TwitterOAuth.sh source bash-oauth-master/TwitterOAuth.sh TO_init if [ ! -e $config_file ]; then TO_access_token_helper if (( $? == 0 )); then echo oauth_token=${TO_ret[0]} > $config_file echo oauth_token_secret=${TO_ret[1]} >> $config_file fi fi source $config_file if [[ "$1" = "read" ]]; then TO_statuses_home_timeline'''YOUR_TWEET_NAME''10' echo $TO_ret | sed's/,"/n/g' | sed's/":/~/' | awk -F~ '{} {if ($1 == "text") {txt=$2;} else if ($1 == "screen_name") printf("From: %sn Tweet: %snn", $2, txt);} {}' | tr'"''' elif [[ "$1" = "tweet" ]]; then shift TO_statuses_update''"$@" echo 'Tweeted :)' fi Run the script as follows: $./twitter.sh read Please go to the following link to get the PIN: https://api.twitter.com/oauth/authorize?oauth_token=LONG_TOKEN_STRING PIN: PIN_FROM_WEBSITE Now you can create, edit and present Slides offline. - by A Googler $./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook" Tweeted :) $./twitter.sh read | head -2 From: Clif Flynt Tweet: I am reading Packt Shell Scripting Cookbook How it works... First of all, we use the source command to include the TwitterOAuth.sh library, so we can use its functions to access Twitter. The TO_init function initializes the library. Every app needs to get an OAuth token and token secret the first time it is used. If these are not present, we use the library function TO_access_token_helper to acquire them. Once we have the tokens, we save them to a config file so we can simply source it the next time the script is run. The library function TO_statuses_home_timeline fetches the tweets from Twitter. This data is retuned as a single long string in JSON format, which starts like this: [{"created_at":"Thu Nov 10 14:45:20 +0000 "016","id":7...9,"id_str":"7...9","text":"Dining... Each tweet starts with the created_at tag and includes a text and a screen_nametag. The script will extract the text and screen name data and display only those fields. The script assigns the long string to the variable TO_ret. The JSON format uses quoted strings for the key and may or may not quote the value. The key/value pairs are separated by commas, and the key and value are separated by a colon :. The first sed to replaces each," character set with a newline, making each key/value a separate line. These lines are piped to another sed command to replace each occurrence of ": with a tilde ~ which creates a line like screen_name~"Clif_Flynt" The final awk script reads each line. The -F~ option splits the line into fields at the tilde, so $1 is the key and $2 is the value. The if command checks for text or screen_name. The text is first in the tweet, but it's easier to read if we report the sender first, so the script saves a text return until it sees a screen_name, then prints the current value of $2 and the saved value of the text. The TO_statuses_updatelibrary function generates a tweet. The empty first parameter defines our message as being in the default format, and the message is a part of the second parameter. Tracking changes to a website Tracking website changes is useful to both web developers and users. Checking a website manually impractical, but a change tracking script can be run at regular intervals. When a change occurs, it generate a notification. Getting ready Tracking changes in terms of Bash scripting means fetching websites at different times and taking the difference by using the diff command. We can use curl and diff to do this. How to do it... This bash script combines different commands, to track changes in a webpage: #!/bin/bash #Filename: change_track.sh #Desc: Script to track changes to webpage if [ $# -ne 1 ]; then echo -e "$Usage: $0 URLn" exit 1; fi first_time=0 # Not first time if [ ! -e "last.html" ]; then first_time=1 # Set it is first time run fi curl --silent $1 -o recent.html if [ $first_time -ne 1 ]; then changes=$(diff -u last.html recent.html) if [ -n "$changes" ]; then echo -e "Changes:n" echo "$changes" else echo -e "nWebsite has no changes" fi else echo "[First run] Archiving.." fi cp recent.html last.html Let's look at the output of the track_changes.sh script on a website you control. First we'll see the output when a web page is unchanged, and then after making changes. Note that you should change MyWebSite.org to your website name. First, run the following command: $ ./track_changes.sh http://www.MyWebSite.org [First run] Archiving.. Second, run the command again. $ ./track_changes.sh http://www.MyWebSite.org Website has no changes Third, run the following command after making changes to the web page: $ ./track_changes.sh http://www.MyWebSite.org Changes: --- last.html 2010-08-01 07:29:15.000000000 +0200 +++ recent.html 2010-08-01 07:29:43.000000000 +0200 @@ -1,3 +1,4 @@ +added line :) data How it works... The script checks whether the script is running for the first time by using [ ! -e "last.html" ];. If last.html doesn't exist, it means that it is the first time and, the webpage must be downloaded and saved as last.html. If it is not the first time, it downloads the new copy recent.html and checks the difference with the diff utility. Any changes will be displayed as diff output.Finally, recent.html is copied to last.html. Note that changing the website you're checking will generate a huge diff file the first time you examine it. If you need to track multiple pages, you can create a folder for each website you intend to watch. Posting to a web page and reading the response POST and GET are two types of requests in HTTP to send information to, or retrieve information from a website. In a GET request, we send parameters (name-value pairs) through the webpage URL itself. The POST command places the key/value pairs in the message body instead of the URL. POST is commonly used when submitting long forms or to conceal the information submitted from a casual glance. Getting ready For this recipe, we will use the sample guestbook website included in the tclhttpd package.  You can download tclhttpd from http://sourceforge.net/projects/tclhttpd and then run it on your local system to create a local webserver. The guestbook page requests a name and URL which it adds to a guestbook to show who has visited a site when the user clicks the Add me to your guestbook button. This process can be automated with a single curl (or wget) command. How to do it... Download the tclhttpd package and cd to the bin folder. Start the tclhttpd daemon with this command: tclsh httpd.tcl The format to POST and read the HTML response from generic website resembles this: $ curl URL -d "postvar=postdata2&postvar2=postdata2" Consider the following example: $ curl http://127.0.0.1:8015/guestbook/newguest.html -d "name=Clif&url=www.noucorp.com&http=www.noucorp.com" curl prints a response page like this: <HTML> <Head> <title>Guestbook Registration Confirmed</title> </Head> <Body BGCOLOR=white TEXT=black> <a href="www.noucorp.com">www.noucorp.com</a> <DL> <DT>Name <DD>Clif <DT>URL <DD> </DL> www.noucorp.com </Body> -d is the argument used for posting. The string argument for -d is similar to the GET request semantics. var=value pairs are to be delimited by &. You can POST the data using wget by using --post-data "string". For example: $ wgethttp://127.0.0.1:8015/guestbook/newguest.cgi --post-data "name=Clif&url=www.noucorp.com&http=www.noucorp.com" -O output.html Use the same format as cURL for name-value pairs. The text in output.html is the same as that returned by the cURL command. The string to the post arguments (for example, to -d or --post-data) should always be given in quotes. If quotes are not used, & is interpreted by the shell to indicate that this should be a background process. How to do it... If you look at the website source (use the View Source option from the web browser), you will see an HTML form defined, similar to the following code: <form action="newguest.cgi"" method="post"> <ul> <li> Name: <input type="text" name="name" size="40"> <li> Url: <input type="text" name="url" size="40"> <input type="submit"> </ul> </form> Here, newguest.cgi is the target URL. When the user enters the details and clicks on the Submit button, the name and url inputs are sent to newguest.cgi as a POST request, and the response page is returned to the browser. Downloading a video from the internet There are many reasons for downloading a video. If you are on a metered service, you might want to download videos during off-hours when the rates are cheaper. You might want to watch videos where the bandwidth doesn't support streaming, or you might just want to make certain that you always have that video of cute cats to show your friends. Getting ready One program for downloading videos is youtube-dl. This is not included in most distributions and the repositories may not be up to date, so it's best to go to the youtube-dl main site:http://yt-dl.org You'll find links and information on that page for downloading and installing youtube-dl. How to do it… Using youtube-dl is easy. Open your browser and find a video you like. Then copy/paste that URL to the youtube-dl command line. youtube-dl  https://www.youtube.com/watch?v=AJrsl3fHQ74 While youtube-dl is downloading the file it will generate a status line on your terminal. How it works… The youtube-dl program works by sending a GET message to the server, just as a browser would do. It masquerades as a browser so that YouTube or other video providers will download a video as if the device were streaming. The –list-formats (-F) option will list the available formats a video is available in, and the –format (-f) option will specify which format to download. This is useful if you want to download a higher-resolution video than your internet connection can reliably stream. Summary In this article we learned how to download and parse website data, send data to forms, and automate website-usage tasks and similar activities. We can automate many activities that we perform interactively through a browser with a few lines of scripting. Resources for Article: Further resources on this subject: Linux Shell Scripting – various recipes to help you [article] Linux Shell Script: Tips and Tricks [article] Linux Shell Script: Monitoring Activities [article]
Read more
  • 0
  • 0
  • 33382

article-image-chatgpt-for-sql-queries
Chaitanya Yadav
20 Oct 2023
10 min read
Save for later

ChatGPT for SQL Queries

Chaitanya Yadav
20 Oct 2023
10 min read
Our Data Engineering Byte Newsletter gives data engineers and practitioners what they often lack today: clear, real-world insights—where every byte tells a story.Subscribe here to stay ahead in data engineeringIntroductionChatGPT is an efficient language that may be used in a range of tasks, including the creation of SQL queries. In this article, you will get to know how effectively you will be able to use SQL queries by using ChatGPT to optimize and craft them correctly to get perfect results.It is necessary to have sufficient SQL knowledge before you can use ChatGPT for the creation of SQL queries. The language that the databases are communicating with is SQL. This is meant to be used for the production, reading, updating, and deletion of data from databases. SQL is the most specialized language in this domain. It's one of the main components in a lot of existing applications because it deals with structured data that can be retrieved from tables.There are a number of different SQL queries, but some more common ones include the following:SELECT: It will select data from a database.INSERT: It will insert new data into a database.UPDATE: This query will update the existing data in a database.DELETE: This query is used to delete data from a database.Using ChatGPT to write SQL queriesOnce you have a basic understanding of SQL, you can start using ChatGPT to write SQL queries. To do this, you need to provide ChatGPT with a description of the query that you want to write. After that, ChatGPT will generate the SQL code for you.For example, you could just give ChatGPT the query below to write an SQL query to select all of the customers in your database.Select all of the customers in my databaseFollowing that, ChatGPT will provide the SQL code shown below:SELECT * FROM customers;The customer table's entire set of columns will be selected by this query. Additionally, ChatGPT can be used to create more complex SQL statements.How to Use ChatGPT to Describe Your IntentionsNow let’s have a look at some examples where we will ask ChatGPT to generate SQL code by asking it queries from our side.For Example:We'll be creating a sample database for ChatGPT, so we can ask them to set up restaurant databases and two tables.ChatGPT prompt:Create a sample database with two tables: GuestInfo and OrderRecords. The GuestInfo table should have the following columns: guest_id, first_name, last_name, email_address, and contact_number. The OrderRecords table should have the following columns: order_id, guest_id, product_id, quantity_ordered, and order_date.ChatGPT SQL Query Output:We requested that ChatGPT create a database and two tables in this example. After it generated a SQL query. The following SQL code is to be executed on the Management Studio software for SQL Server. As we are able to see the code which we got from ChatGPT successfully got executed in the SSMS Database software.How ChatGPT Can Be Used for Optimizing, Crafting, and Debugging Your QueriesSQL is an efficient tool to manipulate and interrogate data in the database. However, in particular, for very complex datasets it may be difficult to write efficient SQL queries. The ChatGPT Language Model is a robust model to help you with many tasks, such as optimizing SQL queries.Generating SQL queriesThe creation of SQL queries from Natural Language Statements is one of the most common ways that ChatGPT can be used for SQL optimization. Users who don't know SQL may find this helpful, as well as users who want to quickly create the query for a specific task.For example, you could ask for ChatGPT in the following way:Generate an SQL query to select all customers who have placed an order in the last month.ChatGPT would then generate the following query:SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH;Optimizing existing queriesThe optimization of current SQL queries can also be achieved with ChatGPT. You can do this by giving ChatGPT the query that you want improved performance of and it will then suggest improvements to your query.For example, you could ask for ChatGPT in the following way:SELECT * FROM products WHERE product_name LIKE '%shirt%';ChatGPT might suggest the following optimizations:Add an index to the products table on the product_name column.Use a full-text search index on the product_name column.Use a more specific LIKE clause, such as WHERE product_name = 'shirt' if you know that the product name will be an exact match.Crafting queriesBy providing an interface between SQL and Natural Language, ChatGPT will be able to help with the drafting of complicated SQL queries. For users who are not familiar with SQL and need to create a quick query for a specific task, it can be helpful.For Example:Let's say we want to know which customers have placed an order within the last month, and spent more than $100 on it, then write a SQL query. The following query could be generated by using ChatGPT:SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH AND order_total > 100;This query is relatively easy to perform, but ChatGPT can also be used for the creation of more complicated queries. For example, to select all customers who have placed an order in the last month and who have purchased a specific product, we could use ChatGPT to generate a query.SELECT * FROM customers WHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTH AND order_items LIKE '%product_name%';Generating queries for which more than one table is involved can also be done with ChatGPT. For example, to select all customers who have placed an order in the last month and have also purchased a specific product from a specific category, we could use ChatGPT to generate a query.SELECT customers.*FROM customersINNER JOIN orders ON customers.id = orders.customer_idINNER JOIN order_items ON orders.id = order_items.order_idWHERE order_date >= CURRENT_DATE - INTERVAL 1 MONTHAND order_items_product_id = (SELECT id FROM products WHERE product_name = 'product_name')AND product_category_id = (SELECT id FROM product_categories WHERE category_name = 'category_name');The ChatGPT tool is capable of providing assistance with the creation of complex SQL queries. The ChatGPT feature facilitates users' writing efficient and accurate queries by providing an interface to SQL in a natural language.Debugging SQL queriesFor debugging SQL queries, the ChatGPT can also be used. To get started, you can ask ChatGPT to deliver a query that does not return the anticipated results. It will try to figure out why this is happening.For example, you could ask for ChatGPT in the following way:SELECT * FROM customers WHERE country = 'United States';Let's say that more results than expected are returned by this query. If there are multiple rows in a customer table, or the country column isn't being populated correctly for all clients, ChatGPT may suggest that something is wrong.How ChatGPT can help diagnose SQL query errors and suggest potential fixesYou may find that ChatGPT is useful for diagnosing and identifying problems, as well as suggesting possible remedies when you encounter errors or unexpected results in your SQL queries.To illustrate how ChatGPT could help you diagnose and correct SQL queries, we'll go over a hands-on example.Scenario: You'll be working with a database for Internet store transactions. The 'Products' table is where you would like to see the total revenue from a specific product named "Laptop". But you'll get unexpected results while running a SQL query.Your SQL Query:SELECT SUM(price) AS total_revenue FROM Products WHERE product_name = 'Laptop'; Issue: The query is not providing the expected results. You're not sure what went wrong.ChatGPT Assistance:Diagnosing the Issue:You can ask ChatGPT something like, "What could be the issue with my SQL query to calculate the total revenue of 'Laptop' from the Products table?"ChatGPT’s Response:The ChatGPT believes that the problem may arise from a WHERE clause. It suggests that because the names of products may not be distinctive, and there might be a lot of entries called 'Laptops', it is suggested to use ProductID rather than the product name. This query could be modified as follows:SELECT SUM(price) AS total_revenue FROM Products WHERE product_id = (SELECT product_id FROM Products WHERE product_name = 'Laptop');Explanation and Hands-on Practice:The reasoning behind this adjustment is explained by ChatGPT. In order to check if the revised query is likely to lead to an expected overall profit for a 'Laptop' product, you can then try running it.SELECT SUM(price) AS total_revenue FROM Products WHERE product_id = (SELECT product_id FROM Products WHERE product_name = 'Laptop');We have obtained the correct overall revenue from a 'Laptop' product with this query, which has resolved your unanticipated results issue.This hands-on example demonstrates how ChatGPT can help you diagnose and resolve your SQL problems, provide tailored suggestions, explain the solutions to fix them, and guide you through the process of strengthening your SQL skills by using practical applications.ConclusionIn conclusion, this article provides insight into the important role that ChatGPT plays when it comes to generating efficient SQL queries. In view of the key role played by SQL in database management for structured data, which is essential to modern applications, it stressed that there should be a solid knowledge base on SQL so as to effectively use ChatGPT when creating queries. We explored how ChatGPT could help you generate, optimize, and analyze SQL queries by presenting practical examples and use cases.It explains to users how ChatGPT is able to diagnose SQL errors and propose a solution, which in the end can help them solve unforeseen results and improve their ability to use SQL. In today's data-driven world where effective data manipulation is a necessity, ChatGPT becomes an essential ally for those who seek to speed up the SQL query development process, enhance accuracy, and increase productivity. It will open up new possibilities for data professionals and developers, allowing them to interact more effectively with databases.If you want to learn more about SQL, you can read this book: SQL for Data AnalyticsThis book goes beyond basic SQL and teaches you how to analyze real-world data using advanced techniques like joins, window functions, and statistical analysis. It includes hands-on exercises and case studies to help you build practical, job-ready data analytics skills.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.
Read more
  • 0
  • 0
  • 33339

article-image-react-conf-2019-concurrent-mode-preview-out-css-in-js-react-docs-in-40-languages-and-more
Bhagyashree R
29 Oct 2019
9 min read
Save for later

React Conf 2019: Concurrent Mode preview out, CSS-in-JS, React docs in 40 languages, and more

Bhagyashree R
29 Oct 2019
9 min read
React Conf 2019 wrapped up last week. It was kick-started with a keynote by Tom Occhino and Yuhi Zheng from the React team who both talked about Concurrent Mode and Suspense. Then followed by Frank Yan also from the React team, who explained how they are building the “new Facebook” with React and Relay. One of the major highlights of his talk was the CSS-in-JS library that will be open-sourced once ready. Sophie Alpert, former manager of the React team gave a talk on building a custom React renderer. To demonstrate that, she implemented a small version of ReactDOM in just 30 minutes. There were many other lightning talks and presentations on translated React, building inclusive apps by improving their accessibility, and much more. React Conf 2019 is a two-day event that took place from Oct 24-25 at Lake Las Vegas, Nevada. This conference brought together front-end and full-stack developers to “share knowledge, skills, to network, and just to have fun.” React's long-term goal: "Making it easier to build great user experiences" Tom Occhino, Engineering Director of the React group, took to the stage to talk about the goals for React and the community. He says that React’s long-term goal is to make it easier for developers to build great user experiences. “Easier to build” means improving the developer experience. The three factors that contribute to a great developer experience are a low barrier to entry, developer productivity, and ability to scale. React is constantly working towards improving the developer experience by introducing new features. Two such features are: Concurrent Mode and Suspense. Concurrent Mode Concurrent Mode is a set of features to make React apps more responsive by rendering component trees without blocking the main thread. It gives React the ability to interrupt big blocks of low-priority work in order to focus on higher priority work like responding to user input. This will enable React to work on several state updates concurrently and removing jarring and too frequent DOM updates. The team also released the first early community preview of Concurrent Mode last week. https://twitter.com/reactjs/status/1187411505001746432 Suspense Suspense was introduced as an improvement to the developer experience when dealing with asynchronous data fetching within React apps. It suspends your component rendering and shows a fallback until some condition is met. Occhino describes Suspense as a “React system for orchestrating asynchronous loading of code, data, and resources.” He adds, “Suspense lets the component wait for something before they render. This helps consolidate nested dependencies and nested spinners and things behind the single simple loading experience.” Towards the end of his keynote, Occhino also touched upon how the team plans to make the React community more inclusive and diverse. He said, “Over the past 10 years, I have learned that diverse teams build better products and make better decisions. Everyone working on React shares my conviction about this.” He adds, “Up until recently we have taken a pretty passive stance to building and shaping the React community. We have a responsibility to you all and I feel like we let many of you down. We are committed to doing better!” As a first step, the team has now replaced the React code of conduct with the contributor covenant. Read also: #Reactgate forces React leaders to confront community’s toxic culture head on What’s new the React team is working on Yuzi Zheng, Engineering Manager for React and Relay team at Facebook gave an insight into what projects the core teams are working on. She started off by giving a recap of hooks, which was one of the most-awaited React features announced at React Conf 2018. “Hooks are designed for the future of React in the way that it naturally encourages code that is compatible with all the plumbing features such as accessibility, server-side rendering, suspense, and concurrent mode. Since its release, the reception of Hooks has been really positive,” she shared. If you want to understand the fundamentals of React Hooks and use them for implementing responsive design and more, check out our book, Learning Hooks. Another long-term project that the team is focusing on is providing developers a way to easily build accessibility features in React. Currently, developers can create accessible websites using standard HTML techniques, but it does have some limitations. To help building accessibility directly into React the team is working on two areas: managing focus and input interfaces. For managing focus, the team plans to add primitives that provide “a more structured way of making sure component flows well” for cases like React portals and Suspense fallback and are accessible by default. For input interfaces, they plan to add support for rich gestures that work across platforms and are accessible by default. The team is also focusing on improving the initial render times. Server-side rendering helps in reducing the amount of CPU usage on the client for the initial render to some extent, but it does have some limitations. To meet these limitations, the team plans to add built-in support for server-side rendering. This will work with lazily loaded components to reduce the bytes needed on the client, support streaming down markups in chunks, and be fully-compatible with Concurrent Mode and Suspense. The CSS-in-JS library Frank Yan, Engineering Manager in the React group at Facebook talked about how the team has rebuilt and redesigned the Facebook website and the key lessons they have learned along the way. The new Facebook website is a single-page app with React organizing the HTML and JavaScript into components from the top down and with GraphQL and Relay colocating the queries declaratively in the components. The only key part that the team did not reorganize was CSS. They instead created a new library to embed styles in components called CSS-in-JS. It aims to make the styles easier to read, understand, and update. Its syntax is inspired by React Native and other frameworks. Since it enables you to embed styles inside JavaScript files, you can also use JavaScript tooling like type checkers and linters. React docs translated into 40 languages Nat Alison is a freelance front-end developer who helped the React team coordinate translations of reactjs.org into 40 languages. She shared why and how they were able to translate the docs for this massively popular library. She shared, “More than 80% of the world’s population does not know English. If we restrict React, one of the most popular JavaScript frameworks, we restrict who gets to create and shape the web.” Providing the officially translated docs will make it easier for several non-English speaking React developers to understand and use it in their projects. This will also prevent users from creating unofficial translations, which can be incorrect, outdated, or difficult to find. Initially, they thought of integrating a SaaS platform that allows users to submit translations, but this was not a feasible solution. Then they decided to check out the solution used by Vue, which is maintaining separate repositories for each language forked from the original repo. Similar to Vue, the team also created a bot that periodically tracks for changes in the English repo and submits pull requests whenever there is a change. If you want to contribute to translating React docs in your language, check out the IsReactTranslatedYet website. Developing accessible apps Brittany Feenstra, a developer at Formidable, took to the stage to talk about why accessibility is important and how you can approach it. Accessibility or a11y is making your apps and websites usable for everyone, including people with any kind of disabilities.  There are four types of disabilities that developers need to design for: visual, auditory, motor, and cognitive. Feenstra mentioned that though we all are aware of the importance of accessibility, we often “end up saving it for later” because of tight deadlines. Feenstra, however, compares accessibility with marathons. It is not something that you can achieve in just one sprint, she says. You should instead look at it as a training program that you will follow when participating in a marathon. You need to take a step-by-step approach to make an accessible app. If we do that “we will be way less fatigued and well-equipped,” she adds. Sharing some starting tips she said that we need to focus on three areas. First, learn to run, or in accessibility context, understand the HTML semantics then explore reference patterns, navigation, and focus traps. Second, improve nutritional habits, or in accessibility context, use environments and tools that help us write sturdier code. She recommends using axe, an accessibility checker for WCAG 2 and Section 508 accessibility. Also, check out the tools that basically simulate how people with visual impairment will see your UI such as NoCoffee and I want to see like the colour blind. She emphasizes on linting and testing your code for accessibility with the help of eslint-plugin-jsx-a11y and accessibility assessment automation tools. Third, cross-train and stretch, or in accessibility context, learn to “interact with the UI in ways that let us understand the update we are making to our code.” “React is Fiction” This was a talk by Jenn Creighton, a Frontend Architect at The Wing, who comes from a creative writing background. “Writing React to me felt like coming home. It was really familiar in a way that I could not pinpoint,” she said. Then she realized that writing React reminded her of fiction and merging the two disciplines helped her write better components. Creighton drew the similarities between developing in React and creative writing. One of the key principles of creative writing is “Show, don’t tell” that advises authors to describe a situation instead of just telling it. This will help engage the readers as they will be able to picture the situation in their heads. According to Creighton, React also has a similar principle: “Declarative, not imperative.”  React is declarative, which allows developers to describe what the final state should be, instead of listing all the steps to reach that state. There were many other exciting talks about progressive web animations, building React-Select, and more. Check out the live streams to watch the full talks: Day1: https://www.youtube.com/watch?v=RCiccdQObpo Day2: https://www.youtube.com/watch?v=JDDxR1a15Yo&t=2376s Ionic React released; Ionic Framework pivots from Angular to a native React version ReactOS 0.4.12 releases with kernel improvements, Intel e1000 NIC driver support, and more React Native 0.61 introduces Fast Refresh for reliable hot reloading
Read more
  • 0
  • 0
  • 33307
article-image-bizarre-python
Packt
19 Aug 2015
20 min read
Save for later

The strange relationship between objects, functions, generators and coroutines

Packt
19 Aug 2015
20 min read
The strange relationship between objects, functions, generators and coroutines In this article, I’d like to investigate some relationships between functions, objects, generators and coroutines in Python. At a theoretical level, these are very different concepts, but because of Python’s dynamic nature, many of them can appear to be used interchangeably. I discuss useful applications of all of these in my book, Python 3 Object-oriented Programming - Second Edition. In this essay, we’ll examine their relationship in a more whimsical light; most of the code examples below are ridiculous and should not be attempted in a production setting! Let’s start with functions, which are simplest. A function is an object that can be executed. When executed, the function is entered at one place accepting a group of (possibly zero) objects as parameters. The function exits at exactly one place and always returns a single object. Already we see some complications; that last sentence is true, but you might have several counter-questions: What if a function has multiple return statements? Only one of them will be executed in any one call to the function. What if the function doesn’t return anything? Then it it will implicitly return the None object. Can’t you return multiple objects separated by a comma? Yes, but the returned object is actually a single tuple Here’s a look at a function: def average(sequence): avg = sum(sequence) / len(sequence) return avg print(average([3, 5, 8, 7])) which outputs: That’s probably nothing you haven’t seen before. Similarly, you probably know what an object and a class are in Python. I define an object as a collection of data and associated behaviors. A class represents the “template” for an object. Usually the data is represented as a set of attributes and the behavior is represented as a collection of method functions, but this doesn’t have to be true. Here’s a basic object: class Statistics: def __init__(self, sequence): self.sequence = sequence def calculate_average(self): return sum(self.sequence) / len(self.sequence) def calculate_median(self): length = len(self.sequence) is_middle = int(not length % 2) return ( self.sequence[length // 2 - is_middle] + self.sequence[-length // 2]) / 2 statistics = Statistics([5, 2, 3]) print(statistics.calculate_average()) which outputs: This object has one piece of data attached to it: the sequence. It also has two methods besides the initializer. Only one of these methods is used in this particular example, but as Jack Diederich said in his famous Stop Writing Classes talk, a class with only one function besides the initializer should just be a function. So I included a second one to make it look like a useful class (It’s not. The new statistics module in Python 3.4 should be used instead. Never define for yourself that which has been defined, debugged, and tested by someone else). Classes like this are also things you’ve seen before, but with this background in place, we can now look at some bizarre things you might not expect (or indeed, want) to be able to do with a function. For example, did you know that functions are objects? In fact, anything that you can interact with in Python is defined in the source code for the CPython interpreter as a PyObject structure. This includes functions, objects, basic primitives, containers, classes, modules, you name it. This means we can attach attributes to a function just as with any standard object. Ah, but if functions are objects, can you attach functions to functions? Don’t try this at home (and especially don’t do it at work): def statistics(sequence): statistics.sequence = sequence return statistics def calculate_average(): return sum(statistics.sequence) / len(statistics.sequence) statistics.calculate_average = calculate_average print(statistics([1, 5, 8, 4]).calculate_average()) which outputs: This is a pretty crazy example (but we’re just getting started). The statistics function is being set up as an object that has two attributes: sequence is a list and calculate_average is another function object. For fun, the function returns itself so that the print function can call the calculate_average function all in one line. Note that the statistics function here is an object, not a class. Rather than emulating the Statistics class in the previous example, it is more similar to the statistics instance of that class. It is hard to imagine any reason that you would want to write code like this in real life. Perhaps it could be used to implement the Singleton (anti-)pattern popular with some other languages. Since there can only ever be one statistics function, it is not possible to create two distinct instances with two distinct sequence attributes the way we can with the Statistics class. There is generally little need for such code in Python, though, because of its ‘consenting adults’ nature. We can more closely simulate a class by using a function like a constructor: def Statistics(sequence): def self(): return self.average() self.sequence = sequence def average(): return sum(self.sequence) / len(self.sequence) self.average = average return self statistics = Statistics([2, 1, 1]) print(Statistics([1, 4, 6, 2]).average()) print(statistics()) which outputs: That looks an awful lot like Javascript, doesn’t it? The Statistics function acts like a constructor that returns an object (that happens to be a function, named self). That function object has had a couple attributes attached to it, so our function is now an object with both data and behavior. The last three lines show that we can instantiate two separate Statistics “objects” just as if it were a class. Finally, since the statistics object in the last line really is a function, we can even call it directly. It proxies the call through to the average function (or is it a method at this point? I can’t tell anymore) defined on itself. Before we go on, note that this simulated overlapping of functionality does not mean that we are getting exactly the same behavior out of the Python interpreter. While functions are objects, not all objects are functions. The underlying implementation is different, and if you try to do this in production code, you’ll quickly find confusing anomalies. In normal code, the fact that functions can have attributes attached to them is rarely useful. I’ve seen it used for interesting diagnostics or testing, but it’s generally just a hack. However, knowing that functions are objects allows us to pass them around to be called at a later time. Consider this basic partial implementation of an observer pattern: class Observers(list): register_observer = list.append def notify(self): for observer in self: observer() observers = Observers() def observer_one(): print('one was called') def observer_two(): print('two was called') observers.register_observer(observer_one) observers.register_observer(observer_two) observers.notify() which outputs: At line 2, I’ve intentionally reduced the comprehensibility of this code to conform to my initial ‘most of the examples in this article are ridiculous’ thesis. This line creates a new class attribute named register_observer which points to the list.append function. Since the Observers class inherits from the list class, this line essentially creates a shortcut to a method that would look like this: def register_observer(self, item): self.append(item) And this is how you should do it in your code. Nobody’s going to understand what’s going on if you follow my version. The part of this code that you might want to use in real life is the way the callback functions are passed into the registration function at lines 16 and 17. Passing functions around like this is quite common in Python. The alternative, if functions were not objects, would be to create a bunch of classes that have a single method with an uninformative name like execute and pass those around instead. Observers are a bit too useful, though, so in the spirit of keeping things ridiculous, let’s make a silly function that returns a function object: def silly(): print("silly") return silly silly()()()()()() which outputs: Since we’ve seen some ways that functions can (sort of) imitate objects, let’s now make an object that can behave like a function: class Function: def __init__(self, message): self.message = message def __call__(self, name): return "{} says '{}'".format(name, self.message) function = Function("I am a function") print(function('Cheap imitation function')) which outputs: I don’t use this feature often, but it can be useful in a few situations. For example, if you write a function and call it from many different places, but later discover that it needs to maintain some state, you can change the function to an object and implement the __call__ method without changing all the call sites. Or if you have a callback implementation that normally passes functions around, you can use a callable object when you need to store more complicated state. I’ve also seen Python decorators made out of objects when additional state or behavior is required. Now, let’s talk about generators. As you might expect by now, we’ll start with the silliest way to implement generation code. We can use the idea of a function that returns an object to create a rudimentary generatorish object that calculates the Fibonacci sequence: def FibFunction(): a = b = 1 def next(): nonlocal a, b a, b = b, a + b return b return next fib = FibFunction() for i in range(8): print(fib(), end=' ') which outputs: This is a pretty wacky thing to do, but the point is that it is possible to build functions that are able to maintain state between calls. The state is stored in the surrounding closure; we can access that state by referencing them with Python 3’s nonlocal keyword. It is kind of like global except it accesses the state from the surrounding function, not the global namespace. We can, of course, build a similar construct using classic (or classy) object notation: class FibClass(): def __init__(self): self.a = self.b = 1 def __call__(self): self.a, self.b = self.b, self.a + self.b return self.b fib = FibClass() for i in range(8): print(fib(), end=' ') which outputs: Of course, neither of these obeys the iterator protocol. No matter how I wrangle it, I was not able to get FibFunction to work with Python’s builtin next() function, even after looking through the CPython source code for a couple hours. As I mentioned earlier, using the function syntax to build pseudo-objects quickly leads to frustration. However, it’s easy to tweak the object based FibClass to fulfill the iterator protocol: class FibIterator(): def __init__(self): self.a = self.b = 1 def __next__(self): self.a, self.b = self.b, self.a + self.b return self.b def __iter__(self): return self fib = FibIterator() for i in range(8): print(next(fib), end=' ') which outputs: This class is a standard implementation of the iterator pattern. But it’s kind of ugly and verbose. Luckily, we can get the same effect in Python with a function that includes a yield statement to construct a generator. Here’s the Fibonacci sequence as a generator: ef FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(next(fib), end=' ') print('n', fib) which outputs: The generator version is a bit more readable than the other two implementations. The thing to pay attention to here is that a generator is not a function. The FibGenerator function returns an object as illustrated by the words “generator object” in the last line of output above. Unlike a normal function, a generator function does not execute any code inside it when we call the function. Instead it constructs a generator object and returns that. You could think of this as like an implicit Python decorator; the Python interpreter sees the yield keyword and wraps it in a decorator that returns an object instead. To start the function code executing, we have to use the next function (either explicitly as in the examples, or implicitly by using a for loop or yield from). While a generator is technically an object, it is often convenient to think of the function that creates it as a function that can have data passed in at one place and can return values multiple times. It’s sort of like a generic version of a function (which can have data passed in at one place and return a value at only one place). It is easy to make a generator that behaves not completely unlike a function, by yielding only one value: def average(sequence): yield sum(sequence) / len(sequence) print(next(average([1, 2, 3]))) which outputs: Unfortunately, the call site at line 4 is less readable than a normal function call, since we have to throw that pesky next() in there. The obvious way around this would be to add a __call__ method to the generator but this fails if we try to use attribute assignment or inheritance. There are optimizations that make generators run quickly in C code and also don’t let us assign attributes to them. We can, however, wrap the generator in a function-like object using a ludicrous decorator: def gen_func(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) return next(gen) return wrapper @gen_func def average(sequence): yield sum(sequence) / len(sequence) print(average([1, 6, 3, 4])) which outputs: Of course this is an absurd thing to do. I mean, just write a normal function for pity’s sake! But taking this idea a little further, it could be tempting to create a slightly different wrapper: def callable_gen(func): class CallableGen: def __init__(self, *args, **kwargs): self.gen = func(*args, **kwargs) def __next__(self): return self.gen.__next__() def __iter__(self): return self def __call__(self): return next(self) return CallableGen @callable_gen def FibGenerator(): a = b = 1 while True: a, b = b, a + b yield b fib = FibGenerator() for i in range(8): print(fib(), end=' ') which outputs: To completely wrap the generator, we’d need to proxy a few other methods through to the underlying generator including send, close, and throw. This generator wrapper can be used to call a generator any number of times without calling the next function. I’ve been tempted to do this to make my code look cleaner if there are a lot of next calls in it, but I recommend not yielding into this temptation. Coders reading your code, including yourself, will go berserk trying to figure out what that “function call” is doing. Just get used to the next function and ignore this decorator business. So we’ve drawn some parallels between generators, objects and functions. Let’s talk now about one of the more confusing concepts in Python: coroutines. In Python, coroutines are usually defined as “generators that you can send values into”. At an implementation level, this is probably the most sensible definition. In the theoretical sense, however, it is more accurate to define coroutines as constructs that can accept values at one or more locations and return values at one or more locations. Therefore, while in Python it is easy to think of a generator as a special type of function that has yield statements and a coroutine as a special type of generator that we can send data into a different points, a better taxonomy is to think of a coroutine that can accept and return values at multiple locations as a general case, and generators and functions as special types of coroutines that are restricted in where they can accept or return values. So let’s see a coroutine: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 try: line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 except StopIteration as ex: print('n' + 'n'.join(ex.value)) which outputs: The LineInserter object is called a coroutine rather than a generator only because the yield statement is placed on the right side of an assignment operator. Now whenever we yield a line, it stores any value that might have been sent back into the coroutine in the to_append variable. As you can see in the driver code, we can send a value back in using inserter.send. if you instead just use next, the to_append variable gets a value of None. Don’t ask me why next is a function and send is a method when they both do nearly the same thing! In this example, we use the send call to insert a ruler every four lines to separate stanzas in Emily Dickinson’s famous poem. But I used the exact same coroutine in a program that parses the source file for this article. It checks if any line contains the string !#python, and if so, it executes the subsequent code block and inserts the output (see the ‘which outputs’ lines throughout this article) into the article. Coroutines can provide that little extra something when normal ‘one way’ iteration doesn’t quite cut it. The coroutine in the last example is really nice and elegant, but I find the driver code a bit annoying. I think it’s just me, but something about the indentation of a try…catch statement always frustrates me. Recently, I’ve been emulating Python 3.4’s contextlib.suppress context manager to replace except clauses with a callback. For example: def LineInserter(lines): out = [] for line in lines: to_append = yield line out.append(line) if to_append is not None: out.append(to_append) return out from contextlib import contextmanager @contextmanager def generator_stop(callback): try: yield except StopIteration as ex: callback(ex.value) def lines_complete(all_lines): print('n' + 'n'.join(all_lines)) emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ inserter = LineInserter(iter(emily.splitlines())) count = 1 with generator_stop(lines_complete): line = next(inserter) while True: line = next(inserter) if count % 4 else inserter.send('-------') count += 1 which outputs: The generator_stop now encapsulates all the ugliness, and the context manager can be used in a variety of situations where StopIteration needs to be handled. Since coroutines are undifferentiated from generators, they can emulate functions just as we saw with generators. We can even call into the same coroutine multiple times as if it were a function: def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) next(increment_by_5) next(increment_by_8) print(increment_by_5.send(sequence)) print(increment_by_8.send(sequence)) print(increment_by_5.send(sequence)) which outputs: Note the two calls to next at lines 9 and 10. These effectively “prime” the generator by advancing it to the first yield statement. Then each call to send essentially looks like a single call to a function. The driver code for this coroutine doesn’t look anything like calling a function, but with some evil decorator magic, we can make it look less disturbing: def evil_coroutine(func): def wrapper(*args, **kwargs): gen = func(*args, **kwargs) next(gen) def gen_caller(arg=None): return gen.send(arg) return gen_caller return wrapper @evil_coroutine def IncrementBy(increment): sequence = yield while True: sequence = yield [i + increment for i in sequence] sequence = [10, 20, 30] increment_by_5 = IncrementBy(5) increment_by_8 = IncrementBy(8) print(increment_by_5(sequence)) print(increment_by_8(sequence)) print(increment_by_5(sequence)) which outputs: The decorator accepts a function and returns a new wrapper function that gets assigned to the IncrementBy variable. Whenever this new IncrementBy is called, it constructs a generator using the original function, and advances it to the first yield statement using next (the priming action from before). It returns a new function that calls send on the generator each time it is called. This function makes the argument default to None so that it can also work if we call next instead of send. The new driver code is definitely more readable, but once again, I would not recommend using this coding style to make coroutines behave like hybrid object/functions. The argument that other coders aren’t going to understand what is going through your head still stands. Plus, since send can only accept one argument, the callable is quite restricted. Before we leave our discussion of the bizarre relationships between these concepts, let’s look at how the stanza processing code could look without an explicit coroutine, generator, function, or object: emily = """I died for beauty, but was scarce Adjusted in the tomb, When one who died for truth was lain In an adjoining room. He questioned softly why I failed? “For beauty,” I replied. “And I for truth,—the two are one; We brethren are,” he said. And so, as kinsmen met a night, We talked between the rooms, Until the moss had reached our lips, And covered up our names. """ for index, line in enumerate(emily.splitlines(), start=1): print(line) if not index % 4: print('------') which outputs: This code is so simple and elegant! This happens to me nearly every time I try to use coroutines. I keep refactoring and simplifying it until I discover that coroutines are making my code less, not more, readable. Unless I am explicitly modeling a state-transition system or trying to do asynchronous work using the terrific asyncio library (which wraps all the possible craziness with StopIteration, cascading exceptions, etc), I rarely find that coroutines are the right tool for the job. That doesn’t stop me from attempting them though, because they are fun. For the record, the LineInserter coroutine actually is useful in the markdown code executor I use to parse this source file. I need to keep track of more transitions between states (am I currently looking for a code block? am I in a code block? Do I need to execute the code block and record the output?) than in the stanza marking example used here. So, it has become clear that in Python, there is more than one way to do a lot of things. Luckily, most of these ways are not very obvious, and there is usually “one, and preferably only one, obvious way to do things”, to quote The Zen Of Python. I hope by this point that you are more confused about the relationship between functions, objects, generators and coroutines than ever before. I hope you now know how to write much code that should never be written. But mostly I hope you’ve enjoyed your exploration of these topics. If you’d like to see more useful applications of these and other Python concepts, grab a copy of Python 3 Object-oriented Programming, Second Edition.
Read more
  • 0
  • 0
  • 33283

article-image-iclr-2019-highlights-algorithmic-fairness-ai-for-social-good-climate-change-protein-structures-gan-magic-adversarial-ml-and-much-more
Amrata Joshi
09 May 2019
7 min read
Save for later

ICLR 2019 Highlights: Algorithmic fairness, AI for social good, climate change, protein structures, GAN magic, adversarial ML and much more

Amrata Joshi
09 May 2019
7 min read
The ongoing ICLR 2019 (International Conference on Learning Representations) has brought a pack full of surprises and key specimens of innovation. The conference started on Monday, this week and it’s already the last day today! This article covers the highlights of ICLR 2019 and introduces you to the ongoing research carried out by experts in the field of deep learning, data science, computational biology, machine vision, speech recognition, text understanding, robotics and much more. The team behind ICLR 2019, invited papers based on Unsupervised objectives for agents, Curiosity and intrinsic motivation, Few shot reinforcement learning, Model-based planning and exploration, Representation learning for planning, Learning unsupervised goal spaces, Unsupervised skill discovery and Evaluation of unsupervised agents. https://twitter.com/alfcnz/status/1125399067490684928 ICLR 2019, sponsored by Google marks the presence of 200 researchers contributing to and learning from the academic research community by presenting papers and posters. ICLR 2019 Day 1 highlights: Neural network, Algorithmic fairness, AI for social good and much more Algorithmic fairness https://twitter.com/HanieSedghi/status/1125401294880083968 The first day of the conference started with a talk on Highlights of Recent Developments in Algorithmic Fairness by Cynthia Dwork, an American computer scientist at Harvard University. She focused on "group fairness" notions that address the relative treatment of different demographic groups. And she talked on research in the ML community that explores fairness via representations. The investigation of scoring, classifying, ranking, and auditing fairness was also discussed in this talk by Dwork. Generating high fidelity images with Subscale Pixel Networks and Multidimensional Upscaling https://twitter.com/NalKalchbrenner/status/1125455415553208321 Jacob Menick, a senior research engineer at Google, Deep Mind and Nal Kalchbrenner, staff research scientist and co-creator of the Google Brain Amsterdam research lab talked on Generating high fidelity images with Subscale Pixel Networks and Multidimensional Upscaling. They talked about the challenges involved in generating the images and how they address this issue with the help of Subscale Pixel Network (SPN). It is a conditional decoder architecture that helps in generating an image as a sequence of image slices of equal size. They also explained how Multidimensional Upscaling is used to grow an image in both size and depth via intermediate stages corresponding to distinct SPNs. There were in all 10 workshops conducted on the same day based on AI and deep learning covering topics such as, The 2nd Learning from Limited Labeled Data (LLD) Workshop: Representation Learning for Weak Supervision and Beyond Deep Reinforcement Learning Meets Structured Prediction AI for Social Good Debugging Machine Learning Models The first day also witnessed a few interesting talks on neural networks covering topics such as The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, How Powerful are Graph Neural Networks? etc. Overall the first day was quite enriching and informative. ICLR 2019 Day 2 highlights: AI in climate change, Protein structure, adversarial machine learning, CNN models and much more AI’s role in climate change https://twitter.com/natanielruizg/status/1125763990158807040 Tuesday, also the second day of the conference, started with an interesting talk on Can Machine Learning Help to Conduct a Planetary Healthcheck? by Emily Shuckburgh, a Climate scientist and deputy head of the Polar Oceans team at the British Antarctic Survey. She talked about the sophisticated numerical models of the Earth’s systems which have been developed so far based on physics, chemistry and biology. She then highlighted a set of "grand challenge" problems and discussed various ways in which Machine Learning is helping to advance our capacity to address these. Protein structure with a differentiable simulator On the second day of ICLR 2019, Chris Sander, computational biologist, John Ingraham, Adam J Riesselman, and Debora Marks from Harvard University, talked on Learning protein structure with a differentiable simulator. They about the protein folding problem and their aim to bridge the gap between the expressive capacity of energy functions and the practical capabilities of their simulators by using an unrolled Monte Carlo simulation as a model for data. They also composed a neural energy function with a novel and efficient simulator which is based on Langevin dynamics for building an end-to-end-differentiable model of atomic protein structure given amino acid sequence information. They also discussed certain techniques for stabilizing backpropagation and demonstrated the model's capacity to make multimodal predictions. Adversarial Machine Learning https://twitter.com/natanielruizg/status/1125859734744117249 Day 2 was long and had Ian Goodfellow, a machine learning researcher and inventor of GANs, to talk on Adversarial Machine Learning. He talked about supervised learning works and making machine learning private, getting machine learning to work for new tasks and also reducing the dependency on large amounts of labeled data. He then discussed how the adversarial techniques in machine learning are involved in the latest research frontiers. Day 2 covered poster presentation and a few talks on Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset,  Learning to Remember More with Less Memorization, Learning to Remember More with Less Memorization, etc. ICLR 2019 Day 3 highlights: GAN, Autonomous learning and much more Developmental autonomous learning: AI, Cognitive Sciences and Educational Technology https://twitter.com/drew_jaegle/status/1125522499150721025 Day 3 of ICLR 2019 started with Pierre-Yves Oudeyer’s, research director at Inria talk on Developmental Autonomous Learning: AI, Cognitive Sciences and Educational Technology. He presented a research program that focuses on computational modeling of child development and learning mechanisms. He then discussed the several developmental forces that guide exploration in large real-world spaces. He also talked about the models of curiosity-driven autonomous learning that enables machines to sample and explore their own goals and learning strategies. He then explained how these models and techniques can be successfully applied in the domain of educational technologies. Generating knockoffs for feature selection using Generative Adversarial Networks (GAN) Another interesting topic on the third day of ICLR 2019 was Generating knockoffs for feature selection using Generative Adversarial Networks (GAN) by James Jordon from Oxford University, Jinsung Yoon from California University, and Mihaela Schaar Professor at UCLA. The experts talked about the Generative Adversarial Networks framework that helps in generating knockoffs with no assumptions on the feature distribution. They also talked about the model they created which consists of 4 networks, a generator, a discriminator, a stability network and a power network. They further demonstrated the capability of their model to perform feature selection. Followed by few more interesting topics like Deterministic Variational Inference for Robust Bayesian Neural Networks, there were series of poster presentations. ICLR 2019 Day 4 highlights: Neural networks, RNN, neuro-symbolic concepts and much more Learning natural language interfaces with neural models Today’s focus was more on neural models and neuro symbolic concepts. The day started with a talk on Learning natural language interfaces with neural models by Mirella Lapata, a computer scientist. She gave an overview of recent progress on learning natural language interfaces which allow users to interact with various devices and services using everyday language. She also addressed the structured prediction problem of mapping natural language utterances onto machine-interpretable representations. She further outlined the various challenges it poses and described a general modeling framework based on neural networks which tackle these challenges. Ordered neurons: Integrating tree structures into Recurrent Neural Networks https://twitter.com/mmjb86/status/1126272417444311041 The next interesting talk was on Ordered neurons: Integrating tree structures into Recurrent Neural Networks by Professors Yikang Shen, Aaron Courville and Shawn Tan from Montreal University, and, Alessandro Sordoni, a researcher at Microsoft. In this talk, the experts focused on how they proposed a new RNN unit: ON-LSTM, which achieves good performance on four different tasks including language modeling, unsupervised parsing, targeted syntactic evaluation, and logical inference. The last day of ICLR 2019 was exciting and helped the researchers present their innovations and attendees got a chance to interact with the experts. To have a complete overview of each of these sessions, you can head over to ICLR’s Facebook page. Paper in Two minutes: A novel method for resource efficient image classification Google I/O 2019 D1 highlights: smarter display, search feature with AR capabilities, Android Q, linguistically advanced Google lens and more Google I/O 2019: Flutter UI framework now extended for Web, Embedded, and Desktop
Read more
  • 0
  • 0
  • 33272
Modal Close icon
Modal Close icon