How-To Tutorials

article-image-how-configure-squid-proxy-server

25 Apr 2011

8 min read

How to Configure Squid Proxy Server

25 Apr 2011

0
7
64292

article-image-chatgpt-for-data-governance

Jyoti Pathak

22 Mar 2024

11 min read

ChatGPT for Data Governance

Jyoti Pathak

22 Mar 2024

11 min read

0
0
64268

article-image-how-to-create-sales-analysis-app-in-qlik-sense-using-dar-method-tutorial

Savia Lobo

19 Jul 2019

14 min read

How to create sales analysis app in Qlik Sense using DAR method [Tutorial]

Savia Lobo

19 Jul 2019

14 min read

A Qlik Sense application combines data in the form of a structured data model, visualizations, sheets, and stories. Dimensions, measures, and visualizations can be created as data items that can be reused between several visualizations (charts) and sheets (visualizations). This article is taken from the book Hands-On Business Intelligence with Qlik Sense by Kaushik Solanki, Pablo Labbe, Clever Anjos, and Jerry DiMaso. By the end of this book, you will be well-equipped to run successful business intelligence applications using Qlik Sense's functionality, data modeling techniques, and visualization best practices. To follow along with the examples implemented in this article, you can download the code from the book’s GitHub repository. In this article, we will create a sales analysis application to explore and analyze the data model that you can find on GitHub. While developing the application, we will apply the Dashboard, Analysis, Reporting (DAR) methodology. Technical requirements We will use the application we previously created, as a starting point, with a loaded data model to eliminate the process of loading and modeling the data all over again. You can also download the initial and final versions of the application from the book repository on GitHub. After downloading the initial version of the application, follow these steps: If you are using Qlik Sense Desktop, place the QVF application file in the QlikSense\Apps folder, under your document's personal folder. If you are using Qlik Sense Cloud, upload the application to your personal workspace. The problem statement Suppose that the company is a worldwide wholesale seller of food and beverages. This means that they need to review information about their sales in several countries. They need to know their performance in each of them, and they require this information to be detailed by the customer name, category name, and product name. They also need to know the average percentage of discounts, how many orders they have issued, and the total sales amount. Each of these key indicators should be dictated by the month and year. That explains the basis for this case. We need to find the perfect solution by accessing the right dashboard and displaying all of the data in the most consolidated fashion. Creating the dashboard sheet We will begin by creating a new sheet with the name Dashboard: Open the app and click on Create new sheet: Set the Title of the sheet to Dashboard: Click on the sheet icon to save the title, and open the sheet to start creating visualizations. Creating KPI visualizations A KPI visualization is used to get an overview of the performance values that are important to our company. To add the KPI visualizations to the sheet, follow these steps: Click on the Edit button located on the toolbar to enter the edit mode: Click on the Master items button on the asset panel and click on the Measures heading: Click on Sales $ and drag and drop it into the empty space on the sheet: Qlik Sense will create a new visualization of the KPI type because we have selected a measure: Resize the visualization toward the top-left of the sheet: Repeat steps 1 through 5 to add two visualizations for the Avg Discount % and Orders # measures. Place the objects to the right of the previously added visualization: To change the type of visualization from Gauge to KPI, click on the chart type selector: Select the KPI chart type: Now, all three of the measures are visualized as KPI: Creating a pie chart with Sales $ by Categories To add the pie chart with Sales $ by Categories onto the sheet, follow these steps: Click on the Charts button on the asset panel, which is on the left-hand side of the screen, to open the chart selector panel. Click on Pie chart and drag and drop it into the empty space on the sheet: Click on the Add dimension button and select Category in the Dimensions section: Click on the Add measure button and select Sales $ in the Measures section: The pie chart will look like this: Now, we will enhance the presentation of the chart by removing the Dimension label and adding a title to the chart: To remove the Dimension label, select the Appearance button that lies in the properties panel at the right-hand side of the screen and expand Presentation, under which you will find the Dimension label. Turn it off by simply clicking on the toggle button: Click on the title of the object and type Sales $ share by Category: Click on Done in the toolbar to enter the visualization mode: Creating a bar chart with Sales $ by Top 10 Customers To add the bar chart with the top 10 customers by sales $ to the sheet, carry out these steps: Before adding the bar chart, resize the pie chart: Click on the Charts button that lies on the asset panel to open the chart selector panel. Click on Bar chart and drag and drop it into the empty space in the center of the sheet: Click on the Add dimension button and select the Customer option in the Dimensions section. Click on the Add measure button and select Sales $ in the Measures section. The bar chart will look like this: To enhance the presentation of the chart, we will limit the number of customers that are depicted in the chart to 10, and add a title to the chart: Select Data in the properties panel on the right-hand side of the screen and expand the Customer dimension. Set the Limitation values as Fixed number, Top and type 11 in the limitation box: Click on the title of the chart and type Top 10 Customers by Sales $. Click on Done to enter the visualization mode. The bar chart will look like this: Creating the geographical map of sales by country To add the geographical map of sales by country to the sheet, follow these steps: Before adding the map chart, resize the bar chart: Click on the Charts button that lies on the asset panel to open the chart selector panel. Click on the Map button and drag and drop the chart into the empty space on the right-hand side of the sheet: The map visualization will show a default world map with no data, as follows: Here, we need to add an Area layer to plot the countries, and add a Sales $ measure to fill in the area of each country with a color scale: Click on the Add Layer button in the properties panel on the right-hand side of the screen: Select the Area layer: Add the Country dimension, as it contains the information to plot the area: The map will show the country areas filled in with a single color, as follows: To add the Sales $ measure to set the color scale for each country, go to the asset panel at the left-hand side of the screen and click on the Master items heading in the Measures section. Drag and drop the Sales $ measure on top of the map: In the pop-up menu for the map, select Use in "Country"(Area Layer): After that, select Color by: Sales $: The map will now show the countries with more Sales $ in a dark color, and those with lower Sales $ in a light color: Now, click on the title of the object and type Sales $ by Country. Click on the Done button to enter the visualization mode. The sheet will look like this, but it will vary according to your screen resolution: Creating the analysis sheet While the dashboard sheet shows information on several topics for a quick overview, the analysis sheet focuses on a single topic for data exploration. We will create the analysis sheet with the following visualizations: A filter panel, with the dimensions: OrderYear, OrderMonth, Country, Customer, Category, and Product KPI Sales $ KPI Avg Discount % A combo chart for Pareto (80/20) analysis by customer A table with customer data Let's start with creating a new sheet with the name Customer Analysis: Click on the Sheet selection button at the top-right of the screen to open the sheet overview panel. Click on the Create new sheet button and set the title of the sheet to Customer Analysis. To finish this example, click on the sheet icon to save the title, and open the sheet to start creating visualizations. Adding a filter pane with main dimensions We will now build the customer analysis sheet by adding a filter pane by following these steps: Click on the Edit button to enter the edit mode. Click on the Charts button on the asset panel and drag and drop Filter pane into the empty space on the sheet: Click on the Add dimension button and select Order Year in the Dimensions section: Since we need to add more dimensions to our Filter pane, click on the Add dimension button in the properties on the right-hand side of the screen, and select Order Month in the Dimensions section. Repeat the previous step to add the Country, Customer, Category, and Product dimensions. The Filter pane will look like what's shown in the following screenshot: Now, resize the width of the filter panel to fit three columns of the grid: We also need to add the Filter pane as a master visualization, which is to be reused across the analysis and reporting sheets that we will create next: Right-click on the filter pane and select Add to master items: Set the name of the master item to Default Filter and the description to A filter pane to be reused across sheets: Click on the Add button: Adding KPI visualizations To add the KPIs of Sales $ and Avg Discount % to the sheet, we have two options. The first option is to add the KPI visualizations to the Master items library, and add them to the new sheet: Go to the dashboard sheet. Add the KPI visualizations of Sales $ and Avg Discount % to the Master item. Name them KPI Sales $ and KPI Avg Discount %, respectively. From the visualization section in the Master items library, simply drag and drop each of the KPIs into the top end of the sheet. The second option is to copy and paste the KPI visualizations between sheets: Go to the dashboard sheet. Select the KPI visualization for Sales $. Press Ctrl + C or right-click on the visualization object and select Copy in the context menu. Go back to the Customer Analysis sheet. Press Ctrl + V or right-click in the empty area of the sheet and select Paste in the context menu. Repeat the same steps for KPI Avg Discount %. The sheet editor will look like this: Creating a combo chart for Pareto (80/20) analysis A Pareto analysis helps us to identify which groups of customers contribute to the first 80% of our sales. To create a Pareto analysis, we will use a combo chart as it allows us to combine metrics with different shapes such as bars, lines, and symbols. We will represent the data in two axes; the primary axis is found at the left-hand side of the chart, and the secondary axis is found at the right-hand side of the chart. In our example, the chart has a bar for Sales $ in the primary axis, as well as two lines: one for the Cumulative % of sales, and the other as static, with 80% in the secondary axis. In the following screenshot, you can see the highlighted customers contributing to the first 80% of the sales: To create the Pareto analysis chart, follow these steps: Click on the Charts button on the asset panel and find the Combo chart. Drag and drop the Combo chart into the empty space at the right-hand side of the sheet. Click on Add Dimension and select Customer in the Dimension section. Click on Add Measure and select Sales $ in the Measures section. The combo chart will look like this: We need to add two other measures, represented by lines. The first is the cumulative percentage of sales, and the second is the reference line at 80%. To add the cumulative sales line, go to the properties panel, expand the Data section, and click on the Add button in Measures: Click on the fx button to open the expression editor: Type the following expression in the expression editor to calculate a cumulative ratio of the sales for each customer, over the whole amount of the sales of all customers: RangeSum(Above(Sum(SalesAmount), 0, RowNo())) / Sum(total SalesAmount) Click on the Apply button to close the expression editor and save the expression. Set the Label of the new measure to Cumulative Sales %. Check if the properties Line and Secondary axis are selected for the measure: Change the number formatting to Number, set the formatting option to Simple, and select 12.3%. Now, find the Add button in the Measure pane to add another measure: the reference line for 80%. Open the Expression editor, type 0.8, and click on the Apply button. Set the Label to 80%. Check if the properties Line is selected and that the Secondary axis is selected for the measure: We also need to fix the sort order into a descending fashion, by Sales $: Go to the properties panel and expand the Sorting section. Click on Customer to expand the Sorting configuration for the dimension. Switch off the Auto sorting. Click on the checkbox for Sort by expression to select the option. Open the Expression editor and type sum(SalesAmount). Click on Apply to close the expression editor and apply the changes. Set the Title of the chart to Pareto Analysis. Change the sorting order to Descending. Deselect other sorting options if they are selected. The Sorting pane will look like this: Finally, the combo chart will look like this: Creating a reporting sheet Reporting sheets allow the user to see the data in a more granular form. This type of sheet provides information that allows the user to take action at an operational level. We will start this example by creating a new sheet with the name Reporting: Click on the Sheet selection button at the top-right of the screen to open the sheet overview panel Click on the Create new sheet button and set the Title of the sheet to Product Analysis Click on the sheet icon to save the title, open the sheet to start creating visualizations, and enter the edit mode Adding a default filter pane We will start to build the reporting sheet by adding the default filter pane that has already been added to the Master items library: Click on the Edit button to enter the edit mode. Click on the Master items button on the asset panel and find Default filter in the Visualization section. Click on Default filter pane and drag and drop it into the empty space at the top of the sheet. Resize the height of the filter pane to fit one row of the grid. The sheet will then look like this: Next, we will add the table chart to the sheet, as follows: Click on the Charts button on the asset panel and find the Table visualization. Click on Table and drag and drop it into the empty space at the center of the sheet. Click on the Add dimension button and select OrderID from the Field list. Click on Add measure and select Sales $ from the Dimensions list. Click on the Master items button on the asset panel, which is on the left-hand side of the screen, and click the Dimensions heading to expand it. We will then add more dimensions. Drag and drop the Customer dimension on the table. Select Add "Customer" from the floating menu. Repeat the process, using the drag and drop feature to add Country, Category, Product, EmployeesFirstName to the table. Click on the Measures heading in Master items to expand it. Drag and drop the Avg Discount % and Quantity # measures onto the table. Select Add in the floating menu for each of the selected measure. Click on the Fields button on the asset panel, which is on the left-hand side of the screen. Find the OrderID field in the list. Drag and drop the OrderID field onto the table. Select Add OrderID from the floating menu. Repeat the same steps to add the OrderDate field to the table. The table will look like this: In this article, we saw how to create a Qlik Sense application using the DAR methodology, which will help you to explore and analyze an application's information. If you found this post useful, do check out the book, Hands-On Business Intelligence with Qlik Sense. This book teaches you how to create dynamic dashboards to bring interactive data visualization to your enterprise using Qlik Sense. Best practices for deploying self-service BI with Qlik Sense Four self-service business intelligence user types in Qlik Sense How Qlik Sense is driving self-service Business Intelligence

0
0
64178

How-To Tutorials

article-image-using-chatgpt-for-customer-service

Amita Kapoor

07 Mar 2024

10 min read

Using ChatGPT for Customer Service

Amita Kapoor

07 Mar 2024

10 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionCustomer service bots of old can often feel robotic, rigid, and painfully predictable. But enter ChatGPT: the fresher, more dynamic contender in the bot arena.ChatGPT isn't just another bot. It's been meticulously trained on a vast sea of text and code, equipping it to grapple with questions that would stump its predecessors. And it's not limited to just customer queries; this versatile bot can craft a range of text formats, from poems to programming snippets.But the standout feature? ChatGPT's touch of humour. It's not just about answering questions; it's about engaging in a way that's both informative and entertaining. So if you're in search of a customer service experience that's more captivating than the norm, it might be time to chat with ChatGPT. Onboarding ChatGPT: A Quick and Easy GuideReady to set sail with ChatGPT? Here's your easy guide to make sure you're all set and ready to roll:1. Obtain the API Key: First, you'll need to get an API key from OpenAI. This is like your secret password to the world of ChatGPT. To get an API key, head to the OpenAI platform and sign up. Once you're signed in, go to the API section and click on "Create New Key."2. Integrate ChatGPT with Your System: Once you have your API key, you can integrate ChatGPT with your system. This is like introducing ChatGPT to your system and making sure they're friends, ready to work together smoothly. To integrate ChatGPT, you'll need to add your API key into your system's code. The specific steps involved will vary depending on your system, but there are many resources available online to help you. Here is an example of how you can do it in Python:import openai import os # Initialize OpenAI API Client api_key = os.environ.get("OPENAI_API_KEY") # Retrieve the API key from environment variables openai.api_key = api_key # Set the API key # API parameters model = "gpt-3.5-turbo" # Choose the appropriate engine max_tokens = 150 # Limit the response length3. Fine-Tune ChatGPT (Optional): ChatGPT is super smart, but sometimes you might need it to learn some specific stuff about your company. That's where fine-tuning comes in. To fine-tune ChatGPT, you can provide it with training data that is specific to your company. This could include product information, customer service FAQs, or even just examples of the types of conversations that you want ChatGPT to be able to handle. Fine-tuning is not required, but it can help to improve the performance of ChatGPT on your specific tasks. [https://www.packtpub.com/article-hub/fine-tuning-gpt-35-and-4].And that's it! With these three steps, ChatGPT will be all set to jump in and take your customer service to the next level. Ready, set, ChatGPT!Utilise ChatGPT for Seamless Question AnsweringIn the ever-evolving world of customer service, stand out by integrating ChatGPT into your service channels, making real-time, accurate response a seamless experience for your customers. Let’s delve into an example to understand the process better.Example: EdTech Site with Online K-12 CoursesImagine operating a customer service bot for an EdTech site with online courses for K-12. You want to ensure that the bot provides answers only on relevant questions, enhancing the user experience and ensuring the accuracy and efficiency of responses. Here's how you can achieve this:1. Pre-defined Context:Initiate the conversation with a system message that sets the context for the bot’s role.role_gpt = "You are a customer service assistant for an EdTech site that offers online K-12 courses. Provide information and assistance regarding the courses, enrollment, and related queries." This directive helps guide the model's responses, ensuring they align with the expected topics.2. Keyword Filtering:Implement keyword filtering to review user’s queries for relevance to topics the bot handles. If the query includes keywords related to courses, enrollment, etc., the bot answers; otherwise, it informs the user about the limitation. Here's a basic example of a keyword filtering function in Python. This function is_relevant_query checks if the query contains certain keywords related to the services offered by the EdTech site.def is_relevant_query(query, keywords): """ Check if the query contains any of the specified keywords. :param query: str, the user's query :param keywords: list of str, keywords to check for :return: bool, True if query contains any keyword, False otherwise """ query = query.lower() return any(keyword in query for keyword in keywords) # Usage example: keywords = ['enrollment', 'courses', 'k-12', 'online learning'] query = "Tell me about the enrollment process." is_relevant = is_relevant_query(query, keywords)Next, we combine the bot role and user query to build the complete messagemessages = [ { "role": "system", "content": f"{role_gpt}" }, {"role": "user", "content": f"{query}"} ]We now make the openAI API can only when the question is relevant:is_relevant = is_relevant_query(query, keywords) if is_relevant: # Process the query with ChatGPT # Make API call response = openai.ChatCompletion.create( model=model, messages=messages ) # Extract and print chatbot's reply chatbot_reply = response['choices'][0]['message']['content' print("ChatGPT: ", chatbot_reply) else: print("I'm sorry, I can only answer questions related to enrollment, courses, and online learning for K-12.")To elevate the user experience, prompt your customers to use specific questions. This subtle guidance helps funnel their queries, ensuring they stay on-topic and receive the most relevant information quickly. Continuous observation of user interactions and consistent collection of their feedback is paramount. This valuable insight allows you to refine your bot, making it more intuitive and adept at handling various questions. Further enhancing the bot's efficiency, enable a feature where it can politely ask for clarification on vague or ambiguous inquiries. This ensures your bot continues to provide precise and relevant answers, solidifying its role as an invaluable resource for your customers.Utilise ChatGPT to tackle Frequently Asked QuestionsAmidst the myriad of queries in customer service, frequently asked questions (FAQs) create a pattern. With ChatGPT, transform the typical, monotonous FAQ experience into an engaging and efficient one.Example: A Hospital ChatbotConsider the scenario of a hospital chatbot. Patients might have numerous questions before and after appointments. They might be inquiring about the hospital’s visitor policies, appointment scheduling, post-consultation care, or the availability of specialists. A well-implemented ChatGPT can swiftly and accurately tackle these questions, giving relief to both the hospital staff and the patients. Here is a tentative role setting for such a bot:role_gpt = "You are a friendly assistant for a hospital, guiding users with appointment scheduling, hospital policies, and post-consultation care."This orientation anchors the bot within the healthcare context, offering relevant and timely patient information. For optimal results, a finely tuned ChatGPT model for this use case is ideal. This enhancement allows for precise, context-aware processing of healthcare-related queries, ensuring your chatbot stands as a trustworthy, efficient resource for patient inquiries.The approach outlined above can be seamlessly adapted to various other sectors. Imagine a travel agency, where customers frequently inquire about trip details, booking procedures, and cancellation policies. Or consider a retail setting, where questions about product availability, return policies, and shipping details abound. Universities can employ ChatGPT to assist students and parents with admission queries, course details, and campus information. Even local government offices can utilize ChatGPT to provide citizens with instant information about public services, documentation procedures, and local regulations. In each scenario, a tailored ChatGPT, possibly fine-tuned for the specific industry, can provide swift, clear, and accurate responses, elevating the customer experience and allowing human staff to focus on more complex tasks. The possibilities are boundless, underscoring the transformative potential of integrating ChatGPT in customer service across diverse sectors. Adventures in AI Land🐙 Octopus Energy: Hailing from the UK's bustling lanes, Octopus Energy unleashed ChatGPT into the wild world of customer inquiries. Lo and behold, handling nearly half of all questions, ChatGPT isn’t just holding the fort – it’s conquering, earning accolades and outshining its human allies in ratings!📘 Chegg: Fear not, night-owl students! The world of academia isn’t left behind in the AI revolution. Chegg, armed with the mighty ChatGPT (aka Cheggmate), stands as the valiant knight ready to battle those brain-teasing queries when the world sleeps at 2 AM. Say goodbye to the midnight oil blues!🥤 PepsiCo: Oh, the fizz and dazzle! The giants aren’t just watching from the sidelines. PepsiCo, joining forces with Bain & Company, bestowed upon ChatGPT the quill to script their advertisements. Now every pop and fizz of their beverages echo with the whispers of AI, making each gulp a symphony of allure and refreshment.Ethical Considerations for Customer Service ChatGPTIn the journey of enhancing customer service with ChatGPT, companies should hold the compass of ethical considerations steadfast. Navigate through the AI world with a responsible map that ensures not just efficiency and innovation but also the upholding of ethical standards. Below are the vital checkpoints to ensure the ethical use of ChatGPT in customer service:Transparency: Uphold honesty by ensuring customers know they are interacting with a machine learning model. This clarity builds a foundation of trust and sets the right expectations.Data Privacy: Safeguard customer data with robust security measures, ensuring protection against unauthorized access and adherence to relevant data protection regulations. For further analysis or training, use anonymized data, safeguarding customer identity and sensitive information.Accountability: Keep a watchful eye on AI interactions, ensuring the responses are accurate, relevant, and appropriate. Establish a system for accountability and continuous improvement.Legal Compliance: Keep the use of AI in customer service within the bounds of relevant laws and regulations, ensuring compliance with AI, data protection, and customer rights laws.User Autonomy: Ensure customers have the choice to switch to a human representative, maintaining their comfort and ensuring their queries are comprehensively addressed.TConclusionTo Wrap it Up (with a Bow), if you're all about leveling up your customer service game, ChatGPT's your partner-in-crime. But like any good tool, it's all about how you wield it. So, gear up, fine-tune, and dive into this AI adventure!Author BioAmita Kapoor is an accomplished AI consultant and educator with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita retired early and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. After her retirement, Amita founded NePeur, a company providing data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.

0
0
63928

article-image-different-types-of-nosql-databases-and-when-to-use-them

Richard Gall

10 Sep 2019

8 min read

Different types of NoSQL databases and when to use them

Richard Gall

10 Sep 2019

8 min read

Why NoSQL databases? The popularity of NoSQL databases over the last decade or so has been driven by an explosion of data. Before what’s commonly described as ‘the big data revolution’, relational databases were the norm - these are databases that contain structured data. Structured data can only be structured if it is based on an existing schema that defines the relationships (hence relational) between the data inside the database. However, with the vast quantities of data that are now available to just about every business with an internet connection, relational databases simply aren’t equipped to handle the complexity and scale of large datasets. Why not SQL databases? This is for a couple of reasons. The defined schemas that are a necessary component of every relational database will not only undermine the richness and integrity of the data you’re working with, relational databases are also hard to scale. Relational databases can only scale vertically, not horizontally. That’s fine to a certain extent, but when you start getting high volumes of data - such as when millions of people use a web application, for example - things get really slow and you need more processing power. You can do this by upgrading your hardware, but that isn’t really sustainable. By scaling out, as you can with NoSQL databases, you can use a distributed network of computers to handle data. That gives you more speed and more flexibility. This isn’t to say that relational and SQL databases have had their day. They still fulfil many use cases. The only difference is that NoSQL can offers a level of far greater power and control for data intensive use cases. Indeed, using a NoSQL database when SQL will do is only going to add more complexity to something that just doesn’t need it. Seven NoSQL Databases in a Week Different types of NoSQL databases and when to use them So, now we’ve looked at why NoSQL databases have grown in popularity in recent years, lets dig into some of the different options available. There are a huge number of NoSQL databases out there - some of them open source, some premium products - many of them built for very different purposes. Broadly speaking there are 4 different models of NoSQL databases: Key-Value pair-based databases Column-based databases Document-oriented databases Graph databases Let’s take a look at these four models, how they’re different from one another, and some examples of the product options in each. Key Value pair-based NoSQL database management systems Key/Value pair based NoSQL databases store data in, as you might expect, pairs of keys and values. Data is stored with a matching key - keys have no relation or structure (so, keys could be height, age, hair color, for example). When should you use a key/value pair-based NoSQL DBMS? Key/value pair based NoSQL databases are the most basic type of NoSQL database. They’re useful for storing fairly basic information, like details about a customer. Which key/value pair-based DBMS should you use? There are a number of different key/value pair databases. The most popular is Redis. Redis is incredibly fast and very flexible in terms of the languages and tools it can be used with. It can be used for a wide variety of purposes - one of the reasons high-profile organizations use it, including Verizon, Atlassian, and Samsung. It’s also open source with enterprise options available for users with significant requirements. Redis 4.x Cookbook Other than Redis, other options include Memcached and Ehcache. As well as those, there are a number of other multi-model options (which will crop up later, no doubt) such as Amazon DynamoDB, Microsoft’s Cosmos DB, and OrientDB. Hands-On Amazon DynamoDB for Developers [Video] RDS PostgreSQL and DynamoDB CRUD: AWS with Python and Boto3 [Video] Column-based NoSQL database management systems Column-based databases separate data into discrete columns. Instead of using rows - whereby the row ID is the main key - column-based database systems flip things around to make the data the main key. By using columns you can gain much greater speed when querying data. Although it’s true that querying a whole row of data would take longer in a column-based DBMS, the use cases for column based databases mean you probably won’t be doing this. Instead you’ll be querying a specific part of the data rather than the whole row. When should you use a column-based NoSQL DBMS? Column-based systems are most appropriate for big data and instances where data is relatively simple and consistent (they don’t particularly handle volatility that well). Which column-based NoSQL DBMS should you use? The most popular column-based DBMS is Cassandra. The software prizes itself on its performance, boasting 100% availability thanks to lacking a single point of failure, and offering impressive scalability at a good price. Cassandra’s popularity speaks for itself - Cassandra is used by 40% of the Fortune 100. Mastering Apache Cassandra 3.x - Third Edition Learn Apache Cassandra in Just 2 Hours [Video] There are other options available, such as HBase and Cosmos DB. HBase High Performance Cookbook Document-oriented NoSQL database management systems Document-oriented NoSQL systems are very similar to key/value pair database management systems. The only difference is that the value that is paired with a key is stored as a document. Each document is self-contained, which means no schema is required - giving a significant degree of flexibility over the data you have. For software developers, this is essential - it’s for this reason that document-oriented databases such as MongoDB and CouchDB are useful components of the full-stack development tool chain. Some search platforms such as ElasticSearch use mechanisms similar to standard document-oriented systems - so they could be considered part of the same family of database management systems. When should you use a document-oriented DBMS? Document-oriented databases can help power many different types of websites and applications - from stores to content systems. However, the flexibility of document-oriented systems means they are not built for complex queries. Which document-oriented DBMS should you use? The leader in this space is, MongoDB. With an amazing 40 million downloads (and apparently 30,000 more every single day), it’s clear that MongoDB is a cornerstone of the NoSQL database revolution. MongoDB 4 Quick Start Guide MongoDB Administrator's Guide MongoDB Cookbook - Second Edition There are other options as well as MongoDB - these include CouchDB, CouchBase, DynamoDB and Cosmos DB. Learning Azure Cosmos DB Guide to NoSQL with Azure Cosmos DB Graph-based NoSQL database management systems The final type of NoSQL database is graph-based. The notable distinction about graph-based NoSQL databases is that they contain the relationships between different data. Subsequently, graph databases look quite different to any of the other databases above - they store data as nodes, with the ‘edges’ of the nodes describing their relationship to other nodes. Graph databases, compared to relational databases, are multidimensional in nature. They display not just basic relationships between tables and data, but more complex and multifaceted ones. When should you use a graph database? Because graph databases contain the relationships between a set of data (customers, products, price etc.) they can be used to build and model networks. This makes graph databases extremely useful for applications ranging from fraud detection to smart homes to search. Which graph database should you use? The world’s most popular graph database is Neo4j. It’s purpose built for data sets that contain strong relationships and connections. Widely used in the industry in companies such as eBay and Walmart, it has established its reputation as one of the world’s best NoSQL database products. Back in 2015 Packt’s Data Scientist demonstrated how he used Neo4j to build a graph application. Read more. Learning Neo4j 3.x [Video] Exploring Graph Algorithms with Neo4j [Video] NoSQL databases are the future - but know when to use the right one for the job Although NoSQL databases will remain a fixture in the engineering world, SQL databases will always be around. This is an important point - when it comes to databases, using the right tool for the job is essential. It’s a valuable exercise to explore a range of options and get to know how they work - sometimes the difference might just be a personal preference about usability. And that’s fine - you need to be productive after all. But what’s ultimately most essential is having a clear sense of what you’re trying to accomplish, and choosing the database based on your fundamental needs.

0
0
63498

article-image-chatgpt-for-business-intelligence

Chaitanya Yadav

08 Feb 2024

7 min read

ChatGPT for Business Intelligence

Chaitanya Yadav

08 Feb 2024

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!IntroductionA large language model (LLM) chatbot called ChatGPT was created by OpenAI. It is a strong tool that may be applied to many different activities, including business intelligence (BI).Business Intelligence is the act of gathering, analyzing, and interpreting data to derive insights that may be applied to improve business choices. Many BI-related processes can be automated with ChatGPT, freeing up BI analysts to work on more strategic projects.What is Business Intelligence?Business intelligence is the process of converting unprocessed data into insights that can be used to make decisions. BI gives businesses a competitive edge by assisting them in making well-informed decisions and streamlining operations. Data gathering, processing, reporting, and visualization all have been associated. For BI, ChatGPT can be applied in a number of ways, including the following:Data preparation and cleaning: Data preparation and cleaning, which is frequently a tedious and time-consuming activity, can be automated by using ChatGPT. Data problems can be found and fixed by ChatGPT, and data can also be transformed into a format that BI tools can use.Data analysis: Data analysis and pattern recognition can be done with ChatGPT. Additionally, ChatGPT can be used to create predictive models that can be applied for predicting outcomes in the future.Data visualization: Data visualizations, such as graphs and charts, can be created using ChatGPT and utilized to share findings with others.Improved Efficiency: By streamlining operations and assisting firms in identifying areas for development, BI helps them become more productive and cost-effective.Real-Time Monitoring: Businesses may use BI to track performance in real-time, which makes it simpler to respond to changes and trends as they emerge.Examples:Step 1: Import Libraries and Set API KeyIn the first step, we begin by importing and installing the openai package.pip install openaiNow you will need to enter the OpenAI API key before running the code as shown below. with the help of this code, you will be able to use the OpenAI API to ask queries about business intelligence and receive answers from ChatGPT.import openai # Enter your API key here api_key = "sk-kPe290Nfc5yjg08gYTR3T3B1bkFJfghIOkIvj1zObNvlc" openai.api_key = api_key[DP1] Step 2: Define the Interaction FunctionHere, we define a Python function called ask_chatgpt which will take a question as input and interact with ChatGPT using the OpenAI API. Inside this function, we make a request to the OpenAI API with the question provided as the prompt.def ask_chatgpt(question): response = openai.Completion.create( engine="text-davinci-002", prompt=f"ChatGPT for Business Intelligence: {question}", max_tokens=150 # Here you can adjust according to your need ) return response.choices[0].textStep 3: Define Example QuestionsWe create a collection of sample questions in this step. You should use them as your questions or prompts when using ChatGPT.questions = [ "What is Business Intelligence?", "How can BI benefit businesses?", "Can you provide an example of data analysis in BI?", ]Step 4: Interact with ChatGPT and Print ResponsesWe go across the set of sample questions in the final stage. The ask_chatgpt function is used to communicate with ChatGPT and receive responses for each query. After that, the console is printed with the responses.# Now we interact with ChatGPT and print the responses for question in questions: response = ask_chatgpt(question) print(f"Q: {question}") print(f"A: {response}\n") Output:Q: What is Business Intelligence? A: BI transforms data into actionable insights for informed decision-making. Q: How can BI benefit businesses? A: BI enhances decision-making, efficiency, and customer experiences. Q: Can you provide an example of data analysis in BI? A: A business might use BI to analyze sales data to identify trends, target specific customer segments, optimize marketing campaigns.In this output, ChatGPT perfectly answers each query, showcasing its capacity to understand and explain business intelligence-related ideas. The responses highlight key aspects of BI, emphasizing its function in turning data into insightful knowledge that informs strategic business choices. Example of ChatGPT PlaygroundWe can utilize ChatGPT Playground to obtain a hands-on experience with ChatGPT for BI. Through the ChatGPT Playground, a web-based interface, you may communicate with ChatGPT and create text, translate languages, create other types of creative output, and receive insightful responses to your queries.To start using the ChatGPT Playground, simply go to the ChatGPT website and click on the "Playground" button. Once you're in the Playground, you can start implementing your prompts and queries. After that text will be generated by ChatGPT in response to your queries and prompts.For example, we can use ChatGPT to analyze sales data. We can simply type in the following prompt:Analyze the sales data for the past quarter and identify any trends or patterns.ChatGPT will then generate a response that analyzes the sales data and identifies any trends or patterns. For example, ChatGPT might respond with the following:The sales data for the past quarter shows that sales of product A have increased by 15%, while sales of product B have decreased by 10%. This suggests that there is a growing demand for product A, and a declining demand for product B.Best Practices for Harnessing ChatGPT in Business IntelligenceUnderstand the limitations: ChatGPT's an extensive language model, but it doesn't act as a human analyst. It's able to produce creative and interesting texts, but it hasn't been perfect. It's possible to make mistakes, and it can be prejudiced. To minimize the risk of errors, it is important to be aware of this limitation and use ChatGPT as effectively as possible.Protect confidential information: If you are using ChatGPT as a cloud-based service, it is important to take into account the data that you're sharing with it. Do not divulge to ChatGPT any classified or sensitive information.Verify information: The ChatGPT is trained on a massive database of text and code, in order to make sure that it is accurate and up to date, crosscheck the output from ChatGPT with other sources of information.Provide feedback: The ChatGPT is still being developed, and there will always be improvements. Please give feedback to help developers make the model better if you do not feel satisfied with the output generated by ChatGPT.ConclusionChatGPT can automate lengthy BI operations including data cleaning, preparation, and analysis. In addition, it can provide data visualizations, predictive models, and real-time monitoring—all essential elements of efficient business intelligence. ChatGPT's automation of these tasks allows the BI analysts to work on more complex and significant initiatives, which in turn improves company productivity and efficiency.The article describes how to include ChatGPT into the BI process in a step-by-step manner. These processes include developing interaction functions, importing libraries, and offering sample queries. It also highlights the interactive aspect of the ChatGPT Playground, where users may engage with ChatGPT directly to analyze data, ask questions, and get intelligent answers. Overall, ChatGPT is a useful tool in the constantly changing field of business analytics and decision-making because of its capacity to automate operations and offer useful information.Author BioChaitanya Yadav is a data analyst, machine learning, and cloud computing expert with a passion for technology and education. He has a proven track record of success in using technology to solve real-world problems and help others to learn and grow. He is skilled in a wide range of technologies, including SQL, Python, data visualization tools like Power BI, and cloud computing platforms like Google Cloud Platform. He is also 22x Multicloud Certified.In addition to his technical skills, he is also a brilliant content creator, blog writer, and book reviewer. He is the Co-founder of a tech community called "CS Infostics" which is dedicated to sharing opportunities to learn and grow in the field of IT.

0
0
63432

article-image-top-10-deep-learning-frameworks

Amey Varangaonkar

25 May 2017

9 min read

Top 10 deep learning frameworks

Amey Varangaonkar

25 May 2017

9 min read

Deep learning frameworks are powering the artificial intelligence revolution. Without them, it would be almost impossible for data scientists to deliver the level of sophistication in their deep learning algorithms that advances in computing and processing power have made possible. Put simply, deep learning frameworks make it easier to build deep learning algorithms of considerable complexity. This follows a wider trend that you can see in other fields of programming and software engineering; open source communities are continually are developing new tools that simplify difficult tasks and minimize arduous ones. The deep learning framework you choose to use is ultimately down to what you're trying to do and how you work already. But to get you started here is a list of 10 of the best and most popular deep learning frameworks being used today. What are the best deep learning frameworks? Tensorflow One of the most popular Deep Learning libraries out there, Tensorflow, was developed by the Google Brain team and open-sourced in 2015. Positioned as a ‘second-generation machine learning system’, Tensorflow is a Python-based library capable of running on multiple CPUs and GPUs. It is available on all platforms, desktop, and mobile. It also has support for other languages such as C++ and R and can be used directly to create deep learning models, or by using wrapper libraries (for e.g. Keras) on top of it. In November 2017, Tensorflow announced a developer preview for Tensorflow Lite, a lightweight machine learning solution for mobile and embedded devices. The machine learning paradigm is continuously evolving - and the focus is now slowly shifting towards developing machine learning models that run on mobile and portable devices in order to make the applications smarter and more intelligent. Learn how to build a neural network with TensorFlow. If you're just starting out with deep learning, TensorFlow is THE go-to framework. It’s Python-based, backed by Google, has a very good documentation, and there are tons of tutorials and videos available on the internet to guide you. You can check out Packt’s TensorFlow catalog here. Keras Although TensorFlow is a very good deep learning library, creating models using only Tensorflow can be a challenge, as it is a pretty low-level library and can be quite complex to use for a beginner. To tackle this challenge, Keras was built as a simplified interface for building efficient neural networks in just a few lines of code and it can be configured to work on top of TensorFlow. Written in Python, Keras is very lightweight, easy to use, and pretty straightforward to learn. Because of these reasons, Tensorflow has incorporated Keras as part of its core API. Despite being a relatively new library, Keras has a very good documentation in place. If you want to know more about how Keras solves your deep learning problems, this interview by our best-selling author Sujit Pal should help you. Read now: Why you should use Keras for deep learning [box type="success" align="" class="" width=""]If you have some knowledge of Python programming and want to get started with deep learning, this is one library you definitely want to check out![/box] Caffe Built with expression, speed, and modularity in mind, Caffe is one of the first deep learning libraries developed mainly by Berkeley Vision and Learning Center (BVLC). It is a C++ library which also has a Python interface and finds its primary application in modeling Convolutional Neural Networks. One of the major benefits of using this library is that you can get a number of pre-trained networks directly from the Caffe Model Zoo, available for immediate use. If you’re interested in modeling CNNs or solve your image processing problems, you might want to consider this library. Following the footsteps of Caffe, Facebook also recently open-sourced Caffe2, a new light-weight, modular deep learning framework which offers greater flexibility for building high-performance deep learning models. Torch Torch is a Lua-based deep learning framework and has been used and developed by big players such as Facebook, Twitter and Google. It makes use of the C/C++ libraries as well as CUDA for GPU processing. Torch was built with an aim to achieve maximum flexibility and make the process of building your models extremely simple. More recently, the Python implementation of Torch, called PyTorch, has found popularity and is gaining rapid adoption. PyTorch PyTorch is a Python package for building deep neural networks and performing complex tensor computations. While Torch uses Lua, PyTorch leverages the rising popularity of Python, to allow anyone with some basic Python programming language to get started with deep learning. PyTorch improves upon Torch’s architectural style and does not have any support for containers - which makes the entire deep modeling process easier and transparent to you. Still wondering how PyTorch and Torch are different from each other? Make sure you check out this interesting post on Quora. Deeplearning4j DeepLearning4j (or DL4J) is a popular deep learning framework developed in Java and supports other JVM languages as well. It is very slick and is very widely used as a commercial, industry-focused distributed deep learning platform. The advantage of using DL4j is that you can bring together the power of the whole Java ecosystem to perform efficient deep learning, as it can be implemented on top of the popular Big Data tools such as Apache Hadoop and Apache Spark. [box type="success" align="" class="" width=""]If Java is your programming language of choice, then you should definitely check out this framework. It is clean, enterprise-ready, and highly effective. If you’re planning to deploy your deep learning models to production, this tool can certainly be of great worth![/box] MXNet MXNet is one of the most languages-supported deep learning frameworks, with support for languages such as R, Python, C++ and Julia. This is helpful because if you know any of these languages, you won’t need to step out of your comfort zone at all, to train your deep learning models. Its backend is written in C++ and cuda, and is able to manage its own memory like Theano. MXNet is also popular because it scales very well and is able to work with multiple GPUs and computers, which makes it very useful for the enterprises. This is also one of the reasons why Amazon made MXNet its reference library for Deep Learning too. In November, AWS announced the availability of ONNX-MXNet, which is an open source Python package to import ONNX (Open Neural Network Exchange) deep learning models into Apache MXNet. Read why MXNet is a versatile deep learning framework here. Microsoft Cognitive Toolkit Microsoft Cognitive Toolkit, previously known by its acronym CNTK, is an open-source deep learning toolkit to train deep learning models. It is highly optimized and has support for languages such as Python and C++. Known for its efficient resource utilization, you can easily implement efficient Reinforcement Learning models or Generative Adversarial Networks (GANs) using the Cognitive Toolkit. It is designed to achieve high scalability and performance and is known to provide high-performance gains when compared to other toolkits like Theano and Tensorflow when running on multiple machines. Here is a fun comparison of TensorFlow versus CNTK, if you would like to know more. deeplearn.js Gone are the days when you required serious hardware to run your complex machine learning models. With deeplearn.js, you can now train neural network models right on your browser! Originally developed by the Google Brain team, deeplearn.js is an open-source, JavaScript-based deep learning library which runs on both WebGL 1.0 and WebGL 2.0. deeplearn.js is being used today for a variety of purposes - from education and research to training high-performance deep learning models. You can also run your pre-trained models on the browser using this library. BigDL BigDL is distributed deep learning library for Apache Spark and is designed to scale very well. With the help of BigDL, you can run your deep learning applications directly on Spark or Hadoop clusters, by writing them as Spark programs. It has a rich deep learning support and uses Intel’s Math Kernel Library (MKL) to ensure high performance. Using BigDL, you can also load your pre-trained Torch or Caffe models into Spark. If you want to add deep learning functionalities to a massive set of data stored on your cluster, this is a very good library to use. [box type="shadow" align="" class="" width=""]Editor's Note: We have removed Theano and Lasagne from the original list due to the Theano retirement announcement. RIP Theano! Before Tensorflow, Caffe or PyTorch came to be, Theano was the most widely used library for deep learning. While it was a low-level library supporting CPU as well as GPU computations, you could wrap it with libraries like Keras to simplify the deep learning process. With the release of version 1.0, it was announced that the future development and support for Theano would be stopped. There would be minimal maintenance to keep it working for the next one year, after which even the support activities on the library would be suspended completely. “Supporting Theano is no longer the best way we can enable the emergence and application of novel research ideas”, said Prof. Yoshua Bengio, one of the main developers of Theano. Thank you Theano, you will be missed! Goodbye Lasagne Lasagne is a high-level deep learning library that runs on top of Theano. It has been around for quite some time now and was developed with the aim of abstracting the complexities of Theano, and provide a more friendly interface to the users to build and train neural networks. It requires Python and finds many similarities to Keras, which we just saw above. However, if we are to find differences between the two, Keras is faster and has a better documentation in place.[/box] There are many other deep learning libraries and frameworks available for use today – DSSTNE, Apache Singa, Veles are just a few worth an honorable mention. Which deep learning frameworks will best suit your needs? Ultimately, it depends on a number of factors. If you want to get started with deep learning, your safest bet would be to use a Python-based framework like Tensorflow, which are quite popular. For seasoned professionals, the efficiency of the trained model, ease of use, speed and resource utilization are all important considerations for choosing the best deep learning framework.

0
0
63122

article-image-airflow-ops-best-practices-observation-and-monitoring

Dylan Intorf, Kendrick van Doorn, Dylan Storey

12 Nov 2024

15 min read

Airflow Ops Best Practices: Observation and Monitoring

Dylan Intorf, Kendrick van Doorn, Dylan Storey

12 Nov 2024

15 min read

This article is an excerpt from the book, "Apache Airflow Best Practices", by Dylan Intorf, Kendrick van Doorn, Dylan Storey. With practical approach and detailed examples, this book covers newest features of Apache Airflow 2.x and it's potential for workflow orchestration, operational best practices, and data engineering.IntroductionIn this article, we will continue to explore the application of modern “ops” practices within Apache Airflow, focusing on the observation and monitoring of your systems and DAGs after they’ve been deployed.We’ll divide this observation into two segments – the core Airflow system and individual DAGs. Each segment will cover specific metrics and measurements you should be monitoring for alerting and potential intervention.When we discuss monitoring in this section, we will consider two types of monitoring – active and suppressive.In an active monitoring scenario, a process will actively check a service’s health state, recording its state and potentially taking action directly on the return value.In a suppressive monitoring scenario, the absence of a state (or state change) is usually meaningful. In these scenarios, the monitored application sends an active schedule to a process to inform it that it is OK, usually suppressing an action (such as an alert) from occurring.This chapter covers the following topics:Monitoring core Airflow componentsMonitoring your DAGsTechnical requirementsBy now, we expect you to have a good understanding of Airflow and its core components, along with functional knowledge in the deployment and operation of Airflow and Airflow DAGs.We will not be covering specific observability aggregators or telemetry tools; instead, we will focus on the activities you should be keeping an eye on. We strongly recommend that you work closely with your ops teams to understand what tools exist in your stack and how to configure them for capture and alerting your deployments.Monitoring core Airflow componentsAll of the components we will discuss here are critical to ensuring a functioning Airflow deployment. Generally, all of them should be monitored with a bare minimum check of Is it on? and if a component is not, an alert should surface to your team for investigation. The easiest way to check this is to query the REST API on the web server at `/health/`; this will return a JSON object that can be parsed to determine whether components are healthy and, if not, when they were last seen.SchedulerThis component needs to be running and working effectively in order for tasks to be scheduled for execution.When the scheduler service is started, it also starts a `/health` endpoint that can be checked by an external process with an active monitoring approach.The returned signal does not always indicate that the scheduler is working properly, as its state is simply indicative that the service is up and running. There are many scenarios where the scheduler may be operating but unable to schedule jobs; as a result, many deployments will include a canary dag to their deployment that has a single task, acting to suppress an external alert from going off.Import metrics that airflow exposes for you include the following:scheduler.scheduler_loop_duration: This should be monitored to ensure that your scheduler is able to loop and schedule tasks for execution. As this metric increases, you will see tasks beginning to schedule more slowly, to the point where you may begin missing SLAs because tasks fail to reach a schedulable state.scheduler.tasks.starving: This indicates how many tasks cannot be scheduled because there are no slots available. Pools are a mechanism that Airflow uses to balance large numbers of submitted task executions versus a finite amount of execution throughput. It is likely that this number will not be zero, but being high for extended periods of time may point to an issue in how DAGs are being written to schedule work.scheduler.tasks.executable: This indicates how many tasks are ready for execution (i.e., queued). This number will sometimes not be zero, and that is OK, but if the number increases and stays high for extended periods of time, it indicates that you may need additional computer resources to handle the load. Look at your executor to increase the number of workers it can run. Metadata databaseThe metadata database is used to store and track all of the metadata for your Airflow deployments’ previous DAG/task executions, along with information about your environment’s roles and permissions. Losing data from this database can interrupt normal operations and cause unintended consequences, with DAG runs being repeated.While critical, because it is architecturally ubiquitous, the database is also least likely to encounter issues, and if it does, they are absolutely catastrophic in nature.We generally suggest you utilize a managed service for provisioning and operating your backing database, ensuring that a disaster recovery plan for your metadata database is in place at all times.Some active areas to monitor on your database include the following:Connection pool size/usage: Monitor both the connection pool size and usage over time to ensure appropriate configuration, and identify potential bottlenecks or resource contention arising from Airflow components’ concurrent connections.Query performance: Measure query latency to detect inefficient queries or performance issues, while monitoring query throughput to ensure effective workload handling by the database.Storage metrics: Monitor the disk space utilization of the metadata database to ensure that it has sufficient storage capacity. Set up alerts for low disk space conditions to prevent database outages due to storage constraints.Backup status: Monitor the status of database backups to ensure that they are performed regularly and successfully. Verify backup integrity and retention policies to mitigate the risk of data loss if there is a database failure.TriggererThe Triggerer instance manages all of the asynchronous operations of deferrable operators in a deferred state. As such, major operational concerns generally relate to ensuring that individual deferred operators don’t cause major blocking calls to the event loop. If this occurs, your deferrable tasks will not be able to check their state changes as frequently, and this will impact scheduling performance.Import metrics that airflow exposes for you include the following:triggers.blocked_main_thread: The number of triggers that have blocked the main thread. This is a counter and should monotonically increase over time; pay attention to large differences between recording (or quick acceleration) counts, as it’s indicative of a larger problem.triggers.running: The number of triggers currently on a triggerer instance. This metric should be monitored to determine whether you need to increase the number of triggerer instances you are running. While the official documentation claims that up to tens of thousands of triggers can be on an instance, the common operational number is much lower. Tune at your discretion, but depending on the complexity of your triggers, you may need to add a new instance for every few hundred consistent triggers you run.Executors/workersDepending on the executor you use, you will need to monitor your executors and workers a bit differently.The Kubernetes executor will utilize the Kubernetes API to schedule tasks for execution; as such, you should utilize the Kubernetes events and metrics servers to gather logs and metrics for your task instances. Common metrics to collect on an individual task are CPU and memory usage. This is crucial for tuning requests or mutating individual task resource requests to ensure that they execute safely.The Celery worker has additional components and long-lived processes that you need to metricize. You should monitor an individual Celery worker’s memory and CPU utilization to ensure that it is not over- or under-provisioned, tuning allocated resources accordingly. You also need to monitor the message broker (usually Redis or RabbitMQ) to ensure that it is appropriately sized. Finally, it is critical to measure the queue length of your message broker and ensure that too much “back pressure” isn’t being created in the system. If you find that your tasks are sitting in a queued state for a long period of time and the queue length is consistently growing, it’s a sign that you should start an additional Celery worker to execute on scheduled tasks. You should also investigate using the native Celery monitoring tool Flower (https://flower.readthedocs.io/en/latest/) for additional, more nuanced methods of monitoring.Web serverThe Airflow web server is the UI for not just your Airflow deployment but also the RESTful interface. Especially if you happen to be controlling Airflow scheduling behavior with API calls, you should keep an eye on the following metrics:Response time: Measure the time taken for the API to respond to requests. This metric indicates the overall performance of the API and can help identify potential bottlenecks.Error rate: Monitor the rate of errors returned by the API, such as 4xx and 5xx HTTP status codes. High error rates may indicate issues with the API implementation or underlying systems.Request rate: Track the rate of incoming requests to the API over time. Sudden spikes or drops in request rates can impact performance and indicate changes in usage patterns.System resource utilization: Monitor resource utilization metrics such as CPU, memory, disk I/O, and network bandwidth on the servers hosting the API. High resource utilization can indicate potential performance bottlenecks or capacity limits.Throughput: Measure the number of successful requests processed by the API per unit of time. Throughput metrics provide insights into the API’s capacity to handle incoming traffic.Now that you have some basic metrics to collect from your core architectural components and can monitor the overall health of an application, we need to monitor the actual DAGs themselves to ensure that they function as intended.Monitoring your DAGsThere are multiple aspects to monitoring your DAGs, and while they’re all valuable, they may not all be necessary. Take care to ensure that your monitoring and alerting stack match your organizational needs with regard to operational parameters for resiliency and, if there is a failure, recovery times. No matter how much or how little you choose to implement, knowing that your DAGs work and if and how they fail is the first step in fixing problems that will arise.LoggingAirflow writes logs for tasks in a hierarchical structure that allows you to see each task’s logs in the Airflow UI. The community also provides a number of providers to utilize other services for backing log storage and retrieval. A complete list of supported providers is available at https://airflow.apache.org/docs/apache-airflow-providers/core-extensions/logging.html.Airflow uses the standard Python logging framework to write logs. If you’re writing custom operators or executing Python functions with a PythonOperator, just make sure that you instantiate a Python logger instance, and then the associated methods will handle everything for you.AlertingAirflow provides mechanisms for alerting on operational aspects of your executing workloads that can be configured within your DAG:Email notifications: Email notifications can be sent if a task is put into a marked or retry state with the `email_on_failure` or `email_on_retry` state, respectively. These arguments can be provided to all tasks in the DAG with the `default_args` key work in the DAG, or individual tasks by setting the keyword argument individually.Callbacks: Callbacks are special actions that are executed if a specific state change occurs. Generally, these callbacks should be thoughtfully leveraged to send alerts that are critical operationally:on_success_callback: This callback will be executed at both the task and DAG levels when entering a successful state. Unless it is critical that you know whether something succeeds, we generally suggest not using this for alerting.on_failure_callback: This callback is invoked when a task enters a failed state. Generally, this callback should always be set and, in critical scenarios, alert on failures that require intervention and support.on_execute_callback: This is invoked right before a task executes and only exists at the task level. Use sparingly for alerting, as it can quickly become a noisy alert when overused.on_retry_callback: This is invoked when a task is placed in a retry state. This is another callback to be cautious about as an alert, as it can become noisy and cause false alarms.sla_miss_callback: This is invoked when a DAG misses its defined SLA. This callback is only executed at the end of a DAG’s execution cycle so tends to be a very reactive notification that something has gone wrong.SLA monitoringAs awesome of a tool as Airflow is, it is a well-known fact in the community that SLAs, while largely functional, have some unfortunate details with regard to implementation that can make them problematic at best, and they are generally regarded as a broken feature in Airflow. We suggest that if you require SLA monitoring on your workflows, you deploy a CRON job monitoring tool such as healthchecks (https://github.com/healthchecks/healthchecks) that allows you to create suppressive alerts for your services through its rest API to manage SLAs. By pairing this third- party service with either HTTP operators or simple requests from callbacks, you can ensure that your most critical workflows achieve dynamic and resilient SLA alerting.Performance profilingThe Airflow UI is a great tool for profiling the performance of individual DAGs:The Gannt chart view: This is a great visualization for understanding the amount of time spent on individual tasks and the relative order of execution. If you’re worried about bottlenecks in your workflow, start here.Task duration: This allows you to profile the run characteristics of tasks within your DAG over a historical period. This tool is great at helping you understand temporal patterns in execution time and finding outliers in execution. Especially if you find that a DAG slows down over time, this view can help you understand whether it is a systemic issue and which tasks might need additional development.Landing times: This shows the delta between task completion and the start of the DAG run. This is an un-intuitive but powerful metric, as increases in it, when paired with stable task durations in upstream tasks, can help identify whether a scheduler is under heavy load and may need tuning.Additional metrics that have proven to be useful (but may need to be calculated) include the following:Task startup time: This is an especially useful metric when operating with a Kubernetes executor. To calculate this, you will need to calculate the difference between `start_date` and `execution_date` on each task instance. This metric will especially help you identify bottlenecks outside of Airflow that may impact task run times.Task failure and retry counts: Monitoring the frequency of task failures and retries can help identify information about the stability and robustness of your environment. Especially if these types of failure can be linked back to patterns in time or execution, it can help debug interactions with other services.DAG parsing time: Monitoring the amount of time a DAG takes to parse is very important to understand scheduler load and bottlenecks. If an individual DAG takes a long time to load (either due to heavy imports or long blocking calls being executed during parsing), it can have a material impact on the timeliness of scheduling tasks.ConclusionIn this article, we covered some essential strategies to effectively monitor both the core Airflow system and individual DAGs post-deployment. We highlighted the importance of active and suppressive monitoring techniques and provided insights into the critical metrics to track for each component, including the scheduler, metadata database, triggerer, executors/workers, and web server. Additionally, we discussed logging, alerting mechanisms, SLA monitoring, and performance profiling techniques to ensure the reliability, scalability, and efficiency of Airflow workflows. By implementing these monitoring practices and leveraging the insights gained, operators can proactively manage and optimize their Airflow deployments for optimal performance and reliability.Author BioDylan Intorf is a solutions architect and data engineer with a BS from Arizona State University in Computer Science. He has 10+ years of experience in the software and data engineering space, delivering custom tailored solutions to Tech, Financial, and Insurance industries.Kendrick van Doorn is an engineering and business leader with a background in software development, with over 10 years of developing tech and data strategies at Fortune 100 companies. In his spare time, he enjoys taking classes at different universities and is currently an MBA candidate at Columbia University.Dylan Storey has a B.Sc. and M.Sc. from California State University, Fresno in Biology and a Ph.D. from University of Tennessee, Knoxville in Life Sciences where he leveraged computational methods to study a variety of biological systems. He has over 15 years of experience in building, growing, and leading teams; solving problems in developing and operating data products at a variety of scales and industries.

2
0
62994

article-image-data-cleaning-worst-part-of-data-analysis

Amey Varangaonkar

04 Jun 2018

5 min read

Data cleaning is the worst part of data analysis, say data scientists

Amey Varangaonkar

04 Jun 2018

5 min read

The year was 2012. Harvard Business Review had famously declared the role of data scientist as the ‘sexiest job of the 21st century’. Companies were slowly working with more data than ever before. The real actionable value of the data that could be used for commercial purposes was slowly beginning to uncover. Someone who could derive these actionable insights from the data was needed. The demand for data scientists was higher than ever. Fast forward to 2018 - more data has been collected in the last 2 years than ever before. Data scientists are still in high demand, and the need for insights is higher than ever. There has been one significant change, though - the process of deriving insights has become more complex. If you ask the data scientists, the first initial phase of this process, which involves data cleansing, has become a lot more cumbersome. So much so, that it is no longer a myth that data scientists spend almost 80% of their time cleaning and readying the data for analysis. Why data cleaning is a nightmare In the recently conducted Packt Skill-Up survey, we asked data professionals what the worst part of the data analysis process was, and a staggering 50% responded with data cleaning. Source: Packt Skill Up Survey We dived deep into this, and tried to understand why many data science professionals have this common feeling of dislike towards data cleaning, or scrubbing - as many call it. Read the Skill Up report in full. Sign up to our weekly newsletter and download the PDF for free. There is no consistent data format Organizations these days work with a lot of data. Some of it is in a structured, readily understandable format. This kind of data is usually quite easy to clean, parse and analyze. However, some of the data is really messy, and cannot be used as is for analysis. This includes missing data, irregularly formatted data, and irrelevant data which is not worth analyzing at all. There is also the problem of working with unstructured data which needs to be pre-processed to get the data worth analyzing. Audio or video files, email messages, presentations, xml documents and web pages are some classic examples of this. There’s too much data to be cleaned The volume of data that businesses deal with on a day to day basis is in the scale of terabytes or even petabytes. Making sense of all this data, coming from a variety of sources and in different formats is, undoubtedly, a huge task. There are a whole host of tools designed to ease this process today, but it remains an incredibly tricky challenge to sift through the large volumes of data and prepare it for analysis. Data cleaning is tricky and time-consuming Data cleansing can be quite an exhaustive and time-consuming task, especially for data scientists. Cleaning the data requires removal of duplications, removing or replacing missing entries, correcting misfielded values, ensuring consistent formatting and a host of other tasks which take a considerable amount of time. Once the data is cleaned, it needs to be placed in a secure location. Also, a log of the entire process needs to be kept to ensure the right data goes through the right process. All of this requires the data scientists to create a well-designed data scrubbing framework to avoid the risk of repetition. All of this is more of a grunt work and requires a lot of manual effort. Sadly, there are no tools in the market which can effectively automate this process. Outsourcing the process is expensive Given that data cleaning is a rather tedious job, many businesses think of outsourcing the task to third party vendors. While this reduces a lot of time and effort on the company’s end, it definitely increases the cost of the overall process. Many small and medium scale businesses may not be able to afford this, and thus are heavily reliant on the data scientist to do the job for them. You can hate it, but you cannot ignore it It is quite obvious that data scientists need clean, ready-to-analyze data if they are to to extract actionable business insights from it. Some data scientists equate data cleaning to donkey work, suggesting there’s not a lot of innovation involved in this process. However, some believe data cleaning is rather important, and pay special attention to it given once it is done right, most of the problems in data analysis are solved. It is very difficult to take advantage of the intrinsic value offered by the dataset if it does not adhere to the quality standards set by the business, making data cleaning a crucial component of the data analysis process. Now that you know why data cleaning is essential, why not dive deeper into the technicalities? Check out our book Practical Data Wrangling for expert tips on turning your noisy data into relevant, insight-ready information using R and Python. Read more Cleaning Data in PDF Files 30 common data science terms explained How to create a strong data science project portfolio that lands you a job

0
2
62843

article-image-how-we-are-thinking-about-generative-ai

Packt

18 Jul 2024

10 min read

How we are Thinking About Generative AI

Packt

18 Jul 2024

10 min read

How we are Thinking About Generative AI for Developers and Tech LearningPackt is a global tech publisher serving developers and tech professionals (TechPros). Over the last 20 years, we have published over 8,000 books and videos, gaining deep insights into the evolving challenges tech professionals face. Recently, the rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape, affecting everyone from software developers to business strategists.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape.The rapid emergence of generative AI (GenAI) technologies like CoPilot, ChatGPT, and Gemini has transformed the tech landscape. These changes affect everyone from software developers to business strategists. The tech industry is at a critical inflection point with technology use, development, and education. At Packt, we are actively exploring generative AI's impact on the industry and TechPros' daily work and learning. Here, we outline our thoughts on how GenAI reshapes professional activities and tech learning, and our strategic responses to it. We would love to hear your feedback on this document and your thoughts on the issues raised within it. Please do send any comments to: GenAI_feedback@packt.com. The Impact of GenAI on TechPro WorkThe rapid pace of advancement in Generative AI makes it difficult to predict, but we believe, on balance, that it is a force for good in software development. A core Packt value that we share with our TechPro users is a belief in and commitment to the power of technology for progress. Our default setting is to get on board with change.GenAI is already changing the nature of many development jobs, but it will not mean the end of software development. We are fundamentally optimistic about the future for TechPros powered by GenAI. It will mean more, faster, better work.This is how we at Packt see these changes: Increased Software ProductionHumanity continuously evolves, adapts, and advances, maintaining a need for more sophisticated software solutions – whether those are built on traditional software platforms or on top of AI models themselves. GenAI is already transforming the economics of supply by making engineers more productive and enabling more engineering tasks. The demand for more, better software will remain, leading to an increase in the number of professionals building, designing, adapting, and managing software. Shifts in Software DevelopmentMuch of what engineers spend time doing can be quite generic. GenAI is beginning to automate these middle-tier, routine activities, allowing developers to focus on higher-value, more creative tasks. This shift redistributes work in three dimensions from the center of the development stack. Work moves ‘up the stack’ into architecture, domain expertise, and design, ‘down the stack’ into complex algorithm development, infrastructure, and tooling, and outwards to the edges with specific integrations and implementations. To meet the increased demand for software, there will be significantly more designers and implementors at those development edges, with increasing business and domain focus and specialization. There will be a continuously hard-to-meet need for deep tech engineers building the tools and infrastructure that enable this automation to operate efficiently at scale and speed. This will be seen at the hardware and firmware level as well as operating systems, cloud platforms, and the models and algorithms that modern software is built upon. Increased Domain and Business SpecializationAs GenAI moves tasks from generic operations upwards and outwards to more specialized domains, engineers will increasingly make decisions that require greater judgment and domain expertise. This will lead to a greater focus on domain experience and knowledge, and a higher value on business relationships.GenAI also democratizes the development and management of systems, making these processes accessible to more users and transforming many jobs from direct task execution to overseeing AI agents that perform the work. This evolution could significantly expand the roles involving aspects of software design or delivery. Impact on Tech Pro LearningGenAI integrates automation and problem solving, leading to profound change in how TechPros learn and solve problems. We see the core changes as being:Shift Toward Just-In-Time (JIT) Continuous LearningDevelopers have always preferred to learn by doing—starting work and solving problems on the fly. GenAI makes this the only viable approach. The ROI of upfront Just-In-Case (JIC) learning, where developers research technologies that might be useful in future, declines when co-pilots can accelerate initial builds and troubleshoot during development. GenAI tools can escalate to rapid Just-in-Time [JIT] learning sprints to backfill knowledge gaps as they are discovered.GenAI tools can help engineers to rapidly understand and work on existing complex and often undocumented code bases, again backfilling knowledge gaps JIT. Entry Level Learning Moves to Simulated EnvironmentsThe JIT learning-by-doing model also applies to students and juniors, but the study work they do will be “as good as real.” Traditional, linear courseware will be replaced by personalized, hands-on projects in rich simulated environments. These environments provide shorter, contextual learning experiences that effectively bridge the gap between theory and practice, reducing the training load on increasingly busy senior developers. Growth in Demand for Real World Experience and Peer InteractionAs development increasingly moves up the stack and routine tasks are automated, there is a growing need for TechPros to understand specific real-world applications of systems and solutions. Highly specific, detailed, and objective case studies with high relevance to a specific problem area and technical solution will become increasingly valuable. Demand for discussion and interaction with experienced fellow professionals to share knowledge and insights will also grow. Such authentic content not only aids learning but also enhances the training of AI models. Authoritative and Expert Insight Remains KeyDespite the shift towards more automated and JIT learning approaches, a thorough understanding of core concepts remains crucial. Books will continue to be one of the most powerful and authoritative ways for technology originators to share their foundational knowledge. This will remain the key long-term use-case for tech books. Continuing Need for Creator Trust and AuthenticityGen AI enables the rapid creation of written work. In the tech publishing domain, we estimate that up to around 50% of titles in certain categories on Amazon might already be AI-generated or derived. This AI content meets certain user needs, and this proliferation will continue across store platforms. We believe that human-generated work fulfils a different user need and that there will always be value in authentic creator insight and expertise. We continue to build direct relationships with tech professionals and authors to create and publish this content. The Future is UncertainHow this evolves is hard to know. The pace of change both in the technology and in the landscape around it has surfaced issues with reliability, compliance, cost, and memory/reasoning limitations. GenAI technology is moving extremely fast but has serious technical challenges. GenAI technology is moving extremely fast but has serious technical challenges.These issues will be resolved over time, but they limit the pace of actual deployment. A Cautious Approach to ChangeThe case for changing existing systems, practices, and organizational models should be approached with caution. Enterprises have a high bar for adopting core systems and the deployment phase will be long and require detailed work. Uncertainty in Computing PlatformsIt remains uncertain whether GenAI might evolve into the dominant general purpose computing platform or how it will evolve past the current transformer architecture. It may become a ubiquitous implementation layer for all services over time; we do not know. However, we share the view that this is a pivotal phase for technology and for humanity. A Mixed Economy of the Old and the NewWe see a long phase of a mixed economy of old methods and new GenAI tools. There will be pockets of rapid adoption of GenAI tooling, like we see in coding co-pilots and in application areas, such as customer service agents. However, with every deployment there will be a lot of “old style” engineering: problem solving, integrations, QA, optimization. The shifts to high level working will be gradual and not immediately noticeable. Friction in Human SystemsHuman systems inherently resist change. Individuals stick with working and learning systems with which they are comfortable. Teaching methods evolve slowly, and we see different generations working and learning in different ways. While a shift toward Just-In-Time (JIT) learning is underway, structured, long-form learning will continue to play a crucial role. Rapid Adoption Among DevelopersThe pace at which individual developers have adopted co-pilots and are using GenAI for problem solving is striking. We expect these trends of grassroots, individual adoption to continue and accelerate. How Packt is RespondingThe insights gained from talking with TechPros combined with our thinking about the impact of GenAI on TechPro work and learning has resulted in these strategic initiatives:Shift to the Edges of the Development Stack in PublishingWe are pioneering new approaches to developing and publishing real world practical case studies to answer the crucial questions: “What are people actually building with this right now?” and, “How are they actually doing it?”What are people actually building with this right now? How are they actually doing it?We will increase our focus on publishing specific, definitive, deep, technical books from the creators and builders of new technology to help TechPros broaden their skills across the development stack. We will continue to build the tech book canon in the era of GenAI.License for LLM Training ResponsiblyThe uniquely high-quality content tech authors create has immense value for LLM training. We want to support the evolution of this technology while developing model training as a potentially valuable new channel for published content.We want authors to get fair value and the recognition they are due, and we will pursue all agreements with partners in a pragmatic but principled way. Use GenAI to Enable a Step Change in Content Engineering and Derived WorksGenAI tools and automations can reduce the cost and effort of keeping a title up to date as technology evolves, and of creating a rich portfolio of derived works from the initial content. We call this BODE: Build Once, Deploy Everywhere.We are exploring exciting use-cases to increase the value of the original work, and its reach into new platforms, formats, languages, and versions. Build Packt Models and Explore JITWe have already delivered experimental AI agents fine-tuned on specific Packt titles. We are expanding this to topic, role, and whole-library models. We are exploring integration of the Packt corpus into co-pilots and tools to deliver workflow-embedded JIT knowledge and learning escalation. Build Professional MembershipsRecognizing the increased value of live interactions in a post-GenAI world, we are committed to enabling Tech Professionals to engage in high-quality, trustworthy interactions with peers working on similar roles and projects.Thoughts? Feedback?Please send any comments to:GenAI_feedback@packt.com

3
0
62818

article-image-getting-started-with-web-scraping-using-python-tutorial

Melisha Dsouza

29 Nov 2018

15 min read

Getting started with Web Scraping using Python [Tutorial]

Melisha Dsouza

29 Nov 2018

15 min read

0
0
62599

How-To Tutorials

article-image-fingerprint-detection-using-opencv

Packt

07 Oct 2015

11 min read

Fingerprint detection using OpenCV 3

Packt

07 Oct 2015

11 min read

In this article by Joseph Howse, Quan Hua, Steven Puttemans, and Utkarsh Sinha, the authors of OpenCV Blueprints, we delve into the aspect of fingerprint detection using OpenCV. (For more resources related to this topic, see here.) Fingerprint identification, how is it done? We have already discussed the use of the first biometric, which is the face of the person trying to login to the system. However since we mentioned that using a single biometric can be quite risky, we suggest adding secondary biometric checks to the system, like the fingerprint of a person. There are many of the shelf fingerprint scanners which are quite cheap and return you the scanned image. However you will still have to write your own registration software for these scanners and which can be done by using OpenCV software. Examples of such fingerprint images can be found below. Examples of single individual thumb fingerprint in different scanning positions This dataset can be downloaded from the FVC2002 competition website released by the University of Bologna. The website (http://bias.csr.unibo.it/fvc2002/databases.asp) contains 4 databases of fingerprints available for public download of the following structure: Four fingerprint capturing devices DB1 - DB4. For each device, the prints of 10 individuals are available. For each person, 8 different positions of prints were recorded. We will use this publicly available dataset to build our system upon. We will focus on the first capturing device, using up to 4 fingerprints of each individual for training the system and making an average descriptor of the fingerprint. Then we will use the other 4 fingerprints to evaluate our system and make sure that the person is still recognized by our system. You could apply exactly the same approach on the data grabbed from the other devices if you would like to investigate the difference between a system that captures almost binary images and one that captures grayscale images. However we will provide techniques for doing the binarization yourself. Implement the approach in OpenCV 3 The complete fingerprint software for processing fingerprints derived from a fingerprint scanner can be found at https://github.com/OpenCVBlueprints/OpenCVBlueprints/tree/master/chapter_6/source_code/fingerprint/fingerprint_process/. In this subsection we will describe how you can implement this approach in the OpenCV interface. We will start by grabbing the image from the fingerprint system and apply binarization. This will enable us to remove any desired noise from the image as well as help us to make the contrast better between the kin and the wrinkled surface of the finger. // Start by reading in an image Mat input = imread("/data/fingerprints/image1.png", CV_LOAD_GRAYSCALE); // Binarize the image, through local thresholding Mat input_binary; threshold(input, input_binary, 0, 255, CV_THRESH_BINARY | CV_THRESH_OTSU); The Otsu thresholding will automatically choose the best generic threshold for the image to obtain a good contrast between foreground and background information. This is because the image contains a bimodal distribution (which means that we have an image with a 2 peak histogram) of pixel values. For that image, we can approximately take a value in the middle of those peaks as threshold value. (for images which are not bimodal, binarization won't be accurate.) Otsu allows us to avoid using a fixed threshold value, and thus making the system more general for any capturing device. However we do acknowledge that if you have only a single capturing device, then playing around with a fixed threshold value could result in a better image for that specific setup. The result of the thresholding can be seen below. In order to make thinning from the next skeletization step as effective as possible we need the inverse binary image. Comparison between grayscale and binarized fingerprint image Once we have a binary image we are actually already set to go to calculate our feature points and feature point descriptors. However, in order to improve the process a bit more, we suggest to skeletize the image. This will create more unique and stronger interest points. The following piece of code can apply the skeletization on top of the binary image. The skeletization is based on the Zhang-Suen line thinning approach. Special thanks to @bsdNoobz of the OpenCV Q&A forum who supplied this iteration approach. #include <opencv2/imgproc.hpp> #include <opencv2/highgui.hpp> using namespace std; using namespace cv; // Perform a single thinning iteration, which is repeated until the skeletization is finalized void thinningIteration(Mat& im, int iter) { Mat marker = Mat::zeros(im.size(), CV_8UC1); for (int i = 1; i < im.rows-1; i++) { for (int j = 1; j < im.cols-1; j++) { uchar p2 = im.at<uchar>(i-1, j); uchar p3 = im.at<uchar>(i-1, j+1); uchar p4 = im.at<uchar>(i, j+1); uchar p5 = im.at<uchar>(i+1, j+1); uchar p6 = im.at<uchar>(i+1, j); uchar p7 = im.at<uchar>(i+1, j-1); uchar p8 = im.at<uchar>(i, j-1); uchar p9 = im.at<uchar>(i-1, j-1); int A = (p2 == 0 && p3 == 1) + (p3 == 0 && p4 == 1) + (p4 == 0 && p5 == 1) + (p5 == 0 && p6 == 1) + (p6 == 0 && p7 == 1) + (p7 == 0 && p8 == 1) + (p8 == 0 && p9 == 1) + (p9 == 0 && p2 == 1); int B = p2 + p3 + p4 + p5 + p6 + p7 + p8 + p9; int m1 = iter == 0 ? (p2 * p4 * p6) : (p2 * p4 * p8); int m2 = iter == 0 ? (p4 * p6 * p8) : (p2 * p6 * p8); if (A == 1 && (B >= 2 && B <= 6) && m1 == 0 && m2 == 0) marker.at<uchar>(i,j) = 1; } } im &= ~marker; } // Function for thinning any given binary image within the range of 0-255. If not you should first make sure that your image has this range preset and configured! void thinning(Mat& im) { // Enforce the range tob e in between 0 - 255 im /= 255; Mat prev = Mat::zeros(im.size(), CV_8UC1); Mat diff; do { thinningIteration(im, 0); thinningIteration(im, 1); absdiff(im, prev, diff); im.copyTo(prev); } while (countNonZero(diff) > 0); im *= 255; } The code above can then simply be applied to our previous steps by calling the thinning function on top of our previous binary generated image. The code for this is: Apply thinning algorithm Mat input_thinned = input_binary.clone(); thinning(input_thinned); This will result in the following output: Comparison between binarized and thinned fingerprint image using skeletization techniques. When we got this skeleton image, the following step would be to look for crossing points on the ridges of the fingerprint, which are being then called minutiae points. We can do this by a keypoint detector that looks at a large change in local contrast, like the Harris corner detector. Since the Harris corner detector is both able to detect strong corners and edges, this is ideally for the fingerprint problem, where the most important minutiae are short edges and bifurcation, the positions where edges come together. More information about minutae points and Harris Corner detection can be found in the following publications: Ross, Arun A., Jidnya Shah, and Anil K. Jain. "Toward reconstructing fingerprints from minutiae points." Defense and Security. International Society for Optics and Photonics, 2005. Harris, Chris, and Mike Stephens. "A combined corner and edge detector." Alvey vision conference. Vol. 15. 1988. Calling the Harris Corner operation on a skeletonized and binarized image in OpenCV is quite straightforward. The Harris corners are stored as positions corresponding in the image with their cornerness response value. If we want to detect points with a certain cornerness, than we should simply threshold the image. Mat harris_corners, harris_normalised; harris_corners = Mat::zeros(input_thinned.size(), CV_32FC1); cornerHarris(input_thinned, harris_corners, 2, 3, 0.04, BORDER_DEFAULT); normalize(harris_corners, harris_normalised, 0, 255, NORM_MINMAX, CV_32FC1, Mat()); We now have a map with all the available corner responses rescaled to the range of [0 255] and stored as float values. We can now manually define a threshold, that will generate a good amount of keypoints for our application. Playing around with this parameter could improve performance in other cases. This can be done by using the following code snippet: float threshold = 125.0; vector<KeyPoint> keypoints; Mat rescaled; convertScaleAbs(harris_normalised, rescaled); Mat harris_c(rescaled.rows, rescaled.cols, CV_8UC3); Mat in[] = { rescaled, rescaled, rescaled }; int from_to[] = { 0,0, 1,1, 2,2 }; mixChannels( in, 3, &harris_c, 1, from_to, 3 ); for(int x=0; x<harris_normalised.cols; x++){ for(int y=0; y<harris_normalised.rows; y++){ if ( (int)harris_normalised.at<float>(y, x) > threshold ){ // Draw or store the keypoint location here, just like you decide. In our case we will store the location of the keypoint circle(harris_c, Point(x, y), 5, Scalar(0,255,0), 1); circle(harris_c, Point(x, y), 1, Scalar(0,0,255), 1); keypoints.push_back( KeyPoint (x, y, 1) ); } } } Comparison between thinned fingerprint and Harris corner response, as well as the selected Harris corners. Now that we have a list of keypoints we will need to create some of formal descriptor of the local region around that keypoint to be able to uniquely identify it among other keypoints. Chapter 3, Recognizing facial expressions with machine learning, discusses in more detail the wide range of keypoints out there. In this article, we will mainly focus on the process. Feel free to adapt the interface with other keypoint detectors and descriptors out there, for better or for worse performance. Since we have an application where the orientation of the thumb can differ (since it is not a fixed position), we want a keypoint descriptor that is robust at handling these slight differences. One of the mostly used descriptors for that is the SIFT descriptor, which stands for scale invariant feature transform. However SIFT is not under a BSD license and can thus pose problems to use in commercial software. A good alternative in OpenCV is the ORB descriptor. In OpenCV you can implement it in the following way. Ptr<Feature2D> orb_descriptor = ORB::create(); Mat descriptors; orb_descriptor->compute(input_thinned, keypoints, descriptors); This will enable us to calculate only the descriptors using the ORB approach, since we already retrieved the location of the keypoints using the Harris corner approach. At this point we can retrieve a descriptor for each detected keypoint of any given fingerprint. The descriptors matrix will contain a row for each keypoint containing the representation. Let us now start from the case where we have only a single reference image for each fingerprint. In that case we will have a database containing a set of feature descriptors for the training persons in the database. We then have a single new entry, consisting of multiple descriptors for the keypoints found at registration time. We now have to match these descriptors to the descriptors stored in the database, to see which one has the best match. The most simple way is by performing a brute force matching using the hamming distance criteria between descriptors of different keypoints. // Imagine we have a vector of single entry descriptors as a database // We will still need to fill those once we compare everything, by using the code snippets above vector<Mat> database_descriptors; Mat current_descriptors; // Create the matcher interface Ptr<DescriptorMatcher> matcher = DescriptorMatcher::create("BruteForce-Hamming"); // Now loop over the database and start the matching vector< vector< DMatch > > all_matches; for(int entry=0; i<database_descriptors.size();entry++){ vector< DMatch > matches; matcheràmatch(database_descriptors[entry], current_descriptors, matches); all_matches.push_back(matches); } We now have all the matches stored as DMatch objects. This means that for each matching couple we will have the original keypoint, the matched keypoint and a floating point score between both matches, representing the distance between the matched points. The idea about finding the best match seems pretty straightforward. We take a look at the amount of matches that have been returned by the matching process and weigh them by their Euclidean distance in order to add some certainty. We then look for the matching process that yielded the biggest score. This will be our best match and the match we want to return as the selected one from the database. If you want to avoid an imposter getting assigned to the best matching score, you can again add a manual threshold on top of the scoring to avoid matches that are not good enough, to be ignored. However it is possible, and should be taken into consideration, that if you increase the score to high, that people with a little change will be rejected from the system, like for example in the case where someone cuts his finger and thus changing his pattern to drastically. Fingerprint matching process visualized. To summarize, we saw how to detect fingerprints and implement it using OpenCV 3. Resources for Article: Further resources on this subject: Making subtle color shifts with curves [article] Tracking Objects in Videos [article] Hand Gesture Recognition Using a Kinect Depth Sensor [article]

0
1
62522

article-image-understanding-data-structures-in-swift

Avi Tsadok

22 Oct 2024

10 min read

Understanding Data Structures in Swift

Avi Tsadok

22 Oct 2024

10 min read

This article is an excerpt from the book, The Ultimate iOS Interview Playbook, by Avi Tsadok. The iOS Interview Guide is an essential book for iOS developers who want to maximize their skills and prepare for the competitive world of interviews on their way to getting their dream job. The book covers all the crucial aspects, from writing a resume to reviewing interview questions, and passing the architecture interview successfully.Introduction In iOS development, data structures are fundamental tools for managing and organizing data. Whether you are preparing for a technical interview or building robust iOS applications, mastering data structures like arrays, dictionaries, and sets is essential. This tutorial will guide you through the essential data structures in Swift, explaining their importance and providing practical examples of how to use them effectively. By the end of this tutorial, you will have a solid understanding of how to work with these data structures, making your code more efficient, modular, and reusable. Prerequisites Before diving into the tutorial, make sure you have the following prerequisites: Familiarity with Swift Programming Language: A basic understanding of Swift syntax, including variables, functions, and control flow, is essential for this tutorial. Xcode Installed: Ensure you have Xcode installed on your Mac. You can download it from the Mac App Store if you haven’t done so already. Basic Understanding of Object-Oriented Programming: Knowing concepts such as classes and objects will help you better understand the examples provided in this tutorial. Step-by-Step Instructions 1. Learning the Importance of Data Structures Data structures play a crucial role in iOS development. They allow you to store, manage, and manipulate data efficiently, which is especially important in performance-sensitive applications. Whether you're handling user data, managing app state, or working with APIs, choosing the right data structure can significantly impact your app's performance and scalability. Swift provides several built-in data structures, including arrays, dictionaries, and sets. Each of these data structures offers unique advantages and is suitable for different use cases. Understanding when and how to use each of them is a key skill for any iOS developer. 2. Working with Arrays Arrays are one of the most commonly used data structures in Swift. They allow you to store ordered collections of elements, making them ideal for tasks that require sequential access to data. Declaring and Initializing an Array To declare and initialize an array in Swift, you can use the following syntax: var numbers: [Int] = [1, 2, 3, 4, 5] This creates an array of integers containing the values 1 through 5. Arrays in Swift are type-safe, meaning you can only store elements of the specified type (in this case, Int). Removing Duplicates from an Array A common task in programming is to remove duplicate elements from an array. Swift makes this easy by converting the array into a Set, which automatically removes duplicates, and then converting it back into an array: let arrayWithDuplicates = [1, 2, 3, 3, 4, 5, 5] let arrayWithNoDuplicates = Array(Set(arrayWithDuplicates)) This approach is efficient and concise, leveraging the unique properties of sets to remove duplicates. Iterating Over an Array Arrays provide several methods for iterating over their elements. The most common approach is to use a for-in loop: for number in numbers { print(number) } This loop prints each element in the array to the console. You can also use methods like map, filter, and reduce for more advanced operations on arrays. 3. Implementing a Queue Using an Array A queue is a data structure that follows the First-In-First-Out (FIFO) principle, where the first element added is the first one to be removed. Queues are commonly used in scenarios like task scheduling, breadth-first search algorithms, and managing requests in networking. In Swift, you can implement a basic queue using an array. Here’s an example: struct Queue<Element> { private var array: [Element] = [] var isEmpty: Bool { return array.isEmpty } var count: Int { return array.count } mutating func enqueue(_ element: Element) { array.append(element) } mutating func dequeue() -> Element? { return array.isEmpty ? nil : array.removeFirst() } } In this implementation: The enqueue method adds an element to the end of the array. The dequeue method removes and returns the first element in the array. Queues are useful in many scenarios, such as managing tasks in a multi-threaded environment or implementing a breadth-first search algorithm. 4. Dictionaries in Swift Dictionaries are another powerful data structure in Swift. They store data in key-value pairs, allowing you to quickly look up values based on their associated keys. Dictionaries are ideal for tasks where you need fast access to data based on a unique identifier. Declaring and Initializing a Dictionary Here’s how you can declare and initialize a dictionary in Swift: var userAges: [String: Int] = ["Alice": 25, "Bob": 30] In this example, the keys are strings representing user names, and the values are integers representing their ages. Accessing and Modifying Dictionary Values You can access and modify values in a dictionary using the key: if let age = userAges["Alice"] { print("Alice is \(age) years old.") } userAges["Alice"] = 26 This code snippet retrieves Alice's age and updates it to 26. Dictionaries are highly efficient for lookups, making them a valuable tool when working with large datasets. Adding and Removing Key-Value Pairs To add a new key-value pair to a dictionary, simply assign a value to a new key: userAges["Charlie"] = 22 To remove a key-value pair, use the removeValue(forKey:) method: userAges.removeValue(forKey: "Bob") 5. Exploring Sets Sets in Swift are similar to arrays, but with one key difference: they do not allow duplicate elements. Sets are unordered collections of unique elements, making them ideal for tasks like checking membership, ensuring uniqueness, and performing set operations (e.g., union, intersection). Declaring and Initializing a Set Here’s how you can declare and initialize a set in Swift: let uniqueNumbers: Set = [1, 2, 3, 4, 5] Unlike arrays, sets do not maintain the order of elements. However, they are more efficient for operations like checking if an element exists. Performing Set Operations Swift sets support various operations that are common in set theory, such as union, intersection, and subtraction: let evenNumbers: Set = [2, 4, 6, 8] let oddNumbers: Set = [1, 3, 5, 7] let union = evenNumbers.union(oddNumbers) // All unique elements from both sets let intersection = evenNumbers.intersection([4, 5, 6]) // Elements common to both sets let difference = evenNumbers.subtracting([4, 6]) // Elements in evenNumbers but not in the other set 6. Understanding the Codable Protocol The Codable protocol in Swift simplifies encoding and decoding data, making it easier to work with JSON and other data formats. This is especially useful when interacting with web APIs or saving data to disk. Defining a Codable Struct Here’s an example of a struct that conforms to the Codable protocol: struct Person: Codable { var name: String var age: Int var address: String } With Codable, you can easily encode and decode instances of Person using JSONEncoder and JSONDecoder: let person = Person(name: "Alice", age: 25, address: "123 Main St") let jsonData = try JSONEncoder().encode(person) let decodedPerson = try JSONDecoder().decode(Person.self, from: jsonData) Output and Explanation For each code snippet, you should test and verify that the output matches the expected results. For example, when implementing the queue structure, enqueue and dequeue elements to ensure the correct order of processing. Similarly, when working with dictionaries, confirm that you can retrieve, add, and remove key-value pairs correctly. Conclusion This tutorial has covered fundamental data structures in Swift, including arrays, dictionaries, and sets, and their practical applications in iOS development. Understanding these data structures will make you a better Swift developer and prepare you for technical interviews and real-world projects. Author BioAvi Tsadok, seasoned iOS developer with a 13-year career, has proven his expertise leading projects for notable companies like Any.do, a top productivity app, and currently at Melio Payments, where he steers the mobile team. Known for his ability to simplify complex tech concepts, Avi has written four books and published 40+ tutorials and articles that enlighten and empower aspiring iOS developers. His voice resonates beyond the page, as he's a recognized public speaker and has conducted numerous interviews with fellow iOS professionals, furthering the field's discourse and development.

0
0
62442

article-image-manipulating-text-data-using-python-regular-expressions-regex

Sugandha Lahoti

16 Feb 2018

8 min read

Manipulating text data using Python Regular Expressions (regex)

Sugandha Lahoti

16 Feb 2018

8 min read

0
0
62366

Packt

13 Jan 2014

6 min read

Using IPv6 on Packet Tracer

Packt

13 Jan 2014

6 min read

This article is written by Jesin A the author of Packet Tracer Network Simulator. Cisco Packet Tracer is a powerful network simulation program and provides simulation, visualization, authoring, assessment, and shows collaboration capabilities of a network. This article explains the IPv6 addresses used in Packet Tracer. IPv4 has 4.3 billion addresses, which may seem mindboggling. However, it took only two decades for it to reach its depletion. IPv6 has come to the rescue in the form of 128-bit addresses. Packet Tracer supports a wide array of IPv6 features. We'll start by learning how to assign IP addresses to different devices and how to configure routing between them. Finally, we'll create a setup that enables IPv6 communication over IPv4 devices. Assigning IPv6 addresses Starting from Packet Trace Version 6, the IP Configuration utility under the Desktop tab of end devices has an option to enter an IPv6 address. Let's begin with a simple topology consisting of two PCs and a router connected to a switch, as shown in the following screenshot: There are three ways of assigning IPv6 addresses to a device and we'll see each one of them. Autoconfiguration Autoconfiguration requires the least amount of configuration but makes it difficult to remember the IPv6 addresses. This method uses the MAC address of the device to create an IPv6 address with the FE80:: prefix. Carry out the following steps to assign IPv6 addresses using Autoconfiguration: Begin by configuring the router. Enter the interface configuration mode and enable IPv6 on the interface. R0(config)#ipv6 unicast-routing R0(config)#interface FastEthernet0/0 R0(config-if)#ipv6 enable Next, we will configure a link local address and a global unicast address on this interface. We'll use eui-64 to reduce the configuration. R0(config-if)#ipv6 address autoconfig R0(config-if)#ipv6 add 2000::/64 eui-64 R0(config-if)#no shutdown Verify that the interface is up and has two IPv6 addresses. R0>sh ipv6 interface brief FastEthernet0/0 [up/up] FE80::2D0:58FF:FE65:E701 2000::2D0:58FF:FE65:E701 These IPv6 addresses may vary when you try them out, as they are based on the MAC address. Enable routing so that this router can be identified as a default gateway. R0(config)#ipv6 unicast-routing The configuration of the router is now done, let's move on to the PCs. Go to the Desktop tab of the PC, open IP Configuration , and under the IPv6 Configuration section, choose Auto Config . The gateway and the PC's IP address will be assigned automatically, as shown in the following screenshot: Use the simple PDU tool to test the connectivity; you'll see ICMPv6 packets moving between the nodes. To view the IPv6 address from the command line of PCs, use the ipv6config command. Static IPv6 IPv6 addresses can also be assigned statically on all devices. We'll use the same topology for this section too. We'll carry out the following steps to configure IPv6 addresses statically: Begin by configuring a static IPv6 address on the router. R0(config)#interface fastethernet0/0 R0(config-if)#ipv6 enable R0(config-if)#ipv6 address 2000::1/64 R0(config-if)#no shutdown Go to the Desktop tab of PC, open the IP Configuration utility, and enter an IPv6 address with the same prefix. Now use the simple PDU tool to test the connectivity. Once both the methods work fine, you can have a look at the IPv6 neighbors table. This is similar to the ARP table of IPv4. R0#sh ipv6 neighbor IPv6 Address Age Link-layer Addr State Interface 2000::2 0 00E0.A39E.05C4 REACH Fa0/0 2000::3 0 0001.43B9.0268 REACH Fa0/0 Now that we have configured IPv6 addresses on a single network, let's configure them on more networks and enable routing between them. IPv6 static and dynamic routing Similar to IPv4, IPv6 too supports both static and dynamic routing. Configuration commands for its static routing are similar to IPv4. Static routing Modifying the same topology that we used previously, let's add a router, switch, and two PCs to create a separate network, as shown in the following screenshot: The first network will use addresses starting from 2000:1::/64 and the second network will use addresses starting from 2000:2::/64. The link between both the routers will have IP addresses 2001::10/64 and 2001::20/64. Here is a table describing the topology: Device Interface IP address R1 FastEthernet0/0 2000:1::1/64 FastEthernet0/1 2001::10/64 PC0 FastEthernet 2000:1::2/64 PC1 FastEthernet 2000:1::3/64 R2 FastEthernet0/0 2000:2::1/64 FastEthernet0/1 2001::20/64 PC2 FastEthernet 2000:2::2/64 PC3 FastEthernet 2000:2::3/64 After the necessary IP addresses and gateways have been assigned, open the CLI tab for the R1 router, and start configuring routing by following the given commands: R1(config)#ipv6 unicast-routing R1(config)#ipv6 route 2000:2::/64 2001::20 Next, open the CLI tab for R2 and configure routing on it. R2(config)#ipv6 unicast-routing R2(config)#ipv6 route 2000:1::/64 2001::10 Now use the simple PDU tool to test the connectivity. You may also use the tracert command on a PC to see the path a packet takes. PC>tracert 2000:2::3 Tracing route to 2000:2::3 over a maximum of 30 hops: 1 63 ms 63 ms 47 ms 2000:1::1 2 94 ms 78 ms 94 ms 2001::20 3 156 ms 109 ms 129 ms 2000:2::3 Trace complete. Dynamic routing Packet Tracer offers the same dynamic routing protocols for IPv6: RIPv6, EIGRP, and OSPF. We'll be configuring RIPv6 in this section. Note that RIPv6 does not represent RIP Version 6; it is RIP for IPv6 addresses. For this exercise, we'll use the topology shown in the following screenshot: The additional IP assignment details alone are shown in the following table: Device Interface IPv6 Address R2 FastEthernet1/0 2001:1::10/64 R3 FastEthernet0/0 2000:3::1/64 FastEthernet0/1 2001:1::20/64 PC2 FastEthernet 2000:3::2/64 We'll see how to configure RIP on one router and you can do the same on the others. R1(config)#interface FastEthernet0/0 R1(config-if)#ipv6 address 2000:1::1/64 R1(config-if)#ipv6 rip Net1 enable R1(config-if)#ipv6 enable R1(config-if)#interface FastEthernet0/1 R1(config-if)#ipv6 address 2001::10/64 R1(config-if)#ipv6 rip Net1 enable R1(config-if)#ipv6 enable Note that the ipv6 rip command is used to enable RIP on a particular interface. Entering ipv6 rip Net1 enable on the first interface begins the RIPv6 process. The Net1 string can be any name that can be used to name the RIP process. Once configured, use the usual diagnostic tools (ping to simple PDU) to check the connectivity. To view the RIP database, use the following command: R1#sh ipv6 rip database RIP process "Net1" local RIB 2000:2::/64, metric 2, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2000:3::/64, metric 3, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2001::/64, metric 2 FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec 2001:1::/64, metric 2, installed FastEthernet0/1/FE80::201:97FF:FE87:E5A9, expires in 173 sec RIP process "LINK" local RIB Trace the route of the packet to see the path it takes. PC>tracert 2000:3::2 Tracing route to 2000:3::2 over a maximum of 30 hops: 1 31 ms 32 ms 31 ms 2000:1::1 2 50 ms 50 ms 63 ms 2001::20 3 94 ms 94 ms 94 ms 2001:1::20 4 125 ms 109 ms 125 ms 2000:3::2 Trace complete. Summary In this article, we learned how to use IPv6 with Packet Tracer. We saw the limitation of the IPv4 addresses. We also learned how to assign IPv6 addresses and how to configure IPv6 static and dynamic routing. Resources for Article : How to edit the attributes in QGIS Troubleshooting OpenStack Compute problems Creating Identity and Resource Pools in Cisco Unified Computing System

0
0
61806

How to Configure Squid Proxy Server

ChatGPT for Data Governance

How to create sales analysis app in Qlik Sense using DAR method [Tutorial]

Using ChatGPT for Customer Service

Different types of NoSQL databases and when to use them

ChatGPT for Business Intelligence

Top 10 deep learning frameworks

Airflow Ops Best Practices: Observation and Monitoring

Data cleaning is the worst part of data analysis, say data scientists

How we are Thinking About Generative AI

Trending Topics

Getting started with Web Scraping using Python [Tutorial]

Fingerprint detection using OpenCV 3

Understanding Data Structures in Swift

Manipulating text data using Python Regular Expressions (regex)

Using IPv6 on Packet Tracer

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access